The BioMassters Challenge Starter Code
Getting Started with MATLAB
Hello all! We at MathWorks®, in collaboration with DrivenData, are excited to bring you The BioMassters Challenge. The objective of this challenge is to estimate the annual aboveground biomass (AGBM) in a given patch of Finland when provided satellite imagery of that patch. In this blog we are providing a basic starter example in MATLAB®. In this code, I create a basic image-to-image regression model and train it to predict annual AGBM for each pixel in the input data. Then I use this model on test data and save the results in the format required for the challenge. This should serve as basic starting code to help you to start analyzing the data and work towards developing a more efficient, optimized, and accurate model using more of the training data available. To request your complimentary MATLAB license and other getting started resources, visit the MathWorks BioMassters challenge homepage. You can also access and download this code by visiting this GitHub repository.
Table of Contents
The Data
Each chip_id represents one patch of land in a given year. For each chip, you are provided approximately 24 satellite images and 1 AGBM image.
The satellite imagery comes from two satellites called Sentinel-1 (S1) and Sentinel-2 (S2), covering nearly 13,000 patches of forest in Finland from 2016 to 2021. Each chip is 2,560 by 2,560 meters, and the images of these chips are 256 by 256 pixels, so each pixel represents a 10 by 10 meter area of land within the chip. You are provided a single image from each satellite for each calendar month. For S1, each image is generated by taking the mean across all images acquired by S1 for the chip during that time. For S2, you are provided the best image for each month.
The AGBM image serves as the label for each chip in a given year. Just like the satellite data, the AGBM data is provided in the form of images that cover 2,560 meter by 2,560 meter areas at 10 meter resolution, which means they are 256 by 256 pixels in size. Each pixel in the satellite imagery corresponds to a pixel in the AGBM image with the same chip ID.
For the competition, you will use this data to train a model that can predict this AGBM value when provided with only the satellite imagery. To learn more about the images, features, labels and submission metrics, head over to the challenge's Problem Description page!
Preview the Data
To understand the data that we will be working with, let's look at a few example images for a specific chip_id. In the sections below, the images correspond to chip_id 0a8b6998.
First, define a variable that points to the S3 bucket so that we can access the data. You can find this path in the 'biomassters_download_instructions.txt' file provided on the data download page. Make sure this is the path for the entire bucket, not any specific folder - it should start with 's3://'. % Example path, you will need to replace this
s3Path = 's3://competition-bucket-name-location/';
This will be used throughout the blog.
Sentinel-1:
For each chip_id, we expect to see 12 images from Sentinel-1 with the naming convention {chip_id}_S1_{month}, where month is a value between 00 and 11. There are cases where there may be missing data, which could result in one or more of these images missing.
Each Sentinel-1 image has four bands, where each band is one 256x256 matrix that contains a specific measurement for the chip. Let's visualize each band of one of these S1 images:
exampleS1Path = fullfile(s3Path, 'train_features', '0a8b6998_S1_00.tif');
exampleS1 = imread(exampleS1Path);
% To visualize each layer, rescale the values of each pixel to be between 0 and 1
% Darker pixels indicate lower values, ligher pixels indicate higher values
montage(rescale(exampleS1));