blog winnerscompetition

Meet the winners of STAC Overflow: Map Floodwater from Radar Imagery Challenge

Meet the winners who were best able to detect floodwater using synthetic-aperture radar (SAR) imagery! These winners developed solutions that can help to strengthen early warning systems and direct emergency relief.

Tammy Glazer
DrivenData

The motivation

Flooding is the most frequent and costly natural disaster in the world. According to the World Health Organization, between 1998 and 2017, floods affected more than two billion people worldwide. As global warming continues to exacerbate rising sea levels, prolong rainstorms, and increase snowmelt, the frequency and severity of extreme weather is only expected to rise.

During a flood event, it is critical that governments and humanitarian organizations be able to accurately measure flood extent in near real-time to strengthen early warning systems, assess risk, and target relief. Historically, hydrologists, governments, and emergency response teams have relied on readings from rain and stream gauging stations to understand flood reach and develop forecasts. Though helpful, these ground measures only measure water height, are spatially limited, and can be expensive to maintain.

High resolution synthetic-aperture radar (SAR) imaging has strengthened monitoring systems by providing data in otherwise inaccessible areas at frequent time intervals. Specifically, C-band SAR from the Sentinel-1 mission provides an all-weather, day-and-night supply of images of the Earth’s surface. By operating in the microwave band of the electromagnetic spectrum, SAR can capture images through clouds, precipitation, smoke, and vegetation, making it especially valuable for flood detection.

False color composite representing the VV and VH polarization bands of a SAR image alongside its water label from the competition dataset assembled by Cloud to Street and Microsoft AI for Earth

The goal of the STAC Overflow: Map Floodwater from Radar Imagery challenge was to build machine learning algorithms that can map floodwater using Sentinel-1 global SAR imagery. A newly updated dataset of satellite images captured between 2016 and 2020 was prepared and labeled by Cloud to Street and made available through Microsoft’s Planetary Computer. The Planetary Computer combines a multi-petabyte catalog of analysis-ready environmental data with intuitive APIs and a flexible development environment to support scientists, developers, and policy makers applying artificial intelligence to environmental challenges.

Models that can effectively use remote sensing to improve flood mapping have the potential to strengthen risk assessment, relief targeting, and disaster preparedness.

The results

Over 660 participants from around the world stepped up to this important challenge, generating more than 1,400 entries. To measure performance, we used a metric called Intersection over Union (IoU) or Jaccard Index, a common similarity measure that compares the size of the intersection relative to the size of the union for predicted and ground truth image pixels. At the start of the competition, DrivenData and Cloud to Street released a benchmark solution that used a ResNet-34 encoder and a U-Net decoder for semantic segmentation, which achieved an IoU of 0.44 on the test set. Within a week, multiple teams had already beaten this performance!

The top-performing models obtained IoUs above 0.80, demonstrating a significant improvement over the benchmark solution. They were able to capture signal from polarized SAR data across different geographies and vegetations. All winning solutions successfully leveraged the Planetary Computer STAC API to bolster their models using supplementary NASA Digital Elevation Model (NASADEM) elevation data and/or the European Commission’s Joint Research Centre (JRC) global surface water data to learn about a geography’s natural topography. To secure the top leaderboard positions, they used a combination U-Net and U-Net++ convolutional neural networks (CNNs) and gradient boosted decision trees.

In addition to using different models, competitors experimented with different sampling strategies, adversarial training schemes, and image augmentations to handle label imbalances and enhance the training dataset. As a result, these solutions can help to improve disaster risk management and response around the world by equipping response teams with more accurate and timely flood maps.

All of the prize-winning solutions from this competition are linked below and made available for anyone to use and learn from. Meet the winners and learn how they built their master mapping models!


Meet the winners

Daniil Stepanov and Anna Belyaeva

Place: 1st

Prize: $10,000

Hometown: Moscow, Russia

Username: sweetlhare, Belass

Background:

We are a team of two people from Russia. Anna leads the team and allocates time and resources. Daniil is responsible for developing the models. Together, we can outline and solve any task.

What motivated you to compete in this challenge?

We are fans of competitions and hackathons. Recently, many problems have involved computer vision and working with satellite images. When we saw this competition, we immediately decided to try it. We did not imagine that we would win until the last moment!

Summary of approach:

First, we trained a single U-Net model, as this is a classic approach for segmentation. Next, we studied articles that attempted to solve similar problems. We learned that flooding is often detected using mathematical formulas, which gave us the idea to try pixel-by-pixel classification by converting the images into tabular data. We noticed that both approaches helped to predict excess flooding but did not fill in all water pixels. Therefore, we decided to combine these two approaches by taking the maximum output value rather than the average. It was important to incorporate the NASADEM elevation band into our model. This was our final approach. The models are light enough that inference can be reproduced easily.

Check out team Moscow Hares’ full write-up and solution in the competition repo.

Max Lutz

Place: 2nd

Prize: $6,000

Hometown: Paris, France

Username: Max_Lutz

Background:

My name is Max Lutz. I’m a graduate from an engineering school in France at the Master’s level, where I specialized in energy and the environment. I have been studying data science and machine learning since March 2021. My goal is to work with data to help solve climate change and to work on projects that involve energy, the environment, and society.

What motivated you to compete in this challenge?

I was interested in this competition because the data and the objective are linked to the environment. Since my objective is to work with data to help solve climate change, it was a very good match. I thought it would be a very good opportunity to learn how to work with satellite data and how to make predictions using images.

Summary of approach:

My best solution includes the following steps:

  • Load the ids of the VV and VH files
  • Split the ids into a training set and a test set. I created three different splits so that I could train three models and average the output for prediction.
  • Load the VV and VH images, labels, and auxiliary data files from the Planetary Computer. For the VV and VH images, clip the values outside of the range -30 to 0, map these values to 0-255, and convert to unit8. For the NASADEM images, clip the values outside of the range 0 to 255 and convert to unit8. For the rest of the data, do not apply any conversion.
  • Use the albumentations library to generate new images from the original ones. Use RandomRotate90, HorizontalFlip, and VerticalFlip to produce random transformations, and append these new images to the original data.
  • Train a U-Net model on the data using a loss function of dice loss with a squared denominator (I got the idea of using this loss function from this article).

Check out Max_Lutz’s full write-up and solution in the competition repo.

Will L.

Place: 3rd

Prize: $4,000

Hometown: Walpole, USA

Username: loweew

Background:

I am a biochemist who switched to computational work when I heard about CUDA in 2007. It seemed like it would be an important technology. I spent a few years as a research professor at Vanderbilt University leading a group developing novel, GPU-accelerated machine learning techniques for computer-aided drug design. I currently serve as the VP of Data Science at Lose It!, and as a user advisor to the TensorFlow team at Google.

What motivated you to compete in this challenge?

I had the week off from work and decided to use that time to learn PyTorch for segmentation. I came across the blog post for the competition and it seemed like a great place to start. I have been a prize winner in a couple of other DrivenData competitions and was already familiar with the great work they do.

Summary of approach:

My focus was really around building a generalizable ensemble. In my previous work on these types of competitions, I’ve found that my cross-validation models have suffered from hold-out variance issues. I tried to identify chip ids that contributed to this variation and kept them in the training folds during all cross-validation rounds. I figured that if the other chips could not generalize to those “outliers”, then it made sense to keep them in the training set and not try to use them for early stopping. I have found that using 5-fold cross-validation models as an ensemble or jury typically results in better generalization versus performing a final fit using the hyperparameters discovered through the cross-validation process. I used a variety of augmentations during training.

Check out the loweew’s full write-up and solution in the competition repo.


Thanks to all the participants and to our winners! Special thanks to Microsoft AI for Earth and Cloud to Street for enabling this important and interesting challenge and for providing the data to make it possible!

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

tutorial

NASA Pose Bowl - Benchmark

An introduction to the NASA Pose Bowl competition, with a benchmark solution for the object detection track

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.