Open data using AI for climate action

by Greg Lipstein

This is a follow-on post to a previous summary of the ways that organizations are using AI for climate action. It consolidates a range of real-world datasets from DrivenData challenges for anyone looking to get hands-on experience.

Climate change poses a massive challenge, with far-reaching impacts. Many data scientists and developers are aware of the need for action and are interested in finding problems to take on. At the same time, it can be hard know where machine learning and AI are actually needed by organizations, and what skills to build to be part of the solution.

In our previous post, we summarized 5 ways organizations are using AI for climate action. They included:

  1. 💧 Managing natural resources
  2. 🌪️ Disaster resilience
  3. 🐒 Biodiversity conservation
  4. ⚡ Energy efficiency
  5. 🛰️ Upstream enablers

These areas were distilled from over 20 machine learning competitions over the past few years, with more than $1 million awarded to developers of the top-performing approaches. They reflect problems where public, private, and social sector organizations are actively investing in developing better algorithmic solutions.

As the costs of climate change continue to mount, efforts in these areas will also increasingly need more good people trained with relevant skills and experience to help take on the critical opportunities that arise.

In this post, we recap each area above and pull together applied use cases where partners have made datasets openly available for ongoing practice and learning. Each project includes links to the data, challenge, and winning solutions for anyone looking to get more experience working in these domains.

1. 💧 Managing natural resources

Recap: The first area is focused on how we monitor and manage natural resources like water, air, and forests. Having better estimates of how these resources are changing — both from climate change and in connection with decisions by landowners, public officials, and policymakers — will allow us to detect issues earlier and get a better sense for what actions are worth taking.

Common data types: geospatial, satellite, LiDAR, ground and IoT sensors, meteorological

Use case: Estimating water from snowpack

With the Bureau of Reclamation

Seasonal mountain snowpack is a critical water resource throughout the Western U.S. Better estimates of "snow water equivalent" (SWE) from real-time satellite, ground station, and meteorological data helps water managers plan resources and respond to extreme weather events like floods and droughts.

Use case: Monitoring annual forest biomass

With University of Liege, Avoin, Mathworks

Above-ground biomass is a widespread measure of carbon release and sequestration by forests. Satellite imagery can help estimate and map changes over time, supporting landowners and policy makers to make better decisions for the conservation of forests.

2. 🌪️ Disaster resilience

Recap: The second area addresses the increase in extreme weather events and natural disasters like tropical storms, floods, and fires. Efforts using AI may target monitoring and response efforts when an event occurs, like tracking flood extent to inform the approach of emergency responders. They may also target preparation and mitigation efforts before an event, like identifying at-risk areas for building damage to prioritize retrofits.

Common data types: Aerial imagery, satellite, geospatial

Use case: Estimating the intensity of tropical storms

With Radiant Earth Foundation

Overhead satellite imagery gathered throughout a hurricane's life cycle can be used to generate more accurate, near-real time wind speed estimates, which in turn help to power the risk approximation models that humanitarian response efforts rely on during a storm.

Use case: Mapping flood extent

With Microsoft Planetary Computer, Cloud to Street

Flooding has become the most frequent and costly natural disaster in the world. The ability for synthetic aperture radar (SAR) satellite imagery to penetrate through storms and vegetation makes it particularly useful for monitoring events in order to help strengthen early warning systems, assess risk, and target relief.

3. 🐒 Biodiversity conservation

Recap: The third area is focused on how we monitor and protect wildlife populations to determine when and how best to intervene for the sustainability of the world’s ecosystems. As climate change increasingly affects life-sustaining habitats on land and at sea, conservationists are using more data from trail cameras to drones in order to track changes in wildlife populations and study their behavior. Advances in computer vision are helping to process all the raw information from these observation systems, freeing up more time for humans to focus on interpreting and responding to the results.

Common data types: video, imagery

Use case: Automated wildlife identification from camera traps

With the Max Planck Institute for Evolutionary Anthropology, Wild Chimpanzee Foundation, Microsoft AI for Earth, Snapshot Serengeti

Trail cameras help conservation researchers study and monitor a wide range of ecologies while limiting human interference. To work with this imagery at scale, these groups rely on accurate computer vision models to detect and classify the species that appear.

Wildlife identification in the rainforests of West Africa:

Wildlife identification on the Serengeti:

Use case: Photo-identification of endangered beluga whales

With the Bureau of Ocean Energy Management, NOAA Fisheries, WildMe

Matching new drone images from annual photo-identification surveys to known individuals helps power more scalable population assessment and protection efforts for Cook Inlet beluga whales.

Use case: Monitoring catch for sustainable fisheries

With The Nature Conservancy, Gulf of Maine Research Institute

AI tools to automatically analyze footage from on-board video systems help responsible fisheries comply with regulations, save time, and lower the safety risk and cost from an auditor on board.

4. ⚡ Energy efficiency

Recap: The fourth area deals with how we generate and consume energy. The production of electricity and heat is the largest single source of global greenhouse gas emissions, accounting for around 25% of all emissions. Efforts to decarbonize the economy involve increasing the efficiency in how we use energy while also expanding generation from clean power sources. AI is being used for a range of tasks where automated systems and better predictions can help, for instance by reducing waste from unnecessary use.

Common data types: IoT, time series, meteorological, simulation

Use case: Forecasting building energy consumption

With Schneider Electric

More accurate forecasts of building energy consumption mean better planning and more efficient energy use, working with a range of historical data while factoring in relevant information like holidays and weather patterns.

Energy consumption forecasting:

Energy forecasting with limited historical data:

Use case: Optimizing battery charging systems for renewable energy storage

With Schneider Electric

Traditional optimization methods and machine learning can be combined with battery charging simulation data to build algorithms for controlling a battery charging system as efficiently as possible, helping increase the flexibility and reliability of energy storage.

Use case: Detecting anomalies in energy use to identify potential sources of waste

With Schneider Electric

Commercial buildings waste an estimated 15-30% of energy used due to poorly maintained, degraded, and improperly controlled equipment. Quick-response algorithms trained on historical records can help managers find anomalies in energy use and elevate them for attention.

5. 🛰️ Upstream enablers

Recap: The fifth area looks at applications that are not focused on specific downstream use cases, but rather are intended to help enable a range of uses by improving the usefulness of interim data outputs further upstream. Many examples we have seen focus on important earth observation sources such as satellite and drones, which feed into many downstream use cases including forest monitoring, disaster planning, and emergency response.

Common data types: Aerial imagery, satellite, geospatial

Use case: Mapping building footprints to support disaster risk management

With the Global Facility for Disaster Reduction and Recovery / World Bank and Azavea

High-resolution drone imagery can be used to automatically map building footprints across a range of urban settings, a first step toward many important interventions that help mitigate disaster risk and enable more effective disaster response.

Use case: Identifying cloud cover interference in Sentinel-2 satellite imagery

With Microsoft Planetary Computer, Radiant Earth Foundation

Detecting and removing clouds in satellite imagery is a critical step to reduce noise and inaccuracy in a wide range of earth observation models, improving downstream applications like land cover classification, agriculture, and disaster management.

Use case: Rectifying 3-D maps from oblique satellite imagery to support time-sensitive applications

With the National Geospatial-Intelligence Agency, Johns Hopkins University Applied Physics Laboratory

The first images available from satellites are often taken from an angle, so models that can accurately map ground objects from different orientations help make satellite imagery more useful for time-sensitive needs like emergency response and navigation.


Given the immense scale and high stakes posed by climate change, we need the best tools and talent we can bring to bear. AI is an important tool in this effort, and we are seeing more organizations in a position to put it to use.

This post pulls together some of the concrete ways we have already seen organizations investing in solutions and making data available for ongoing use and learning. Our hope is to share these examples as a hands-on learning resource for more people to build out skills and experience working in these important areas. We know there will be a lot more need and opportunities in the coming years.

If you're interested in more great competitions with open data, the latest updates can always be found on the DrivenData website. Happy building!

Thanks to the clients and partners above who have made this work possible, and to the competition participants whose hard work, ingenuity and passion have helped push forward new solutions to these challenges.

DrivenData is a social enterprise that brings the power of data science and AI to organizations tackling the world’s biggest challenges. Check out our competitions, and if you have a problem to solve or thoughts on this post we'd love to hear from you.