case studies

30 Mar 2021

From competition to deployment: forecasting geomagnetic storms with NOAA

We designed and ran a global challenge that produced open-source models now powering public, real-time Dst predictions to help mitigate space weather risks.

The organizations¶

The National Oceanic and Atmospheric Administration (NOAA) is a leading U.S. science agency within the Department of Commerce. Its mission is to understand and predict changes in climate, weather, oceans, and coasts—and to use that knowledge to protect communities and natural resources. Predicting space weather, including geomagnetic storms, is a critical part of that mission.

DrivenData partnered directly with two key NOAA-affiliated organizations:

The National Centers for Environmental Information (NCEI), which maintains one of the world’s largest environmental data archives and develops tools for scientific data discovery and use.
The Cooperative Institute for Research in Environmental Sciences (CIRES), a joint institute of NOAA and the University of Colorado Boulder, which brings together over 800 researchers focused on advancing Earth and space science.

The challenge¶

Geomagnetic storms—caused by solar activity—can disrupt power grids, satellite operations, GPS, and communications. To reduce those risks, forecasters rely on the Disturbance Storm-time Index (Dst), a key measure of geomagnetic activity.

NOAA set out to improve real-time Dst forecasting by inviting the global data science community to help build better models. The goal: develop accurate, operationally viable models to forecast Dst for the current and following hour, using real-time solar-wind data from spacecraft like ACE and DSCOVR.

The approach¶

DrivenData partnered with NOAA, NCEI, and CIRES to design and run the MagNet: Model the Geomagnetic Field data science competition. Our work included:

Problem framing and dataset preparation:

We worked closely with NOAA scientists to define the forecasting task, identify operationally relevant evaluation metrics, and prepare a high-quality dataset from real-time solar-wind data.
Global competition management

We hosted the challenge on the DrivenData platform, providing clear instructions, a real-time leaderboard, and community support for more than 600 participating data scientists and researchers.
Operational realism

We designed a custom code execution environment to simulate real-world constraints, including data latency and resource limitations, ensuring that models were not only accurate but also practical for deployment. This infrastructure allowed participants to submit fully executable code, which was evaluated in conditions that mirrored NOAA’s operational forecasting environment.
Evaluation and verification

We implemented a rigorous, multi-stage evaluation process, and NOAA independently validated the top-performing models on recent, unseen data to confirm real-world applicability.

Comparison of real-time Disturbance Storm-time (Dst) index values with predictions from an operational machine learning model ("Dst CNN") developed through the MagNet data science competition. The model, based on the second-place winning solution, runs in real time using solar wind inputs and is now publicly available via NOAA and CIRES.

The results¶

High-performing forecasting models

More than 1,200 submissions were made by over 600 participants. The winning models, using techniques like LSTMs, GRUs, CNNs, and LightGBM, significantly outperformed existing benchmarks for forecasting the Dst index. An ensemble of the top solutions was able to push the state-of-the-art on unseen data, reducing error by 30% compared with the NCEI benchmark model.
Operational adoption and public release

Following the challenge, NOAA adopted the top-performing solutions and integrated them into operational forecasting systems. NOAA and NCEI researchers partnered with one of the winners to productionize an ensemble of the two best models and integrated it into NOAA’s High Definition Geomagnetic Model (HDGM). Real time predictions are now publicly available, advancing national capabilities for space weather prediction.
Scientific recognition The competition and its results were published in Space Weather, a journal of the American Geophysical Union. The paper was among the top 10% most-read in 2023—highlighting the project's impact on the research community.

Our real-world impact

All projects

Partners: Max Planck Institute for Evolutionary Anthropology, Arcus Foundation, WILDLABS

Automating wildlife identification for research and conservation

Detected wildlife in images and videos—automatically and at scale—by building the winning algorithm from a DrivenData competition into an open source python package and a web application running models in the cloud.

Partner: CodePath

Data engineering from the ground up

Built data infrastructure to ingest, clean, integrate, and organize data across CodePath, created interactive dashboards for accurate monitoring of program trends, and provided trusted data expertise to identify and hire talent to carry the work forward.

Partner: The National Center for State Courts

Building a private LLM sandbox for NCSC

We worked with the National Center for State Courts to build an LLM chat sandbox for private usage. This sandbox allows users to experiment with LLM tools in a way that is safe, secure, and cost-effective, with specific use cases and prompts relevant to their work.

Partners: The World Bank, The Conflict and Environment Observatory

Identifying crop types using satellite imagery in Yemen

Used satellite imagery to identify crop extent, crop types and climate risks to agriculture in Yemen, informing World Bank development programs in the country after years of civil war.

Partners: Private sector, social sector

Building applied solutions with LLMs

Built solutions using LLMs for multiple real-world applications, across tasks including semantic search, summarization, named entity recognition, and multimodal analysis. Work has spanned research on state-of-the-art models tuned for specific use cases to production ready retrieval-augmented AI applications.

Partners: Bureau of Ocean Energy Management, NOAA Fisheries, Wild Me

Protecting endangered beluga whales with computer vision

Designed and administered a computer vision challenge that produced state-of-the-art machine learning models to identify and match individual endangered beluga whales from photo surveys.

Partner: EverFree

A production application to support survivors of human trafficking

Built the Freedom Lifemap platform, a digital tool designed to support survivors of human trafficking on their journey toward reintegration and independence

Partner: ReadNet

Crowdsourcing solutions for AI assisted early literacy screening

Ran a machine learning challenge to develop automatic scoring methods for audio clips from literacy screener exercises. Automated scoring can help teachers quickly and reliably identify children in need of early literacy intervention.

Partner: Science for America

Making higher education data more accessible

Created an open source Python library and interactive data visualization platform for analyzing U.S. higher education data and illuminating trends and disparities in STEM education.

Partner: BetterUp Labs

Building research infrastructure for conversational AI

Developed the data infrastructure, machine learning pipelines, and research tools for the CANDOR Corpus with BetterUp Labs—enabling large-scale insights into human conversation.

Partner: Candid

Linking nonprofit grants to organizations with machine learning

Built Orgmatch, a scalable and explainable entity resolution system to add value to information processed by a leading nonprofit data hub.

Partner: IDEO.org

Illuminating mobile money experiences in Tanzania

Analyzed millions of mobile money records to uncover patterns in behavior, and then combined these insights with human-centered design to shape new approaches to delivering mobile money to low-income populations in Tanzania.

Partners: Insecurity Insight, Physicians for Human Rights

Tracking attacks on health care in Ukraine

Built a real-time, interactive map to visualize attacks on the Ukrainian health care system since the Russian invasion began in February of 2022. The map will support partner efforts to provide aid, hold aggressors accountable in court, and increase public awareness.

Partner: Wellcome

Addressing algorithmic bias in medical research

Conducted a literature review to understand the current state of bias identification & mitigation in mental health research, and synthesized recommended best practices from the field of machine learning.

Partner: National Oceanic and Atmospheric Administration (NOAA)

Forecasting geomagnetic storms

Designed and ran a global challenge that produced open-source models now powering public, real-time predictions of geomagnetic storm activity to help mitigate space weather risks.

Partner: CABI Plantwise

Mining chat messages with plant doctors using language models

Automated recognition of agricultural entities (such as crops, pests, diseases, and chemicals) in WhatsApp and Telegram messages among plant doctors, enabling new ways to surface emerging trends and improve science-based guidance for smallholder farmers.

Partner: NASA

Monitoring water quality from satellite imagery

Created an open-source package to detect harmful algal blooms using machine learning and satellite imagery. Included running a machine-learning competition, conducting end user interviews, and engineering a robust, deployable pipeline.

Partners: Candid, Black Wealth Data Center

Facilitating LLM opportunity workshops

It can be difficult to tell how generative AI can produce the most impact for organizations seeking to incorporate it into their work. We've facilitated human-centered design workshops with our partners to surface how generative AI can accelerate progress toward their goals.

Partner: Data science company foundation

Matching students with schools where they are likely to succeed

Used machine learning to match students with higher education programs where they are more likely to get in and graduate based on their unique profile, with a focus on backgrounds traditionally less likely to attend college or apply to more competitive programs.

Partner: University of Maryland

Processing multimodal tutoring data

Built well-engineered data pipelines to extract machine learning features from audio, video and transcript data collected from online tutoring sessions, enabling a team at the University of Maryland to study how relationship-building affects student outcomes.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.