A view of smaller water bodies, some with HABs, as captured by Sentinel-2
For years, harmful algal bloom (HAB) monitoring has lived with a simple but frustrating truth: our tools work well, just not at the spatial and temporal scales we need them to. Large lakes and coastal systems can be monitored with sensors, such as Imaging Flow Cytobots (IFCBs), and remote sensing tools such as the EPA’s CyANWeb. Unfortunately, ~98% of U.S. lakes are too small for effective HAB detection using CyAN because of Sentinel-3’s 300m spatial resolution.
As a result, HAB monitoring for small lakes, reservoirs, rivers, and inland freshwater systems relies on labor-intensive field sampling programs and fixed-sensor systems, each with limitations:
- Precision measurements obtained from field monitoring programs are limited by geographic coverage, sampling intervals, access to water bodies, volunteer or paid staff participation and coordination, and lab resources.
- Precision measurements from fixed sensor systems are limited by trade-offs between system costs, pinpoint measurement, and the number of sampling sensors needed for representative areal sampling.
In short, there is a gap in the tools available to state and regional HAB monitoring programs: the ability to monitor the nation’s hundreds of thousands of smaller water bodies (<250 acres) across large regions at high spatial and temporal resolution.
A new tool, CyFi—short for Cyanobacteria Finder—was developed to close that gap. It’s a machine-learning model packaged as an open-source command-line tool that uses Sentinel-2 satellite 10-30m resolution data to identify HABs in inland water bodies of less than ~250 acres (100 hectares).
The initial models that led to CyFi are the outcome of an open machine learning competition, hosted by DrivenData and funded by NASA, with collaboration from NOAA, EPA, USGS, DOD's Defense Innovation Unit, Berkley AI Research, and Microsoft AI for Earth. CyFi was developed through multiple phases of research and development.

This blog presents the story of how CyFi enhances modern HAB monitoring approaches, extending the reach of HAB monitoring systems while complementing and strengthening all other components.
The problem: The smaller water bodies at risk for HABs are the hardest to monitor¶
The difficulty starts with scale—the number of water bodies, their geographic distribution over large areas, and their smaller size.
In situ sampling: Pinpoint accuracy and precision, but challenging to scale¶
Field teams have always been the backbone of freshwater HAB monitoring. They take water samples, ship them to labs, and get back precise cyanobacteria cell counts and toxin measurements. Similarly, fixed sensor systems installed in water bodies offer precision and continuous monitoring. But precision comes at a cost. One or more of these limitations apply:
- Sensor equipment is too expensive to deploy for hundreds or thousands of lakes
- Manual field sampling is too resource-intensive for broad area, frequent, and routine surveillance
- Pinpoint samples do not accurately represent the areal extent of HAB
In situ sampling is irreplaceable for confirming toxicity and regulatory thresholds. But it can’t be everywhere at once.
Satellite monitoring: Frequent and large area coverage, but with limitations on resolution and precision¶
Remote sensing promises an answer—continuous, passive, wide-area imagery. But inland lakes and water bodies smaller than ~250 acres represent a major limitation: the satellites that can detect cyanobacteria aren’t sharp enough to resolve smaller water bodies.
-
Sentinel-3 has the spectral bands needed to detect cyanobacteria-specific pigments, such as phycocyanin. Problem: 300–500 m resolution completely washes out most small lakes and rivers.
-
Sentinel-2 has the spatial resolution (10–30 m) needed to clearly see small water bodies. Problem: Its broad spectral bands don’t isolate cyanobacteria, making direct detection unreliable.
A visual comparison of Sentinel-3 resolution (left) and Sentinel-2 resolution (right)
Monitoring small water bodies at scale became a classic “pick two” dilemma. But for comprehensive and responsive HAB monitoring, you need all three.
| Approach | Scalable? | Works on small lakes? | Cyanobacteria-specific? |
|---|---|---|---|
| In situ | ❌ | ✅ | ✅ |
| Sentinel-3 | ✅ | ❌ | ✅ |
| Sentinel-2 | ✅ | ✅ | ❌ |
Potential HAB monitoring breakthrough: Machine learning with Sentinel-2 satellite data¶
Instead of relying on direct spectral detection of cyanobacteria, AI models can infer bloom conditions from combinations of Sentinel-2’s broad-band features. A well-trained model effectively translates Sentinel-2 imagery into an estimate of cyanobacteria density—even though the satellite itself lacks cyanobacteria-specific bands.
This is the core insight behind CyFi.
What CyFi does¶
CyFi is an open-source Python package that:
- Takes simple input—location and date.
- Downloads the relevant Sentinel-2 imagery.
- Generates cyanobacteria cell density estimates.
- Assigns a severity level using WHO guidelines (or custom cutoffs).
- Outputs both data and optional overlays on satellite imagery.
The data behind CyFi: National-scale training and testing¶
CyFi is grounded in one of the largest curated datasets of in-situ cyanobacteria measurements from across the United States:
- ~9,000 observations in the training set
- ~4,000 observations in the test set
- Carefully vetted data from 14 organizations including state agencies, federal programs, and scientific monitoring networks
- Published publicly via NASA’s SeaBASS archive
This breadth makes the model robust across diverse water types and seasonal conditions.

While CyFi was developed from both national and state datasets, we recognize that results may vary when applied to local contexts. We recommend that each monitoring program perform validation testing using its data. The DrivenData team is available to help any team test and onboard CyFi.
How CyFi compares to CyAN¶
A natural benchmark is Cyanobacteria Index (CI) from CyAN, which is built on Sentinel-3.
When comparing 756 paired observations from U.S. lakes large enough for Sentinel-3 to capture, CyFi's bloom detection (presence/absence accuracy) was 72% and CyAN’s was 66%. In other words, CyFi performs on par with the leading Sentinel-3-based method—but covers 10x more water bodies thanks to Sentinel-2’s higher resolution. For context, there are more than 400,000 lakes under 250 acres in the contiguous US and these account for 98% of CONUS lakes.
CyFi performs at least as well as Sentinel-3 based tools but has 10x greater coverage of lakes. Accuracy was evaluated using on a dataset of 756 ground measurements from across the U.S.
Where CyFi fits within the typical HAB monitoring ecosystem¶
CyFi doesn't replace manual sampling, Sentinel-3 algorithms or on-site sensors. Instead, it complements each of them by filling crucial observational gaps and helping improve the overall system.
1. Manual sampling + CyFi¶
Use CyFi for statewide or regional passive coverage. Use sampling for confirmation, toxin analysis, and regulatory decisions. CyFi helps teams decide where to sample—increasing impact without increasing budgets.
2. Sentinel-3 remote sensing + CyFi¶
Sentinel-3 provides useful, accurate coverage for large lakes, oceans, and coastal areas where phycocyanin detection is possible. CyFi extends this insight to small inland and freshwater bodies where Sentinel-3 cannot resolve features. Seen together, they provide tiered coverage for all freshwater lake, river, and pond sizes.
3. Sensors and buoys + CyFi¶
In situ instruments capture detailed dynamics in high-priority locations. CyFi identifies hotspots and seasonal patterns across the broader landscape. It becomes a wider net that helps managers identify gaps in current in situ systems and confirm where high-intensity monitoring is worthwhile and cost-effective.
Adding CyFi to an existing HAB monitoring system can provide weekly HAB estimates for every lake in a region¶
Because CyFi produces structured predictions, it integrates cleanly into dashboards, alerts, and workflows—turning satellite data into actionable information for water managers, public health officials, and environmental agencies.
With CyFi integrated into an in situ monitoring system, agencies can now operate with broad, repeatable, near-real-time situational awareness that complements existing regimes and directs attention where it's needed, with these potential incremental benefits:
- More informed water body and beach closures from area estimates of HAB
- Faster and more comprehensive drinking water advisories by increasing the frequency of sampling over a larger area
- More targeted and strategic sampling plans for in situ monitoring, with potential to further optimize placement of sensors and better direct manual field teams
The result is more confidence in identifying the places that are and aren’t blooming.
How to add remote sensing to your HAB monitoring system¶
CyFi is a relatively new offering for the HAB monitoring stack. We’re looking for partnerships to strengthen and verify CyFi’s contribution. These high-impact options are ready to pursue immediately through partnerships:
1. Ground-truthing and training CyFi with new field samples¶
If your organization collects cyanobacteria samples, we can compare CyFi predictions against your data to evaluate performance and establish baseline accuracy. For your implementation, whether in or outside the U.S., we recommend retraining CyFi on your data for highest accuracy.
2. Integrating CyFi outputs into monitoring dashboards or workflows¶
If you manage a large region and need process automation, we can help you integrate CyFi for automated, large-area weekly monitoring that feeds directly into your dashboards and decision-support systems.
DrivenData is looking for partners to validate and test advanced HAB monitoring systems¶

The python package is just a building block. We want to collaborate with freshwater monitoring programs to further validate, improve, and deploy CyFi. CyFi is open source and freely available for exploration and experimentation.
We invite you to reach out to partner with us for validation, dashboard integration, a customized CyFi HAB monitoring deployment, or any manner of technical support from the team that built CyFi.