Sample Projects
Automating wildlife identification for research and conservation
Partners: Max Planck Institute for Evolutionary Anthropology, Arcus Foundation
Detected wildlife in video footage—automatically and at scale—by running a global algorithm development challenge and developing an open source python package and web application building on the winning solution (Project Zamba).
Approaches include: Deep learning, computer vision, transfer learning, data science competition, crowdsourced data annotations, open source software
Building LLM solutions
Partners: private sector, social sector
Built solutions using LLMs for multiple real-world applications, across a range of tasks including text summarization, semantic search, named entity recognition (NER), and multimodal analysis. This work has spanned research on state-of-the-art transformer models tuned for specific use cases to production ready retrieval-augmented AI applications.
Approaches include: LLMs, transformers, finetuning, prompt-tuning, LLM evaluation, retrieval-augmentation
Identifying crop types using satellite imagery in Yemen
Partners: The World Bank, The Conflict and Environment Observatory
Used satellite imagery to identify crop extent, crop types and climate risks to agriculture in Yemen, informing World Bank development programs in the country after years of civil war.
Approaches include: Deep learning, computer vision, earth observation data
Illuminating mobile money experiences in Tanzania
Partners: IDEO.org
Analyzed millions of mobile money records to uncover patterns in behavior, and then combined these insights with human-centered design to shape new approaches to delivering mobile money to low-income populations in Tanzania.
Approaches include: Human-centered design + data science, exploratory analysis, interactive visualization, rapid prototyping
Tracking attacks on health care in Ukraine
Partners: Insecurity Insight, Physicians for Human Rights
Built a real-time, interactive map to visualize attacks on the Ukrainian health care system since the Russian invasion began in February of 2022. The map will support partner efforts to provide aid, hold aggressors accountable in court, and increase public awareness.
Approaches include: Interactive visualization, open data, geospatial data, production web application
Mining chat messages with plant doctors using language models
Partners: CABI Plantwise
Automated recognition of agricultural entities (such as crops, pests, diseases, and chemicals) in WhatsApp and Telegram messages among plant doctors, enabling new ways to surface emerging trends and improve science-based guidance for smallholder farmers.
Approaches include: Natural language processing (NLP), named-entity recognition (NER), fuzzy matching, human-in-the-loop data annotation
Matching students with schools where they are likely to succeed
Partners: Data science company foundation
Used machine learning to match students with higher education programs where they are more likely to get in and graduate based on their unique profile. Focused on serving students from backgrounds traditionally less likely to attend college or apply for more competitive programs.
Approaches include: Recommender systems, predictive modeling, software engineering
Mapping fair trade products from source to shelf
Partners: Fair Trade USA
Visualized the flow of fair trade coffee products from the farms where they are grown to the stores where they are sold, connecting the nodes in supply chain transactions and increasing transparency for customers and auditors.
Approaches include: Interactive dashboarding, GIS analysis, Tableau
Developing performance indicators and repayment models in off-grid solar
Partners: The World Bank, Angaza, GOGLA, Lighting Global
Analyzed repayment behaviors across dozens of pay-as-you-go (PAYG) solar energy companies serving off-grid populations throughout Africa; developed key performance indicators (KPIs) to facilitate standardized measurement and reporting for PAYG portfolios.
Approaches include: Predictive modeling, exploratory analytics, open source software, key performance indicators (KPIs), public-private partnerships
Modeling patient pathways through hospitals
Partners: Haystack Informatics
Mapped out the probabilistic patient journeys through hospitals based on tens of thousands of patient experiences, giving hospitals a better view into the timing of the activities in their departments and how they relate to operational efficiency.
Approaches include: Predictive modeling, activity-based costing, Spark, production web application
Predicting public health risks from restaurant reviews
Partners: Yelp, Harvard University, City of Boston
Flagged public health risks at restaurants by combining Yelp reviews with open city data on past inspections. An algorithmic approach discovers 25% more violations with the same number of inspections.
Approaches include: Machine learning challenge, natural language processing (NLP), open data, alternative data sources
Smart auto-tagging of K-12 school spending
Partners: Education Resource Strategies
Built algorithms that put apples-to-apples labels on school budget line items so that districts understand how their spending stacks up and where they can improve, saving months of manual processing each year.
Approaches include: Natural language processing (NLP), machine learning challenge, Excel tooling, ranked prioritization for manual follow-up
Building data tools to fight human trafficking in Nepal
Partners: Love Justice
Aided anti-trafficking efforts at border crossings and airports by combining data across locations and surfacing insights that give interviewers greater intelligence about the right questions to ask and how to direct them.
Approaches include: Data entry user experience design, data repository, GIS analysis, dynamic dashboard
Putting AI into the hands of lung cancer clinicians
Partners: GO2 Foundation for Lung Cancer
Translated advances in machine learning research to practical software for clinical settings, building an open source application through a new kind of data challenge.
Approaches include: Data challenge, deep learning, open source software, computer vision, predictive modeling, computer-aided diagnosis
Driving data education through custom competitions
Partners: Microsoft
Developed online, white-label data science competitions for students to synthesize their learnings and test their skills on applied challenges. Each capstone features a real-world dataset that focuses on an important issue in the social sector.
Approaches include: Private data challenge, regression analysis, predictive modeling, data science education