Data & AI solutions for
problems that matter

We help mission-driven organizations harness their data to work smarter and offer more impactful services using data science, machine learning, and AI.

What we're known for

Our services

DrivenData is an AI firm with over a decade of experience building end-to-end solutions with organizations ranging from tech-enabled nonprofits to Fortune 100 companies. We help from strategy to implementation and R&D to production, often working as a thought partner to define what's worth doing and how to get it done. To learn more, book a free discovery call.

Data science & AI

Machine learning

We develop machine learning models, drawing on leading tools and practices in the field. We deliver well-designed, accessible solutions that have real-world impact.

What we do
  • Data identification and exploration
  • Literature review
  • Computer vision
  • Natural language processing (NLP)
  • Geospatial analysis and remote sensing
  • Multimodal AI
What we deliver
  • Model code
  • Summary of results and insights
  • Evaluation and recommendations
For example

We worked with the World Bank to develop machine learning models using satellite imagery to identify crops and agricultural climate risks in Yemen, informing development programs in the country after years of civil war.

Generative AI & LLMs

We build solutions that take advantage of modern advances in generative AI and large language models like ChatGPT. We help teams explore what is possible with the latest technology applied to their needs.

What we do
  • Model experimentation and evaluation
  • Text generation, analysis, and summarization
  • Chatbot development
  • Retrieval augmented generation (RAG)
  • Sandbox environments for secure access
  • Multi-agent systems
What we deliver
  • Deployed applications
  • Prototypes and demos
  • Private AI sandbox environments
  • Evaluation and recommendations
For example

We partnered with the National Center for State Courts to develop a private sandbox environment for testing large language models (LLMs) like ChatGPT, supporting secure experimentation with new capabilities in generative AI.

Analysis & visualization

We use best practices in data science to pull together data sources for analysis, drive inquiry based on organizational needs, and synthesize results. We bring the expertise and tools to illuminate the most useful information from data.

What we do
  • Identifying and evaluating data sources
  • Exploratory data analysis (EDA)
  • Iterative analysis with rapid feedback loops
  • Interactive data visualization
  • Geospatial analysis
  • Dashboarding and deployment
What we deliver
  • Well-organized analysis code
  • Visualizations and insights
  • Deployed data products and dashboards
  • Data-driven reports
For example

We worked with Science for America to analyze data from the Department of Education and develop an interactive data visualization platform and open source code library, supporting greater understanding of trends, patterns, and disparities in STEM education.

Data science competitions

Our competition platform channels the skills of 100K+ data scientists and developers around the world to build solutions for social good. Competitions engage a large expert community with real-world problems and elevate the best solutions.

What we do
  • Challenge design, including problem statement, prize structure, evaluation metrics, and rules
  • Data source assessment
  • Data preparation and benchmarking
  • Competition hosting
  • Marketing and outreach
  • Participant management and discussion forum
  • Solution evaluation (automated scoring and/or judging)
  • Prize administration and reporting
What we deliver
  • Competition website with all challenge content, participant registration and live leaderboard
  • Winning model code, documentation, and write-up of approaches
  • Winner announcement and results summary
For example

We partnered with Meta AI to run the Hateful Memes Challenge, where thousands of participants around the world competed to build the best machine learning models for identifying hate speech in multimodal memes containing images and text.

Software engineering

Application development

We build modern web applications built on best practices in software engineering, data science, and human-centered design.

What we do
  • Conduct user interviews
  • Scope technical requirements
  • Design data architecture
  • Develop backend software
  • Design and develop frontend interfaces
  • Migrate old data sources
  • Test prototypes
  • Deploy applications for use
What we deliver
  • Deployed products and applications
  • Well-structured code and product summaries for development teams
  • User interview notes, findings, and recommendations
For example

We worked with EverFree, a leader in supporting survivors of human trafficking, to develop their Freedom Lifemap technology. The platform includes fully-featured functionality to check in with survivors on their current needs and connect them with personalized support, while strengthening organizations with data to guide and improve programs, measure outcomes, and stop exploitation.

Applied machine learning

We help take machine learning applications from development to production, bridging the “missing middle” between static research code and the ability to use regularly generated model outputs for real user needs.

What we do
  • Conduct user interviews and landscaping
  • Assess model performance and efficiency opportunities
  • Restructure and robustify code into a runnable pipeline
  • Engineer clean, reproducible repository, access points for processing new data, tests, and continuous integration
  • Training and documentation
  • Operationalization or integration into user workflows
What we deliver
  • Clean, configurable code package capable of generating predictions on new input data
  • User interview notes, findings, and recommendations
  • Documentation and user training resources
For example

We partnered with the Max Planck Institute for Evolutionary Anthropology and the Wild Chimpanzee Foundation to take competition-winning models for detecting wildlife from trail camera videos and build them into a production system runnable on new data. The solution included an open-source tool called Zamba (“forest” in Lingala) for identifying hundreds of kinds of animals, and a user-friendly web application called Zamba Cloud for training and running models in the cloud.

Open source software

We help you turn your project into an open source software package, incorporating conventions and best practices from the open source community.

What we do
  • Ensure your project meets the standards. best practices, and conventions that users expect from modern, high-quality open source software
  • Set up modern tooling, automation infrastructure, and workflows for a better maintainer experience
  • Coordinate clear, comprehensive documentation for accessible use
  • Educate you on best practices for sustainable project maintenance and accepting contributions from experienced maintainers
What we deliver
  • Your project as a professional, ready-to-distribute software package
  • A polished project documentation website that comprehensively covers user needs
For example

We built and currently maintain Cookiecutter Data Science, a widely-used (>8,500 stars, >2,500 forks) open source project template for organizing data science work so that it’s easy to reproduce and build upon. See our open source page for additional examples.

Data engineering

Data pipelines

We design and implement scalable, efficient data pipelines to process, clean, and transform data for analysis and machine learning. Our pipelines ensure reliable, reproducible workflows tailored to organizational goals.

What we do
  • Data ingestion from diverse sources
  • Data cleaning and transformation
  • ETL/ELT pipeline development
  • Workflow orchestration
  • Monitoring and error handling
What we deliver
  • Clean, reproducible workflows
  • Documentation and user training resources
  • Integrated pipeline automation with alerting
For example

We partnered with Candid (formerly Foundation Center and GuideStar) to configure pipelines for increasing data quality and matching nonprofits and funders with the correct organizations in their central database. Deploying and monitoring these systems at scale helps increase the reliability of the core information on philanthropic activity that Candid provides for the social sector.

Data warehousing

We develop data warehousing solutions that consolidate and structure your data for easy retrieval and analysis, especially by data analysts and less technical staff. Our solutions are tailored to meet performance needs while ensuring scalability and security.

What we do
  • Design data warehouse architecture
  • Optimize for query performance and readiness for reporting tools (e.g. Tableau, PowerBI)
  • Build and deploy cloud-based data warehouses, helping decide on support plans and negotiating with vendors
  • Data governance and security best practices
What we deliver
  • Scalable, performant data storage
  • Comprehensive documentation
  • Access and security configurations
For example

We worked with CodePath to plan and implement their core data infrastructure to answer critical questions on organizational effectiveness, operations, and impact. The infrastructure brought together data streams from across the organizations for accurate, secure, self-service reporting.

Dataset development

We help turn raw data into well-structured datasets prepared for development and use. Our approach builds on extensive experience developing, evaluating, and sharing a wide range of datasets through our data science competitions.

What we do
  • Prepare data for external use, including aspects such as quality, structure, formats, and distribution infrastructure
  • Develop reproducible data pipelines for processing data
  • Support data publication and sharing
  • Lower the bar to use through demos and tutorials
  • Discount on any future competitions engaging the ML community with the dataset
What we deliver
  • Clean dataset with clear documentation and metadata for use
  • Demos and tutorials illustrating how to engage with the data
  • Organized, reproducible code for generating processed dataset
For example

We worked with researchers at BetterUp and the University of Pennsylvania to develop the CANDOR corpus, a large, novel multimodal dataset of naturalistic conversations including audio, video, and transcripts as well as extracted vocal, facial, and semantic features and post-conversation reflections.

Data strategy

Opportunity assessment

We help you understand what is possible with advances in data science and AI, define what is actually worth doing, and scope the right approach for your goals.

What we do
  • Assessment of organizational needs
  • Insight into relevant current methodologies and limitations
  • Landscape of available data
  • Detailed project roadmap
What we deliver
  • Summary report on opportunities, prioritization, and roadmap
  • Tear-sheets with initial descriptions of opportunities, benefits, costs, and approach to pursue
  • Data / methodology landscape with prototype analysis and recommendations
For example

We worked with Fair Trade USA to assess opportunities using their data and potential improvements to their data model to track and communicate impact. These recommendations along with initial prototypes of use helped demonstrate the untapped value of existing data and led to investments in a new team with supporting infrastructure for leveling up the use of data across the organization.

State-of-the-field research

We inventory how data, AI, and algorithms are used in target domains and synthesize findings for action through summary reports and technical resources.

What we do
  • Literature review
  • Stakeholder interviews
  • Data inventory and assessment
  • Development of technical resources and demos
  • Summarization and reporting
What we deliver
  • Summary reports
  • Stakeholder interview notes
  • Data / methodology landscape with prototype analysis and recommendations
For example

We partnered with Wellcome to investigate the risks of bias in the fields of computational neuroscience and psychiatry along with approaches to address them. As part of the work, we created a public annotated case study illustrating best practices for measuring, mitigating, and communicating algorithmic bias.

Hiring & capacity building

We help hire, advise and train data teams. We help organizations bring on their first data science and engineering hires, and help upskill data leaders and teams looking to elevate their practices.

What we do
  • Source, evaluate, and interview data science and engineering candidates
  • Provide fractional data science management
What we deliver
  • Candidate recommendations supported through hire
  • Interview notes and recordings
  • Fractional data leadership
  • Executed trainings and participant outputs
For example

We partnered with the technical leadership team at the Harvard Business Review to hire their first data scientist, helping to craft the job description, source candidates, and lead the applied technical interview to identify the best fit for the role.

In-person workshops

We lead workshops bringing together stakeholders to brainstorm, design, and prioritize new opportunities for deploying modern data and AI technologies at their organizations.

What we do
  • Opportunity identification and brainstorming
  • Needs assessment
  • Human-centered design for data systems
  • Opportunities with generative AI
  • Actionable ethics for data scientists
What we deliver
  • Half or full day workshops
  • In-person, remote, or hybrid facilitation
  • Notes and recordings
  • Collaborative outputs from sessions
For example

We worked with the Black Wealth Data Center to run a half-day design workshop on how generative AI could be used to support their goals of analyzing drivers of Black wealth, in particular generating ideas for understanding media narratives around Black leadership and what questions, hypotheses, and data sources could provide perspective with modern AI tools.

What our clients say

I work with the DrivenData team again and again. The reason I keep going back to them, and recommending them to everyone I know, is that they are some of the smartest, most caring, and humble folks I've had the privilege of working with. If you are wanting to deploy cutting edge AI for social impact, this is the team to work with.
Andrew Means
Senior Director of Global Impact Data Strategy
The DrivenData team was absolutely fantastic! Their deep experience in how to set up effective data challenges was critical to making our competition a success. They did not just provide a hosting platform; they were true partners in assessing what was possible with our data and designing an end-to-end challenge to meet our goals.
Victoria Da Poian
Data Scientist & Engineer
DrivenData did an outstanding job building our Freedom LifeMap platform, providing us with innovative solutions to visualize complex data and track survivor pathways. Their expertise and dedication were invaluable in creating a tool that enhances our impact and helps us better support vulnerable populations.
Katie Rootlieb
Chief Operating Officer
DrivenData helped us clearly communicate the opportunities and room for improvement in our data model, and created prototypes which showed Fair Trade USA leadership the potential data insights that could better inform our work. This partnership was instrumental in gaining the support necessary to create a data team.
Bennett Wetch
VP of Technology Innovation
DrivenData did an excellent job understanding our unique data engineering needs and devising smart solutions that worked for our data. Thanks to the engagement, not only did we get excellent products, we also accelerated our own learning.
Marina Turlakova
Director of Research & Evaluation
From our initial conceptual discussion to the delivery of the final product the team was professional, clear on scope and timing, incredibly competent and creative in achieving our goals within our budget. We have been thrilled with the end result and have used it for pitches and press events. I couldn't recommend them more highly than to say we would love to work with them again and have tremendous respect for the core thesis behind their mission.
Rupal Patel
Founder and President

Book a discovery call

Our experts will work with you to understand the problems you face and how we can leverage your data to offer more impactful services.

Contact us

Our real-world impact

All projects
Partners: Max Planck Institute for Evolutionary Anthropology, Arcus Foundation, WILDLABS

Automating wildlife identification for research and conservation

Detected wildlife in images and videos—automatically and at scale—by building the winning algorithm from a DrivenData competition into an open source python package and a web application running models in the cloud.

Partners: Private sector, social sector

Building LLM solutions

Built solutions using LLMs for multiple real-world applications, across tasks including semantic search, summarization, named entity recognition, and multimodal analysis. Work has spanned research on state-of-the-art models tuned for specific use cases to production ready retrieval-augmented AI applications.

Partners: The World Bank, The Conflict and Environment Observatory

Identifying crop types using satellite imagery in Yemen

Used satellite imagery to identify crop extent, crop types and climate risks to agriculture in Yemen, informing World Bank development programs in the country after years of civil war.

Partners: IDEO.org

Illuminating mobile money experiences in Tanzania

Analyzed millions of mobile money records to uncover patterns in behavior, and then combined these insights with human-centered design to shape new approaches to delivering mobile money to low-income populations in Tanzania.

Partners: Insecurity Insight, Physicians for Human Rights

Tracking attacks on health care in Ukraine

Built a real-time, interactive map to visualize attacks on the Ukrainian health care system since the Russian invasion began in February of 2022. The map will support partner efforts to provide aid, hold aggressors accountable in court, and increase public awareness.

Partners: CABI Plantwise

Mining chat messages with plant doctors using language models

Automated recognition of agricultural entities (such as crops, pests, diseases, and chemicals) in WhatsApp and Telegram messages among plant doctors, enabling new ways to surface emerging trends and improve science-based guidance for smallholder farmers.

Partners: Data science company foundation

Matching students with schools where they are likely to succeed

Used machine learning to match students with higher education programs where they are more likely to get in and graduate based on their unique profile, with a focus on backgrounds traditionally less likely to attend college or apply to more competitive programs.

Partners: Fair Trade USA

Mapping fair trade products from source to shelf

Visualized the flow of fair trade coffee products from the farms where they are grown to the stores where they are sold, connecting the nodes in supply chain transactions and increasing transparency for customers and auditors.

Partners: The World Bank, Angaza, GOGLA, Lighting Global

Developing performance indicators and repayment models in off-grid solar

Analyzed repayment behaviors across dozens of pay-as-you-go (PAYG) solar energy companies serving off-grid populations throughout Africa, and developed KPIs to facilitate standardized reporting for PAYG portfolios.

Partners: Haystack Informatics

Modeling patient pathways through hospitals

Mapped out the probabilistic patient journeys through hospitals based on tens of thousands of patient experiences, giving hospitals a better view into the timing of the activities in their departments and how they relate to operational efficiency.

Partners: Yelp, Harvard University, City of Boston

Predicting public health risks from restaurant reviews

Flagged public health risks at restaurants by combining Yelp reviews with open city data on past inspections. An algorithmic approach discovers 25% more violations with the same number of inspections.

Partners: Education Resource Strategies

Smart auto-tagging of K-12 school spending

Built algorithms that put apples-to-apples labels on school budget line items so that districts understand how their spending stacks up and where they can improve, saving months of manual processing each year.

Partners: Love Justice

Building data tools to fight human trafficking in Nepal

Aided anti-trafficking efforts at border crossings and airports by combining data across locations and surfacing insights that give interviewers greater intelligence about the right questions to ask and how to direct them.

Partners: GO2 Foundation for Lung Cancer

Putting AI into the hands of lung cancer clinicians

Translated advances in machine learning research to practical software for clinical settings, building an open source application through a new kind of data challenge.

Partners: Microsoft

Driving data education through custom competitions

Developed online, white-label data science competitions for students to synthesize their learnings and test their skills on applied challenges. Each capstone features a real-world dataset that focuses on an important issue in the social sector.

Our story

Founded in 2014, we're an expert team of data scientists and engineers interested in doing good work for good reason. DrivenData was started by three grad students out of the Harvard Innovation Lab with a mission to bring the power of data science to organizations tackling the world's biggest social challenges.

Over more than a decade, our team has worked with some of the most prominent organizations in the social, public, and private sectors. We have built on our machine learning competitions with a full range of data science and AI services, along with a top-notch team behind them.

Meet the team

How we work

Quality engineering

We engineer code so that it's clear, correct, and reproducible. We've seen a lot of data science in the wild, and we know the importance of organizing work so that it's easy to understand and build upon. Our technical team takes pride in bringing the best tools and practices to the job at hand.

Human-centered design

Part of our job is connecting the technical capabilities of what we do with why they matter for people and organizations. We often draw on the tools of human-centered design to help shape the opportunities worth solving and to refine solutions with end users.

Ethics & privacy

We take our responsibility as stewards of data seriously. Our team adheres to best practices in data security, have experience working with extremely sensitive datasets, and have developed tools for data scientists to tackle critical discussions around issues like privacy and bias.

Rapid feedback loops

We bring a highly client-focused and iterative approach to our work. By interspersing collaborative discussion with agile development sprints, we are able to respond to emerging priorities and drive towards high-value work outputs that provide the greatest return for development resources.

Open source & crowdsourcing

We believe in the power of collective progress. We support open source solutions where possible, actively maintain open source projects for the data science community, and run competitions where technical experts around the world advance solutions for tough problems in AI/ML.

Flexible expertise

Delivering impactful work means meeting organizations where they are. We bring a rich range of experience to serve the end-to-end needs of our partners, from strategy and R&D to prototyping and production.

Latest posts

All posts

insights

Life beyond the leaderboard

What happens to winning solutions after a machine learning competition?

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Common questions

Have a question about how we can work together or something else? Feel free to contact us!

What kind of organizations do you work with?

DrivenData does not have any restrictions in terms of size, mission, software tools, or database structure. We have worked on applications in fields from education to climate, tech to healthcare. Our goal is to partner with organizations that share our vision of using data to make the world a better place. If you're an organization excited about innovative ways to leverage your data, please get in touch.

How are DrivenData challenges and DrivenData Labs connected?

DrivenData is a social enterprise that works with mission-driven organizations to drive change through data science and engineering.

Challenges are one way we work with organizations to develop data science and AI solutions, by structuring problems so that a global community of experts can drive progress. We think it's a great option for suitable problems, and we've spent over a decade building out a competition platform and community that we're proud of.

We also know that competitions aren't right for every problem. For more flexible needs or sensitive data sources, we have our own team of experienced data scientists and engineers to take the case. Our team has extensive experience developing machine learning and artificial intelligence solutions, from exploratory research to advanced modeling to production systems, with a range of partners including Microsoft, Meta AI, NASA, Gates Foundation, and the World Bank.

How do I join or host a competition?

To join a competition, check out our open competitions or learn more about how it works. You can also explore our past competitions that often include open solutions and open data.

To run a competition, read our information for partners or reach out with information on the competition you have in mind and someone on our team will follow up to discuss.

How do you typically structure your engagements?

Engagements are typically structured in one of two ways.

Deliverable-based engagements target a specific set of outputs. This tends to be a good fit when those outputs are clear and the path to achieving them is well understood.

Sprint-based engagements allow our team to work more flexibly through a series of development sprints, iterating with clients and responding to emerging priorities and learnings as we go. This tends to be a good fit when there is a wider set of potentially useful directions or the nature of the work is more iterative (R&D tends to fall in this bucket).

In both cases, we work with our partners to intersperse collaborative discussion with agile development. This allows us to stay closely aligned throughout the work and drive towards high-value work outputs that provide the greatest return for development resources.

Where can I find more info about projects you've worked on?

Check out our sample projects, or take a look through our case studies.

If you have additional questions about specific domains or technologies you're interested in, drop us a line below.

Would you come speak at our conference or podcast?

Contact us below. It's likely someone on our team will be a good fit and interested in following up.

Get in touch

Let us know the challenges your organization faces and where you're looking for help. We'll reply within 2 working days, and can arrange a free discovery call.

You'll get to talk directly with our data science and AI experts. We'll advise on the opportunities we can see to help and which of our services would have the most impact.