case studies

Jump-starting data infrastructure and in-house data expertise

DrivenData designed and built a data warehouse to centralize, organize, and visualize data across CodePath's operations. Our team also provided technical hiring assistance to find the right talent to carry the work forward.

CodePath logo. CodePath is a national nonprofit dedicated to transforming computer science education for first-generation and low-income students.

The organization

CodePath is a national nonprofit dedicated to transforming computer science education for first-generation and low-income students. By offering no-cost technical courses, career support, and a robust community network, CodePath equips college students with the skills and experience necessary to launch thriving careers in tech. The organization partners with top universities and major tech employers to bridge the gap between traditional education and industry expectations.

The challenge

As CodePath expanded its reach, its evolving data needs prompted a closer look at how to better support growth by engineering more robust data infrastructure. It’s a scenario that is familiar to ambitious and rapidly growing organizations: as operational data grows across a range of platforms, managing and unifying this information becomes increasingly complex. Definitions and reporting practices grow organically across teams, leading to duplicated effort and inconsistent results.

In light of this, CodePath recognized the benefit of building a more centralized and robust data foundation, but it faced a common challenge: how do you bootstrap data infrastructure, and how do you identify and hire experienced data professionals without prior in-house data expertise?

The approach

To bridge that gap, DrivenData partnered with CodePath to design and implement modern, organization-wide data infrastructure centered on clarity, consistency, and sustainability, while also launching a hiring process to assemble a team to sustain and advance the work. DrivenData worked closely with CodePath to carry out the following approach:

  • Collaboration with data stakeholders: Designing data infrastructure may sound like a purely technical challenge, but its success hinges on integrating the extensive domain knowledge of staff across an organization. DrivenData collaborated closely with stakeholders at all levels to grasp how data is generated and utilized in real-world scenarios. This collaboration ensured that data systems align with the organization's operational realities.

  • Organizing the data: We employed software engineering best practices and industry-standard tools to ingest, clean, organize, and document data so that everyone works from the same definitions and calculations, building trust in the data and its derivatives.

  • Empowering staff with tools to deliver business value from the data: Data infrastructure is only as useful as the tools it supports. We created a suite of interactive dashboards and reports, allowing staff across departments to get the insights they need on demand and without needing to write code.

  • Building internal capacity: To support long-term success, CodePath launched a hiring initiative for a full-time data engineer and a data scientist. DrivenData served as trusted data experts throughout the hiring process, deploying a technical take-home project and evaluating the candidates’ qualifications and suitability for their roles in the organization.

The results

CodePath now has robust data infrastructure that supports reliable decision-making across the organization. The key components include:

  • A scalable cloud data warehouse and transformation pipeline built on industry-standard technologies and modern architectural best practices resulting in a centralized data system—a “single source of truth”—that brings together information from across CodePath’s systems

  • Interactive dashboards and automated reports enabling staff to monitor program performance, trends, and student outcomes with up-to-date data

  • Training and documentation consisting of a library of resources such as dashboard user guides, video walkthroughs, and detailed business logic definitions, in addition to synchronous training sessions and office hours as new data products are introduced

  • An in-house data team including a dedicated data engineer and data scientist, empowering CodePath to independently carry forward the data infrastructure work and pursue more advanced analytics and machine learning initiatives in the future

Our real-world impact

All projects
Partners: CodePath

Data engineering from the ground up

Built data infrastructure to ingest, clean, integrate, and organize data across CodePath, created interactive dashboards for accurate monitoring of program trends, and provided trusted data expertise to identify and hire talent to carry the work forward.

Partners: Max Planck Institute for Evolutionary Anthropology, Arcus Foundation, WILDLABS

Automating wildlife identification for research and conservation

Detected wildlife in images and videos—automatically and at scale—by building the winning algorithm from a DrivenData competition into an open source python package and a web application running models in the cloud.

Partners: Private sector, social sector

Building LLM solutions

Built solutions using LLMs for multiple real-world applications, across tasks including semantic search, summarization, named entity recognition, and multimodal analysis. Work has spanned research on state-of-the-art models tuned for specific use cases to production ready retrieval-augmented AI applications.

Partners: The World Bank, The Conflict and Environment Observatory

Identifying crop types using satellite imagery in Yemen

Used satellite imagery to identify crop extent, crop types and climate risks to agriculture in Yemen, informing World Bank development programs in the country after years of civil war.

Partners: Bureau of Ocean Energy Management, NOAA Fisheries, Wild Me

Protecting endangered beluga whales with computer vision

Designed and administered a computer vision challenge that produced state-of-the-art machine learning models to identify and match individual endangered beluga whales from photo surveys.

Partners: EverFree

A production application to support survivors of human trafficking

Built the Freedom Lifemap platform, a digital tool designed to support survivors of human trafficking on their journey toward reintegration and independence

Partners: ReadNet

Crowdsourcing solutions for AI assisted early literacy screening

Ran a machine learning challenge to develop automatic scoring methods for audio clips from literacy screener exercises. Automated scoring can help teachers quickly and reliably identify children in need of early literacy intervention.

Partners: Science for America

Making higher education data more accessible

Created an open source Python library and interactive data visualization platform for analyzing U.S. higher education data and illuminating trends and disparities in STEM education.

Partners: IDEO.org

Illuminating mobile money experiences in Tanzania

Analyzed millions of mobile money records to uncover patterns in behavior, and then combined these insights with human-centered design to shape new approaches to delivering mobile money to low-income populations in Tanzania.

Partners: Insecurity Insight, Physicians for Human Rights

Tracking attacks on health care in Ukraine

Built a real-time, interactive map to visualize attacks on the Ukrainian health care system since the Russian invasion began in February of 2022. The map will support partner efforts to provide aid, hold aggressors accountable in court, and increase public awareness.

Partners: Wellcome

Addressing algorithmic bias in medical research

Conducted a literature review to understand the current state of bias identification & mitigation in mental health research, and synthesized recommended best practices from the field of machine learning.

Partners: CABI Plantwise

Mining chat messages with plant doctors using language models

Automated recognition of agricultural entities (such as crops, pests, diseases, and chemicals) in WhatsApp and Telegram messages among plant doctors, enabling new ways to surface emerging trends and improve science-based guidance for smallholder farmers.

Partners: NASA

Monitoring water quality from satellite imagery

Created an open-source package to detect harmful algal blooms using machine learning and satellite imagery. Included running a machine-learning competition, conducting end user interviews, and engineering a robust, deployable pipeline.

Partners: Data science company foundation

Matching students with schools where they are likely to succeed

Used machine learning to match students with higher education programs where they are more likely to get in and graduate based on their unique profile, with a focus on backgrounds traditionally less likely to attend college or apply to more competitive programs.

Partners: Fair Trade USA

Mapping fair trade products from source to shelf

Visualized the flow of fair trade coffee products from the farms where they are grown to the stores where they are sold, connecting the nodes in supply chain transactions and increasing transparency for customers and auditors.

Partners: University of Maryland

Processing multimodal tutoring data

Built well-engineered data pipelines to extract machine learning features from audio, video and transcript data collected from online tutoring sessions, enabling a team at the University of Maryland to study how relationship-building affects student outcomes.

Partners: The World Bank, Angaza, GOGLA, Lighting Global

Developing performance indicators and repayment models in off-grid solar

Analyzed repayment behaviors across dozens of pay-as-you-go (PAYG) solar energy companies serving off-grid populations throughout Africa, and developed KPIs to facilitate standardized reporting for PAYG portfolios.

Partners: Haystack Informatics

Modeling patient pathways through hospitals

Mapped out the probabilistic patient journeys through hospitals based on tens of thousands of patient experiences, giving hospitals a better view into the timing of the activities in their departments and how they relate to operational efficiency.

Partners: Yelp, Harvard University, City of Boston

Predicting public health risks from restaurant reviews

Flagged public health risks at restaurants by combining Yelp reviews with open city data on past inspections. An algorithmic approach discovers 25% more violations with the same number of inspections.

Partners: Education Resource Strategies

Smart auto-tagging of K-12 school spending

Built algorithms that put apples-to-apples labels on school budget line items so that districts understand how their spending stacks up and where they can improve, saving months of manual processing each year.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.