Open source projects

DrivenData builds and maintains a number of popular open source projects for the data science, machine learning, and software engineering communities. Check them out here!

Open source aligns with our mission to support the community of developers devoted to reproducible science and social impact. For organizations looking for support with their own projects, learn more about our open source services. For developers interested in contributing, you can find us on Github.

Content list

Developer tools

We open source practical tools we use in our own work to support reproducible, responsible, and maintainable software.

Cookiecutter Data Science

A logical, reasonably standardized, and flexible project structure for doing and sharing data science work

Since starting DrivenData, we’ve seen a lot of data science in the wild. As the field develops, it’s becoming increasingly important to organize data science work so that it’s easy to reproduce and build upon. Cookiecutter Data Science is a widely used project template that keeps data scientists organized and on track.

cloudpathlib

Pathlib-style classes for cloud storage services

Have you wished for a consistent and easy interface in Python to access files in cloud storage like S3 and Azure? cloudpathlib is an extensible Python library that provides pathlib.Path-style classes for dealing with files in various cloud storage services, with seamless local caching.

erdantic

Entity relationship diagrams for Python data model classes like Pydantic

Looking for an easy, clean way to visualize your data model? erdantic is a simple tool for drawing entity relationship diagrams (ERDs) that show how data model classes are connected. Generate ERDs from models defined with multiple supported frameworks, such as Pydantic and dataclasses.

nbautoexport

Making it easier to code review Jupyter notebooks, one script at a time

nbautoexport automatically exports Jupyter notebooks to various file formats (.py, .html, and more) upon save while using Jupyter. One great use case is to automatically have script versions of your notebooks to facilitate code review commenting.

pandas_path

Path style access for pandas

Love pathlib.Path? Love pandas? pandas_path makes it easy to use pathlib methods on pandas Series. Just one import adds a .path accessor to any pandas Series or Index so that you can use all of the methods on a Path object.

Responsible AI tools

We help data scientists be more intentional in their choices and more aware of the ethical implications of their work.

Deon: An Ethics Checklist for Data Scientists

A command line tool to easily add an ethics checklist to your data science projects

When there's a lot at stake, checklists make sure big questions don't slip through the cracks and tough conversations happen even (especially) in fast-moving environments. The goal of deon is to push that conversation forward and provide concrete, actionable reminders to the developers that have influence over how data science gets done.

Real-world applications

We partner with organizations to build open source applications that address domain-specific social impact challenges.

Project Zamba

Computer vision for wildlife research and conservation

Zamba is an open-source Python package that uses machine learning and computer vision to help automate time-intensive image and video processing tasks for wildlife monitoring. Zamba includes multiple state-of-the-art, pretrained machine learning models for species and blank detection in different geographies. It can also be used to train custom models on new species and geographies based on user-provided labeled data.

CyFi: Cyanobacteria Finder

Harmful algal bloom detection from satellite imagery

CyFi is a command line tool that uses satellite imagery and machine learning to detect dangerous concentrations of cyanobacteria in small, inland water bodies. The goal of CyFi is to help water quality managers better allocate resources for in situ sampling and make more informed decisions around public health warnings for critical resources like lakes and reservoirs.

scipeds

A Python package for working with higher education data from IPEDS

The Integrated Postsecondary Education Data System (IPEDS) offers a wealth of comprehensive data on U.S. higher education institutions, but this data is spread across multiple files and formats, making longitudinal analyses and cross-institution comparisons challenging. scipeds is an open-source Python library that simplifies the analysis of IPEDS data.

Benchmarked models

We publish winning solutions from past data science competitions to support learning and reuse.

Winning models from DrivenData competitions

Check out how ML experts built their winning algorithms

DrivenData runs machine learning competitions to help non-profits, NGOs, governments, and other social impact organizations use data science in service of humanity. To enable data scientists and mission-driven organizations to learn from the work done in these competitions, we open source the code submitted by winners for others to learn from, use, and adapt.

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.