Open source projects
DrivenData maintains a number of popular open source projects for the data science, machine learning, and software engineering communities. Check them out here!
DrivenData maintains a number of popular open source projects for the data science, machine learning, and software engineering communities. Check them out here!
Since starting DrivenData, we’ve seen a lot of data science in the wild. As the field develops, it’s becoming increasingly important to organize data science work so that it’s easy to reproduce and build upon. Cookiecutter Data Science is a widely used project template that keeps data scientists organized and on track.
When there's a lot at stake, checklists make sure big questions don't slip through the cracks and tough conversations happen even (especially) in fast-moving environments. The goal of deon is to push that conversation forward and provide concrete, actionable reminders to the developers that have influence over how data science gets done.
One command jumpstarts the conversation all data teams should be having. Explore the checklist here!
Have you wished for a consistent and easy interface in Python to access files in cloud storage like S3 and Azure? cloudpathlib is an extensible Python library that provides pathlib.Path-style classes for dealing with files in various cloud storage services, with seamless local caching.
Our goal is to be the meringue of file management libraries: the subtle sweetness of pathlib working in harmony with the ethereal lightness of the cloud.
Looking for an easy, clean way to visualize your data model? erdantic is a simple tool for drawing entity relationship diagrams (ERDs) that show how data model classes are connected. Generate ERDs from models defined with multiple supported frameworks, such as Pydantic and dataclasses.
If you have data models in Python, this is a great way to illustrate your schema and add a visual reference to your documentation.
nbautoexport automatically exports Jupyter notebooks to various file formats (.py, .html, and more) upon save while using Jupyter. One great use case is to automatically have script versions of your notebooks to facilitate code review commenting.
DrivenData runs machine learning competitions to help non-profits, NGOs, governments, and other social impact organizations use data science in service of humanity. Part of our mission is to enable data scientists and mission-driven organizations to learn from the work done in these competitions. To this end, the code submitted by winners is released under an open source license for others to learn from, use, and adapt.
Check out how ML experts built their winning algorithms!
Zamba (meaning "forest" in Lingala) is an open-source Python package that uses machine learning and computer vision to help automate time-intensive image and video processing tasks for wildlife monitoring, enabling researchers to focus on interpreting the content and using the results.
Zamba builds on the winning solution from the Pri-matrix Factorization challenge and includes multiple state-of-the-art, pretrained machine learning models for species and blank detection in different geographies. It can also be used to train custom models on new species and geographies based on user-provided labeled data.
Harmful algal blooms like cyanobacteria occur all around the world and endanger both human and marine health.
CyFi is a command line tool that uses satellite imagery and machine learning to detect dangerous concentrations of cyanobacteria in small, inland water bodies. Built on the winning solutions of the Tick Tick Bloom competition, the goal of CyFi is to help water quality managers better allocate resources for in situ sampling and make more informed decisions around public health warnings for critical resources like lakes and reservoirs.
In the Concept to Clinic challenge, hundreds of data scientists and engineers from around the world came together to build open source tools to fight the world’s deadliest cancer. The prototype developed during the live challenge period between August 2017 and January 2018 focused on helping clinicians flag, assess, and report concerning nodules from CT scans.
This open-source project is an end-to-end application that allows radiologists to better interact with state-of-the-art AI as part of their diagnostic process.
Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.
Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.