blog toolsmedia

Our Minimum Viable Process for changing and deploying software

Over the years, we have gradually settled on a process for changing and releasing software that is robust enough to prevent common failure modes but lightweight enough to be minimally annoying. Here’s what it looks like when we make a change.

Isaac Slavitt
Co-founder

DrivenData’s web deployments are supported by our friends at Heroku. This post highlights the ways Heroku underlies our Minimum Viable Process for deployments. We also recently recorded a podcast that explores this topic further including some deployed projects we’ve been working on, which you can find here.


Our Minimum Viable Process for deployments

At DrivenData, we operate a production platform with tens of thousands of users. We are always working on adding features and improving the software, so we often make changes to our codebase.

Of course, making changes in software can sometimes introduce bugs, and bugs are not fun. Another thing that is not a lot of fun is process, but some amount of process is necessary to prevent bugs.

Ah, the glamorous life of an engineer: making tradeoffs to balance negatives!

chart showing tradeoff with pain on the left axis, process on the right axis, and a U-shaped parabola with a highlighted minimum point

Over the years, we have gradually settled on a process for changing and releasing software that is robust enough to prevent common failure modes but lightweight enough to be minimally annoying. Here’s what it looks like when we make a change.

  • Checkout a branch to work on.
  • Change or write a test relevant to the change.
  • Change the actual code to make the change.
  • Push the named branch to our repo, open a PR against master, and assign a fellow engineer to do a code review.
  • Our CI will automatically run the test suite. Any of the following conditions will fail the build:
    • Linting. The flake8 tool has any warnings or errors.
    • Code style. The black tool would change any files.
    • Tests. Any test fails or errors.
    • Code coverage. The overall project coverage is reduced by a certain non-trivial percentage or the diff coverage is bad.

  • If the build succeeds, it will place a green checkmark on the PR. If it fails, there will be a scary red ‘x’ icon.
  • If the build succeeds and was on the master or staging branch, it will get deployed to Heroku.

Let’s break this process down piece by piece.

Branching

We use an extremely lightweight version of feature branches. What this means in practice is that we almost never make changes without a well defined Github issue specifying the changes to be made.

When we make a new branch, we use the name to refer back to the Github issue tracking the desired change (e.g. 254-add-error-message-on-timeout).

When we open the pull request, the commit message or PR description will finish with a string like Closes #254, which automatically closes the relevant issue when merged to master.

Writing tests and code coverage

Though we have never dipped below 95% test coverage on our platform, a figure we take some pride in, we are not sticklers for particular code coverage numbers. Nor are we sticklers for test-driven development (TDD) in that we don’t mandate that the tests get written first.

But we are sticklers for exercising most of the plausible pathways that the code can take. It’s not great if a certain conditional in the code never gets triggered by any of the tests, and as developers we have definitely been burned in the past by assuming that a change doesn’t need a test because it’s "not that complicated."

A failing test is not a slap in the face. It’s your past self saying "you’re welcome" to your future self for preventing you from having to fix a bug in the middle of the night.

Code reviews, linting, and style

Ordinarily, nothing goes to the master branch without a second set of eyes — even if that review is just a gut check.

In our code review, we don’t usually have to worry about nitpicking code style because the build was already the jerk and complained if anything was off. This way, we can focus more on core concepts like security and functionality.

meme with Lumbergh from Office Space asking you to rerun flake8

Unless the change is trivial, we often ask each other open ended questions such as "how would this affect [other piece of the system that is marginally concerned here but might get overlooked]" These questions aren’t gotchas, they are recognition that simple questions are often extremely important for checking assumptions.

Effective code reviews are a huge topic in their own right, and we don’t address how to do those here, but we feel strongly that not doing code reviews at all is a smell — even in data science work.

Deployment

We deploy our code on Heroku, and have been since the beginning of 2014. For us, it was a great way of abstracting away the machine so that we could stay focused on building the software, and our use of Heroku has scaled up as we have grown from an early alpha to tens of thousands of users.

One of the nice things about a git-based deploy workflow is that we have automated pretty much everything based on our branch strategy. For example, a passing build on our staging branch automatically deploys to our staging instance. A passing build on master automatically deploys first to staging and then production.

Not only does our CI deploy the application, but it triggers a snapshot of the database using Heroku’s PG Backups, runs any database migrations, does static asset uploads, and does a post-deploy HTTP 200 health check at the same time. These are exactly the types of action that get missed when deployment is a manual process.

This is a very boring deployment workflow. There is nothing novel about this deployment workflow. We prefer it that way.

Closing thoughts

One of the gratifying things about working directly with many organizations is getting to see how they work. We have worked with two-person startups all the way up to Fortune 50 companies, and the sheer variety in software development and deployment process between organizations is incredible.

There is a popular belief that all startups are nimble and older "enterprise" companies are stultifyingly slow, but the relationship between size and software practices is not always straightforward.

We have worked with new-ish startups that have a staggering amount of process: thousands of JIRA tickets in dozens of repositories, organized into epics, communicated about over email/Slack/Bitbucket and more. Meanwhile, we have worked with teams that are part of giant organizations where a tiny handful of people make ad hoc changes and carry out deployments however they feel like.

Process can be a liberating tool to reduce stress and worry, or it can be a crushing burden that makes engineers start looking for work at other companies. This is a classic case not just of diminishing returns with increasing amounts of process but also of a parabola of sadness with output vanishing as bureaucracy increases.

For our small team, we are always experimenting with ways to introduce tiny amounts of process that have an outsized impact on reducing our stress levels and making work more enjoyable. This post has only described how we actively make changes and not even scratched the surface of how we plan and prioritize work, but the same principle applies across the board.

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

insights

Life beyond the leaderboard

What happens to winning solutions after a machine learning competition?

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.