blog community

Community Spotlight: Brett Mullins

The Community Spotlight features fantastic members from our DrivenData community. Brett Mullins is a graduate student whose work focuses on differential privacy and explainable machine learning.

The Community Spotlight celebrates the diversity of expertise, perspectives, and experiences of our community members. In this post we sit down with Brett Mullins, a winner of the Differential Privacy Temporal Map Challenge and a graduate student at the University of Massachusetts at Amherst.


Picture of Brett Mullins

Name: Brett Mullins

Hometown: Atlanta, Georgia


Tell us a little about yourself. Where do you live, what do you do professionally, what do you enjoy doing outside of work?

I am a graduate student in Computer Science at UMass Amherst working with Gerome Miklau and Dan Sheldon. My research focuses on differential privacy and explainable machine learning but extends to other areas where applying formal models brings new ideas to the table.

Outside of work, I spend quite a bit of time reading, sampling coffees from all over New England, and tinkering with various Linux-friendly computers. I'm currently getting set up on a Framework 13.

How did you get started in data science?

Like many data scientists in the 2010s, I stumbled my way into the field. During undergrad, I studied economics and philosophy at Georgia State University. Afterward, I worked as research assistant at the Fiscal Research Center - a research group at GSU - on a project measuring income mobility in Georgia using government administrative data. This involved lots of dirty data work since we were linking datasets from multiple state and federal agencies.

Later, I did a summer internship with Allstate's data science group and returned full-time the next spring. My time at Allstate allowed me to learn a great deal while working on a variety of projects from model building - which mostly consisted of mining XGBoost but included a brief excursion into computer vision - to deployment and all of the wonderful messiness that comes along with getting a model integrated into a production environment.

What is a recent project you’ve worked on that you’re excited about or other interesting work you can share?

Two exciting recent projects I've worked on involve introducing a new algorithm for differentially private synthetic data generation - the focus of the DrivenData competition I participated in - and the foundations of rule-based explanations in machine learning. Below is a brief description of each.

In 2022, my colleagues and I at UMass released AIM, a novel algorithm for differentially private synthetic data generation which outperforms many of the existing mechanisms across high-dimensional datasets. Our mechanism follows the Select-Measure-Generate paradigm which iteratively learns low-dimensional statistics from the dataset and introduces several features to utilize the privacy budget efficiently. AIM grew out of line of research which included participating in the 2020 Differential Privacy Temporal Map Challenge.

For a classifier, a rule-based explanation is a set of inputs defined by a rule that the model labels the same way. For example, if there's a model that predicts that I have a cold, then a potential explanation is that my fever is above 100 F and I have a cough. There are several techniques available to generate these sorts of explanations such as Anchors and LoRE. In a recent paper, I argue that these algorithms can find explanations because most classifiers have a simple topological structure relative to the shape of potential explanations. This paper is part of an emerging area called Formal XAI.

What motivated you to join a DrivenData competition or to continue participating in challenges?

Going into the 2020 Differential Privacy Temporal Map Challenge, I was relatively new to differential privacy and was interested in learning about synthetic data and its applications. Fortunately, several folks from UMass shared my interests and we fielded a team. My colleague Ryan McKenna had success in the 2018 Synthetic Data Challenge and was instrumental in getting me up to speed for this one.

It also didn't hurt that the prizes on offer were quite generous! I look forward to any privacy-based competitions hosted in the future.

What hurdles have you had to overcome to become a data scientist? What advice would you give to others facing the same challenges?

Getting started in data science can feel like a daunting task. Not only is the field/role amorphous, but if you asked ten people what makes a data scientist you would get eleven different answers. Two areas that I always suggest people work to improve are communication and math/cs fundamentals.

Communication in data science is often pitched as storytelling. My practical interpretation of this is being able to make clear and engaging/entertaining slides. Being able to do this will take one far!

Suggesting learning math/cs fundamentals is more controversial, since these ideas may not be immediately useful to all or even most roles. Plus, it's more fun to dive into neural networks than figure out how to insert/delete records into/from a balanced tree data representation. However, when you run into a problem or want to use something new, you'll have a much easier time if you start from a solid base and only have to learn the particulars at hand. Here I recommend a bit of linear algebra, probability, convex optimization, and data structures.

Have you read any good books or articles recently?

A book which I started a while ago and am eager to pick back up is John Stillwell's Reverse Mathematics: Proofs from the Inside Out (2018). This is an introduction to a program in logic which studies the relative strength of theorems particularly in analysis. This book is both readable and approachable for those with a math background up through analysis or advanced calculus.

Resverse Mathematics Cover

Photo: Reverse Mathematics book cover/Wikipedia.

I also write reading recommendations at the end of each year. Here are my lists for books and articles from 2022.

Where can the community find you online?

You can visit my website, find my papers on Semantic Scholar, and view some of my projects on GitHub.


Thank you for sharing your experiences and advice, Brett (@bmullins)! If you or someone you know would like to be in the Community Spotlight, let us know!

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

tutorial

NASA Pose Bowl - Benchmark

An introduction to the NASA Pose Bowl competition, with a benchmark solution for the object detection track

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.