Announcing the results of our Keeping It Fresh competition

See who kept it the freshest — meet the winners of the Keeping It Fresh competition.

We are excited to put a lid (cough cough) on this competition and release the winning solutions out to the world. Congratulations to all of our competitors for all their work during the competition, and particularly to our top three finalists who you can meet below!

This challenge ran for two months, and we saw a lot of improvement over that time. Then at the beginning of July, top modelers made predictions for where minor, major, and severe hygiene violations would surface for the next 6 weeks. Over that time, we compared their predictions with what public health inspectors actually found when they went into restaurants! At the end of the evaluation period, we saw which teams had developed the most accurate predictions.

restaurant map

You can read about how they made their models below. We've already seen some exciting follow-up, including an article in the Washington Post. Check it out:

"Using the winning algorithm, [Harvard researcher Mike] Luca says, Boston could catch the same number of health violations with 40 percent fewer inspections, simply by better targeting city resources at what appear to be dirty-kitchen hotspots. The city of Boston is now considering ways to use such a model."

Way to go DrivenDatistas! If you're geared up to work on more data challenges, check out our competitions page for the latest. For the data used in Keeping it Fresh, check out this competition's open data page. Thanks for all your submissions and keep up the good work!


leaderboard

Meet the winners!

1st Place

Name: Liliana Medina

Home base: St. Ives, England

Background: My name is Liliana Medina, I'm a Portuguese citizen currently living and working in Cambridge, UK. I am presently a data scientist for ForecastThis, where I've been involved in the development of a machine-learning-as-a-service platform, DSX, and also in financial forecasting, web traffic fraud detection and text analysis related projects. Before this I was a data and text mining specialist for 365Media. I have an MSc in Electrical and Computer Engineering, and my thesis work was focused on pattern detection in electrophysiological signals, using unsupervised learning methods. I'm going back to college soon, as I have recently enrolled in an Astronomy certification programme at the University of Cambridge's Institute of Continuing Education.

Method overview: Liliana extracted three classes of features: information about a restaurant's history of inspections, metadata about restaurants from Yelp (e.g., the type of cuisine), and data she extracted from the Yelp reviews. Her approach to creating features from the review text involved both sentiment analysis and topic modeling. Ultimately, she combined these features into a model that averaged the predictions of a random forest and gradient boosted decision trees.

2nd Place

Name: Qingchen Wang

Home base: Surrey, Canada

Background: My name is Qingchen Wang, and I have just started as a PhD student in data science at the University of Amsterdam. I've also just started work as a consultant at ORTEC.

Method overview: Qingchen focused on building features about the restaurants based entirely on information about the restaurant inspection history and data from Yelp. He noted that many restaurants don't clean up their act after a failed inspection, so he was able to special-case predictions for these repeat-violators. Combining this inspection history information with Yelp information like the average star rating and number of reviews, he was able to train a random forest model that earned him second place.

3rd Place

Name: Shane Teehan

Home base: Dublin, Ireland

Background: I am an advanced analytics professional with an academic background in Operations Research and about ten years industry experience. I am currently managing a team of data scientists working in the aviation sector.

Method overview: In order to extract the most effective features from the model, Shane normalized the json data dump from Yelp and imported that into a Postgres database. Working in SQL allowed him to quickly explore the dataset and construct complex combinations of features about restaurants, their reviews and Yelp users. Finally, Shane included features about a restaurant's history of violations. He built four models for each target (random forest, extra random trees, gradient boosting machine, l2 logistic regression) and blended the predictions from these models to create his final submission.

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

tutorial

NASA Pose Bowl - Benchmark

An introduction to the NASA Pose Bowl competition, with a benchmark solution for the object detection track

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.