blog winnerscompetition

Trying to Get It Right when Everyone was Wrong

The best we can do with the polls.

Casey Fitzpatrick
DrivenData

Trying to Get It Right when Everyone was Wrong

Now that the electoral college has signed, sealed, and certified their votes in sextuplicate, we thought it'd be a great time to dive into the resuls of our recent lightning competition for election predictions.

When we partnered with Civis Analytics to find America's Next Top (Statistical) Model, we were not expecting that the 2016 U.S.A. Presidential Election would come to be described as the biggest political upset in U.S. history. The upset, of course, was that nearly all the data – including the Republican National Committee's – seemed to favor the Democratic candidate. Nevertheless, when the Electoral College met on December 19, Republican President Elect Donald J. Trump received 304 electoral votes compared to Democratic chellenger Hillary Clinton's 227. (Seven electors — 2 of Trump's and 5 of Clinton's — defected and voted for somebody else.) Trump's electoral victory comes despite Clinton's popular vote lead of nearly 3 million, distinguishing the 2016 results as the largest gap between the popular vote and the electoral college in nearly 150 years.

While some see the predictions of 2016 as an absolute failure of data-driven approaches to politics, it is important to remind ourselves that not all approaches were equally wrong. Certain models were essentially certain that Clinton would win in a landlside. But others, noteably Nate Silver's 538 model, actually issued a non-trivial 10.5% chance that precisely what did happen would happen: Trump wins the Electoral College while Clinton wins the popular vote. With these facts in mind, let's take a quick look at our top submissions and try to gain a sense of what made some approaches more right when everyone was wrong.

Our competitors were asked to submit specific percentage predictions for each of the four major candidates in each state. Competitors were allowed to use any data, from polls to posts, to generate their predictions. Submissions were scored according to the RMSE metric. To get a quick sense of how competitors did overall, let's look at the error chart for the mean predictions. We took the mean of each prediction for each candidate in each state, and subtracted the prediction from the actual outcome, meaning results less than zero correspond to a candidate underperforming expectations, and results greater than zero correspond to a candidate overperforming.

Clearly Trump overperformed and Clinton underperformed throughout submissions. That makes sense since Trump beat his polls in many keys states, particularly in battleground and so-called "Democratic firewall" states, leading to the upset. Given that the polls were off, however, these errors also indicate an overreliance on polling data as inputs to the model predictions. This can be seen very clearly by Trumps overperformance in some of the key states thought to have flipped the election, such as PA, FL, NC, OH, MI, and WI.

Let's focus specifically on Pennsylvania (PA), Michigan (MI), and Wisconsin (WI). These states are interesting for a couple of reasons. First, WI hasn't voted republican in a presidential election since 1984, while PA and MI haven't since 1988, making these states part of the supposed Democratic Firewall. Second, given that these states were expected to go to the Democrats, it is worth noting that had this been the case their electoral votes alone – PA: 20, MI: 16, and WI: 10 – would have been more than enough to make Clinton the 45th president.

As expected, Trump overperformed in each of these key states, indicating the general trend of overreliance on polls mentioned above. The question now is, Did the winning models do better in the key "surprise" states, and if so, why?

Aside from polling data and historical returns, several winners enriched their models with other data. The top finisher, a quantitative policy researcher from the Netherlands, and the second place team, composed of an analyst and a former political journalist now communications executive, both included demographic variables and other specific domain knowledge in their predictions.

The top finisher relied on demographic information from the census, including race, education level, and income. Nevertheless, despite finishing first overall on our leaderboard (and therefore closer to the actual outcome of the election), when comparing reality to the winner's best submission, Trump overperformed the key state predictions even more than mean of submissions. This result is consistent with the position that "demographics aren't destiny."

As for the second place team, they took a slightly different approach to get information from beyond the polls. Leaning on nearly two decades of experience in political reporting, demographic and historical variables were used to categorize states into the following scenarios:

  • Time for Change - predicted to return to Bush republican years (Trump-like demographics e.g., high percent uneducated)
  • Partial Time for Change - predicted partial reversion but not full based on demographics (e.g., reduced Black vote but higher Hispanic vote)
  • No Time for Change - predicted to remain similar to Obama years (e.g., large minority population, Mexican boarder states)
  • Trend - states consistently trending Democratic or Republican since 2000 and predicted to remain doing so
  • Mormon Corridor Pattern - large Mormon population and predicted higher 3rd party vote

This additional information was weighted more than polls by the second place model. The resulting key state error is given below:

As we can see from the figure, Trump still overperformed the predictions but the second place model predictions are approaching the actual outcome, and significantly outperforming the mean of submissions – and the first place finisher in all key states (including key states other than MI, PA, and WI).

In fact, Trump's actual results did not outperform the second place predictions by more than 2.5% in any key state, whereas the winning model was off by over 3% in each key state shown, and nearly 5% in some others such as Ohio.

So, what can we take away from a contest where even the winning models failed overall? Well, it seems clear one take away is that more than polling data was needed to get closer to anticipating the upset. Does that mean that "data is dead," as some have said? No, it doesn't. It simply means that polling doesn't tell the whole story; if it did, we wouldn't need elections. Here at DrivenData, we are fortunate to have competitors who were able to leverage this insight, getting closer to right when pretty much everyone was wrong.

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

tutorial

NASA Pose Bowl - Benchmark

An introduction to the NASA Pose Bowl competition, with a benchmark solution for the object detection track

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.