What wins data competitions? (And what you can learn from it)

The State of Competitive Machine Learning 2023 is a great read! Every working data scientist can learn from what wins challenges, so I've picked out some key insights below.

Challenges are like prototype environments: participants want to get to the best answer, quickly, efficiently, and with minimum code. If in your work you want quick results and improved accuracy, you'll learn something here.

We know Python, deep learning, and PyTorch are a common stack for winners, but the extent—it is basically a clean sweep—is surprising. This has definitely changed in the last couple years. In my opinion, tensorflow and keras are still worth considering, especially for edge/mobile/browser deployment, but to keep up with the latest you've got to know PyTorch.
Obviously, deep learning won most NLP and CV challenges, and a unified API for pretrained models is essential, particularly timm and huggingface's transformers. Try a different architecture or pretraining regime with very little code change. Turns "no free lunch" into a buffet!
Some interesting things that don't happen in challenges: training from scratch, custom pretraining, and novel architectures. Likely because of the time and computation constraints these don't make sense for challenges, but we'll often explore these in our own work. Competitions help demonstrate that the work to return ratio on starting from first principles is not worth it for many problems.
Interesting to see EfficientNet to be the most popular pre-trained architecture; we've found it to be hyperparameter sensitive when fine-tuning. Comps saw some ConvNeXt activity, and I think more is coming—give it a shot.
LightGBM and XGBoost continue to be what you should reach for first for anything that is not computer vision or NLP. Both for tabular data and model ensembling they both perform well, and appear across ML competitions solutions. In my mind, neither is the clear winner, and I still recommend trying both and settling the question empirically.
Augmentation is common for computer vision (albumentations) and sometimes test-time augmentation appears in winning solutions. I think both training and test augmentation are under-utilized in production ML flows at the moment. Also, it is clear that NLP is missing both the methods and the libraries for effective augmentation, but I think that changes in the near future.

A couple bonus observations:

😍 More challenges around areas that DD community cares about—particularly, conservation, energy and medicine
😍 Novel competition structures for things like reinforcement learning and weak supervision
😍 Cross-validation gets a big shoutout! Do it. We still see it neglected

I'm proud of DrivenData's representation in the report, especially the callout to our openness including:

HUGE thanks to ML Contests for all their hard work! 🙏

What wins data competitions? (And what you can learn from it)

Tags

Latest posts

Community Spotlight: Paola Ruiz, Néstor González, Daniel Crovo

Community Spotlight: Kirill Brodt

Jump-starting data infrastructure and in-house data expertise

A production application to support survivors of human trafficking

Life beyond the leaderboard

(Tech) Infrastructure Week for the Nonprofit Sector

Meet the winners of Phase 2 of the PREPARE Challenge

AI sauce on everything: Reflections on ASU+GSV 2025

Open-source packages for using speech data in ML

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Crowdsourcing solutions for AI-assisted early literacy screening

Where to find a data job for a good cause

Meet the Winners of the Youth Mental Health Narratives Challenge

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

10 takeaways from 10 years of data science for social good

Making higher education data more accessible

Goodnight Moon, Hello Early Literacy Screening Benchmark

Youth Mental Health: Automated Abstraction Benchmark

Meet the winners of Phase 1 of the PREPARE Challenge

Work with us to build a better world

Tags

Stay updated

Latest posts

Work with us to build a better world