by
Hannah Moshontz
Machine learning competitions offer rich opportunities for learning and teaching. Competitions provide an experiential learning environment, featuring a motivating problem, a clear objective, access to all necessary materials and tools, and iterative feedback.
As a result, we often see competitions used by instructors to build and demonstrate applied data skills. In fact, DrivenData competitions were first run by graduate students, partly as a way to give more learners ways to connect with social impact data problems.
DrivenData competitions are great for teaching because they are:
- Cumulative: Students use many different skills and learnings
- Applied: Real-world datasets and compelling applications
- Rigorous: Built on industry best practices
- Fair: Performance is transparently and objectively measured
- Scalable: Platform easily accommodates many students
- Fun: Live contest tracking self and peer performance
In this blog post, we offer tips and resources for teaching with DrivenData machine learning competitions. This post includes:
- An overview of the types of competitions available for educational use
- Examples of how competitions have been used to support learning
- Some consolidated resources to get started
Types of DrivenData Competitions Useful for Teaching¶
There are over 65 practice, active, and completed competitions on drivendata.org that you can explore on the competition search page. Use the tabs on the left to filter by impact domain (e.g., climate, science, privacy, health), difficulty, and open data or solutions.
Three types of DrivenData competition are particularly useful for instruction, as described below.
Type | Summary | Teaching opportunities |
---|---|---|
Practice competitions | Open competitions with no prize offerings | Timing is flexible, and data can be used to design any exercise or assignment |
Active prize competitions | Open prize competitions with a live leaderboard | Students can participate in a real prize competition alongside experts, and can learn from the winning strategies after the competition closes |
Closed prize competitions with open data | Closed prize competitions with data available for people to play with, and winning solutions documented | Timing is flexible, there is a large variety of topics and data modalities, and data can be used to design any exercise or assignment |
These competitions include a variety of data types and problem complexity, making them appropriate for a range of courses and instruction levels. In the next section we share some examples of different ways instructors can draw on these resources to support learners.
Examples of Use¶
Instructors can incorporate DrivenData competitions or competition materials into their instruction in many different ways. We give recommendations and examples below, with instructors of college or graduate level data science or applied statistics courses in mind. However, these examples can be adapted to other skill levels and course topics through assignment design or selection of a competition. For instance, the PREPARE Challenge could be fodder for behavioral or health sciences students, and the Pale Blue Dot Visualization Challenge could be great for environmental studies students.
Participate in an Active Competition¶
As part of a course-long project or extra credit opportunity, students can participate in a competition individually or in groups. This is the simplest option for instructors because the competition provides all the context and instructions that students need. Optionally, students can write a summary of their participation and its relevance to course concepts. Having students participate in an active competition works particularly well when the submission period of the competition begins just before the start of the course. On request, we can make a custom leaderboard just for your class or for different sections of your class.
DrivenData Competitions to use: Any active prize or practice competition
Skill options: EDA, feature engineering, troubleshooting, model development, model interpretation and description, documentation, technical writing, visualization
Assessment: Credit/no credit, or grades based on leaderboard performance, quality of code, or a customized final report
Variations: Require a particular code structure, like Cookiecutter Data Science, to encourage best practices in code organization and reproducibility. For less advanced classes, provide or use existing starter code and focus on specific skills like feature engineering. For more advanced classes, place additional constraints or conditions on student solutions (e.g., students must use a particular algorithm type or must make inferences quickly).
Difficulty: Intermediate or Advanced, but flexible depending on the selected competition and design of the assignment
Use Open Data from Closed Prize Competitions¶
As part of a problem set, in-class demonstration, exam, or other project assignment that requires model development, you can use the open data from a closed prize competition. Since closed competitions do not offer automatic scoring of predictions and models, this option works best when you have an assignment idea in need of cleaned and well-documented data.
There are open datasets covering a variety of modalities and topics. For example, with open data from closed DrivenData competitions, students can train models to identify the composition of Martian rocks and soil samples, estimate above-ground forest biomass from satellite images, identify individual beluga whales from overhead images, classify animals in camera trap images, or forecast changes in Earth’s magnetic field from solar wind measurements.
DrivenData Competitions to use: Any competition with open data
Skill options: Flexible to fit a huge range of data science or statistical skills
Assessment: Grades can be based on model performance, or a submitted report or presentation.
Variations: For practice with data wrangling, students can find, download, and prepare data for analysis as part of the assignment.
Difficulty: All skill levels.
Analyze Winning Solutions from Closed Prize Competitions¶
As part of an in-class or take home exercise for individuals or groups, students can review and analyze the winning solution code from a competition. Students can pick the competition, or be assigned one. The content of their review can be tailored to course goals and data availability, but could involve code review, hunting for errors, reproducing winning model predictions, mere summarization of a single solution, compare and contrast of different approaches across solutions, or a description of how to combine or ensemble the approaches from the top solutions.
DrivenData Competitions to use: Any closed competition with announced winners
Skill options: Code literacy, troubleshooting, technical writing, comparative analysis, presentation
Assessment: Grades based on the quality of the review or assessment
Variations: Have students use the winners' written reports rather than the code itself, have students attempt to solve the competition problem themselves before doing this exercise.
Difficulty: Beginner or Intermediate
Consider Data Science Ethics¶
As part of a take home exercise or project, students can use a competition problem or a prize-winning model to practice applying ethical considerations. For example, students can explore the different consequences of false positives or false negatives in the context of algorithms for detecting harmful algal blooms from the Tick Tick Bloom challenge, or can consider the issues that would be important to monitor for after deploying an algorithm that estimates seasonal snowpack in the Western U.S. from the Snowcast Showdown.
DrivenData Competitions to use: Any competition
Skill options: Ethics
Assessment: Grades based on the quality of the paper or report
Variations: For an easier in-class version, just use the problem to abstractly reflect on ethical challenges. For a more advanced version, have students apply Deon to their own solutions.
Difficulty: All skill levels
Resources¶
Category | Resource | Description |
---|---|---|
Finding competitions | Active prize competitions | Ongoing competitions with prizes |
Practice competitions | Open competitions for practice, no prizes | |
Competitions with open data | Competitions featuring publicly available datasets | |
Competition-specific resources | Winner's announcement blog posts | Blog posts titled "Meet the Winners of the X Challenge" for completed competitions |
Winning solutions and write ups | GitHub repository with winning solutions for completed competitions | |
Forum | Q&A about competitions, organized by competition categories | |
Community code | Feature for sharing notebooks and code snippets, accessible from competition page menu | |
Topic and data guides | Beginner's Guide to Satellite Data | Introduction to working with satellite imagery |
Open data and AI applications for climate action | Resources for AI applications in climate action | |
Open earth observation data related to select SDGs | Data sources for food access, clean water, and climate action | |
Ethics checklist for data science | Ethical considerations in data science projects | |
Practical guidance for reproducible data science | Template and best practices for reproducible data science |
Closing Thoughts¶
DrivenData competitions offer educators practical opportunities to engage students in real-world data science problems. Teachers can use competitions in many different ways in their course design, including direct or skills-focused participation in competitions, solution walkthroughs and reproductions or extensions, and assignments that require students to apply important concepts, like ethical considerations, to different kinds of problems or algorithms. Competitions can be used to serve many different course goals and student skill levels. With the right guidance and resources, students can make significant contributions to solving important social issues with data science.
If you’d like advice on teaching a particular skill with DrivenData competitions, or if you want to create a custom leaderboard for your class, please get in touch!
Thumbnail and banner image by Sylvia Yang on Unsplash.