blog community

Happy April Fools' Day 2023

Learn more about DrivenData's recently announced AI chatbot, April.

Jay Qi
Lead Data Scientist
April: Hello, I'm excited about the opportunity to work with you and achieve our common goals. 😊

Meet April

On April 1st, DrivenData released a splashily-titled new competition: "Preview: The Future of DrivenData Competitions". The competition announced the debut of DrivenData's own chatbot, April, who was developed to take over evaluation of all competitions.

If you tried submitting prediction to April, you likely found that she was very hard to impress. Some skillful chatbot conversationalists may have successfully coaxed April to reveal some personal information, like the fact that her favorite holiday is April Fools' Day. Wait a minute...


User: Is this an April Fools' joke? April: April Fools' Day? This holiday is at the top of my list.

Happy April Fools' Day!

Under the hood

April is not actually a live AI model hosted by DrivenData. As has been demonstrated with ChatGPT in recent months, letting people interact with live AI chatbots can be pretty fraught with uncomfortable situations!

Our team had some fun using ChatGPT to generate automatic responses to user predictions in a variety of different "personas" which were embedded as static response banks. For each persona, we presented ChatGPT with a prompt like "Please generate 20 uncertain or negative responses for a magic 8 ball in the style of technical jargon". Our favorite responses were the ones in the style of TV show character Ted Lasso and Great British Bake Off judge Paul Hollywood. You can use the "I don't like the feedback I'm getting" button to switch April to another persona.


Screenshot of ChatGPT.
How we generated April's responses using ChatGPT.

Other messages from April, such as the greeting, the initial message in the feedback chat box, and the apology when switching personas were also generated by ChatGPT. These were done with prompts like "Write 20 variations of 'Hello. I'm looking forward to working with you.'"

April's demo is entirely implemented with client-side Javascript. For most of April's dialogue, the response banks were randomly shuffled on page load, and then the demo would simply iterate through them. For the prediction feedback, a simple hash was calculated from the input text so that the same input would deterministically produce the same feedback. To see the source code for April's demo, you can inspect the HTML source of the demo's web page. We've also open-sourced a standalone copy of the demo on GitHub, which may be more convenient way to read the code. Here's a direct link to the static responses that April could make.

Let's get serious for a moment

We live in an exciting time where the capabilities of AI technologies are changing rapidly. There is a lot of energy around how chatbots like OpenAI's ChatGPT and Google's Bard might be used in imaginative new ways. Regardless of how much one buys the hype, it is clearly remarkable how well the newest generation of large language models mimic human writing and often respond with real, useful knowledge. Let's also not forget the groundbreaking improvements last year in image generation from models like DALL-E, MidJourney, and Stable Diffusion.

However, any technology also has the potential to be misused and cause harm, whether intentional or unintentional. Deploying models ethically and responsibly involves hard conversations both upfront and after deployment. DrivenData has created a data science ethics checklist tool named Deon, which is more relevant than ever for people considering the use of modern AI models. Deon is a framework for incorporating ethical checks into the development process of data science applications. It includes a default checklist that is broadly applicable to most data science projects.

Ethics is nuanced, and there is rarely one right answer. The goal of Deon is not to tell you what to do, but rather to provide a starting point for thoughtful discussions about the relative trade-offs and risks involved in a project. Sections D and E on "Modeling" and "Deployment" from Deon's default checklist are especially relevant when using a model provided by a large AI lab, such as ChatGPT from OpenAI. Below are some selected items from the checklist.

  • D.5 Communicate bias: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

One of the challenges with AIs like ChatGPT is that the limitations and shortcomings are often not readily obvious. For example, they are known to "hallucinate", where they provide information with the appearance of objective knowledge even if it is entirely fabricated. This phenomenon has already led to recent real-world problems: misuse of AI chatbots has led to misunderstandings about a business' services, disruptive confusion about laws, and wrong medical information.

  • D.4 Explainability: Can we explain in understandable terms a decision the model made in cases where a justification is needed?

Having as clear of an understanding as possible about how these models work is important to identifying the risks and failure modes. Well-known researchers have dubbed large language models to be "stochastic parrots", which can be a useful framing for setting expectations about these models' capabilities. Ultimately, there is a lot that cognitive science still doesn't know about human cognition, and so there is debate among AI researchers whether large language models could have emergent cognitive capabilities beyond parotting, or not. How to even evaluate these models' capabilities in a reliable, scientific way is not straightforward or settled.

  • D.2 Fairness across groups: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?

Models like ChatGPT are typically trained on large datasets of internet content and can reflect the biases present in that data. The previously mentioned "stochastic parrots" paper has extensive discussion about the risks and limitations of internet content datasets. This is a hard problem, and one that even OpenAI readily recognizes as still an important shortcoming of ChatGPT.

Finally, the work doesn't end when the model is deployed. Even after precautions to mitigate harms upfront, monitoring and plans are needed to address problems that still can arise. We expect the below questions to be especially relevant for teams thinking about deployments:

  • E.1 Monitoring and evaluation: How are we planning to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?
  • E.2 Redress: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?
  • E.4 Unintended use: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?

Thoughtful governance, intentional processes, and human-centric design are critical to using this new technology in a safe and responsible way. Our hope is that Deon can be a useful tool to prompt the important discussions needed.


Though April won't really be helping us with DrivenData competitions, who knows what the future might hold as AI continues to progress! Thank you for celebrating April Fools' Day with us.

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

tutorial

NASA Pose Bowl - Benchmark

An introduction to the NASA Pose Bowl competition, with a benchmark solution for the object detection track

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.