blog insights

Data Tornadoes: What the flurry of AI ethics conversations misses

This era of generative AI is changing how we think and talk about AI ethics, but we shouldn't lose sight of what matters.

Peter Bull
Co-founder

The ChatGPT AI explosion is changing the questions we ask about data ethics. For example: what is consciousness?

JK JK JK JK. This is not a troll piece. I promise this is real and practical. But, it’s not going to start that way so stick with me.

There has been a lot of talk about emergent properties recently, which is really bringing me back to my undergrad philosophy days. Is consciousness (or, to a lesser degree, reasoning) an emergent property of neural nets—specifically LLMs like ChatGPT?

In a philosophy class, one of the classic examples when discussing emergent properties is a tornado. A tornado is made up of debris and dust and air particles. But, the behavior of the tornado is not dependent on any particular particle in the tornado. The interactions in the system define the behavior of a tornado, define what it means to be a tornado. The property of being a tornado is emergent from the system.

The analogy to LLMs is simple: no single node in a neural network explains how the network works. Thus, philosophers ask the question: How do we explain where tornado-ness comes from?

I took an entire class on the 18th century Dutch philosopher Baruch de Spinoza, and he tackled the problem of emergent properties by saying that every particle is God. Not a piece of God, just is God. This leads to very sensible conclusions, like the conclusion that chairs think. No seriously, Spinoza believed that chairs think. I guess that if you believe that chairs think, you may not be too worried about neural networks also thinking.

But, as we see right now, lots of people are worried about neural networks thinking. So, let’s talk about those worries. I think some very important properties may be emergent, and that AGI is potentially dangerous, and yada yada yada future of the human race. However, these conversations are drowning out the real, substantive harms of AI systems that are already here. There has been a flurry of media attention to AI alignment and longtermist risks, and a good portion of the ChatGPT conversations I have jump to “killer AI” in about two leaps from “it drafted my email.” But that misses well understood harms that are right in front of us, and these are only going to get worse. Here’s why.

In the case of ChatGPT, the scale of the model was one of the critical breakthroughs that made a difference in degree look like a difference in category, largely because it resulted in specific emergent properties. For example, the emergent property of convincing an otherwise skeptical journalist that it was in love with him. There’s another difference of degree that is ramping up the harm caused by AI.

It used to be the case that there was a talent crunch on data scientists and ML engineers that could build AI systems. Now, there are probably hundreds of AI start-ups a day built on no ML knowledge or background, simply enough software engineering to use an API key. This has meant the scale and scope of AI applications is also ballooning.

Five years ago we started a project called Deon, which is an ethics checklist for data scientists. The purpose was simple: catalog the harms that were specific to machine learning in a way that helped practicing data scientists to mitigate those harms. The project was meant to be used as a grassroots way of building data ethics into projects. We provide examples for each checklist item where ignoring that item resulted in harm. We architected the checklist around the data science process so you could easily map relevant concerns to wherever you were in your project. Being grassroots was important: a checklist like this was easy to socialize to the relatively small group of data science practitioners engaged in good faith to build responsibly. That is not how data products on top of LLMs are being built now. And, importantly, that is not who is building them.

Our theory of change was that we could harness the community of data scientists to be stewards of data ethics by integrating it into their workflow. We spent a lot of time on talks and workshops at gatherings of AI practitioners like this one from my colleague Emily Miller called “Actionable Ethics in AI.” We talked about a long history of things gone wrong, like unintentional leaks of PII, measuring fairness of algorithms across groups, explaining model inferences in critical decision making scenarios, preventing unintended use and abuses of models, and making sure the model results present uncertainty accurately. What we did not anticipate was how quickly the deployment of AI would be done by a broader audience of software engineers who wouldn’t be attuned to these risks. There’s no gatekeeping now that is going to change that, and there’s no pause in model training that is going to make a difference.

Deon’s approach fails to meet the moment. The potential for harm no longer takes a direct route through a small community of practitioners that we can easily transmit ideas to. I now think that the only path forward is not practitioner-led. Instead, organizations and institutions need to take data ethics seriously. We need AI governance policies at companies that emphasize responsible deployment. We need institutional watch dogs that monitor and report on what companies and governments are doing. We need real government regulation with actual teeth that forces company lawyers to hold back engineering teams. We need expert journalists that can explain these complex and technical stories of harm to the public. These interventions need to not be zero-interest-rate phenomena. The scope of controls needs to match the scope of the tools themselves. I do think there’s some reason to be optimistic this is possible, but that’s a topic for another post.

The tornado as the introductory metaphor was not an accident. A tornado can do a lot of damage without us needing to know if it is capable of thinking, or if it “wants” to destroy us. Which, in Spinoza’s view, it very well might.

Tags

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

insights

Life beyond the leaderboard

What happens to winning solutions after a machine learning competition?

insights

(Tech) Infrastructure Week for the Nonprofit Sector

Reflections on how to build data and AI infrastructure in the social sector that serves the needs of nonprofits and their beneficiaries.

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

insights

AI sauce on everything: Reflections on ASU+GSV 2025

Data, evaltuation, product iteration, and public goods: reflections on the ASU+GSV Summit 2025.

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

insights

What a non-profit shutting down tells us about AI in the social sector

When non-profits when they shut down, we should pay attention to the assets they produce as public goods and how they can be used to drive impact.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.