The organization¶
BetterUp Labs is the research and development arm of BetterUp, a platform that combines coaching, AI, and behavioral science to drive personal and professional growth. With expertise spanning business, academia, and behavioral science, the Labs team generates research-driven insights using the world’s largest real-time dataset on workplace coaching and performance. Its pioneering work shapes BetterUp’s coaching products and methodologies.
The challenge¶
To advance the science of coaching, BetterUp Labs set out to explore the deep dynamics of human conversation. This required designing and executing the collection and analysis of large-scale multimodal (video, audio, text) data from the ground up. The team needed robust technical infrastructure and advanced machine learning methods to transform this raw data into meaningful scientific insights.
The approach¶
BetterUp Labs partnered with DrivenData on a multi-year R&D initiative to make this vision a reality. Our work focused on three core areas:
-
Building the end-to-end data pipeline
We engineered a robust system to handle the full lifecycle of more than 1,000 hours of newly collected multimodal conversational data. This pipeline supported raw data ingestion, validation, cross-modal synchronization, and complex feature extraction—creating the CANDOR Corpus, a novel and large-scale research asset.
-
Applying advanced AI/ML for multimodal analysis
We developed and validated models to extract deep insights across modalities:
- Visual: Gaze direction, smiles, head gestures
- Acoustic: Vocal prosody and paralinguistic features
- Textual and Semantic: Sentiment analysis, emotion recognition, semantic embeddings
- Conversational Dynamics: Turn-taking behaviors and transcription enhancement
-
Delivering the research toolkit
We created a comprehensive Python toolkit that gave researchers streamlined access to every layer of the data—from raw media and transcripts to structured multimodal features and metadata. The research toolkit enabled ML-powered feature generation and accelerated exploration of the CANDOR Corpus.
Throughout the project, we collaborated closely with BetterUp Labs on experimental design and analytical approaches to ensure the tools directly supported their scientific goals.
Topics in the CANDOR corpus track major societal shifts in 2020, with COVID-19 rapidly emerging as a dominant subject alongside enduring themes like family, and spikes in discussion around the presidential election and policing reflecting national attention.
The results¶
-
The CANDOR corpus and pipeline
We delivered a unique, 1,000+ hour dataset of real-world conversations between Americans, collected in 2020 and enriched with hundreds of behavioral measures. By automating validation and processing, the pipeline ensured consistent data quality and made it easier for researchers to build on the corpus. The dataset has since supported a range of downstream applications, including the interactive storytelling project Hello Stranger, bringing insights from real conversations to broader audiences.
-
The research toolkit and algorithms
The toolkit provided seamless access to multimodal data and machine learning algorithms for deep feature extraction, enabling sophisticated analysis of conversational behavior.
-
Landmark scientific contributions
Our technical work led directly to the CANDOR corpus and its publication in Science Advances, a leading, high-impact journal from the American Association for the Advancement of Science (AAAS). The resulting dataset and tools continue to drive ongoing research into the science of conversation and behavior.
-
Foundation for future innovation
This collaboration established a lasting technical foundation—including data, tools, and validated methods—that continues to support BetterUp Labs' cutting-edge research and product innovation in conversational AI.
"DrivenData is consistently excellent—world-class caliber. It's so rare to find the mix of technical skill, critical thinking, business savvy, and just-plain-fun-to-work-with vibes that hallmark their entire team. Whenever we have a gnarly problem to solve at the intersection of AI, analytics, and real-world application, my first thought is, 'Call DrivenData!' True data science superheroes."