blog community

Community Spotlight: Paola Ruiz, Néstor González, Daniel Crovo

The Community Spotlight features fantastic members from our DrivenData community. Three members of the IGCPHARMA team, Paola Ruiz, Néstor González, and Daniel Crovo talk to us about data science, drug discovery, diverse databases and more!

The Community Spotlight celebrates the diversity of expertise, perspectives, and experiences of our community members. In this post we sit down with Paola Ruiz, Néstor González, and Daniel Crovo, members of the IGCPHARMA team that earned prizes in both Phase 1 and Phase 2 of the PREPARE Challenge.


Picture of Paola Ruiz Picture of Néstor González Picture of Daniel Crovo

Names: Paola Ruiz Puentes, Néstor González, Daniel Andrés Crovo Pérez

Hometown: Bogotá, Colombia


To get started, tell us a little about yourself!

Paola: I am a biomedical engineer living in Bogotá, Colombia, with an M.Sc. in BME focused on AI applications in healthcare. Currently, I am the AI Manager at IGC Pharma, where I am working on creating an AI model for predicting cognitive decline in at risk individuals. Outside of work, I enjoy volunteering at my local church, reading everything from academic papers to science-fiction novels, and watching K-dramas.

Néstor: I live in Bogotá, Colombia. Most of my family lives here and it's where I’ve lived most of my life. I am a systems and computing engineer. Currently, I’m working as an AI Engineer in IGC Pharma, a pharmaceutical company invested in improving the life of those affected by neurodegenerative diseases, especially Alzheimer’s disease (AD). I work prototyping and developing AI models that can provide insights into AD and the general aging process related to cognition. Outside of work, I enjoy cooking, watching tv shows, and most recently, I’m focusing on training for a half marathon.

Daniel: I am an electronics engineer with a master’s degree in Artificial Intelligence and Electronics Engineering living in Bogotá, Colombia. I work as an AI Engineer at IGC Pharma, where we are advancing Alzheimer's Disease research with AI. My primary focus involves leveraging machine learning to optimize clinical trials, train large language models with specialized Alzheimer's Disease and aging-related datasets. Outside of work, I enjoy reading (particularly in the fields of technology, science, and philosophy), exercising and playing guitar.

How did you get started in data science?

Paola: I’m passionate about drug discovery. During my undergrad I conducted wet‑lab research on targeted drug delivery, and at the beginning of my master’s it remained as my primary focus, however I started learning the basics of AI. When the COVID pandemic closed the labs, I dedicated myself to AI and discovered its power for drug discovery, disease diagnosis, and treatment optimization. Once laboratories reopened, I elected to continue on the AI path and change the main track of my degree. Currently, my main focus is applying artificial intelligence to accelerate drug discovery and enhance disease diagnosis, specifically for Alzheimer’s Disease.

Néstor: Although, I’m currently working as an AI Engineer, data science is part of my daily life, and it’s been present since I was studying my undergrad. Even with classes that weren’t focused on data, many of the concepts were applied to data that I had to filter, clean and review in detail before being able to use it. During my thesis, which focused on epilepsy lesions, the first steps I took were towards finding good quality data and then selecting the appropriate targets to have a good base to develop AI models. Currently in my job, I face challenges like looking for databases, harmonizing and incorporating them into our AI pipelines.

Daniel: I got started in data science during the last years of my undergraduate studies, where I first learned about machine learning. I continued through self-learning, investing time in online courses and projects to build my skills. Then I worked as a Research Engineer implementing real-time deep learning systems for waste product classification where I decided to start my master’s degree in Artificial Intelligence where I had the opportunity to work in different areas such as NLP and DL for medical applications which ultimately led to my current role as an AI Engineer at IGC Pharma.

Data science is a broad field. What areas are you particularly interested in? Are there particular problems you’re interested in solving?

Daniel: Overall, I have interest in computer vision and NLP applied to the health and medical domains. I am particularly interested in working on early detection of neurological diseases and creating and finding new architectures that are fundamentally interpretable, providing clear and factual insights, as this is of the utmost importance in these domains. My focus is on developing AI solutions that can not only detect patterns in medical data but also explain their reasoning in ways that healthcare professionals can trust.

What motivated you to join a DrivenData competition?

Paola: My motivation for joining the EUREKA Challenge comes from my interest in building AI-driven early-detection tools that serve populations disproportionately affected by Alzheimer’s and related dementias, which was the focus of the challenge. For example, regions as Antioquia in Colombia, has a hereditary mutation that causes early-onset Alzheimer’s, yet such communities are often missing from the large datasets used to train most models. Likewise, happens with other communities that being the most affected by this disease are the least represented in the databases. Contributing solutions that enhance diagnosis and prognosis in underrepresented countries and ultimately improve outcomes where the need is greatest feels both meaningful and urgent.

Néstor: I participated in this challenge as part of a group representing IGC Pharma. We were motivated to participate in the challenge to test out our abilities and knowledge to develop cutting edge models related to Alzheimer’s disease and cognitive decline. Also, we want people to get to know the important work we’re doing at IGC and position ourselves as an AI driven pharmaceutical company.

Daniel: I enjoyed working on the Eureka Phase 2 Challenge. I found the approach of MCI and AD classification based on raw audio data to be very innovative. Additionally, working with audio is always fun for me as I have audio engineering training which provided additional knowledge on how to process the audio files.

After training our first baselines, we performed a complete revision of each audio file and discovered that some contained only noise without any intelligible speech. We decided to exclude these files from training and apply a voice enhancement processing pipeline to reduce noise and improve the overall audio quality. Additionally, we used audio augmentation in our data loaders to reduce overfitting and make the models more robust against data variability.

We experimented with several models (wav2vec, HuBERT, Whisper) and modalities (text, audio features and raw audio). After extensive experimentation, we found that Whisper Distilled v3 using only the raw audios delivered the best performance.

Did you have any domain expertise, or any particular insights from exploring the data, that helped you solve the problem?

Paola: One of our biggest advantages was pairing real clinical experience with a careful data exploration process. We built a team of neuropsychiatrists, clinical psychologists, and AI experts who helped us designed our approach in both phase I and II of the challenge to enable early cognitive decline. Simultaneously, we conducted a thorough review of the raw data to identify and correct issues, such as mislabeled samples, inconsistent recording conditions and languages biases. By addressing these situations early, we were able to propose adjustment strategies, and training robust models.

Daniel: I had some previous training in audio engineering, which provided prior knowledge on how to approach the voice enhancement process. By excluding the audios that were just noise, we were able to improve model performance. Additionally, by digging deeper into the data, we found class imbalances that could potentially introduce bias into the model. For example, the Mandarin audio files in the training dataset only had one label out of 3.

What was the most important technical tip or trick you used to solve the problem? Non-technical?

Néstor: Building up on previous work is always a good starting point to solve any challenge. Before starting to develop and test AI models, we started by identifying the state of the art of speech driven cognitive assessments and its junction with AI.

Daniel: The most important technical tip was to establish a baseline model performance with simpler models and then experiment with pretrained models and build from there. Non-technically, the key was to thoroughly inspect and understand the data before trying to train big models. This approach saved us time and resources by identifying issues early in the process.

Shout outs: Any good tools, posts, or projects from other developers that you appreciate or think the community might enjoy?

Paola: For drug discovery, it’s worth familiarizing yourself with the latest AI‑driven platforms. For protein folding, DeepMind’s AlphaFold, the University of Washington’s RoseTTAFold, and Meta’s ESMFold. For genomics, the Evo 2 genomic language model, and ATOMICA’s RNA modeling toolkit. Finally, Insilico Medicine’s Pharma.AI suite for end-to-end therapeutic design.

What advice (career or otherwise) would you give to your younger self?

Néstor: I would suggest to always look for opportunities of growth, even if they are outside or inside the institutions I’ve been. Also, to talk more with people with experience or knowledge in the fields or opportunities that motivate me.

Daniel: To develop the product and services ideas I had some years ago.

Data science as a field has been struggling with a well-documented lack of diversity. How have you seen diversity and representation, or the lack thereof, impact data science problems?

Paola: In medical research, most large‑scale databases still come from high‑income countries and represent mostly white populations. This limited diversity makes it hard for AI models, aimed at improving diagnosis and treatment, to generalize to underrepresented and often more vulnerable groups. Multiple programs are helping to broaden the genetic and clinical data landscape, but current efforts have not yet achieved the level of inclusivity needed for robust, widely applicable models. Investment in recruiting and integrating diverse cohorts is essential to ensure AI‑driven advances benefit all populations.

Néstor: I have seen this issue very clearly in some of the work we do at IGC regarding AI. We always look for the most diverse and complete databases we can find. However, it’s not an easy task and many times we have to settle for the available data, despite having underrepresented groups. A very clear example of this actually relates to the database we proposed for the first phase of this challenge. We found a Mexican database for aging and cognition, with which we aimed to balance the Latino population on the whole set of databases we have encountered. In this matter, there is a lack of large scale, harmonized data on Latino populations, which are known to have very different care systems for the elder, for example, by taking care of them in a family household rather than on nursing homes. Also, Latinos are known to be more prone to develop Alzheimer’s disease, which is why we are devoted to including these groups in our models.

What hurdles have you had to overcome to become a data scientist? What advice would you give to others facing the same challenges?

Daniel: Start learning machine learning, deep learning, and AI in general. Get a good understanding of the concepts and start implementing things from scratch if possible. Then develop end to end projects including the UI to interact with the model.

If you could run a competition, what would it be about?

Paola: I would like to run a competition to advance targeted drug delivery. Today’s medications often cause significant off-target and systemic adverse effects, but by engineering more precise delivery vehicles, we can greatly reduce those complications. Leveraging modern technologies, such as nanoparticle carriers, antibody–drug conjugates, and AI-driven design will allow us to tailor delivery systems to each therapeutic use case and maximize both safety and efficacy.

Have you read any good books or articles recently?

Daniel: I am reading Proofs: A Long-Form Mathematics Textbook by Jay Cummings. It is a very fun and easy way to learn about mathematical rigor and proofs.

Where can the community find you online?


Thanks to Paola, Néstor, and Daniel (@IGCPHARMA) for sharing their experiences and advice with us! We are excited to feature more great community members. If you think you or someone you know would be a great addition to a future Community Spotlight, let us know!

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of the AI for Advancing Instruction Challenge

Learn how the winners of the AIAI challenge leveraged multimodal classroom data to identify instructional activities and classroom discourse content.

case studies

Automating wildlife monitoring with Zamba & Zamba Cloud

DrivenData partnered with conservation researchers to create Zamba, an open-source machine learning solution that helps wildlife researchers process camera trap footage, reducing months of manual review to hours of automated analysis.

community

Community Spotlight: Paola Ruiz, Néstor González, Daniel Crovo

The Community Spotlight features fantastic members from our DrivenData community. Three members of the IGCPHARMA team, Paola Ruiz, Néstor González, and Daniel Crovo talk to us about data science, drug discovery, diverse databases and more!

community

Community Spotlight: Kirill Brodt

The Community Spotlight features fantastic members from our DrivenData community. Kirill Brodt, a researcher in computer graphics at the University of Montreal, talks animation, pose estimation, and data science challenges.

case studies

Jump-starting data infrastructure and in-house data expertise

DrivenData designed and built a data warehouse to centralize, organize, and visualize data across CodePath's operations. Our team also provided technical hiring assistance to find the right talent to carry the work forward.

case studies

A production application to support survivors of human trafficking

DrivenData developed Freedom Lifemap, a digital tool designed to support survivors of human trafficking on their journey toward reintegration and independence.

insights

Life beyond the leaderboard

What happens to winning solutions after a machine learning competition?

insights

(Tech) Infrastructure Week for the Nonprofit Sector

Reflections on how to build data and AI infrastructure in the social sector that serves the needs of nonprofits and their beneficiaries.

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

insights

AI sauce on everything: Reflections on ASU+GSV 2025

Data, evaltuation, product iteration, and public goods: reflections on the ASU+GSV Summit 2025.

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

case studies

Crowdsourcing solutions for AI-assisted early literacy screening

DrivenData ran a machine learning competition to develop models for scoring audio recordings from literacy screener exercises completed by students in kindergarten through 3rd grade.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

case studies

Mapping agricultural trends in Yemen during crisis

DrivenData partnered with The World Bank to use machine learning and remote sensing data to track agricultural changes across Yemen from 2019-2024, providing critical insights for food security planning in a conflict-affected region.

case studies

Making higher education data more accessible

DrivenData partnered with Science for America to develop scipeds, an open source Python library and interactive data visualization platform designed to simplify the analysis of U.S. higher education data from IPEDS and to illuminate trends and disparities in STEM education.

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.