The Community Spotlight celebrates the diversity of expertise, perspectives, and experiences of our community members. In this post we sit down with Paola Ruiz, Néstor González, and Daniel Crovo, members of the IGCPHARMA team that earned prizes in both Phase 1 and Phase 2 of the PREPARE Challenge.
Names: Paola Ruiz Puentes, Néstor González, Daniel Andrés Crovo Pérez¶
Hometown: Bogotá, Colombia¶
To get started, tell us a little about yourself!
Paola: I am a biomedical engineer living in Bogotá, Colombia, with an M.Sc. in BME focused on AI applications in healthcare. Currently, I am the AI Manager at IGC Pharma, where I am working on creating an AI model for predicting cognitive decline in at risk individuals. Outside of work, I enjoy volunteering at my local church, reading everything from academic papers to science-fiction novels, and watching K-dramas.
Néstor: I live in Bogotá, Colombia. Most of my family lives here and it's where I’ve lived most of my life. I am a systems and computing engineer. Currently, I’m working as an AI Engineer in IGC Pharma, a pharmaceutical company invested in improving the life of those affected by neurodegenerative diseases, especially Alzheimer’s disease (AD). I work prototyping and developing AI models that can provide insights into AD and the general aging process related to cognition. Outside of work, I enjoy cooking, watching tv shows, and most recently, I’m focusing on training for a half marathon.
Daniel: I am an electronics engineer with a master’s degree in Artificial Intelligence and Electronics Engineering living in Bogotá, Colombia. I work as an AI Engineer at IGC Pharma, where we are advancing Alzheimer's Disease research with AI. My primary focus involves leveraging machine learning to optimize clinical trials, train large language models with specialized Alzheimer's Disease and aging-related datasets. Outside of work, I enjoy reading (particularly in the fields of technology, science, and philosophy), exercising and playing guitar.
How did you get started in data science?
Paola: I’m passionate about drug discovery. During my undergrad I conducted wet‑lab research on targeted drug delivery, and at the beginning of my master’s it remained as my primary focus, however I started learning the basics of AI. When the COVID pandemic closed the labs, I dedicated myself to AI and discovered its power for drug discovery, disease diagnosis, and treatment optimization. Once laboratories reopened, I elected to continue on the AI path and change the main track of my degree. Currently, my main focus is applying artificial intelligence to accelerate drug discovery and enhance disease diagnosis, specifically for Alzheimer’s Disease.
Néstor: Although, I’m currently working as an AI Engineer, data science is part of my daily life, and it’s been present since I was studying my undergrad. Even with classes that weren’t focused on data, many of the concepts were applied to data that I had to filter, clean and review in detail before being able to use it. During my thesis, which focused on epilepsy lesions, the first steps I took were towards finding good quality data and then selecting the appropriate targets to have a good base to develop AI models. Currently in my job, I face challenges like looking for databases, harmonizing and incorporating them into our AI pipelines.
Daniel: I got started in data science during the last years of my undergraduate studies, where I first learned about machine learning. I continued through self-learning, investing time in online courses and projects to build my skills. Then I worked as a Research Engineer implementing real-time deep learning systems for waste product classification where I decided to start my master’s degree in Artificial Intelligence where I had the opportunity to work in different areas such as NLP and DL for medical applications which ultimately led to my current role as an AI Engineer at IGC Pharma.
Data science is a broad field. What areas are you particularly interested in? Are there particular problems you’re interested in solving?
Daniel: Overall, I have interest in computer vision and NLP applied to the health and medical domains. I am particularly interested in working on early detection of neurological diseases and creating and finding new architectures that are fundamentally interpretable, providing clear and factual insights, as this is of the utmost importance in these domains. My focus is on developing AI solutions that can not only detect patterns in medical data but also explain their reasoning in ways that healthcare professionals can trust.
What motivated you to join a DrivenData competition?
Paola: My motivation for joining the EUREKA Challenge comes from my interest in building AI-driven early-detection tools that serve populations disproportionately affected by Alzheimer’s and related dementias, which was the focus of the challenge. For example, regions as Antioquia in Colombia, has a hereditary mutation that causes early-onset Alzheimer’s, yet such communities are often missing from the large datasets used to train most models. Likewise, happens with other communities that being the most affected by this disease are the least represented in the databases. Contributing solutions that enhance diagnosis and prognosis in underrepresented countries and ultimately improve outcomes where the need is greatest feels both meaningful and urgent.
Néstor: I participated in this challenge as part of a group representing IGC Pharma. We were motivated to participate in the challenge to test out our abilities and knowledge to develop cutting edge models related to Alzheimer’s disease and cognitive decline. Also, we want people to get to know the important work we’re doing at IGC and position ourselves as an AI driven pharmaceutical company.
Daniel: I enjoyed working on the Eureka Phase 2 Challenge. I found the approach of MCI and AD classification based on raw audio data to be very innovative. Additionally, working with audio is always fun for me as I have audio engineering training which provided additional knowledge on how to process the audio files.
After training our first baselines, we performed a complete revision of each audio file and discovered that some contained only noise without any intelligible speech. We decided to exclude these files from training and apply a voice enhancement processing pipeline to reduce noise and improve the overall audio quality. Additionally, we used audio augmentation in our data loaders to reduce overfitting and make the models more robust against data variability.
We experimented with several models (wav2vec, HuBERT, Whisper) and modalities (text, audio features and raw audio). After extensive experimentation, we found that Whisper Distilled v3 using only the raw audios delivered the best performance.
Did you have any domain expertise, or any particular insights from exploring the data, that helped you solve the problem?
Paola: One of our biggest advantages was pairing real clinical experience with a careful data exploration process. We built a team of neuropsychiatrists, clinical psychologists, and AI experts who helped us designed our approach in both phase I and II of the challenge to enable early cognitive decline. Simultaneously, we conducted a thorough review of the raw data to identify and correct issues, such as mislabeled samples, inconsistent recording conditions and languages biases. By addressing these situations early, we were able to propose adjustment strategies, and training robust models.
Daniel: I had some previous training in audio engineering, which provided prior knowledge on how to approach the voice enhancement process. By excluding the audios that were just noise, we were able to improve model performance. Additionally, by digging deeper into the data, we found class imbalances that could potentially introduce bias into the model. For example, the Mandarin audio files in the training dataset only had one label out of 3.
What was the most important technical tip or trick you used to solve the problem? Non-technical?
Néstor: Building up on previous work is always a good starting point to solve any challenge. Before starting to develop and test AI models, we started by identifying the state of the art of speech driven cognitive assessments and its junction with AI.
Daniel: The most important technical tip was to establish a baseline model performance with simpler models and then experiment with pretrained models and build from there. Non-technically, the key was to thoroughly inspect and understand the data before trying to train big models. This approach saved us time and resources by identifying issues early in the process.
Shout outs: Any good tools, posts, or projects from other developers that you appreciate or think the community might enjoy?
Paola: For drug discovery, it’s worth familiarizing yourself with the latest AI‑driven platforms. For protein folding, DeepMind’s AlphaFold, the University of Washington’s RoseTTAFold, and Meta’s ESMFold. For genomics, the Evo 2 genomic language model, and ATOMICA’s RNA modeling toolkit. Finally, Insilico Medicine’s Pharma.AI suite for end-to-end therapeutic design.
What advice (career or otherwise) would you give to your younger self?
Néstor: I would suggest to always look for opportunities of growth, even if they are outside or inside the institutions I’ve been. Also, to talk more with people with experience or knowledge in the fields or opportunities that motivate me.
Daniel: To develop the product and services ideas I had some years ago.
Data science as a field has been struggling with a well-documented lack of diversity. How have you seen diversity and representation, or the lack thereof, impact data science problems?
Paola: In medical research, most large‑scale databases still come from high‑income countries and represent mostly white populations. This limited diversity makes it hard for AI models, aimed at improving diagnosis and treatment, to generalize to underrepresented and often more vulnerable groups. Multiple programs are helping to broaden the genetic and clinical data landscape, but current efforts have not yet achieved the level of inclusivity needed for robust, widely applicable models. Investment in recruiting and integrating diverse cohorts is essential to ensure AI‑driven advances benefit all populations.
Néstor: I have seen this issue very clearly in some of the work we do at IGC regarding AI. We always look for the most diverse and complete databases we can find. However, it’s not an easy task and many times we have to settle for the available data, despite having underrepresented groups. A very clear example of this actually relates to the database we proposed for the first phase of this challenge. We found a Mexican database for aging and cognition, with which we aimed to balance the Latino population on the whole set of databases we have encountered. In this matter, there is a lack of large scale, harmonized data on Latino populations, which are known to have very different care systems for the elder, for example, by taking care of them in a family household rather than on nursing homes. Also, Latinos are known to be more prone to develop Alzheimer’s disease, which is why we are devoted to including these groups in our models.
What hurdles have you had to overcome to become a data scientist? What advice would you give to others facing the same challenges?
Daniel: Start learning machine learning, deep learning, and AI in general. Get a good understanding of the concepts and start implementing things from scratch if possible. Then develop end to end projects including the UI to interact with the model.
If you could run a competition, what would it be about?
Paola: I would like to run a competition to advance targeted drug delivery. Today’s medications often cause significant off-target and systemic adverse effects, but by engineering more precise delivery vehicles, we can greatly reduce those complications. Leveraging modern technologies, such as nanoparticle carriers, antibody–drug conjugates, and AI-driven design will allow us to tailor delivery systems to each therapeutic use case and maximize both safety and efficacy.
Have you read any good books or articles recently?
Daniel: I am reading Proofs: A Long-Form Mathematics Textbook by Jay Cummings. It is a very fun and easy way to learn about mathematical rigor and proofs.
Where can the community find you online?
Thanks to Paola, Néstor, and Daniel (@IGCPHARMA) for sharing their experiences and advice with us! We are excited to feature more great community members. If you think you or someone you know would be a great addition to a future Community Spotlight, let us know!