blog community

Community Spotlight: Quy Nguyen

The Community Spotlight features fantastic members from our DrivenData community. Quy Nguyen, a data scientist from Vietnam, talks about building time series models with lag features and the importance of storytelling in data science.

We are starting an initiative at DrivenData to feature some of the fantastic members in our data science community. The goal of the Community Spotlight is to bring greater visibility to the diversity of expertise, perspectives, and experiences of our community members.

In this post we sit down with Quy Nguyen, a winner of the safe aging challenge and data scientist working in Singapore.


Picture of Quy

Name: Quy Nguyen

Hometown: Binh Dinh Province, Vietnam


To get started, tell us a little about yourself.

Growing up in a rural area in Vietnam, it has been quite a journey for me to become a senior data scientist. Currently, I work in Singapore. I enjoy making sense of data from data exploration, modeling, running experiments, and creating impact in social and business settings.

When I am not working, I like traveling, reading, cooking and spending time with my family and friends.

How did you get started in data science?

My major was in information systems which taught me how to collect, organize, process, and present data, so it was natural for me to get into data science. In college, I took fundamental courses in programming, artificial intelligence, data mining, and databases. Most of my modules were project-based, where we were expected to build things that are interesting. A few challenging projects I still remember are detecting contours in images, password authentication using keystroke biometrics, protein structure prediction, and location-based restaurant recommendation.

My journey into the real world started when I got an internship where my mentor was a PhD in computer science from Georgia Tech. I was introduced to data science competitions and got hooked from that point forward. I spent most of my free time learning data science on Coursera and doing data science competitions. Along the way, I met and learned from many other talented data scientists around the world. I am thankful for encouraging professors, teachers, mentors, talented classmates and colleagues who guided and helped me grow my interest in data science.

What motivated you to join a DrivenData competition?

As an applied data scientist, I always want to work on different kinds of problems to gain domain knowledge and to keep my skills sharp. I found DrivenData as a diverse and interesting set of real-world challenges.

Storytelling or effective communication is a very important skill for any data scientist. Its role is equally important as other technical skills. It is extremely useful for data scientists when defining problems to be able to effectively communicate the projects’ outcomes to business stakeholders.

Is there a particular DrivenData challenge you’ve enjoyed working on?

I was very excited about solving the Safe Aging challenge in which I was one of the winners. This competition was quite challenging and exciting due to the many potential applications of human activity recognition in healthcare, fitness, and public safety. By predicting activities of daily living and posture or ambulation from the participants, I had the opportunity to help the elderly to live safely at home, and this was the main motivation for me to take part. I was happy to be part of the journey of preparing for the future of aging population and realizing the very meaningful vision of the project. Besides, the challenge provided various kinds of data, from wearable sensory data to RGB-D camera, and passive environmental sensor data.

I worked with a teammate on this problem set - he designed the main flow of the modeling, stacking and validation processes, while I worked on feature engineering. As the data points were all in a time series, the lag features of previous data points of the time series were very strong predictors. Before generating lag features, I reduced the noise from the signal. I extracted a few sets of features such as statistical features, physical interpretations of human motion. Building an extra model to generate more features is also helpful, for example detecting location. Intuitively, this variable should be very useful to predict the activity of the person: for instance, when someone is in a bathroom, it should be very unlikely for him to be jumping or lying down!

The details of our approach can be found at here. After the competition, we collaborated with the 1st place winner and published a paper. Our work was accepted at Advanced Data Mining and Application Conference (ADMA).

What hurdles have you had to overcome to become a data scientist? What advice would you give to others facing the same challenges?

I would say storytelling or effective communication is a very important skill for any data scientist. Its role is equally important as other technical skills. It is extremely useful for data scientists when defining problems or business objectives to be able to effectively communicate the projects’ outcomes to business stakeholders.

In the beginning, most of my projects stopped at the POC stage. Initially, it was hard for me to convince my product managers to put my model in production because they were not able to recognize its viability. Thanks to my managers and mentors, I gradually got better at this. For other data scientists facing the same challenge, I would say "practice makes perfect". For instance, mentoring interns and teaching are excellent opportunities to practice explaining complex technical terms and concepts. Reading books, blog posts, and related business topics is also a great way to learn about business or gain a sense of how a product or process works.

Have you read any good books or articles recently?

I recently watched The Social Dilemma, a documentary that illustrates the societal impacts of machine learning on human and artificial intelligence (AI) safety. There are many advanced AI systems that can nudge people to spend more time on services so that companies can grow their revenue. It would be nice to work on AI systems that nudge individuals to improve health, wealth and happiness while keeping data ethics and rights in mind.

The Social Dilemma poster

Photo: The Social Dilemma promotional poster/Netflix.

Where can the community find you online?

I can be found on LinkedIn at http://sg.linkedin.com/in/quyntk/.


Thanks to Quy (@kimquy06) for sharing her thoughts on work, life, and data science! We are excited to feature more great community members. If you think you or someone you know would be a great addition to a future Community Spotlight, let us know!

Stay updated

Join our newsletter or follow us for the latest on our social impact projects, data science competitions and open source work.

There was a problem. Please try again.
Subscribe successful!
Protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Latest posts

All posts

winners

Meet the winners of Phase 2 of the PREPARE Challenge

Learn about how winners detected cognitive decline using speech recordings and social determinants of health survey data

resources

Open-source packages for using speech data in ML

Overview of key open-source packages for extracting features from voice data to support ML applications

tutorial

Getting started with LLMs: a benchmark for the 'What's Up, Docs?' challenge

An introduction to using large language models via the benchmark to a document summarization challenge.

winners

Meet the Winners of the Goodnight Moon, Hello Early Literacy Screening Challenge

Learn about the results and winning methods from the early literacy screening challenge.

resources

Where to find a data job for a good cause

Finding data jobs for good causes can be difficult. Learn strategies, job lists, and tips to find organizations with open positions working on causes you care about.

winners

Meet the Winners of the Youth Mental Health Narratives Challenge

Learn about the winning solutions from the Youth Mental Health Challenge Automated Abstraction and Novel Variables Tracks

winners

Meet the winners of the Forecast and Final Prize Stages of the Water Supply Forecast Rodeo

Learn about the winners and winning solutions from the final stages of the Water Supply Forecast Rodeo.

insights

10 takeaways from 10 years of data science for social good

This year DrivenData celebrates our 10th birthday! We've spent the past decade working to use data science and AI for social good. Here are some lessons we've learned along the way.

tutorial

Goodnight Moon, Hello Early Literacy Screening Benchmark

In this guest post from the MIT Gabrieli Lab, we'll show you how to get started with the literacy screening challenge!

tutorial

Youth Mental Health: Automated Abstraction Benchmark

Learn how to process text narratives using open-source LLMs for the Youth Mental Health: Automated Abstraction challenge

winners

Meet the winners of Phase 1 of the PREPARE Challenge

Learn about the top datasets sourced for Phase 1 of the PREPARE Challenge.

resources

Teaching with DrivenData Competitions

Inspiration and resources for teaching students data science, machine learning, and AI skills with DrivenData competitions.

winners

Meet the winners of the Pose Bowl challenge

Learn about the top solutions submitted for the Pose Bowl: Spacecraft Detection and Pose Estimation Challenge.

winners

Meet the winners of the Water Supply Forecast Rodeo Hindcast Stage

Learn about the winning models for forecasting seasonal water supply from the first stage of the Water Supply Forecast Rodeo.

tools

Cookiecutter Data Science V2

Announcing the V2 release of Cookiecutter Data Science, the most widely adopted data science project template.

resources

How to make data science projects more open and inclusive

Key practices from the field of open science for making data science work more transparent, inclusive, and equitable.

winners

Meet the winners of the Kelp Wanted challenge

Dive into the solutions from the super segmenters who best detected kelp in Landsat imagery!

winners

Meet the winners of the SNOMED CT Entity Linking Challenge

Meet the winners with the best systems for detecting clinical terms in medical notes.

winners

Meet the winners of the Pale Blue Dot challenge

Learn about the top visuals created for the Pale Blue Dot: Visualization Challenge and the solvers behind them.

tutorial

NASA Pose Bowl - Benchmark

An introduction to the NASA Pose Bowl competition, with a benchmark solution for the object detection track

Work with us to build a better world

Learn more about how our team is bringing the transformative power of data science and AI to organizations tackling the world's biggest challenges.