We are starting an initiative at DrivenData to feature some of the fantastic members in our data science community. The goal of the Community Spotlight is to bring greater visibility to the diversity of expertise, perspectives, and experiences of our community members.
In this post we sit down with Will Koehrsen, a winner of the Power Laws Forecasting Challenge and full-stack data scientist.
Name: Will Koehrsen¶
Hometown: Metamora, Illinois¶
To get started, tell us a little about yourself.
I’m currently a full-stack data scientist at Cortex Building Intelligence, a startup that helps office buildings operate more energy-efficiently. Full-stack means I’m responsible for the whole data science pipeline, taking concepts from idea to continuous, real-time operating recommendations. The work is very rewarding, in terms of both the day-to-day construction of machine learning systems and the larger mission of reducing carbon dioxide emissions while helping building owners save on energy bills. Energy efficiency is a field where the best decision for the environment is also the best for your wallet, and where a little data science can go a long way.
Until March of this year, I lived in New York City; since then, I’ve been sheltering back home in the Midwestern US. Outside of work, I like to give back to the community through volunteering. In-person volunteering has been put on hold with the pandemic. Still, there are ways to help out online, like calling to check in on seniors, or working on data science problems with a social benefit on DrivenData!
I used to write frequently about data science on Towards Data Science, but that’s been put on hold while pursuing an online Master’s in Computer Science from Georgia Tech.
For stress relief, I go on long runs every day and occasionally compete in ultramarathons (the farthest so far was 100 miles).
How did you get started in data science?
It’s tempting to tell a neat story about how I got into data science, but it was a random series of events that led me to this career. In college, I studied mechanical engineering but had two terrible internships that convinced me to change fields. While working one of those internships at NASA, I decided to explore this new field of data science I had heard so much about. Since I didn’t have real responsibilities, I spent almost every hour on the job taking Udacity courses in data science and machine learning.
After the internship, I went back for my senior year. While it was too late to switch my major, I was able to get on a research project using machine learning to help buildings save energy. Little did I realize this project would precisely match a DrivenData challenge and, eventually, my career at Cortex. I spent that year self-studying data science, writing an article a week, and competing in the Power Laws: Forecasting Energy Consumption challenge. Coming in third in that competition was a huge confidence boost and convinced me I could make it in this field even without formal university training.
After graduation, I was fortunate to be hired at FeatureLabs as a data scientist on the strength of my blogs and projects. A few months later, I came to work at Cortex where I’ve been for almost two years.
Coming in third in that competition was a huge confidence boost and convinced me I could make it in this field even without formal university training.
Data science is a broad field. What areas are you particularly interested in? Are there particular problems you’re interested in solving?
The applied aspect of data science - how can we get machine learning models into production and put the results in front of decision-makers - interests me the most. I’m thankful for researchers who develop the algorithms (the random forest being my go-to), but I’m more concerned about changing real-world outcomes.
That means I spend a lot of time thinking about infrastructure, including databases, cloud computing, monitoring, and what is generally considered software engineering (testing, deployment, writing high-quality software). At the end of the day, you can use the most cutting-edge algorithms, but if you can’t get those results to the people who need them when they need them, you can’t improve real-world systems.
There are so many fields with so many problems that need data science, but I’ve chosen to focus on energy, in particular energy efficiency. Climate change is the defining problem of our generation, and data science can help immensely on both the supply (clean energy sources) and demand (reducing energy needs) sides. Admittedly, I’m somewhat of an idealist, and I firmly believe data science should be used to make the world a better place.
It’s frustrating seeing all the talent and effort devoted to getting people to click on ads, spend more time on social media, or buy more consumer goods. I want to tell my grandkids I was part of the solution, not the problem, and climate change is where I can have the most positive impact. Each of us has a choice to make, and I believe our obligation is not to maximize material gain but to ensure humanity’s flourishing far into the future.
What motivated you to join a DrivenData competition?
The combination of gaining experience while solving actual problems with data science and working on a project with socially beneficial objectives initially drew me to DrivenData. The first competition I participated in was the Forecasting Energy Demand part of the Power Laws series of challenges. Seeing that I could predict actual energy usage with machine learning was a transformational experience. I saw that what I had read about and studied could be put to use. Working with the toy datasets used in books and college classes gets tedious pretty quick, but DrivenData provided real-world data and the reward of tackling more ambitious problems. That first competition cemented not only my desire to work in the data science field, but also to use machine learning for positive ends.
I continue participating in challenges to keep my skills up-to-date. There’s always something new on the data science scene, and, while I mostly use proven, battle-tested methods in my job, it’s still fun to see the latest developments and try to put them to practice. It feels like if you’re not continually learning in data science, you’re falling behind (The Red Queen’s Race).
The community aspect of challenges is also a large draw - many of the competitions are more like collaborations (especially if you’re not in it for the cash), and it’s enjoyable to work with others. Combine the prospects of improving my skill set, doing a small part to make the world better, and connecting with others in the field, and DrivenData competitions are a win all-around.
It’s frustrating seeing all the talent and effort devoted to getting people to click on ads, spend more time on social media, or buy more consumer goods. I want to tell my grandkids I was part of the solution, not the problem, and climate change is where I can have the most positive impact.
Shout outs: Any good tools, posts, or projects from other developers that you appreciate or think the community might enjoy?
There are too many great things in data science to list, but a few of my favorites are:
- Editor: VSCode for development, classic Jupyter Notebook for exploration
- Data science book: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurelion Geron - taught me 90%+ of what I know about data science
- PSQL CLI: pgcli - excellent tab-completion + nice aesthetics, what’s not to like?
- Blog: Normcore Tech - talks about tech without all the hype
- Code autocompletion: Tabnine - worryingly-accurate AI code completion
- Visualization library: plotly - switching to plotly from matplotlib was life-altering
- YouTube channel: sentdex - made me fall for Python
- Podcast: AI Podcast with Lex Fridman - good in-depth conversations on AI
- Udacity course: Data Analyst Nanodegree - much better than any of my university courses
- Data Science Newsletter: The Data Science Roundup by Tristan Handy - filters through all the articles to send you a few quality ones every week
- Ethics checklist: Deon - every organization should have a data science code of ethics
What advice (career or otherwise) would you give to your younger self?
Prioritize mental health. Reach out and get help as soon as you need it. There’s nothing stigmatizing about mental health issues, and there are effective ways of addressing these very real problems.
Besides your internships, no employers will look at your resume, so spend less time worrying about grades and more time enjoying the college experience. Who you know is often more important than what you know, so spend at least as much time forming relationships as studying. You don’t understand something until you can implement it, so spend less time reading and more time doing. Have a bias towards action.
Explaining a topic to others is also a good test of whether you know the concept, and communication is one of the essential skills you’ll need in any career. Share your results early and often to get feedback and spot issues while they can be quickly addressed. Constructive criticism is invaluable - always be asking what you can do better.
What hurdles have you had to overcome to become a data scientist? What advice would you give to others facing the same challenges?
There aren’t yet best standards for many aspects of data science, so if you’re looking for the optimal solution, you’re probably not going to find a consensus. As a perfectionist, I often struggled because I couldn’t find the ideal answer. I probably spent too much time reading about methods and not enough time trying them out. Data science is mostly an empirical (as opposed to theoretical) field, which means that if you want to know whether a given method will work, you just have to try it.
Being self-taught, I often worried I didn’t have the credentials to make it as a data scientist. That first DrivenData competition helped boost my confidence, as did writing about my data science projects and seeing the positive feedback (and occasional constructive criticism). I’d tell others who don’t have the largely meaningless yet highly-valued pieces of paper known as degrees to do projects with real-world data and make them public (blog posts). Having a body of work to point to can help you get in the door and gives you a chance to demonstrate your skills.
Moreover, be ambitious when applying for jobs. My current position asked for 5-10 years of data science experience (as if anyone had that much at the time!); I had three months at another startup and 9 months of research. Fortunately, my oeuvre of data science projects spoke volumes and I was given the chance to prove myself with an assignment. That might not have been possible at a larger company, but still, try reaching out and pointing to your public work.
If you found out today was your last day on Earth, what would you do?
My last day on Earth will probably be around 2030 when Elon Musk’s settlement on Mars opens to the public. I’ll likely be pretty busy making sure I packed everything for the trip, but I’d get in one last long run, preferably somewhere in the mountains.
Have you read any good books or articles recently?
Books are the best tools we have for sharing knowledge so I try to read a good deal across a wide range of subjects. Here’s a few books I’ve recently found worthwhile:
- Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker - In almost every measurable category - wealth, health, longevity, rights - the world is better now than ever; it’s only our perception (shaped by the news) that is getting more negative. This is critical because it shows that progress is possible, and we can continue to increase human flourishing with the tools of science and technology.
- Rewiring America: A Handbook for Winning the Climate Fight by Saul Griffith (available for free) - A no-nonsense plan for combating climate change by electrifying America, creating millions of jobs and improving quality of life in the process. This report is a welcome contrast to the usual doom and gloom that pervades climate change discussions and provides an optimistic path to address climate change.
- Utopia for Realists: How We Can Build the Ideal World by Rutger Bregman - Universal Basic Income (UBI) should form the foundation of our social safety net so people are not forced to work meaningless jobs just to survive. (Yes, I’m solidly on Team Yang)
- Thirst: A Story of Redemption, Compassion, and a Mission to Bring Clean Water to the World by Scott Harrison - A nightclub promoter gives up a life of drinking and partying to start a charity to bring clean water to every person on Earth. Doesn’t get much more inspiring than that.
- Uncanny Valley by Anna Weiner: A non-technologist takes her skills to Silicon Valley and achieves impressive success while detailing many of the faults of the tech industry.
- The Road to Character by David Brooks - biographies of individuals who dedicated their lives to socially beneficial causes. From the stories, it’s clear that there is nothing more fulfilling than spending one’s life working for the good of humanity.
Photo: The cover of Rewiring America/rewiringamerica.org.
Where can the community find you online?
- GitHub: https://github.com/WillKoehrsen/
- Medium: https://medium.com/@williamkoehrsen
- Twitter: https://twitter.com/koehrsen_will
Thanks to Will (@willkoehrsen) for sharing his thoughts on work, life, and data science! We are excited to feature more great community members. If you think you or someone you know would be a great addition to a future Community Spotlight, let us know!