It’s been 3 months since I started my new position as a Machine Learning Engineer (MLE) at Spotify. What I like most about this position is that I get to participate in building an end-to-end pipeline, including ideation and experiment, data engineering, machine learning modeling, model serving, online A/B test, monitoring, and many more.
Disclaimer: All posts represent my personal experience only. Opinions expressed are solely my own and do not express the views or opinions of any organization or group.
Sometimes I feel that it was just yesterday when I was greeted by my onboarding “buddy” and went through the process of setting up my laptop and access; and sometimes I feel that I have been here for a long time with a lot of memories: orientation sessions, engineering bootcamp, offsite meetings, holiday parties, daily standup, sprint planning, pair coding, more pair coding, and even more pair coding.
In my first job (see previous posts), I focused exclusively on the machine learning modeling part, and most of the time I worked with Jupyter notebooks to investigate ways to improve the model. Occasionally I contributed to the production code repo, but was not actively involved in the engineering heavy lifting and pipelining. However, as elaborated in Hidden Technical Debt in Machine Learning Systems, machine learning modeling accounts for a small fraction of the whole system. A working Jupyter notebook is far away from a working machine learning system in the real world.
I did not know this system diagram when I first took data science courses on Coursera, and did not take this diagram seriously even when I started to work as a data scientist. I worked on improving my knowledge in math, statistics, and modeling in the “ML code” (see the Connect the Dots series), and did not spend much time in the rest of the system, falsely assuming that “it is just tedious engineering work and does not require much creativity“.
Over time, it became a challenge for me to materialize my research ideas and discoveries into production, not only because of my lack of knowledge and experience in engineering, but also because the expectation for my position was on “scientific ideation”, not implementation. When a research project was eventually to be handed over to engineers who were not necessarily machine learning experts, it could result in repetition in work, cross-team dependencies, and unclear ownership.
It was this challenge that brought me to the MLE position at Spotify. During the interview process, I learned that the expectation for a MLE here was more than research notebooks, but to deploy an end-to-end production pipeline, which was exactly what I was looking for.
After I joined Spotify, I spend most of my time improving my engineering knowledge and skills, and not so much on math and statistics which I focused mostly in my previous experience. My current project involves all the components shown in the ML system diagram above, and the learning curve is certainly very steep. While I still use Jupyter notebooks to experiment ideas and test models, I am glad to be able to work on the production code base directly as an “engineer”, get to understand each part of the system, write production code in Python/Scala/Java, add new features, create pull requests, and perform code review. It is very rewarding to see my code merged into master and running in production, serving Spotify users in real time.
One year ago, I conducted a survey Do you write production code as a data scientist? and reported that 60% data scientists are hybrid: describing their work as both research and engineering. Nomenclature aside, “data scientists” are becoming more and more involved in production. It may become essential to not only learn ML on Coursera and do Kaggle projects, but also to learn fundamental computer science and system design. Here are some recommendations for learning ML systems: