“Which language do you use at work?” I get this question quite often. My short answer is usually “Python for research and Scala for production”. In this post, I will give more descriptive answers and examples to this question.
Time flies. It’s been 7 months since I started to work as a full-time Data Scientist. It sometimes feels much shorter than 7 months: imaging neurons in the laboratory as a graduate student and walking on the stairs in front of Alma mater on campus was just like yesterday. It sometimes feels much longer than that: working is so vastly different from academic study, and I’ve learned so much on so many aspects of data science within this short-long period of time, that my mind and understanding of the industry and the world hardly resembles the graduate student me.
Here I am summarizing a few lessons I learned from work.
My internship at Tapad has officially come to an end last week, after I gave a final presentation of my project (see the intro video below). It’s been a very memorable and rewarding summer. I not only learned about the latest technological development and application of machine learning and big data, but also got to experience the industrial work style and start-up culture.
During my final round interview at Tapad, the VP of Data Science noticed my academic background in neuroscience, and we started to discuss how neuroscience research could contribute to online advertising. Later in the interview, he asked “Have you heard of reinforcement learning (RL)?” And I said no. He explained this reward-guided learning paradigm, which immediately reminded me of my undergraduate research on aversive learning in fruit flies. After the interview, he emailed me the link to RL Courses on YouTube taught by David Silver, the leading scientist of AlphaGo.
After investigating a sample dataset using Python for a month on a local computer, I discovered some interesting patterns and optimized my algorithms to achieve a high measure of goodness. This week, I finally got the opportunity to run my algorithms on a much larger dataset – too large to be stored or processed on a single computer. Finally, it is time to Spark!
In my very first meeting with my mentor at Tapad, after an introduction to DeviceGraph and AdTech 101, we started to discuss the project I would be working on during my internship. The first week was very intense and my mentor understood my struggle in the face of information overload. He showed me his system of organizing information by writing down his incremental knowledge about certain technology as well as his spark of ideas. I adapted his system and developed my own archive to track my learning process and ideas using Google Docs (the picture above). Now, 5 weeks have passed and I have continuously updated the documents as I learn: the more I know, the more I realize I do not know, and the more I want to learn. Learning is an iterative process.
Halfway mark of my summer internship at Tapad! The past 4 weeks have been very exciting and fulfilling. Learning from experienced colleagues and marketing gurus is not only educational from a technical perspective, but also quite inspiring for my personal development. This Friday, we had a team-building improv workshop (I truly recommend everyone and every team to check it out. Really mind blowing!). I was impressed by the “Yes, and…” teamworking spirit and the power of encouragement.