I watched Andrej Karpathy’s 3rd video in the YouTube playlist: Neural Networks: Zero to Hero, and built a MLP (multi layer perceptron) from scratch to embed characters in a feature vector space and predict the next character based on the previous 3 characters. The original paper was published in 2003 by Bengio et al. titled “Neural Probabilistic Language Model”, 10 years before the classic Word2Vec paper was published in 2013. The core idea is similar: representing discrete entities in a continuous feature vector space. My code can be found on Github.
Posts in the Tech category:
2. Build a bigram language model from scratch
I watched Andrej Karpathy’s 2nd video in the YouTube playlist: Neural Networks: Zero to Hero, and built a bigram language model from scratch in Python. Gradient descent of neural network training produces shockingly(?) similar results as the statistical analysis of bigram. My code can be found on Github.
1. Build backpropagation from scratch
This is the first time I implemented an automatic differentiation and the chain rule in Python, and built a backpropagation gradient descent algorithm from scratch. Thank you Andrej Karpathy so much for your amazing YouTube playlist: Neural Networks: Zero to Hero. My code can be found on Github.
Senior engineer and then what?
This March, I got promoted to a senior machine learning engineer. Stepping into this new role, I thought I was more than prepared. After all, people only get promoted to the next level after they already function like the next level. Now, I have worked as a senior engineer for half a year, and I gradually realize that while the daily technical work may appear similar for a senior engineer, being a senior engineer opens many doors and prompts me to think what I truly want for my career.