I watched Andrej Karpathy’s 3rd video in the YouTube playlist: Neural Networks: Zero to Hero, and built a MLP (multi layer perceptron) from scratch to embed characters in a feature vector space and predict the next character based on the previous 3 characters. The original paper was published in 2003 by Bengio et al. titled “Neural Probabilistic Language Model”, 10 years before the classic Word2Vec paper was published in 2013. The core idea is similar: representing discrete entities in a continuous feature vector space. My code can be found on Github.
I watched Andrej Karpathy’s 2nd video in the YouTube playlist: Neural Networks: Zero to Hero, and built a bigram language model from scratch in Python. Gradient descent of neural network training produces shockingly(?) similar results as the statistical analysis of bigram. My code can be found on Github.