I watched Andrej Karpathy’s 3rd video in the YouTube playlist: Neural Networks: Zero to Hero, and built a MLP (multi layer perceptron) from scratch to embed characters in a feature vector space and predict the next character based on the previous 3 characters. The original paper was published in 2003 by Bengio et al. titled “Neural Probabilistic Language Model”, 10 years before the classic Word2Vec paper was published in 2013. The core idea is similar: representing discrete entities in a continuous feature vector space. My code can be found on Github.