A comprehensive linear algebraic representation of the original algorithm for math-minded developers

Motivations

(1) While there are some great tutorials explaining Word2Vec embedding concept and its applications for downstream Natural Language Processing (NLP) tasks, a comprehensive analytical representation of the original work is not readily available online. Many NLP developers who already know the Word2Vec concept, still need to derive the formulas for loss functions and gradient matrices for a computationally efficient software implementation on their hardware platforms such as CPUs and GPUs.

(2) Recently, Transformer-based language models and embeddings such as BERT have shown state-of-the-art performances. Several NLP scientists argue that Transformer-based pre-trained models (which are trained on some generic corpora such…

Nima PourNejatian

Sr. Data scientist with focus on modern NLP techniques and large language models, an entrepreneur, currently at Nvidia, ex-CTO/CEO, EE PhD, Wharton MBA.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store