Natural Language Processing • Language Models
- Language Modeling and Recurrent Neural Networks (RNN)
- N-gram language models
- Neural Language Model
- Evaluating language models
- Contextual Embeddings
- ELMo: Embeddings from Language Models
- GPT-3: Generative Pre-trained Transformer 3
- Citation
Language Modeling and Recurrent Neural Networks (RNN)
- Language modeling involves predicting the next word in a sequence based on the context of the preceding words.
- A language model, like Gboard or a query completion system, assigns probabilities to text sequences, effectively creating a probability distribution over the next word given the context.
N-gram language models
- An n-gram language model chunks a text into n consecutive words (4-gram, 5-gram), making a Markov assumption that to predict word t+1, only the last n-1 words are necessary.
- For instance, a 4-gram model would only consider the last three words. However, if the model doesn’t encounter a specific data sequence within its window, it assigns a probability of zero.
- Furthermore, n-grams can lack context and coherence, as they only consider 3-4 words at a time, leading to a sparsity problem.
- The below slide (source) depicts this.
Neural Language Model
- Unlike n-gram models, a neural language model takes a sequence of words as input and outputs a probability distribution of the next word.
- Early models were window-based, focusing on a fixed range of preceding words and discarding those farther away, but these were only precursors to the more advanced RNN models.
Evaluating language models
- Perplexity is the standard metric for evaluating language models, with lower scores indicating better performance.
Contextual Embeddings
- While traditional word embeddings in NLP are context-free, meaning a word like “bank” would have the same representation regardless of context (e.g., river bank vs. financial institution), contextual embeddings provide a solution.
- These generate different representations based on the surrounding text.
ELMo: Embeddings from Language Models
- ELMo (Peters et al., 2018) introduced contextualized word embeddings.
- Instead of using a fixed embedding for each word, ELMo takes the entire sentence into account before assigning each word an embedding.
- This is achieved through a bidirectional LSTM trained on a specific task. ELMo, therefore, can assign different representations to the same word based on its context.
- The below slide (source) depicts this.
GPT-3: Generative Pre-trained Transformer 3
- GPT-3 is a powerful language model that predicts the probability of the next word given a context.
- Its novelty lies in its ability for flexible “in-context” learning.
- It demonstrates rapid adaptation to completely new tasks through in-context learning, essentially learning how to learn from the context.
- GPT-3 can also be used as a knowledge base, given its vast training data and ability to generate contextually relevant responses.
Citation
If you found our work useful, please cite it as:
@article{Chadha2021Distilled,
title = {Language Models},
author = {Jain, Vinija and Chadha, Aman},
journal = {Distilled Notes for Stanford CS224n: Natural Language Processing with Deep Learning},
year = {2021},
note = {\url{https://aman.ai}}
}