A language model learns to predict the probability of a sequence of words.
Task of predicting what word comes next given the previous word or context.
Statistical Language Models
These models use traditional statistical techniques like N-grams, Hidden Markov Models (HMM) and certain linguistic rules to learn the probability distribution of words
Applications can be:
-
Machine Translation
- P(high winds tonight) > P(large winds tonight)
-
Spell Correction
- P(about fifteen minutes from) > P(about fifteen minuets from)
-
Speech Recognition
- P(I saw a van) >> P(eyes awe of an)
-
Summarisation, Queries and much more
Naive Bayes
N-Grams
We need to compute and store various probabilities when dealing with statistical language models, also the unseen words and smoothing problems needs to be addressed
Neural Language Models
They use the architectures of various Neural Networks to model a predictive language model
Logistic Regression
-
It works with linear combinations and the sigmoid activation function but modern language problems are not linearly mappable thus It might not be able learn complex patterns within the linguistic context
Standard Neural Language Model
-
Words in embedded form are given as input
-
The dot product of input embedding are then taken with the learned weights
-
The final word is given as output
-
Positional Information is not catered in this model, that is a word occurring before a particular word will not be tended, thus a sequence or context is not consumed
Recurrent Neural Language Model
-
RNNs leverage information from prior inputs to influence the current input and output
-
The more the sequence size, RNN starts to forget the previous context thus making learning of long data sequences difficult, this is also known as Vanishing Gradient Problem