A language model learns to predict the probability of a sequence of words.

Task of predicting what word comes next given the previous word or context.

Statistical Language Models

These models use traditional statistical techniques like N-grams, Hidden Markov Models (HMM) and certain linguistic rules to learn the probability distribution of words

Applications can be:

  • Machine Translation

    • P(high winds tonight) > P(large winds tonight)
  • Spell Correction

    • P(about fifteen minutes from) > P(about fifteen minuets from)
  • Speech Recognition

    • P(I saw a van) >> P(eyes awe of an)
  • Summarisation, Queries and much more

Naive Bayes

N-Grams

We need to compute and store various probabilities when dealing with statistical language models, also the unseen words and smoothing problems needs to be addressed

Neural Language Models

They use the architectures of various Neural Networks to model a predictive language model

Logistic Regression

Standard Neural Language Model

  • Words in embedded form are given as input

  • The dot product of input embedding are then taken with the learned weights

  • The final word is given as output

  • Positional Information is not catered in this model, that is a word occurring before a particular word will not be tended, thus a sequence or context is not consumed

Recurrent Neural Language Model

  • RNNs leverage information from prior inputs to influence the current input and output

  • The more the sequence size, RNN starts to forget the previous context thus making learning of long data sequences difficult, this is also known as Vanishing Gradient Problem