Perplexity is an intrinsic measure used to evaluate the performance of a language model in natural language processing (NLP). It quantifies how well a language model predicts a sample or a sequence of words.

Lower perplexity values indicate better performance, meaning the model provides more accurate predictions.

The perplexity of a language model is calculated based on the probability assigned by the model to a given sequence of words. It measures how surprised or uncertain the model is when predicting the next word in a sequence. A lower perplexity value indicates that the model is less surprised or uncertain, meaning it provides more confident and accurate predictions.

Formula

Where:

  • ( ) is the probability assigned by the language model to the sequence of words
  • is the total number of words in the sequence.

With Log Probabilities

The above formula when considering log probabilities to avoid under underflow changes into:

Exponent of mean of log likelihood of all the words in an input sequence.

can be used to convert the log effect back into the probability range of -

In practice, the perplexity of a language model is often calculated on a test dataset or validation dataset. The model is trained on a training dataset, and its performance is evaluated by measuring the perplexity on unseen data. Lower perplexity values indicate better performance and higher accuracy of the language model in predicting sequences of words.

Perplexity of Entire Corpus

  • If we want to know the perplexity of the whole corpus 𝐶 that contains 𝑚 sentences and 𝑁 words, we want to find out how well the model can predict all the sentences together.
  • How to find the perplexity of a corpus