Bidirectional Encoder Representations from Transformers

Based on transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based upon their connection.

  • Pre-training of Deep Bidirectional Transformers for language understanding

  • Performance depends on how big we want BERT to be:

    • BERT Base

      12 Encoders and 110M Parameters

    • BERT Large

      24 Encoders and 340M Parameters

  • Major application in Feature Engineering and creating Dynamic / Contextual Embeddings

    Dynamic Embeddings: Embeddings differ with respect to the context surrounding the word

  • Trains model in both directions, forward and backwards. Thanks to Self Attention

    This allows BERT to truly capture the Bi-directional semantics of the language and understand it better, a level to which even LSTM and BiLSTM were not able to interpret

  • BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token