Bidirectional Encoder Representations from Transformers
Based on transformers, a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based upon their connection.
-
Pre-training of Deep Bidirectional Transformers for language understanding
-
Performance depends on how big we want BERT to be:
-
BERT Base
12 Encoders and 110M Parameters
-
BERT Large
24 Encoders and 340M Parameters
-
-
Major application in Feature Engineering and creating Dynamic / Contextual Embeddings
Dynamic Embeddings: Embeddings differ with respect to the context surrounding the word
-
Trains model in both directions, forward and backwards. Thanks to Self Attention
This allows BERT to truly capture the Bi-directional semantics of the language and understand it better, a level to which even LSTM and BiLSTM were not able to interpret
-
BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token