AIdventure - #nlp

Speeding up Attention Layers

September 11, 2024 • 7 min read

Multi-head, Multi-Query & Grouped-Query Attention layers clearly explained. How cache works in the Attention layers

May 1, 2024 • 8 min read

TF-IDF and BM25 are two of the most used algorithms in Information Retrieval. In this post we will explain how they work

June 17, 2021 • 5 min read

Fine-tuning large language models via trainable rank decomposition matrices

February 26, 2021 • 9 min read

Contrastive learning for unified vision-language representations in a shared embedding space

October 2, 2019 • 4 min read

Knowledge distillation compresses BERT: smaller, faster, with almost all performance

July 26, 2019 • 5 min read

Unlocking the true potential of BERT through rigorous optimization and strategic training choices

October 11, 2018 • 5 min read

Pre-training bidirectional by jointly conditioning on both left and right context

June 11, 2018 • 4 min read

Semi-supervised learning through generative pre-training on unlabeled text and task-specific fine-tuning

June 12, 2017 • 19 min read

Demystifying the Transformer architecture, explaining the Encoder, Decoder, and Attention mechanisms block by block with PyTorch implementation