Speeding up Attention Layers
September 11, 2024
Multi-head, Multi-Query & Grouped-Query Attention layers clearly explained. How cache works in the Attention layers
Posts tagged with #efficiency
September 11, 2024
Multi-head, Multi-Query & Grouped-Query Attention layers clearly explained. How cache works in the Attention layers
June 17, 2021
Fine-tuning large language models via trainable rank decomposition matrices
October 2, 2019
Knowledge distillation compresses BERT: smaller, faster, with almost all performance