AIdventure - #efficiency

Speeding up Attention Layers

September 11, 2024 • 7 min read

Multi-head, Multi-Query & Grouped-Query Attention layers clearly explained. How cache works in the Attention layers

June 17, 2021 • 5 min read

Fine-tuning large language models via trainable rank decomposition matrices

October 2, 2019 • 4 min read

Knowledge distillation compresses BERT: smaller, faster, with almost all performance

April 17, 2017 • 6 min read

Efficient convolutional neural networks for mobile vision applications

August 25, 2016 • 3 min read

Same results as standard convolutions with only a fraction of the computational cost. Explore the tricks behind MobileNet and efficient CNNs