Blogs¶

Saturday, April 18, 2026
10 min read

SmolLM on a Smol Machine: Optimizing LLM Inference on a $15 Computer

I got SmolLM, a 360M-parameter LLM, running at 2.6 tokens/sec on a $15 Raspberry Pi Zero 2W. The naive version ran at 0.015 tok/s, effectively unusable. This post breaks down exactly what made it ~170x faster.

SmolLM generating on a Raspberry Pi Zero 2W

Sunday, March 15, 2026
13 min read

AWQ: Activation-Aware Weight Quantization

AWQ is a weight-only quantization method (activations are kept in full precision because they serve as the "budget" traded away to protect salient weights — quantizing them too would undermine the very scaling trick that makes AWQ work) that achieves strong results through a surprisingly simple insight: not all weights matter equally, and the ones that do can be identified by looking at activations, not weights.