SmolLM on a Smol Machine: Optimizing LLM Inference on a $15 Computer
I got SmolLM, a 360M-parameter LLM, running at 2.6 tokens/sec on a $15 Raspberry Pi Zero 2W. The naive version ran at 0.015 tok/s, effectively unusable. This post breaks down exactly what made it ~170x faster.
