Embedded AI solves a problem ChatGPT-4o can't
Embedded AI solves problems ChatGPT-4o can't, thanks to DeepSeek. Here's how to replicate the result, running on my desk, with the AI private and 100% on-device.

#The setup
- Board: Synaptics Astra SL1680 (4x Arm Cortex-A73, integrated NPU, 4GB RAM)
- Model: DeepSeek R1 Qwen2.5 (Math) 1.5B
- Inference engine: llama.cpp
- Quantization: Q6
We're running example question 6 from the AIME I 2025 benchmark.
Seeing the chain of thought from the tiny model is fascinating. To be clear, it doesn't get to the answer immediately, taking around 3000 tokens on average thinking, self-correcting, and summarising. However, the costs are still a fraction of a cloud service, and this AI has the potential to be embedded in a device.
It raises many questions: about the nature of benchmarking (AIME specifically uses an uncontaminated math test set, but similar-form problems may exist on the internet), and about the applicability of this type of reasoning in embedded systems. But one thing is for sure: the improvement in edge AI inference, combined with lower costs for training and fine-tuning models, can only mean great things for edge AI.