DeepSeek and reasoning models on the edge
DeepSeek feels like a huge step forward for edge AI. The 7B distill beats models four times larger from last year on reasoning, and it's great to see Hugging Face working to reproduce the training process in open source for the whole industry to benefit from.
This morning I got DeepSeek R1 Distill Qwen 1.5B running on my Synaptics Astra SL1640 (4x Arm Cortex-A55), unoptimised, and still showing impressive reasoning behaviour for its size. The potential for on-device AI agents is huge.
Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. DeepSeek-R1-Distill-Qwen-7B achieves 55.5% on AIME 2024, surpassing QwQ-32B-Preview.
From "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" (arXiv:2501.12948).