Dominic PajakBlog

2025-12-29

Building Reachy Mini with on-device AI

Had a blast building Hugging Face / Pollen Robotics Reachy Mini with family over the holidays. Here are some thoughts for the new year, and the on-device stack behind my app.

Education in AI and robotics seems to me super important for society, and for kids especially. Learning by doing is key, and projects like Reachy Mini literally put AI in people's hands, and I'm here for it. Pollen Robotics did an amazing job on every aspect of the user experience; it was genuinely fun building and working on it as a family, and a real privilege to be part of the beta program — I tested one of the early 3D-printed prototypes from Pollen Robotics ♥. It feels like the start of a significant community.

Reachy Mini Reactions

My on-device app in the video, "Reachy Mini Reactions," had a solid start, and then some even more solid contributions from Sandeep Mistry. I vibe-coded a decent chunk just to prove the concept. The functionality is there, but it still needs work on the install process. It's open source on Hugging Face: Reachy Mini Reactions. It listens for speech, turns to face whoever's talking, matches intent with embeddings and semantic search, then responds with speech and gestures — all with inference running fully on the Raspberry Pi CPU, no offloading.

The on-device stack

Everything runs on the Raspberry Pi 5 Arm CPU (2.4 GHz max, ~2 GB RAM used):

End to end, the voice-to-voice loop comes in at around 1–3s:

On-device voice-to-voice loop on the Raspberry Pi 5 CPU: speech in, VAD and turn-to-face with Silero under 100 ms, speech-to-text with Moonshine 200–500 ms, embeddings and semantic search with all-MiniLM-L6-v2 50–150 ms, text-to-speech and gestures with Supertonic and Pollen Emotions 500 ms–2s, then speech and movement out; total about 1–3s.
The whole loop runs on the Raspberry Pi 5 CPU — no offloading.

On-device trade-offs

Keeping all AI inference on-device means some trade-offs, but real benefits in efficiency, privacy, and latency. (If you want to see what cloud-hosted AI can do, the official Reachy Mini conversation app using OpenAI's realtime API is for you.)

Low-latency reactions, like turning towards the user in under 100 ms, make a huge difference to the experience. That makes sense given how important non-verbal communication is; gaze detection or other cues might feel more natural than a pure wake word. The app should be able to run entirely on the internal Raspberry Pi CM4 in a standard Reachy Mini wireless.

It uses semantic search to match responses and allows tool calling (text-to-speech, movements, a time or joke API), but it's not a full conversational assistant. The key goal was to keep voice-to-voice latency low. The web UI is designed to make it very simple to add items to a little vector database and trigger speech responses or gestures.

What's next

So many avenues to explore. One is adding a fine-tuned SLM, or even an LLM, for semantic-cache "misses." I also want to investigate hooking in vision, and agentic AI with the appropriate UX, at a later date. This has been a lot of fun.

#edge-ai#robotics#raspberry-pi