2025-12-29

Building Reachy Mini with on-device AI

Had a blast building Hugging Face / Pollen Robotics Reachy Mini with family over the holidays. Here are some thoughts for the new year, and the on-device stack behind my app.

Education in AI and robotics seems to me super important for society, and for kids especially. Learning by doing is key, and projects like Reachy Mini literally put AI in people's hands, and I'm here for it. Pollen Robotics did an amazing job on every aspect of the user experience; it was genuinely fun building and working on it as a family, and a real privilege to be part of the beta program — I tested one of the early 3D-printed prototypes from Pollen Robotics ♥. It feels like the start of a significant community.

Reachy Mini Reactions

My on-device app in the video, "Reachy Mini Reactions," had a solid start, and then some even more solid contributions from Sandeep Mistry. I vibe-coded a decent chunk just to prove the concept. The functionality is there, but it still needs work on the install process. It's open source on Hugging Face: Reachy Mini Reactions. It listens for speech, turns to face whoever's talking, matches intent with embeddings and semantic search, then responds with speech and gestures — all with inference running fully on the Raspberry Pi CPU, no offloading.

The on-device stack

Everything runs on the Raspberry Pi 5 Arm CPU (2.4 GHz max, ~2 GB RAM used):

On-device AI inference + robot daemon: all on the Raspberry Pi 5
Voice Activity Detection: Silero
Speech-to-Text: Moonshine AI
Embeddings: all-MiniLM-L6-v2
Text-to-Speech: Supertonic
Gestures: Pollen Robotics Emotions Library (adds tons of personality)
Turn to face: Seeed ReSpeaker audio Direction of Arrival

End to end, the voice-to-voice loop comes in at around 1–3s:

The whole loop runs on the Raspberry Pi 5 CPU — no offloading.

On-device trade-offs

Keeping all AI inference on-device means some trade-offs, but real benefits in efficiency, privacy, and latency. (If you want to see what cloud-hosted AI can do, the official Reachy Mini conversation app using OpenAI's realtime API is for you.)

Low-latency reactions, like turning towards the user in under 100 ms, make a huge difference to the experience. That makes sense given how important non-verbal communication is; gaze detection or other cues might feel more natural than a pure wake word. The app should be able to run entirely on the internal Raspberry Pi CM4 in a standard Reachy Mini wireless.

It uses semantic search to match responses and allows tool calling (text-to-speech, movements, a time or joke API), but it's not a full conversational assistant. The key goal was to keep voice-to-voice latency low. The web UI is designed to make it very simple to add items to a little vector database and trigger speech responses or gestures.

What's next

So many avenues to explore. One is adding a fine-tuned SLM, or even an LLM, for semantic-cache "misses." I also want to investigate hooking in vision, and agentic AI with the appropriate UX, at a later date. This has been a lot of fun.

#edge-ai #robotics #raspberry-pi

Building Reachy Mini with on-device AI

#Reachy Mini Reactions

#The on-device stack

#On-device trade-offs

#What's next

Reachy Mini Reactions

The on-device stack

On-device trade-offs

What's next