Dominic PajakBlog

2024-12-16

Building a speech-to-speech assistant with only on-device AI

How did we build a speech-to-speech assistant using only on-device AI? This video shows how.

A sentence transformer takes your question and generates an embedding, a high-dimensional vector that represents the meaning of the question. This enables semantic search for similar-meaning queries, so it doesn't matter if they don't use the exact phrase.

The encoder-only sentence transformer is faster to run than a full LLM, and avoids hallucinations.

The pipeline

Any appliance, in the home, retail, or industry, that would benefit from a voice UI can now run one fully on-device:

Note: this demo is written in Python with off-the-shelf models, so there's plenty of optimisation potential still there.

#edge-ai#llm