Voice control

Pocster · Wednesday at 11:33

I’m on a roll here !

Getting Alexa to run your own automations and use your own wording is a pita . Lots of “ let’s stop you “ from Amazon .

So I had a ha voice assistant - it’s ok ; but its microphones are poor .

So ! Let’s make something that’s as reliable as Alexa that’s local and no LLM .

zillion discussions with chat - and we have a planned approach . Bought an office style conference speaker unit . So multi directional etc . Mac mini has good processing for noise/ background suppression before passing it on . Speech back will be via a squeezebox player . .

VOICE SYSTEM – SOFTWARE STACK SUMMARY

Wake Word Detection
Software options:

Porcupine (Picovoice)
openWakeWord
Snowboy (older / legacy)

Purpose:

Continuously listens for a wake phrase locally with very low CPU.

Speech-to-Text (STT)
Software:

Whisper (OpenAI Whisper local model)
whisper.cpp (faster C++ local version)
Faster-Whisper
OpenAI Whisper API (cloud option)

Purpose:

Converts recorded audio into text.

Speaker Identification (Voice ID)
Software options:

Resemblyzer (voice embeddings)
pyannote.audio
SpeechBrain speaker recognition
Picovoice Eagle (commercial)

Purpose:

Creates a voice fingerprint and compares it against enrolled users.

Important:

This runs separately from Whisper.

Voice identity ≠ transcript content.

Intent Parsing / Command Understanding

If rule-based:

Home Assistant built-in intent engine
Rhasspy

Permission & Policy Layer

Software:

Home Assistant user permissions
Custom Python logic
Node-RED (optional orchestration layer)

Purpose:

Checks:

Who spoke?
Are they authorised?
Does this require confirmation?

Implements:

“Pocster is that OK?” → wait for verified response.

Execution Layer

Software:

Home Assistant
MQTT broker (Mosquitto)
ESPHome
Custom Python services

Purpose:

Triggers actual devices, UI events, or automations.

Text-to-Speech (TTS)

Software:

Piper (local neural TTS)
Coqui TTS
ElevenLabs (cloud)
Home Assistant TTS integrations

Purpose:

System speaks back to the user.

Clean Stack Example (Fully Local Setup)

Wake word: openWakeWord

STT: whisper.cpp

Voice ID: Resemblyzer

Intent: Home Assistant or local LLM via Ollama

Permissions: Custom Python layer

Execution: Home Assistant + MQTT

TTS: Piper

That is the full named software stack for your speech recognition + speaker ID + command system.

SteamyTea · Wednesday at 15:01

WTF

Just put a switch or dial on the wall FFS.

Bramco · Wednesday at 15:09

7 minutes ago, SteamyTea said:

WTF

Just put a switch or dial on the wall FFS.

That would make the tiling job harder and worst still make it the next job in the queue....

ProDave · Wednesday at 15:11

The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again.

It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that.

SteamyTea · Wednesday at 15:16

Just now, ProDave said:

bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that

I think that may be a problem with the OS. Dedicated hardware does not need the same sort of heavy overhead systems to run.

But yes, (expletive deleted)ing stupid idea.

Pocster · Wednesday at 16:44

1 hour ago, ProDave said:

The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again.

It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that.

Easily ! Mac Time Machine - backups up automatically every day / week

Pocster · Wednesday at 16:53

1 hour ago, SteamyTea said:

WTF

Just put a switch or dial on the wall FFS.

Clearly you don’t understand the issue . You cannot get Alexa to do exactly what you want without jumping through hoops “ Alexa play a random album by Coldplay “ and it selects a random cold play from your squeeze box and streamers to a default streamer . Or “ Alex disable jamma cabinet “ - recognises my voice only and does that .

You can frig some of these but it’s a pita also Amazon can cause issues .

I can and will have dashboard when done that these things are selectable. Home assistant voice can do these things with some effort but as said it’s microphones are crap . “ ok nabu radio on “ ; asking it then to turn radio off it won’t be able to mask out the background radio !!!
Once you have a stable reliable system there are many things that can be achieved - limitation is people’s imagination.

Posting here will of course land on lots of “ why bother “ views .

Take my home cinema setup . 1 button does about 6 things ( SWMBO friendly ) yet some of this hardware has no Bluetooth / zigbee / WiFi - so making dumb things work in the chain can be challenging.
Some of you have no vision …. 😆

Edited Wednesday at 16:53 by Pocster

Pocster · Wednesday at 16:54

Remember this is local and no LLM overkill .

It was look at this today or painting …

Edited Wednesday at 16:55 by Pocster

Pocster · Wednesday at 16:58

1 hour ago, Bramco said:

That would make the tiling job harder and worst still make it the next job in the queue....

You’ve got it ! After building for 10 yrs a bit of variety and fun projects are required !

SteamyTea · Wednesday at 18:16

Pocster · 2026-02-13T12:22:01Z

Chatting to my best mate we’ve had a strategic change . Going to historical LLM use multiple 8 way microphone arrays . Be able then to localise regions based on timing of wake word to hit which mic .
LLM allows a lot of fuzzy talk e.g

“ play something I like by Coldplay “

Going to add a database so we have history I.e in the above example you would play Coldplay album that I have played the most in the last 3 months on the speaker nearest to the microphone.

This exceeds Alexa capabilities. History of lighting / music etc from previous actions .

Equally I could record conversations as reminders to be replayed later 😊 ; can’t see any issues with SWMBO there .

Then we can take intent into assumptions . I get home at 7pm in a Friday rather than I ask for radio on Avalon ( that’s what it’s called ) could implement the likelyhood and therefore ask “ do you want the radio on Pocster ? “.

As constant querying verbally may become annoying we can have a point where it simply puts the radio in based on probability of that is what I want .

Also once wake word is received any audio has its volume reduced while doing speech to text as you speak I.e not waiting for command to finish .

Add in the voice id ( Alexa does this but not very well ) I.e SWMBO talking or me . If SWMBO requests something I haven’t authorised then confirmation is requested from me .

The more I think about this more uses I can think off - some OTT of course - but that’s the fun !

Edited 10 hours ago by Pocster

Temp · 2026-02-13T12:27:41Z

Most of us just yell at the kids... Turn the f... light off.

-rick- · 2026-02-13T12:29:48Z

Just lay some flipping tiles! 😜

Pocster · 2026-02-13T21:10:40Z

8 hours ago, -rick- said:

Just lay some flipping tiles! 😜

You know what . I did that . I bed 4 tiles today

Edited 1 hour ago by Pocster

-rick- · 2026-02-13T21:36:31Z

21 minutes ago, Pocster said:

You know what . I did that . I bed 4 tiles today

Good job! As a master procrastinator myself, little and often seems to work quite well for jobs you'd really rather not do.

Onoff · 2026-02-13T21:59:39Z

On 11/02/2026 at 11:33, Pocster said:

I’m on a roll here !

Getting Alexa to run your own automations and use your own wording is a pita . Lots of “ let’s stop you “ from Amazon .

So I had a ha voice assistant - it’s ok ; but its microphones are poor .

So ! Let’s make something that’s as reliable as Alexa that’s local and no LLM .

zillion discussions with chat - and we have a planned approach . Bought an office style conference speaker unit . So multi directional etc . Mac mini has good processing for noise/ background suppression before passing it on . Speech back will be via a squeezebox player . .

VOICE SYSTEM – SOFTWARE STACK SUMMARY

Wake Word Detection
Software options:

Porcupine (Picovoice)

openWakeWord

Snowboy (older / legacy)

Purpose:

Continuously listens for a wake phrase locally with very low CPU.

Speech-to-Text (STT)
Software:

Whisper (OpenAI Whisper local model)

whisper.cpp (faster C++ local version)

Faster-Whisper

OpenAI Whisper API (cloud option)

Purpose:

Converts recorded audio into text.

Speaker Identification (Voice ID)
Software options:

Resemblyzer (voice embeddings)

pyannote.audio

SpeechBrain speaker recognition

Picovoice Eagle (commercial)

Purpose:

Creates a voice fingerprint and compares it against enrolled users.

Important:

This runs separately from Whisper.

Voice identity ≠ transcript content.

Intent Parsing / Command Understanding

If rule-based:

Home Assistant built-in intent engine

Rhasspy

Permission & Policy Layer

Software:

Home Assistant user permissions

Custom Python logic

Node-RED (optional orchestration layer)

Purpose:

Checks:

Who spoke?

Are they authorised?

Does this require confirmation?

Implements:

“Pocster is that OK?” → wait for verified response.

Execution Layer

Software:

Home Assistant

MQTT broker (Mosquitto)

ESPHome

Custom Python services

Purpose:

Triggers actual devices, UI events, or automations.

Text-to-Speech (TTS)

Software:

Piper (local neural TTS)

Coqui TTS

ElevenLabs (cloud)

Home Assistant TTS integrations

Purpose:

System speaks back to the user.

Clean Stack Example (Fully Local Setup)

Wake word: openWakeWord

STT: whisper.cpp

Voice ID: Resemblyzer

Intent: Home Assistant or local LLM via Ollama

Permissions: Custom Python layer

Execution: Home Assistant + MQTT

TTS: Piper

That is the full named software stack for your speech recognition + speaker ID + command system.

All I got from that was "hands free".

Sign In

Voice control

Recommended Posts

Pocster

SteamyTea

Bramco

ProDave

SteamyTea

Pocster

Pocster

Pocster

Pocster

SteamyTea

Pocster

Temp

-rick-

Pocster

-rick-

Onoff

Create an account or sign in to comment

Create an account

Sign in

Activity

Browse