Pocster Posted Wednesday at 11:33 Posted Wednesday at 11:33 I’m on a roll here ! Getting Alexa to run your own automations and use your own wording is a pita . Lots of “ let’s stop you “ from Amazon . So I had a ha voice assistant - it’s ok ; but its microphones are poor . So ! Let’s make something that’s as reliable as Alexa that’s local and no LLM . zillion discussions with chat - and we have a planned approach . Bought an office style conference speaker unit . So multi directional etc . Mac mini has good processing for noise/ background suppression before passing it on . Speech back will be via a squeezebox player . . VOICE SYSTEM – SOFTWARE STACK SUMMARY Wake Word Detection Software options: Porcupine (Picovoice) openWakeWord Snowboy (older / legacy) Purpose: Continuously listens for a wake phrase locally with very low CPU. Speech-to-Text (STT) Software: Whisper (OpenAI Whisper local model) whisper.cpp (faster C++ local version) Faster-Whisper OpenAI Whisper API (cloud option) Purpose: Converts recorded audio into text. Speaker Identification (Voice ID) Software options: Resemblyzer (voice embeddings) pyannote.audio SpeechBrain speaker recognition Picovoice Eagle (commercial) Purpose: Creates a voice fingerprint and compares it against enrolled users. Important: This runs separately from Whisper. Voice identity ≠ transcript content. Intent Parsing / Command Understanding If rule-based: Home Assistant built-in intent engine Rhasspy Permission & Policy Layer Software: Home Assistant user permissions Custom Python logic Node-RED (optional orchestration layer) Purpose: Checks: Who spoke? Are they authorised? Does this require confirmation? Implements: “Pocster is that OK?” → wait for verified response. Execution Layer Software: Home Assistant MQTT broker (Mosquitto) ESPHome Custom Python services Purpose: Triggers actual devices, UI events, or automations. Text-to-Speech (TTS) Software: Piper (local neural TTS) Coqui TTS ElevenLabs (cloud) Home Assistant TTS integrations Purpose: System speaks back to the user. Clean Stack Example (Fully Local Setup) Wake word: openWakeWord STT: whisper.cpp Voice ID: Resemblyzer Intent: Home Assistant or local LLM via Ollama Permissions: Custom Python layer Execution: Home Assistant + MQTT TTS: Piper That is the full named software stack for your speech recognition + speaker ID + command system.
SteamyTea Posted Wednesday at 15:01 Posted Wednesday at 15:01 WTF Just put a switch or dial on the wall FFS.
Bramco Posted Wednesday at 15:09 Posted Wednesday at 15:09 7 minutes ago, SteamyTea said: WTF Just put a switch or dial on the wall FFS. That would make the tiling job harder and worst still make it the next job in the queue.... 1
ProDave Posted Wednesday at 15:11 Posted Wednesday at 15:11 The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again. It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that.
SteamyTea Posted Wednesday at 15:16 Posted Wednesday at 15:16 Just now, ProDave said: bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that I think that may be a problem with the OS. Dedicated hardware does not need the same sort of heavy overhead systems to run. But yes, (expletive deleted)ing stupid idea.
Pocster Posted Wednesday at 16:44 Author Posted Wednesday at 16:44 1 hour ago, ProDave said: The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again. It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that. Easily ! Mac Time Machine - backups up automatically every day / week
Pocster Posted Wednesday at 16:53 Author Posted Wednesday at 16:53 (edited) 1 hour ago, SteamyTea said: WTF Just put a switch or dial on the wall FFS. Clearly you don’t understand the issue . You cannot get Alexa to do exactly what you want without jumping through hoops “ Alexa play a random album by Coldplay “ and it selects a random cold play from your squeeze box and streamers to a default streamer . Or “ Alex disable jamma cabinet “ - recognises my voice only and does that . You can frig some of these but it’s a pita also Amazon can cause issues . I can and will have dashboard when done that these things are selectable. Home assistant voice can do these things with some effort but as said it’s microphones are crap . “ ok nabu radio on “ ; asking it then to turn radio off it won’t be able to mask out the background radio !!! Once you have a stable reliable system there are many things that can be achieved - limitation is people’s imagination. Posting here will of course land on lots of “ why bother “ views . Take my home cinema setup . 1 button does about 6 things ( SWMBO friendly ) yet some of this hardware has no Bluetooth / zigbee / WiFi - so making dumb things work in the chain can be challenging. Some of you have no vision …. 😆 Edited Wednesday at 16:53 by Pocster
Pocster Posted Wednesday at 16:54 Author Posted Wednesday at 16:54 (edited) Remember this is local and no LLM overkill . It was look at this today or painting … Edited Wednesday at 16:55 by Pocster
Pocster Posted Wednesday at 16:58 Author Posted Wednesday at 16:58 1 hour ago, Bramco said: That would make the tiling job harder and worst still make it the next job in the queue.... You’ve got it ! After building for 10 yrs a bit of variety and fun projects are required !
Pocster Posted 10 hours ago Author Posted 10 hours ago (edited) Chatting to my best mate we’ve had a strategic change . Going to historical LLM use multiple 8 way microphone arrays . Be able then to localise regions based on timing of wake word to hit which mic . LLM allows a lot of fuzzy talk e.g “ play something I like by Coldplay “ Going to add a database so we have history I.e in the above example you would play Coldplay album that I have played the most in the last 3 months on the speaker nearest to the microphone. This exceeds Alexa capabilities. History of lighting / music etc from previous actions . Equally I could record conversations as reminders to be replayed later 😊 ; can’t see any issues with SWMBO there . Then we can take intent into assumptions . I get home at 7pm in a Friday rather than I ask for radio on Avalon ( that’s what it’s called ) could implement the likelyhood and therefore ask “ do you want the radio on Pocster ? “. As constant querying verbally may become annoying we can have a point where it simply puts the radio in based on probability of that is what I want . Also once wake word is received any audio has its volume reduced while doing speech to text as you speak I.e not waiting for command to finish . Add in the voice id ( Alexa does this but not very well ) I.e SWMBO talking or me . If SWMBO requests something I haven’t authorised then confirmation is requested from me . The more I think about this more uses I can think off - some OTT of course - but that’s the fun ! Edited 10 hours ago by Pocster
Temp Posted 10 hours ago Posted 10 hours ago Most of us just yell at the kids... Turn the f... light off. 1
Pocster Posted 1 hour ago Author Posted 1 hour ago (edited) 8 hours ago, -rick- said: Just lay some flipping tiles! 😜 You know what . I did that . I bed 4 tiles today Edited 1 hour ago by Pocster 1
-rick- Posted 1 hour ago Posted 1 hour ago 21 minutes ago, Pocster said: You know what . I did that . I bed 4 tiles today Good job! As a master procrastinator myself, little and often seems to work quite well for jobs you'd really rather not do. 1
Onoff Posted 58 minutes ago Posted 58 minutes ago On 11/02/2026 at 11:33, Pocster said: I’m on a roll here ! Getting Alexa to run your own automations and use your own wording is a pita . Lots of “ let’s stop you “ from Amazon . So I had a ha voice assistant - it’s ok ; but its microphones are poor . So ! Let’s make something that’s as reliable as Alexa that’s local and no LLM . zillion discussions with chat - and we have a planned approach . Bought an office style conference speaker unit . So multi directional etc . Mac mini has good processing for noise/ background suppression before passing it on . Speech back will be via a squeezebox player . . VOICE SYSTEM – SOFTWARE STACK SUMMARY Wake Word Detection Software options: Porcupine (Picovoice) openWakeWord Snowboy (older / legacy) Purpose: Continuously listens for a wake phrase locally with very low CPU. Speech-to-Text (STT) Software: Whisper (OpenAI Whisper local model) whisper.cpp (faster C++ local version) Faster-Whisper OpenAI Whisper API (cloud option) Purpose: Converts recorded audio into text. Speaker Identification (Voice ID) Software options: Resemblyzer (voice embeddings) pyannote.audio SpeechBrain speaker recognition Picovoice Eagle (commercial) Purpose: Creates a voice fingerprint and compares it against enrolled users. Important: This runs separately from Whisper. Voice identity ≠ transcript content. Intent Parsing / Command Understanding If rule-based: Home Assistant built-in intent engine Rhasspy Permission & Policy Layer Software: Home Assistant user permissions Custom Python logic Node-RED (optional orchestration layer) Purpose: Checks: Who spoke? Are they authorised? Does this require confirmation? Implements: “Pocster is that OK?” → wait for verified response. Execution Layer Software: Home Assistant MQTT broker (Mosquitto) ESPHome Custom Python services Purpose: Triggers actual devices, UI events, or automations. Text-to-Speech (TTS) Software: Piper (local neural TTS) Coqui TTS ElevenLabs (cloud) Home Assistant TTS integrations Purpose: System speaks back to the user. Clean Stack Example (Fully Local Setup) Wake word: openWakeWord STT: whisper.cpp Voice ID: Resemblyzer Intent: Home Assistant or local LLM via Ollama Permissions: Custom Python layer Execution: Home Assistant + MQTT TTS: Piper That is the full named software stack for your speech recognition + speaker ID + command system. All I got from that was "hands free".
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now