Pocster Posted 13 hours ago Posted 13 hours ago I’m on a roll here ! Getting Alexa to run your own automations and use your own wording is a pita . Lots of “ let’s stop you “ from Amazon . So I had a ha voice assistant - it’s ok ; but its microphones are poor . So ! Let’s make something that’s as reliable as Alexa that’s local and no LLM . zillion discussions with chat - and we have a planned approach . Bought an office style conference speaker unit . So multi directional etc . Mac mini has good processing for noise/ background suppression before passing it on . Speech back will be via a squeezebox player . . VOICE SYSTEM – SOFTWARE STACK SUMMARY Wake Word Detection Software options: Porcupine (Picovoice) openWakeWord Snowboy (older / legacy) Purpose: Continuously listens for a wake phrase locally with very low CPU. Speech-to-Text (STT) Software: Whisper (OpenAI Whisper local model) whisper.cpp (faster C++ local version) Faster-Whisper OpenAI Whisper API (cloud option) Purpose: Converts recorded audio into text. Speaker Identification (Voice ID) Software options: Resemblyzer (voice embeddings) pyannote.audio SpeechBrain speaker recognition Picovoice Eagle (commercial) Purpose: Creates a voice fingerprint and compares it against enrolled users. Important: This runs separately from Whisper. Voice identity ≠ transcript content. Intent Parsing / Command Understanding If rule-based: Home Assistant built-in intent engine Rhasspy Permission & Policy Layer Software: Home Assistant user permissions Custom Python logic Node-RED (optional orchestration layer) Purpose: Checks: Who spoke? Are they authorised? Does this require confirmation? Implements: “Pocster is that OK?” → wait for verified response. Execution Layer Software: Home Assistant MQTT broker (Mosquitto) ESPHome Custom Python services Purpose: Triggers actual devices, UI events, or automations. Text-to-Speech (TTS) Software: Piper (local neural TTS) Coqui TTS ElevenLabs (cloud) Home Assistant TTS integrations Purpose: System speaks back to the user. Clean Stack Example (Fully Local Setup) Wake word: openWakeWord STT: whisper.cpp Voice ID: Resemblyzer Intent: Home Assistant or local LLM via Ollama Permissions: Custom Python layer Execution: Home Assistant + MQTT TTS: Piper That is the full named software stack for your speech recognition + speaker ID + command system.
Bramco Posted 9 hours ago Posted 9 hours ago 7 minutes ago, SteamyTea said: WTF Just put a switch or dial on the wall FFS. That would make the tiling job harder and worst still make it the next job in the queue.... 1
ProDave Posted 9 hours ago Posted 9 hours ago The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again. It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that.
SteamyTea Posted 9 hours ago Posted 9 hours ago Just now, ProDave said: bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that I think that may be a problem with the OS. Dedicated hardware does not need the same sort of heavy overhead systems to run. But yes, (expletive deleted)ing stupid idea.
Pocster Posted 8 hours ago Author Posted 8 hours ago 1 hour ago, ProDave said: The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again. It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that. Easily ! Mac Time Machine - backups up automatically every day / week
Pocster Posted 7 hours ago Author Posted 7 hours ago (edited) 1 hour ago, SteamyTea said: WTF Just put a switch or dial on the wall FFS. Clearly you don’t understand the issue . You cannot get Alexa to do exactly what you want without jumping through hoops “ Alexa play a random album by Coldplay “ and it selects a random cold play from your squeeze box and streamers to a default streamer . Or “ Alex disable jamma cabinet “ - recognises my voice only and does that . You can frig some of these but it’s a pita also Amazon can cause issues . I can and will have dashboard when done that these things are selectable. Home assistant voice can do these things with some effort but as said it’s microphones are crap . “ ok nabu radio on “ ; asking it then to turn radio off it won’t be able to mask out the background radio !!! Once you have a stable reliable system there are many things that can be achieved - limitation is people’s imagination. Posting here will of course land on lots of “ why bother “ views . Take my home cinema setup . 1 button does about 6 things ( SWMBO friendly ) yet some of this hardware has no Bluetooth / zigbee / WiFi - so making dumb things work in the chain can be challenging. Some of you have no vision …. 😆 Edited 7 hours ago by Pocster
Pocster Posted 7 hours ago Author Posted 7 hours ago (edited) Remember this is local and no LLM overkill . It was look at this today or painting … Edited 7 hours ago by Pocster
Pocster Posted 7 hours ago Author Posted 7 hours ago 1 hour ago, Bramco said: That would make the tiling job harder and worst still make it the next job in the queue.... You’ve got it ! After building for 10 yrs a bit of variety and fun projects are required !
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now