Jump to content

Recommended Posts

Posted

I’m on a roll here !

Getting Alexa to run your own automations and use your own wording is a pita . Lots of “ let’s stop you “ from Amazon .

So I had a ha voice assistant - it’s ok ;  but its microphones are poor .

So ! Let’s make something that’s as reliable as  Alexa that’s local and no LLM .

zillion discussions with chat - and we have a planned approach . Bought an office style conference speaker unit . So multi directional etc . Mac mini has good processing for noise/ background suppression before passing it on . Speech back will be via a squeezebox player . .

 

VOICE SYSTEM – SOFTWARE STACK SUMMARY

 

  1. Wake Word Detection
    Software options:

 

 

 

  • Porcupine (Picovoice)
  • openWakeWord
  • Snowboy (older / legacy)

 

 

Purpose:

Continuously listens for a wake phrase locally with very low CPU.

 

 

 

 

  1. Speech-to-Text (STT)
    Software:

 

 

 

  • Whisper (OpenAI Whisper local model)
  • whisper.cpp (faster C++ local version)
  • Faster-Whisper
  • OpenAI Whisper API (cloud option)

 

 

Purpose:

Converts recorded audio into text.

 

 

 

 

  1. Speaker Identification (Voice ID)
    Software options:

 

 

 

  • Resemblyzer (voice embeddings)
  • pyannote.audio
  • SpeechBrain speaker recognition
  • Picovoice Eagle (commercial)

 

 

Purpose:

Creates a voice fingerprint and compares it against enrolled users.

 

Important:

This runs separately from Whisper.

Voice identity ≠ transcript content.

 

 

 

 

  1. Intent Parsing / Command Understanding

 

 

If rule-based:

 

  • Home Assistant built-in intent engine
  • Rhasspy

 

 

  1. Permission & Policy Layer

 

 

Software:

 

  • Home Assistant user permissions
  • Custom Python logic
  • Node-RED (optional orchestration layer)

 

 

Purpose:

Checks:

 

  • Who spoke?
  • Are they authorised?
  • Does this require confirmation?

 

 

Implements:

“Pocster is that OK?” → wait for verified response.

 

 

 

 

  1. Execution Layer

 

 

Software:

 

  • Home Assistant
  • MQTT broker (Mosquitto)
  • ESPHome
  • Custom Python services

 

 

Purpose:

Triggers actual devices, UI events, or automations.

 

 

 

 

  1. Text-to-Speech (TTS)

 

 

Software:

 

  • Piper (local neural TTS)
  • Coqui TTS
  • ElevenLabs (cloud)
  • Home Assistant TTS integrations

 

 

Purpose:

System speaks back to the user.

 

 

 

Clean Stack Example (Fully Local Setup)

 

Wake word: openWakeWord

STT: whisper.cpp

Voice ID: Resemblyzer

Intent: Home Assistant or local LLM via Ollama

Permissions: Custom Python layer

Execution: Home Assistant + MQTT

TTS: Piper

 

 

 

That is the full named software stack for your speech recognition + speaker ID + command system.

Posted
7 minutes ago, SteamyTea said:

WTF

 

Just put a switch or dial on the wall FFS.

 

 

 

That would make the tiling job harder and worst still make it the next job in the queue....

  • Haha 1
Posted

The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again.

 

It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that.

Posted
Just now, ProDave said:

bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that

I think that may be a problem with the OS. Dedicated hardware does not need the same sort of heavy overhead systems to run.

But yes, (expletive deleted)ing stupid idea.

Posted
1 hour ago, ProDave said:

The thing that gets me about spending countless hours configuring and setting up some custom home brew voice control system, is how do you easily back up all the configured software, so WHEN it goes wrong you can just re install it again in a flash and it will all just work again.

 

It's bad enough with my Pi music box rebuilding that each time it crashes, and there is not much customisation of that.

Easily ! Mac Time Machine - backups up automatically every day / week 

Posted (edited)
1 hour ago, SteamyTea said:

WTF

 

Just put a switch or dial on the wall FFS.

 

 

Clearly you don’t understand the issue . You cannot get Alexa to do exactly what you want without jumping through hoops “ Alexa play a random album by Coldplay “ and it selects a random cold play from your squeeze box and streamers to a default streamer . Or “ Alex disable jamma cabinet “ - recognises my voice only and does that .

You can frig some of these but it’s a pita also Amazon can cause issues .

I can and will have dashboard when done that these things are selectable. Home assistant voice can do these things with some effort but as said it’s microphones are crap . “ ok nabu radio on “ ; asking it then to turn radio off it won’t be able to mask out the background radio !!! 
Once you have a stable reliable system there are many things that can be achieved - limitation is people’s imagination.

Posting here will of course land on lots of “ why bother “ views .

Take my home cinema setup . 1 button does about 6 things ( SWMBO friendly ) yet some of this hardware has no Bluetooth / zigbee / WiFi - so making dumb things work in the chain can be challenging. 
Some of you have no vision …. 😆

Edited by Pocster
Posted (edited)

Remember this is local and no LLM overkill .

It was look at this today or painting …

Edited by Pocster
Posted
1 hour ago, Bramco said:

 

That would make the tiling job harder and worst still make it the next job in the queue....

You’ve got it ! After building for 10 yrs a bit of variety and fun projects are required ! 

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...