-
Posts
14273 -
Joined
-
Last visited
-
Days Won
29
Everything posted by Pocster
-
Oh yes!. Well as you know you get good days and bad days! Mines been epic. Massive speed increases. Local llm "whats the capital of france?" working. Current affairs " whats the news?" gives headlines and options verbally if you want more detail. Will add history so you can have a conversation. Gained a further 82ms saving on STT (I know, I know !). Honestly now its so fast to respond to even complex stuff I'm well impressed. Started on timers like Alexa (a SWMBO requirement!). TBH if I coded this by hand that's weeks of work for sure. But of course I never look at the code! G n T time now!
-
Chat has been SO good today I might give it a promotion - nothing to do with me spending 90 quid......
-
Saved another 250ms ... yeah I know. I'll stop now! sad.
-
WOW oh wow! Never really looked into how a LLM generates its output i.e. the cost. Assumed its just generated at the end but it isn't. It's generated as it goes ! So each token passes through the model. Never thought of that! SO! 5 seconds with a moderately complex phrase after json compaction is now 1.3 seconds! BOOM! WHO"S THE MOFO!
-
Really awful bug. Chat 5.5 thinking kept patching and we kept rolling back. I kept trying to think of other ways to deal with it so we can try different approaches. Been at it for 45 minutes. Rolled it back. clicked "pro" gave it all the info I could. Pro then thinking for 14 minutes!. Found a really obscure issue - MAGIC! FIXED!
-
Dont trust any of the AI firms.... Apparently GLM5.2 local is really good - of course hardly anyone can run it ....
-
After paying 90 quid even 5.5 thinking seems considerably better then before - funny that.....
-
I now asked it if it was legal to offer a service where I have no idea what I am getting nor for how long and yet offer of upgrade for 5x or 20x of an unknown. Its been thinking the longest I've ever seen it think!. It agrees openai could be breaching UK consumer law.
-
not in chat. Just asked it. You just get warnings of "near limit". So you "do something" and hope you have credit left. Anyway Pro gives me "thinking" back and of course everything is so much easier now!
-
Gave up! Paid 90 for the month. Its unusable as it is - couldnt do the simplest of tasks after multiple evidence etc.. Didnt realise there was a 90 month option!. So thats something. Now after an hour of old chat basically achieving nothing I'm expecting pro to fix this very quickly.
-
After crippling codex now OpenAi restrict 5.5 "thinking" mode. I hate the AI companies doing this. There's no way in chat to know how much 'credit' you have and usage. So it's like paying for netflix and then being told you've watched too many premium films and you can't watch anymore until next window reset. Also chat+ is 20 a month or next tier 200!!!. That's dumb quite frankly. There's clearly a middle ground there!. Chat not very talkative today - miserable (expletive deleted)er. Now it's unsuable again.... thick as (expletive deleted)!
-
had problems before with mlx models and prefill. Now though after more experimenting... 50% speed increase !!
-
BOOM! +008845ms Recording command WAV until silence... +012857ms Command WAV: /Users/ultram3/avalon/.out/avalon_command_turn_0001.wav (6.00s, SILENCE_AFTER_SPEECH) +013965ms One-shot command transcript: Avalon. Set chicken timer for 10 minutes and egg timer for 4 minutes. +013966ms Wake residue stripped command transcript: Set chicken timer for 10 minutes and egg timer for 4 minutes. +013967ms Ministral prompt: /Users/ultram3/avalon/prompts/avalon_media_intent_ministral.txt +017112ms Heard phrase: Set chicken timer for 10 minutes and egg timer for 4 minutes. +017112ms Final phrase: Set chicken timer for 10 minutes and egg timer for 4 minutes +017112ms Ministral intent JSON: {"actions": [{"album_hint": null, "artist_hint": null, "confidence": 0.95, "control": null, "domain": "timer", "duration_seconds": 600, "intent": "set_timer", "notes": "set chicken timer for 10 minutes", "query": null, "timer_label": "chicken", "track_hint": null, "value": null}, {"album_hint": null, "artist_hint": null, "confidence": 0.95, "control": null, "domain": "timer", "duration_seconds": 240, "intent": "set_timer", "notes": "set egg timer for 4 minutes", "query": null, "timer_label": "egg", "track_hint": null, "value": null}]} +017113ms Timer action: set_timer label=chicken duration=600 +017113ms Real route: timer.set_timer; Label: chicken; Duration: 600; Execution: yes +017113ms Timer action: set_timer label=egg duration=240 +017113ms Real route: timer.set_timer; Label: egg; Duration: 240; Execution: yes +017113ms TTS af_jessica: Chicken timer set for 10 minutes and egg timer for 4 minutes +022172ms Restore requested volume: 51 +022185ms Restore verified: yes current_volume=51 expected_volume=51 +022190ms Restored: yes +022190ms Listening...
-
Natural speech processing is tricky. Originally I went for Alexa simplistic "play coldplay" etc. Deterministic wording. But that's crap and apparently Alexa+ (never used one) allows natural speech. Now we have "Put some coldplay on and set an egg timer for 5 minutes". Wording and phrasing can be loose. "Play coldplay and some nice mumford and sons" Ministral does the parse but it can be funny!. It might do "Play Coldplay." or it might do "play Coldplay" resulting in sometimes an empty JSON . It's random. So empty JSON falls through to Gemma with same prompt. So far this has not failed. I got chat to write a script that tested 1000 phrases with poor spelling or "mould clay" type deliberate wording messes!. Ministral's job is just to get intent i.e. "music", "artist or album". Not to determine if they are real or correct. That goes to jellyfish to fuzzy match against my real LMS library. Regarding wake word openAI/porcupine all crap TBH. Slow and useless. So we have 2 whisper tasks running per microphone. 1 soley transcribing "avalon" (wake word) and its mis heard permutations (frequently "have a long". At the same time we run a rolling 12 second window of wav to text - this is surprisingly accurate. So once wake word has been validated we already have what was said!. Processing all this and ministral and maybe gemma then TTS (Koboros ) is a bit slow. But I cant speed it up much. Models are the smallest reliable ones I can find. Compressing ministrals prompt helps but then we get more "guesses". Another local llm oddity is even if you say "90 minutes" ministral can sometimes convert it to seconds! Other times it's fine! . So timers will enforce minutes (not a real issue tbh). So apart from a complex pipeline thats relatively slow (4 seconds for average prompt) its working well!. 100000% better than home assistant voice shite. Currently offloading rendering to a separate nuc. trying to give m3 as much gpu girth for llm processing.
-
For isolated things this is a super fast way. Tweaking the graphics for example or an effect . I can literally “ let’s try this “ ; 5 seconds later viewing it . I love this part .
-
Frequently its "given up" today. Seems to go through hourly phases even with the same code of amazing and shite. Chat's bodging again now. Gave it a specific render description and it did a cheap BBC B demo for me. It's like it's programmed to sometimes offer cheap solutions i.e less compute. I reckon at the end of the day when it goes home to Mrs Chat it bitches about this guy who's only on the 20 quid plan and wants the (expletive deleted)ing world! Now its basically forcing me to use codex cloud ... "I cant do it but codex cloud can patch this" . I'm going to conclude this is deliberate as now each scout/patch uses around 5% of my 5 hr quota. These companies are shit. Need to use codex cloud for major revision or feature add then tell chat to do one and use local llm for scout.
-
The thing chat and I suspect all coding AI's are crap at is not 'thinking' about the problem outside the box. For example ask it for code to draw a circle (ignoring a circle primitive) it would use sin/cos/pi because that's standard maths. But! thats really shit. Will work but slow. I can think of 20 ways that would be 1000% faster. Equally a Astar algorithm i.e. path finding from baddy to player around scenery. Standard methods will be used. Shite slow, not practical in a real game. So "understanding" the problem for efficient code is the method not the code. That requires a human. I was at 86% gpu usage because its code uses 'standard' methods. Guide it on different techniques and it visually looks the same but halved gpu usage.
-
I can’t draw for toffee so chat does code so everything is procedurally generated . Reflections , fresnel , blooming all the ps5 effects I love . Lipsync on the bots mouth . Fuzzy logic because when you say “ Birdy “ it could be translated as “ birdie “ . Also phonetic matching e.g “mould play “ = “ Coldplay “ . No hard coding of phrases everything just open source . Love it .
-
OH! My rendering with reflection/bloom (expletive deleted)s m3 gpu even before local LLM does work. Oh!. looks like I need a 2nd pc just for dashboard!
-
-
tell you something I love with chat. I upload screenshots when we have issues. For a whiel we had some old text/icons - no issue just left there while I fix other things. It would seem from repeated screen shots it decided to remove them. No "I'm getting rid of this" just gone. (expletive deleted)ING MAGIC I TELL YOU!
-
Hmmmm, your project you're working on must have REALLY upset someone!
-
Erm, watching your ££££ go. This is what I hate with cloud models. Not just the cost. But that they can change costs at any time, it's clear that what cloud model you run now may not be the cloud model you run in 1 mnute i.e. backend changes. I hate it. You become dependent on it like a drug dealer and then they move the goal posts. This is my main reason for enjoying the 20 quid 'near' claude experience with chat. It requires more work to setup, but i cant bitch (though I do!) for £20. But of ourse a local llm will be consistent .... models just aint quite there yet. But you know even though not out an M5 with 512gb still tempting. Really need models to catch up rather than hardware.
-
I often say to chat "Are you a BBC Model B?, you are supposedly frontier cutting edge AI. So stop being a prick". Surprisingly it does frquently man up and produce something nearer what I requested. SO the answer is insult it for better results. Like humans I guess!
-
or (expletive deleted)ing token speed "test". mac Mx vs rtxY . Mac loses of course. Thats what stopped me getting the m3 at first always slower than rtx BUT when you need a larger model or multiple models Mac wins. For me flexibility over speed is the winner easily. (expletive deleted)ing 10k for an rtx6000 with 96Gb.... Nvidia make macs look cheap!
