Avalon local LLM

MikeSharp01 · May 25

12 minutes ago, Pocster said:

Bit like giving an 8 yr old a chainsaw and then telling them to chop trees….

You might see it that way BUT there is very little research I have seen that shows that people without any coding skills can drive LLM code developers successfully - ie taking the basic output of requirements engineering, in English if you like, and just expecting a fully worked solution. So I speculate that what we have is apparent, stress apparent, productivity improvements for the coders that are left but where that productivity as measured by the normal metrics is coming out much less than expectations and that may in the medium term, because of maintainability, be even less. This is not to say I would advise cutting back on LLM coding but rather seeing it for what it is - more of a challenge than expected as it is now. The advent of more " agentic" LLM coding along the road Codex seems to be heading will change the perspective again. Anyways it's all good clean fun and I am off for cream cakes and lashings of ginger beer - tallyho.

Pocster · May 25

1 hour ago, MikeSharp01 said:

Anyways it's all good clean fun and I am off for cream cakes and lashings of ginger beer - tallyho.

Why not! Its hot! I'm in the garden fighting chat. I actually got it to say "I cant do it we need a coder with repo access". I beat the bitch into submission. We'd been trying and failing for 15 minutes.

I asked it what it needed. Gave me a 'sed','tail','cat' answer. I did it manually in terminal uploaded it. 60 seconds later fixed.

But yes "no coding skills" with an LLM is only going to get you toy stuff. Remember OpenAi said 70% of chat is written by chat and they expect that to be 100% soon. So the SE/coder role now changes to LLM manager/architect/controller.

But yes ; I assume even Claude can get it's knickers in a twist!

But from how I was using chat just 12 months ago (and its capability then ) to today is phenomenal !

We go do rabbit holes and poor design decisions. Sometimes it does things I didn't ask and when I find out later I tell it where to go!

But! - what I am achieving in terms of work flow with zero human coding is pretty magical even if not completely local yet. Imagine where it will all be just 12 months from now!

Oh! , I can hear the clink of ice in a G n T .......

Pocster · May 25

Then you get sweet moments where 15 patches all go straight through 😎

Pocster · May 27

Awesomeness! Refined the harness code ( I.e told chat it was shit ) - patches approval really high now .

Implemented but not ‘ voice ‘ tested …

” play Coldplay “

” play head full of dreams “

” play that Coldplay album I played last Tuesday “

” what Coldplay albums do I have? “

” play something by Coldplay “

etc etc

nice and broad . Also if you mis say or more likely it mis heard “ foldplay” will ask if you meant “ Coldplay “

Again not linked into the whisper transcript yet - just fuzzy qwen:3b ‘guessing’ based on intent and what’s in my LMS database .

Currently adding Postgres database functionality

Edited May 27 by Pocster

SimonD · May 28

On 25/05/2026 at 16:50, Pocster said:

Then you get sweet moments where 15 patches all go straight through 😎

It's nice when that happens. And then it all goes wrong with a single update - I just had an entire afternoon and evening trying to resolve a bug, eventually it was a single line of code causing double parsing of the string from the DB. Essentially the code was parsing something already parsed automatically by Postgres and it was looking in the wrong files to fix the bug. It's these days that drive you potty.

Pocster · May 28

1 hour ago, SimonD said:

It's nice when that happens. And then it all goes wrong with a single update - I just had an entire afternoon and evening trying to resolve a bug, eventually it was a single line of code causing double parsing of the string from the DB. Essentially the code was parsing something already parsed automatically by Postgres and it was looking in the wrong files to fix the bug. It's these days that drive you potty.

LOL! I understand exactly what you mean and exactly that bug. But as I've said to date not had to look at 1 line of code.

Chat provides the patch but we have a array of multiple tests it has to pass. If it fails the tests etc I get an output file of why. This is then chucked back at chat where it either does another patch or a better scout.

I've added stuff to the pi harness to reduce failing patches quite substantially e.g. pi can now grep/sed/tail files into its prompt for local 120b scout.

Commits only happen when I say so. Ledger updates must match i.e. what we did, what was tested, what was commited.

Basically so many guardrails that any patching applied has to be rock solid.

I spend 90% of my time after supplying the spec/task pressing '7' to scout or '2' to apply patch. A few copy n pastes when chat gets pissy and stuck i.e. it tells me what it needs and I do it manually.

TBH it's working really well. Soon I'll tie this into the 1st version of voice transcript and upload a demo.

Here's where I spend most my time!

Edited May 28 by Pocster

SimonD · May 28

28 minutes ago, Pocster said:

LOL! I understand exactly what you mean and exactly that bug. But as I've said to date not had to look at 1 line of code.

Chat provides the patch but we have a array of multiple tests it has to pass. If it fails the tests etc I get an output file of why. This is then chucked back at chat where it either does another patch or a better scout.

I've added stuff to the pi harness to reduce failing patches quite substantially e.g. pi can now grep/sed/tail files into its prompt for local 120b scout.

Commits only happen when I say so. Ledger updates must match i.e. what we did, what was tested, what was commited.

Basically so many guardrails that any patching applied has to be rock solid.

I spend 90% of my time after supplying the spec/task pressing '7' to scout or '2' to apply patch. A few copy n pastes when chat gets pissy and stuck i.e. it tells me what it needs and I do it manually.

TBH it's working really well. Soon I'll tie this into the 1st version of voice transcript and upload a demo.

Here's where I spend most my time!

I clearly need to be a bit more disciplined and sort this side out. I've been finding some real limitations with not working locally and version management. I haven't given it access to my local machine so this is definitely causing some friction and the Claude project area has some serious limitations. As I found out, I uploaded all my project files into the depository, but then found out that as soon as Claude does a re-write/patch or whatever edit of the file, it provides an output for download but doesn't update the project file. Then it goes back to use the project file without telling you, ignoring the new output. Even with clear instructions to check with me the latest deployed files, it sometimes completely forgets, completes a new write and output and some of the earlier code is then missing - VS Code deploy version comparison has been a bit of a godsend). I'm currently on very clear instructions and when it creates a new output, I still upload the deployed file to ask it to check the versions are consistent. Now, this is okay for small stuff, but it becomes a right PIA on anything larger (although it still saves hours, weeks and months of human time in what it's able to output - I'm just getting spoiled nowadays).

And another little bug I found is that sometimes, if I select download all for the new artifacts, some of the contents are actually previous versions, not what has just been produced!

So, having learnt what it takes to prompt, I'm now working out the next step which you've clearly nailed.

Edited May 28 by SimonD

SteamyTea · May 28

The thought of being trapped in @Pocster's basement with no stimulus, apart from Coldplay, makes me want to walk out to sea and not return.

Pocster · May 28

56 minutes ago, SimonD said:

I clearly need to be a bit more disciplined and sort this side out. I've been finding some real limitations with not working locally and version management. I haven't given it access to my local machine so this is definitely causing some friction and the Claude project area has some serious limitations. As I found out, I uploaded all my project files into the depository, but then found out that as soon as Claude does a re-write/patch or whatever edit of the file, it provides an output for download but doesn't update the project file. Then it goes back to use the project file without telling you, ignoring the new output. Even with clear instructions to check with me the latest deployed files, it sometimes completely forgets, completes a new write and output and some of the earlier code is then missing - VS Code deploy version comparison has been a bit of a godsend). I'm currently on very clear instructions and when it creates a new output, I still upload the deployed file to ask it to check the versions are consistent. Now, this is okay for small stuff, but it becomes a right PIA on anything larger (although it still saves hours, weeks and months of human time in what it's able to output - I'm just getting spoiled nowadays).

And another little bug I found is that sometimes, if I select download all for the new artifacts, some of the contents are actually previous versions, not what has just been produced!

So, having learnt what it takes to prompt, I'm now working out the next step which you've clearly nailed.

Chat still ignores things which it gets in it's prompt still!

The 'loop' is everything goes back to chat. Ledger, patch outcome, scout etc etc etc.

I'm just human doing boring stuff in the middle that creates project but writes zero code.

Because my ambition was i do ZERO code we need a tight and rather repetitive loop.

My assumption is that Claude nails this in a far better way. But chat with pi as repo scout just requires drag n drop of output files into chatgpt window.

I was thrown by codex thinking it was chat but with repo access. But it isn't. It can patch etc but it doesnt understand the project and its structure.

ChatGPT does - proves it constantly even if it has the odd wobble now and again.

@SimonD I'm confused what the issue is TBH. Claude see's the repo thats the best bit. I have to use local llm to scout repo to uload to chat. Also I naturally assume Claude better reasoner and coder than chat. So I'd assume this arrangement would work better tbh. After all I'm doing a 'poor mans' claude code

Edited May 28 by Pocster

SimonD · May 28

8 minutes ago, Pocster said:

@SimonD I'm confused what the issue is TBH. Claude see's the repo thats the best bit. I have to use local llm to scout repo to uload to chat. Also I naturally assume Claude better reasoner and coder than chat. So I'd assume this arrangement would work better tbh.

I haven't given Claude access to the repo, so there's quite a bit of manual stuff - but the issues are bugs that require a lot of oversight - e.g. not updating the project files after an update, and referring back to superceded files in memory. Yes, the coding is amazing, and the commenting is on another level so very easy to navigate the code where necessary. It's just these bugs that create workflow friction. Now, if I was brave enough to provide access to the repo, we'd probably be in a different place.

Pocster · May 28

Just now, SimonD said:

I haven't given Claude access to the repo, so there's quite a bit of manual stuff - but the issues are bugs that require a lot of oversight - e.g. not updating the project files after an update, and referring back to superceded files in memory. Yes, the coding is amazing, and the commenting is on another level so very easy to navigate the code where necessary. It's just these bugs that create workflow friction. Now, if I was brave enough to provide access to the repo, we'd probably be in a different place.

Lol. Mines a hobby not "real" like yours. But my issue even with previous projects and chat was lack of repo access. Go on give claude repo access just to look/report. BE BRAVE!

SimonD · May 28

2 hours ago, Pocster said:

Go on give claude repo access just to look/report. BE BRAVE!

I'll take a deep breath and jump, feet first and see what happens!

Pocster · May 28

44 minutes ago, SimonD said:

I'll take a deep breath and jump, feet first and see what happens!

Lol, just get to scout and suggest no code changes.

Pocster · May 29

“Programming “ 2026 style

Edited May 29 by Pocster

SimonD · May 29

2 hours ago, Pocster said:

“Programming “ 2026 style

According to Chillblast today, it's something like this:

https://www.chillblast.com/pcs/chillblast-synapse-frontier-ai-workstation - a mere £39,999.99 and it'll give you that magic 96gb. A snip with 0% finance too!

Pocster · May 29

1 hour ago, SimonD said:

According to Chillblast today, it's something like this:

https://www.chillblast.com/pcs/chillblast-synapse-frontier-ai-workstation - a mere £39,999.99 and it'll give you that magic 96gb. A snip with 0% finance too!

Shite for 40k . M3 ultra 96gb under 4k . Slower but a tad cheaper

MikeSharp01 · May 29

5 hours ago, Pocster said:

Programming “ 2026 style

You need to get a proper workstation else you will do your back in.

Pocster · May 30

8 hours ago, MikeSharp01 said:

You need to get a proper workstation else you will do your back in.

Back propped against the padded furniture. Comfy

Pocster · May 31

(expletive deleted)ing chat. Have our method/prompts/harness working pretty good yesterday. Today it doesn't understand anything!

After about an hour of wtf it would seem its magically decided we are going to do it differently! (expletive deleted) knows why it all changed.

SimonD · May 31

2 hours ago, Pocster said:

(expletive deleted)ing chat. Have our method/prompts/harness working pretty good yesterday. Today it doesn't understand anything!

After about an hour of wtf it would seem its magically decided we are going to do it differently! (expletive deleted) knows why it all changed.

Gotta love it. I had a similar thing where it told me that a load of functionality was completely different to what was in the repo. It was so certain, I had to upload the latest deployment to prove my point! And confirm that we had to audit some other stuff it had done, because we weren't sure what it had implemented. One reason why small steps closely supervised is a good thing. Every time I let my boundaries down, off it trots in some random direction. And I'm only just starting to play with local access!

On 29/05/2026 at 21:26, Pocster said:

Shite for 40k . M3 ultra 96gb under 4k . Slower but a tad cheaper

Yes, the M3 is an incredible package right now. I wonder how long that'll last! I wasn't seriously considering the PC, personally I'm just shocked at the recent inflation as I've run out of storage space and was floored by how much it costs now. I'm putting a hold on my plans to remove everything from the cloud and store locally.

Edited May 31 by SimonD

Pocster · May 31

9 minutes ago, SimonD said:

Gotta love it. I had a similar thing where it told me that a load of functionality was completely different to what was in the repo. It was so certain, I had to upload the latest deployment to prove my point! And confirm that we had to audit some other stuff it had done, because we weren't sure what it had implemented. One reason why small steps closely supervised is a good thing. Every time I let my boundaries down, off it trots in some random direction. And I'm only just starting to play with local access!

Yes, the M3 is an incredible package right now. I wonder how long that'll last! I wasn't seriously considering the PC, personally I'm just shocked at the recent inflation as I've run out of storage space and was floored by how much it costs now. I'm putting a hold on my plans to remove everything from the cloud and store locally.

I can sometimes tell when chats going to be a little shit. It takes to long "thinking" or just doesn't refresh its chat window as it should. Just done it now!. Starts claiming that the local llm doesnt work correctly and the wrapper is wrong. I do tell it to get (expletive deleted)ed. I upload the menu system script as proof of what does work with 120b local llm. It then gets on with it. Really annoying how if drifts even though I'm as specific as you can be! Tried briefly GLM (220gb vram!) think it could potentially do patches. But SO slow. I mean like 12 mins to scout and suggest patch. Even fighting with chat takes less time than that.

When I get really pissed off I tell it to "shut the (expletive deleted) up" then I get no chatter just download links. Its nicer but I haven't got a clue what they are for then!. Its a bloody child!

Edited May 31 by Pocster

Pocster · May 31

47 minutes ago, SimonD said:

Yes, the M3 is an incredible package right now. I wonder how long that'll last! I wasn't seriously considering the PC, personally I'm just shocked at the recent inflation as I've run out of storage space and was floored by how much it costs now. I'm putting a hold on my plans to remove everything from the cloud and store locally.

I recently bought a 4TB external tiny SSD for about 300 quid. Its fantastic. But yes! to go local with say min 96gb is either an m3 at an inflated price off ebay or a rtx6000 pc at 10K just for the card!!. I do chuckle now I have 256GB

SimonD · May 31

1 hour ago, Pocster said:

I recently bought a 4TB external tiny SSD for about 300 quid. Its fantastic. But yes! to go local with say min 96gb is either an m3 at an inflated price off ebay or a rtx6000 pc at 10K just for the card!!. I do chuckle now I have 256GB

I need a little more than 4TB. I've currently got 4 x 8TB external drives full, plus 2 x 4TB full, plus 2 x 2TB mini externals full (a third one just failed on me and won't mount any more). I think I've got another 6TB drive floating about too.They all need quite a bit of consolidation as 3 of the drives are backups from previous Macs, the rest are windows. I wanted to put together a proper home NAS, but prices are just scaling so badly for this, I really should have done it 18months ago when I first had the idea. Now I have 3TB on my local machine and only 197gb of storage capacity left.

Why the space? I have my fingers in multiple pots and have lots of video stored, some that need to be edited and rendered, but then I end up increasing the space I need to store the rendered content. So, I was planning on at least an extra 10TB at the very minimum, probably more for proper safe backup storage. YSWIM?

I was thinking about your project when watching this video by Braxman, maybe not massively interesting for you but maybe others. He has a video on setting up Openclaw locally on an old laptop and since I have an old gaming laptop I've just installed Linux on I wondered whether I should have a play:

Pocster · May 31

28 minutes ago, SimonD said:

Why the space? I have my fingers in multiple pots and have lots of video stored,

Oh ! Those kind of videos ! Yes they take lots of storage 😉

SimonD · May 31

1 hour ago, Pocster said:

Oh ! Those kind of videos ! Yes they take lots of storage 😉

'specially in 4k

Avalon local LLM

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Gone West

MikeSharp01

Nickfromwales

Posted Images

Create an account or sign in to comment

Create an account

Sign in