Avalon local LLM

Pocster · April 7

M3 96gb arrived ! Now the work begins !!

-rick- · April 7

Well this escalated.

Not trying to win an argument or even have an argument. When a response to a post of mine doesn't seem to engage on the points in my post I wonder if I didn't explain it well enough so try again. I think you understand my points now so not worth going further.

The one thing I will point out is that the shortages are all on older gen stuff. M4 mac minis, M3 mac studios. Products expected to have announcements of replacements in the next 2 months. It's a well documented Apple tactic to sell down inventory/put long lead times on products near this point (happens with iPhone, laptops, etc). It doesn't always happen because in general Apple has bigger surplus but if things are running low they don't bother to build more.

SteamyTea · April 7

Helium is essential for making all electronics these days.

Helium is a byproduct of oil and gas extraction, as well as radioactive geological processes.

We are basically running out of helium, so if you want some RAM, keep filling your car up with fossil fuels.

If, by chance, you own a small BEV, say a Renault Zoe, and you want some RAM, then you only have yourself to blame, you selfish (expletive deleted)er.

-rick- · April 7

7 minutes ago, SteamyTea said:

Helium is essential for making all electronics these days.

Helium is a byproduct of oil and gas extraction, as well as radioactive geological processes.

We are basically running out of helium, so if you want some RAM, keep filling your car up with fossil fuels.

If, by chance, you own a small BEV, say a Renault Zoe, and you want some RAM, then you only have yourself to blame, you selfish (expletive deleted)er.

This is not the point you are making but what's going on with Iran is massively constraining the supply of Helium. Qatar is the main producer.

US Labs have been told to expect a 50% reduction in supply. This affects MRI's, chip fab and a load of other things. Everything is going to get very difficult very soon.

If the global economy takes a real dive, one silver lining is that when the Iran crisis is over the AI companies might not have such deep pockets (or in some cases still exist) and so memory might become more available again. Then again, nobody else will have money either. 🥴

Pocster · April 7

Oi (expletive deleted)ers !

This is my local llm thread ! Not political or supply issues . Get a (expletive deleted)ing room 👊🏻

-rick- · April 7

7 minutes ago, Pocster said:

Oi (expletive deleted)ers !

This is my local llm thread ! Not political or supply issues . Get a (expletive deleted)ing room 👊🏻

Sorry, thought you'd be too busy chatting up your new llm girlfriend to notice 😏

SteamyTea · April 7

9 minutes ago, -rick- said:

18 minutes ago, Pocster said:

Oi (expletive deleted)ers !

This is my local llm thread ! Not political or supply issues . Get a (expletive deleted)ing room 👊🏻

Sorry, thought you'd be too busy chatting up your new llm girlfriend to notice

What do you think he calls her?

Pocster · April 7

11 minutes ago, -rick- said:

Sorry, thought you'd be too busy chatting up your new llm girlfriend to notice 😏

Thing is @-rick- I was just playing with you a bit . Clearly demand will out strip supply for m5 ultra for sure I.e delivery times will slip like a dog . My intention is to order within minutes of it going live in the App Store .

My ambitions probably do not need m5 with 256gb or more but it bugs me to buy “old” m3 ultra at near full whack when we all know new and similar priced is coming .

-rick- · April 7

1 hour ago, Pocster said:

I was just playing with you a bit

Of course you were! (though I didn't see the buy of the Mac as part of that, just assumed you were impatient/had money to burn).

1 hour ago, Pocster said:

. Clearly demand will out strip supply for m5 ultra for sure I.e delivery times will slip like a dog . My intention is to order within minutes of it going live in the App Store .

Glastonbury tickets all over again 😛

1 hour ago, Pocster said:

My ambitions probably do not need m5 with 256gb or more but it bugs me to buy “old” m3 ultra at near full whack when we all know new and similar priced is coming .

Amen.

Though before you spend money I assume you have a prototype running (at lower model size) on existing hardware?

-rick- · April 7

1 hour ago, SteamyTea said:

What do you think he calls her?

I'm quite happy with my decision to spend zero time trying to imagine whats going on in @Pocsters head!

Pocster · April 7

47 minutes ago, -rick- said:

Though before you spend money I assume you have a prototype running (at lower model size) on existing hardware?

Of course not ! The m3 ultra 96gb will be the prototype!

Pocster · April 7

1 hour ago, -rick- said:

Of course you were! (though I didn't see the buy of the Mac as part of that, just assumed you were impatient/had money to burn).

Glastonbury tickets all over again 😛

Amen.

Though before you spend money I assume you have a prototype running (at lower model size) on existing hardware?

Go big or don’t go in 😉

Pocster · April 9

On qwen coder 30b getting a very respectable 80 tokens / sec .

Estimate on ultra m5 same model might touch 200 !

SteamyTea · April 9

6 minutes ago, Pocster said:

tokens / sec

Wrong metric.

https://www.forbes.com/councils/forbestechcouncil/2025/10/21/why-tokens-per-watt-is-crucial-for-measuring-ai-efficiency/

-rick- · April 9

You looked at Gemma 4? Supposed to be able to get qwen like performance/capability but in a much smaller model. 96GB bit of a waste for it

-rick- · April 9

2 minutes ago, SteamyTea said:

Wrong metric.

https://www.forbes.com/councils/forbestechcouncil/2025/10/21/why-tokens-per-watt-is-crucial-for-measuring-ai-efficiency/

Good job Macs are about the most efficient AI platform then

Pocster · April 9

55 minutes ago, -rick- said:

You looked at Gemma 4? Supposed to be able to get qwen like performance/capability but in a much smaller model. 96GB bit of a waste for it

Sure . Just trying to get visual studio to connect to anything . Will need multiple models anyway for full Avalon

Pocster · April 9

Ooooo !

Forget code complete ….

Used to be anticipate variable etc .

Now write function - but tbh a bit crap - “ simplistic “

Rephrase the prompt and let it write the entire program ( just a test example ) then refactor it . Wow ! . Local speed surprised me as did code quality .

Tomorrow setup a test local git and see what happens . Awesome stuff !

-rick- · April 9

7 minutes ago, Pocster said:

Ooooo !

Forget code complete ….

Used to be anticipate variable etc .

Now write function - but tbh a bit crap - “ simplistic “

Rephrase the prompt and let it write the entire program ( just a test example ) then refactor it . Wow ! . Local speed surprised me as did code quality .

Tomorrow setup a test local git and see what happens . Awesome stuff !

Oh....

You are still working that way? I thought that was considered very inefficient these days.

Better to have it help you write a spec, including testing regime, code-quality metrics, design philosphy and then ask it to go build (and spawn agents/tools to help).

Not that I'm doing any of this. Just watching people like Theo explain it:

https://www.youtube.com/@t3dotgg/videos

Had this in my 'to watch' list for a while:

Pocster · April 9

43 minutes ago, -rick- said:

Oh....

You are still working that way? I thought that was considered very inefficient these days.

Better to have it help you write a spec, including testing regime, code-quality metrics, design philosphy and then ask it to go build (and spawn agents/tools to help).

Not that I'm doing any of this. Just watching people like Theo explain it:

https://www.youtube.com/@t3dotgg/videos

Had this in my 'to watch' list for a while:

lol

I’m just testing it out.

So did simple “ how does it work “ tasks building up each time .

chat and I spec / architect the real thing then just get local llm to do the code .

So yeah I’ll be doing it that way . Don’t really need multiple agents etc .

But I’m so surprised at the speed and quality of code when spec’d properly .

But before this I got chat to write an entire esp32s3 project for me ( automated watering system - it’s on the forum ) .
This though - wow . Not only are my demands far more ; but its capabilities are also .

Edited April 9 by Pocster

-rick- · April 9

16 minutes ago, Pocster said:

This though - wow . Not only are my demands far more ; but its capabilities are also .

What was your previous setup?

Pocster · April 9

Just now, -rick- said:

What was your previous setup?

What do you mean ? To build my watering system ?

just chat . A bit of work to get it reliable with multi file projects ( it doesn’t reliably support git ) .

Now . Me and chat design / spec . Local llm will write code / test . Another 2 local llms will deal with reasoning from voice commands to control / anticipate home assistant stuff

-rick- · April 9

3 minutes ago, Pocster said:

just chat

Which means? ChatGPT? Free, Plus, Pro?

Codex? Claude?

Pocster · April 9

25 minutes ago, -rick- said:

Which means? ChatGPT? Free, Plus, Pro?

Codex? Claude?

Oh I see lol !

Well with free chat I’d run out of credits , so I tried free Claude and run out of credits . I then upgraded Claude and run out .

went back to chat + For 20 quid a month it’s exceptional I think . Our technical conversations can hit 50k tokens .

So I was trying to use chat like a company might pay a proper subscription and use Claude . But I ain’t paying 200 bucks a month

Edited April 9 by Pocster

-rick- · April 9

2 hours ago, Pocster said:

So I was trying to use chat like a company might pay a proper subscription and use Claude . But I ain’t paying 200 bucks a month

Just spend 5 years of subs money on a mac to run the LLM on and hope that in 2 years time it's still good enough to run the current models

Reason I asked is you said the new setup seemed to be performing better and everything I've seen suggests that local models are generally a little worse at chat type stuff and quite a bit worse for agentic workflows (still much better than the leading models from 9 months ago).

I guess with the $20 gpt sub you can't access the better OpenAI models?

Avalon local LLM

Recommended Posts

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Gone West

MikeSharp01

Nickfromwales

Posted Images

Create an account or sign in to comment

Create an account

Sign in