LLMs: Which Studio?

MadDoc · Sep 25, 2025

This question comes up a lot but I haven’t found a good answer for my use case and am looking for advice.

I want to run GPT OSS 120b locally. This is to privately process medical reports.

I can’t decide between the M4 Max with 128Gb RAM or stretch myself and get the M3 Ultra with 256Gb RAM.

Not sure if the tokens/sec difference between the binned M3 and the 80 core one and I’m not sure what kind of tokens/sec the Max would get for long contexts. My understanding is that I’d need 256Gb rather than the base 96Gb for OSS 120b or else memory will be an issue.

The computer will only be used for AI inference, nothing else.

maflynn · Sep 26, 2025

MadDoc said:
I can’t decide between the M4 Max with 128Gb RAM or stretch myself and get the M3 Ultra with 256Gb RAM.

I think 120b LLM is maybe a bit too large for 128GB - If it’s 8-bit quantized, you’re looking at 120 GB but macOS has overhead, so its a tight fit. Additionally M4 has less GPU and neural cores then the M3 Ultra.

I'd say the M3 Ultra with 256GB is the best option over the M4

MadDoc · Oct 2, 2025

That’s what I’ve gone for. A little pricey but I think it’s my best bet to experiment with the higher intelligence models at home.

rb2112 · Sunday at 6:09 PM

MadDoc said:
This question comes up a lot but I haven’t found a good answer for my use case and am looking for advice.

I want to run GPT OSS 120b locally. This is to privately process medical reports.

I can’t decide between the M4 Max with 128Gb RAM or stretch myself and get the M3 Ultra with 256Gb RAM.

Not sure if the tokens/sec difference between the binned M3 and the 80 core one and I’m not sure what kind of tokens/sec the Max would get for long contexts. My understanding is that I’d need 256Gb rather than the base 96Gb for OSS 120b or else memory will be an issue.

The computer will only be used for AI inference, nothing else.

T/S is really personal. Some people need to sit there and read output and are dissatisfied with 40t/s. I personally can prompt it then do something else and check back in a few minutes later, so 18t/s is acceptable.

As the ultra is so much more expensive, can you prep all your models and inputs, get the M4 Max 128GB, then run for a few days to see how it goes?

Search

Search

LLMs: Which Studio?

MadDoc

macrumors 6502

maflynn

macrumors Broadwell

MadDoc

macrumors 6502

rb2112

macrumors regular

Our Staff