This question comes up a lot but I haven’t found a good answer for my use case and am looking for advice.
I want to run GPT OSS 120b locally. This is to privately process medical reports.
I can’t decide between the M4 Max with 128Gb RAM or stretch myself and get the M3 Ultra with 256Gb RAM.
Not sure if the tokens/sec difference between the binned M3 and the 80 core one and I’m not sure what kind of tokens/sec the Max would get for long contexts. My understanding is that I’d need 256Gb rather than the base 96Gb for OSS 120b or else memory will be an issue.
The computer will only be used for AI inference, nothing else.
I want to run GPT OSS 120b locally. This is to privately process medical reports.
I can’t decide between the M4 Max with 128Gb RAM or stretch myself and get the M3 Ultra with 256Gb RAM.
Not sure if the tokens/sec difference between the binned M3 and the 80 core one and I’m not sure what kind of tokens/sec the Max would get for long contexts. My understanding is that I’d need 256Gb rather than the base 96Gb for OSS 120b or else memory will be an issue.
The computer will only be used for AI inference, nothing else.