Gemma 4 is already live on Modular Cloud ⚡ Day-zero support, fastest performance across both NVIDIA and AMD, powered by MAX. One unified system, from kernel to cloud. No waiting on upstream kernels, no fragmentation, just consistent performance and portability. If you’re evaluating Gemma 4 for production, this is worth a look.
Gemma 4 is live on Modular Cloud, day zero, with the fastest performance on both NVIDIA and AMD. Our MAX inference framework delivers 15% higher throughput vs. vLLM on B200, and we’re the only inference provider to ship Google DeepMind's Gemma 4 on a framework we built ourselves. Both flagship models available now: → Gemma 4 31B: dense, 256K context, built for deep reasoning → Gemma 4 26B A4B: MoE, 26B params, 4B active per forward pass Both natively multimodal: text, images, and video. Modular Cloud runs on MAX, our inference framework that unifies GPU kernels, graph compilation, and high-performance serving in a single hardware-agnostic stack. When a new architecture drops, we're not waiting on upstream support or porting hand-tuned kernels. We went from new weights to SOTA performance on two hardware platforms in days. No other inference provider is shipping Google DeepMind's Gemma 4 on a framework they built themselves, and we're the only team serving it across multiple GPU stacks. NVIDIA B200 or AMD MI355X. Same stack, same API. Pick the price-performance point that fits your workload on Modular Cloud. Try Google's Gemma 4: https://lnkd.in/gxGVP4MA What model are you trying first? #Gemma4