Blog
Inside the LPU
Deconstructing Groq's Speed
Legacy hardware forces a choice: faster inference with quality degradation, or accurate inference with unacceptable latency. This tradeoff exists because GPU architectures optimize for training workloads. The LPU–purpose-built hardware for inference–preserves quality while eliminating architectural bottlenecks which create latency in the first place.
Groq Recognized in 2025 Gartner® Cool Vendor in AI Infrastructure report
Introducing MCP Connectors in Beta on GroqCloud
Day Zero Support for OpenAI Open Safety Model
LLMs Inside the Product: A Practical Field Guide
GPT‑OSS Improvements: Prompt Caching & Lower Pricing
Introducing Remote MCP Support in Beta on GroqCloud
Introducing the Next Generation of Compound on GroqCloud
Introducing Kimi K2‑0905 on GroqCloud
Introducing Prompt Caching on GroqCloud
Day Zero Support for OpenAI Open Models
Inside the LPU: Deconstructing Groq’s Speed
OpenBench: Reproducible LLM Evals Made Easy
Build Faster with Groq + Hugging Face
GroqCloud™ Now Supports Qwen3 32B
LoRA Fine-Tune Support Now Live on GroqCloud
From Speed to Scale: How Groq Is Optimized for MoE & Other Large Models
How to Build Your Own AI Research Agent with One Groq API Call
Official Llama API Now Fastest via Groq Inference
Now in Preview: Groq’s First Compound AI System
Llama 4 Inference Fast & Affordable – Now Live on GroqCloud
Build Fast with Text-to-Speech AI – Dialog Model on Groq
Groq Vercel Integration – Fast AI Deployment
Batch Processing with GroqCloud™ for AI Inference Workloads
1