Speaker presenting on stage in front of a large screen showing data center equipment and text about inference chips.

Renegade 2026 Keynote

Hear from our CEO, June Paik, on Furiosa's vision for the future of AI infrastructure.  
The full keynote sessions are now available.

01

Furiosa RNGD AI accelerator card in red with specifications highlighting TFLOPS, HBM3 memory, SRAM, and memory bandwidth.

Tensor Contraction Processor

02

Furiosa RNGD second-generation AI accelerator chip displayed against a black background with illuminated die architecture details.

RNGD PCIe

03

Furiosa NXT RNGD AI server system featuring multiple red accelerator modules installed in a high-performance rack chassis.

NXT RNGD Server

FuriosaAI and Samsung SDS launch Korea’s first domestic NPUaaS to expand enterprise AI access

Samsung SDS

FuriosaAI expands European AI infrastructure with RNGD deployment at Equinix’s Lisbon data center

Equinix deployment

Furiosa SDK 2026.3: A new kernel framework, and the models it unlocks

SDK 2026.3

FuriosaAI and Broadcom partner on next-gen inference for Agentic AI

Broadcom partnership

RNGD outperforms RTX Pro 6000 with the latest SDK

Benchmark

3 KW INFERENCE APPLIANCE FOR AGENTIC SYSTEMS

Furiosa NXT RNGD Server delivers exceptional performance with cost-efficient scalability for inference with advanced LLM and agentic AI applications. Designed for air-cooled data centers, the NXT RNGD Server can be deployed on-premises, in managed environments, or colocation facilities.

8× RNGD

Cards

4 petaFLOPS

512 TFLOPS × 8 cards

384 GB

HBM3 capacity

12 TB/s

Memory bandwidth

3 kW

Power consumption

A data center aisle with black server racks on both sides under bright linear ceiling lights.

"Furiosa RNGD provides a compelling combination of benefits: excellent real-world performance, a dramatic reduction in our total cost of ownership, and a surprisingly straightforward integration.”

Kijeong Jeon, Product Unit Lead

RNGD enables 4x more inference capacity

Enterprise AI scale is constrained by data center power density, with most infrastructure today limited to 15kW per rack.

Max. # of servers per rack

5x

2x

Server power consumption

3 kW

7.5 kW

Tokens/s per rack

26,400 tokens/s

6,600 tokens/s

Max. # of users per rack

880

220

Rack-scale comparison between Furiosa’s NXT RNGD AI infrastructure and an RTX PRO 6000-based deployment, showcasing high-density AI inference systems designed to maximize performance, efficiency, and data center utilization.

Tensor contraction, not matmul

Composite product image showcasing Furiosa RNGD AI accelerator hardware. The image features the external accelerator card with a red thermal enclosure alongside the exposed internal PCB, revealing the RNGD processor, power delivery components, memory interfaces, and board-level architecture. The side-by-side presentation highlights both the industrial design and engineering implementation of the high-performance AI accelerator.

At the heart of Furiosa RNGD is Tensor Contraction Processor architecture (ISCA 2024), specifically designed for efficient tensor contraction operations.
‍
The fundamental computation of modern day deep learning is tensor contraction, a higher dimensional generalization of matrix multiplication. However, most commercial deep learning accelerators today incorporate fixed-sized matmul instructions as primitives.

RNGD breaks away from that, unlocking powerful performance and efficiency.

Abstract red background with dark navy halftone dots forming wavy, flowing patterns across the top and center of the image.

INFERENCE WITHOUT CONSTRAINTS

Performance

Deploy the most capable models with high-throughput, low-latency execution.

Efficiency

Lower total cost of ownership with reduced energy draw, fewer racks, and standard air cooling.

Programmability

Maintain flexibility for tomorrow’s models with a compiler designed to optimize evolving AI workloads.

SOFTWARE FOR LLM DEPLOYMENT

Furiosa Software provides a comprehensive toolchain for LLM inference and agentic workloads, from compilation and optimization to production deployment.

FuriosaAI’s full-stack AI software platform integrates PyTorch, Tensor Contraction Language, compiler optimization, binary generation, LLM serving infrastructure, and Kubernetes-based deployment to deliver efficient AI inference from model development to production-scale systems.

Built for enterprise inference deployment

Comprehensive toolchain for LLM inference and agentic workloads, from compilation and optimization to production deployment.

Maximizing data center utilization

Ensure higher compute utilization and architectural flexibility across systems with containerization, SR-IOV, Kubernetes, as well as other cloud native components.

Robust ecosystem support

Effortlessly deploy models from library to end-user with PyTorch 2.x integration. Leverage the vast advancements of open-source AI and seamlessly transition models into production.

Start testing with Furiosa Access

Data center deployment of a Furiosa RNGD AI inference server with multiple accelerator cards installed in a rack-mounted system, delivering scalable performance, high density, and energy-efficient execution for large-scale AI and LLM workloads.

Interested in evaluating NXT RNGD Server? The Furiosa Access Program provides a structured path for customers and partners to evaluate, integrate, qualify, and deploy Furiosa accelerators through both online and offline access. Available worldwide.

Furiosa Access locations

Seoul

KOR

Bay Area

USA

Lisbon

PRT

Johor Bahru

MYS