Blog

Nebius AI Cloud “Aether 3.5”: Frictionless compute for real world AI

This release introduces new serverless capabilities, the NVIDIA RTX PRO™ 6000 Blackwell Server Edition GPU for applied AI use cases, improved cluster configuration tools, streamlined data operations and platform-level enhancements that reduce routine complexity while preserving full control.

Platform news

MLPerf® Inference v6.0: Top-tier AI performance on NVIDIA Blackwell and Blackwell Ultra

The results of our MLPerf® Inference v6.0 submission demonstrate Nebius’ ability to maximize efficiency for modern AI inference workloads on the latest NVIDIA Blackwell and Blackwell Ultra platforms.

Platform news

Nebius partners with Positronic on Physical AI Leaderboard (PhAIL)

Physical AI is moving from controlled demos to real-world deployment — and that shift demands benchmarks grounded in actual operations, not lab conditions. Today, Nebius joins Positronic as a founding consortium partner of the Physical AI Leaderboard (PhAIL), a new platform that evaluates robot AI models on real hardware using commercially relevant tasks and production-grade metrics.

Technology articles

Nebius VPN Gateway CLI: Easily manage site-to-site VPNs in AI Cloud

The Nebius VPN Gateway CLI provides a simple way to configure and operate site-to-site VPN connectivity in Nebius AI Cloud. In this post, we walk through how it enables infrastructure-as-code workflows for IPSec gateways, helping teams manage secure and reliable connectivity across cloud and on-prem environments.

Platform news

Introducing NVIDIA RTX PRO 6000 Blackwell Server Edition on Nebius

NVIDIA RTX PRO 6000 Blackwell opens new opportunities for cost-efficient inference and increased performance for visual computing and scientific simulations.

Platform news

Introducing DevPods, Jobs and Endpoints: Easy compute access with serverless AI

The serverless services at Nebius are a natural extension of how an AI infrastructure cloud evolves over time, building on a mature and well-established underlying platform. As the platform develops, it becomes possible to expose compute in more flexible and elastic forms that better match how AI workloads are consumed.

Platform news

Nebius and PyTorch partner to accelerate frontier MoE training on NVIDIA Blackwell

In collaboration with PyTorch, Nebius helped demonstrate up to 41% faster pre-training of DeepSeek-V3 models on NVIDIA Blackwell GPUs.

Technology articles

Incident post-mortem analysis: us-central1 service disruption on March 10, 2026

A detailed analysis of the incident on March 10, 2026 that led to service outages in the us-central1 region.

Platform news

Delivering a validated AI Factory stack for agent workloads on Nebius AI Cloud with DataRobot

At NVIDIA GTC 2026, Nebius and DataRobot, with NVIDIA, introduced a validated AI Factory stack for production-grade agent workloads. In this post, we outline how the DataRobot Agent Workforce Platform runs on Nebius AI Cloud to support sustained inference, governance and cost control for AI agents deployed in live business workflows.

Platform news

From fragmented data to production-grade agents: Nebius, Nexla and Tripadvisor at NVIDIA GTC

Nexla and Nebius are partnering to deliver a production-ready data and agent stack that connects governed enterprise data with infrastructure built for sustained inference. In this post, we outline how this architecture enables multi-agent systems to move from fragmented data pipelines to reliable production workflows, and show it in action through a live “Inspiration to Trip” demo presented with Tripadvisor at NVIDIA GTC.

Technology articles

Incident post-mortem analysis: eu-north-1 service disruption on February 26, 2026

A detailed analysis of the incident on February 26, 2026 that led to service outages in the eu-north-1 region.

Platform news

Nebius and Eigen AI partner to accelerate frontier open-source AI inference

Nebius and Eigen AI are partnering to bring optimized frontier open-source models to Nebius Token Factory. As part of the collaboration, optimized implementations of models such as DeepSeek, GLM, GPT-OSS, Kimi, Llama, MiniMax and Qwen will be published on the platform, giving developers direct access to high-performance inference through production-ready endpoints and APIs.

Platform news

Elevating the craft: Introducing the Inference Frontier Program

Today we’re introducing the Inference Frontier Program, a new builder-to-builder initiative dedicated to production inference systems. The program surfaces real architectures, optimizations and engineering tradeoffs from teams running large-scale inference in production.

Technology articles

What is AI Cloud? Key features, use cases & how to choose

Modern ML and LLM workloads require environments equipped with specialized hardware, high-performance networking and integrated MLOps tools. In this article, we’ll explore how AI-focused clouds differ from general-purpose platforms — and what criteria define the right provider for building scalable AI systems.

Platform news

NVIDIA Nemotron 3 Super now available on Nebius Token Factory

NVIDIA Nemotron 3 Super is now available on Nebius Token Factory, bringing a 120B hybrid MoE model optimized for multi-agent systems and complex reasoning workflows to production deployments. With long-context inference and OpenAI-compatible APIs, teams can run Nemotron 3 Super through dedicated GPU endpoints and autoscaling infrastructure without managing their own serving stack.

Technology articles

OpenClaw security: architecture and hardening guide

Self-hosted AI agents offer control and flexibility, but they also introduce real security risks. Incidents involving malicious ClawHub skills, exposed default ports and prompt-injection attacks show that running OpenClaw is not just an installation task, but an infrastructure decision. This guide explains OpenClaw’s architecture and maps real threats to concrete-hardening controls, so teams can deploy it safely in production.

AI research and insights

Meet SWE-rebench-V2: A multilingual, executable dataset for training Software Engineering Agents

We’re introducing SWE-rebench-V2, the next iteration of our large-scale dataset of reinforcement learning (RL) environments for training autonomous software engineering agents (SWEs).

Platform news

Nebius and Toloka to introduce integration to bring human experts-on-demand to AI agents

Today, Nebius and Toloka are announcing plans to bring Tendem into the Nebius ecosystem. This integration further strengthens the Nebius AI stack, anchoring the raw intelligence of Token Factory and the autonomy of Tavily agentic search with a programmable layer of human reliability. Originally designed as the market’s pioneer hybrid human-AI agent, Tendem is now the first platform to embed vetted human experts directly into agentic workflows — making expert judgment callable via the Model Context Protocol (MCP), the emerging standard for AI tool integration.

Platform news

Introducing Dedicated Endpoints and Custom Weights Hub in Nebius Token Factory

We are introducing Dedicated Endpoints and a Custom Weights Hub in Nebius Token Factory. You can now choose GPU type, define GPUs per replica, set scaling limits, select region and deploy your own model weights to isolated endpoints. Deployment becomes a defined, controllable part of your production architecture.

Technology articles

Scaling efficient production-grade inference with NVIDIA Run:ai on Nebius

NVIDIA and Nebius ran joint benchmarks using NVIDIA Run:ai, the AI workload orchestration and optimization software platform on the Nebius AI Cloud. The goal was simple: test whether fractional GPU allocation could improve efficiency and scalability for real-world inference workloads — without compromising performance.

Technology articles

Routing in LLM inference is the difference between scaling and stalling

When inference becomes distributed, routing strategy can determine whether a system scales or stalls. In this article, we examine a real agent-style workload where cache-aware routing in vLLM reduced average step time by nearly 50 percent and cut P95 latency from over a minute to under 20 seconds — with the same model, hardware and traffic.

Technology articles

The energy behind AI: Why power efficiency matters

As AI adoption accelerates, energy use increasingly sets the boundaries of how far the systems can scale. Power availability, efficiency and infrastructure design are becoming practical constraints. This shift is prompting to think of concrete ways to manage the energy footprint of AI systems by optimizing energy use and creating measurable efficiency gains. In our latest whitepaper, we explain how Nebius improves efficiency across the stack, from software engineering to hardware design and data center operations.

Platform news

FinOps efficiency for AI workloads with FOCUS-compliant billing data

We recently introduced support for exporting billing data from Nebius AI Cloud in the FOCUS format. This update is a small but important step toward making cloud integration simpler and financial operations smoother for teams building and scaling AI workloads. At Nebius, we believe billing data should be easy to work with, easy to integrate and easy to trust — especially when AI infrastructure costs are a core part of your business model.

Technology articles

Why large MoE models break latency budgets and what speculative decoding changes in production systems

Large mixture-of-experts language models promise significant gains in model quality, but deploying them in real products often exposes hard latency limits. This article explains why MoE systems that look efficient in benchmarks struggle under production constraints, and how architectural decisions around routing, batching and serving determine whether latency budgets hold under worst-case inputs and real user behavior.

AI research and insights

OpenHands trajectories with Qwen3 Coder 480B

While reinforcement learning drives agents to state-of-the-art performance, rejection fine-tuning serves as a powerful baseline. Stemming from our extensive experiments with different models and scaffoldings, we are sharing a dataset of 67k high-quality OpenHands trajectories from Qwen3 Coder 480B for research purposes. We also include two RFT checkpoints — Qwen3 Instruct 30B and 235B, achieving ~50% and ~60% on SWE-bench Verified respectively.

Blog

Nebius AI Cloud “Aether 3.5”: Frictionless compute for real world AI

MLPerf® Inference v6.0: Top-tier AI performance on NVIDIA Blackwell and Blackwell Ultra

Nebius partners with Positronic on Physical AI Leaderboard (PhAIL)

Nebius VPN Gateway CLI: Easily manage site-to-site VPNs in AI Cloud

Introducing NVIDIA RTX PRO 6000 Blackwell Server Edition on Nebius

Introducing DevPods, Jobs and Endpoints: Easy compute access with serverless AI

Nebius and PyTorch partner to accelerate frontier MoE training on NVIDIA Blackwell

Incident post-mortem analysis: us-central1 service disruption on March 10, 2026

Delivering a validated AI Factory stack for agent workloads on Nebius AI Cloud with DataRobot

From fragmented data to production-grade agents: Nebius, Nexla and Tripadvisor at NVIDIA GTC

Incident post-mortem analysis: eu-north-1 service disruption on February 26, 2026

Nebius and Eigen AI partner to accelerate frontier open-source AI inference

Elevating the craft: Introducing the Inference Frontier Program

What is AI Cloud? Key features, use cases & how to choose

NVIDIA Nemotron 3 Super now available on Nebius Token Factory

OpenClaw security: architecture and hardening guide

Meet SWE-rebench-V2: A multilingual, executable dataset for training Software Engineering Agents

Nebius and Toloka to introduce integration to bring human experts-on-demand to AI agents

Introducing Dedicated Endpoints and Custom Weights Hub in Nebius Token Factory

Scaling efficient production-grade inference with NVIDIA Run:ai on Nebius

Routing in LLM inference is the difference between scaling and stalling

The energy behind AI: Why power efficiency matters

FinOps efficiency for AI workloads with FOCUS-compliant billing data

Why large MoE models break latency budgets and what speculative decoding changes in production systems

OpenHands trajectories with Qwen3 Coder 480B

Products

Resources

Solutions

Prices

Security and compliance

Programs

Company

Legal