Iframe cover image
Experiment tracker for foundation models

Used by OpenAI to monitor & debug GPT-scale training

Monitor thousands of per-layer metrics—losses, gradients, and activations—at any scale. Visualize them with no lag and no missed spikes. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.
icon The problem

Logging all metrics, including layer-level ones is essential. Does it have to be so hard to analyze them?

You want to log thousands of metrics per run to have full visibility into your training process…
But then you struggle to browse and visualize them fast (or at all)…
And you can’t see half of the spikes because the data is downsampled for the sake of loading speed?
icon The solution

Neptune lets you monitor and debug model internals. Without tradeoffs.

Tracking & visualization at scale

Log metrics across all layers. Browse and visualize them in seconds.

Whether your model has 5B or 150T parameters, you generate tens of thousands of per-layer metrics—losses, gradients, and activations. With Neptune, you can not only record but also search through and visualize all of them. No slowdowns, 100% accurate chart rendering.

See the docs
Deep debugging of model internals

Spot hidden issues before they derail training

Issues in a few layers may not show in aggregate metrics but can still destabilize training. With Neptune, you can monitor across layers to isolate and address these issues quickly. Detect vanishing or exploding gradients, batch divergence, and loss convergence failures to ensure a stable training process. 

See the docs
Forking of runs

Gain better visibility into training with many restarts and branches

  • Test many configs at the same time. Stop the runs that don’t improve accuracy. And branch from the best last step. Don’t waste GPUs on training runs that won’t converge.
  • See lineage for forked experiments. The training history is inherited, so you can see your entire experiment on a single chart. Don’t waste time on manual plotting.
See the docs

Deploy Neptune on your infra—on-premises or in a private cloud

Sandbox

Explore an example project: for each run we tracked over 50k metrics, including losses, gradient norms, and activations across layers

How to get started?

1

Connect to Neptune and create a run

# neptune-scale for logging metadata

pip install neptune-scale

from neptune_scale import Run

run = Run(
    run_id=...
    experiment_name=...
)
Use with your usual stack
2

Log hyperparameters & training process

run.log_configs(
    {
        "params/lr": 0.001,
        "params/optimizer": "Adam",
    }
)

for step in epoch:
    run.log_metrics(
        data={
            "train/accuracy": 0.87,
            "train/loss": 0.14,
        }
        step=step,
    )
3

Query your logs for deeper analysis

# neptune-query for querying logs

pip install neptune-query

import neptune_query as nq

# List experiments
nq.list_experiments(r"exp_.*")

# Fetch metadata as table
nq.fetch_experiments_table(
    experiments=r"exp.*",
    attributes=r".*metric.*/val_.+",
)
decor
Our customers

Loved by 60000+ researchers. Trusted by enterprises.

Oliver Lammas
Oliver Lammas Founding Engineer at Navier AI
A tracker is deeply integrated into all of our everyday workflows. It’s being used almost every hour of the day. Neptune hasn’t been down on any of those checks. We’re not fighting it. It works great, it’s fast, it’s reliable, and it is designed for foundation model training. We’re very happy we made the switch.
Vadim Markovtsev
Vadim Markovtsev Founding Engineer at poolside
I really appreciate that I’ve never seen any outage in Neptune. And since we’re training an LLM, that it’s super critical to not have any outages in our loss curve. Other than that, there are things you often take for granted in a product: reliability, flexibility, quality of support. Neptune nails those and gives us the confidence.

State of Foundation Model Training Report 2025

If you lead AI research, infrastructure, or product around foundation model training, this is your go-to reference for 2025.

State of Foundation Model Training report
Frequently asked questions

Yes, you can deploy Neptune on your infra and other answers

  • Yes! Neptune can be deployed on your on-prem infrastructure or private cloud.

    It’s a set of microservices distributed as a Helm chart for Kubernetes deployment. If you need help, our deployment engineers are here to assist every step of the way.

    If you’re interested in self-hosted Neptune, contact us.

  • Short version. Teams switch to Neptune when they need:

    • A snappier, scalable UI – Instantly render massive tables and charts, and search through your logs even with thousands of tracked metrics.
    • Pricing that’s more aligned with their needs – Our pricing model doesn’t limit tracked hours.
    • A dedicated experiment tracker – Neptune focuses on experiment tracking, not an end-to-end platform.

    For the long version, read this full feature-by-feature comparison.

  • Switching to Neptune is straightforward. Our client libraries are similar enough that you can migrate without breaking your workflow. Plus, you’ll get all the core experiment tracking and monitoring features you’re used to, with a UI designed for scale.

    Here’s how it works:

    • Migrate your historical data with our ready-to-go migration script. Check the script.
    • Update your code (most changes take just a few lines).
    • Enjoy a smooth transition, our team is on hand to resolve any migration issues within 24 hours.
  • Yes. Neptune makes it simple and fast to query, filter, and extract experiment data at scale.

    Using the neptune-query API, you can pull metrics, losses, validation results, and other metadata from thousands or even millions of data points with minimal latency. The data can be fetched directly into tables, data frames, or series, so you can run statistical analyses, compare experiments, or perform large-scale meta-analyses with ease.

    This is your data after all, so you should always have fast, direct access to it.

Foundation models require a tracker ready for their scale and challenges

Interested to know how Neptune can help you with that?