Used by OpenAI to monitor & debug GPT-scale training
Logging all metrics, including layer-level ones is essential. Does it have to be so hard to analyze them?
Neptune lets you monitor and debug model internals. Without tradeoffs.
Log metrics across all layers. Browse and visualize them in seconds.
Whether your model has 5B or 150T parameters, you generate tens of thousands of per-layer metrics—losses, gradients, and activations. With Neptune, you can not only record but also search through and visualize all of them. No slowdowns, 100% accurate chart rendering.
Spot hidden issues before they derail training
Issues in a few layers may not show in aggregate metrics but can still destabilize training. With Neptune, you can monitor across layers to isolate and address these issues quickly. Detect vanishing or exploding gradients, batch divergence, and loss convergence failures to ensure a stable training process.
Gain better visibility into training with many restarts and branches
- Test many configs at the same time. Stop the runs that don’t improve accuracy. And branch from the best last step. Don’t waste GPUs on training runs that won’t converge.
- See lineage for forked experiments. The training history is inherited, so you can see your entire experiment on a single chart. Don’t waste time on manual plotting.
Deploy Neptune on your infra—on-premises or in a private cloud
Explore an example project: for each run we tracked over 50k metrics, including losses, gradient norms, and activations across layers
How to get started?
Connect to Neptune and create a run
# neptune-scale for logging metadata
pip install neptune-scale
from neptune_scale import Run
run = Run(
run_id=...
experiment_name=...
)
Log hyperparameters & training process
run.log_configs(
{
"params/lr": 0.001,
"params/optimizer": "Adam",
}
)
for step in epoch:
run.log_metrics(
data={
"train/accuracy": 0.87,
"train/loss": 0.14,
}
step=step,
)
Query your logs for deeper analysis
# neptune-query for querying logs
pip install neptune-query
import neptune_query as nq
# List experiments
nq.list_experiments(r"exp_.*")
# Fetch metadata as table
nq.fetch_experiments_table(
experiments=r"exp.*",
attributes=r".*metric.*/val_.+",
)
Loved by 60000+ researchers. Trusted by enterprises.
State of Foundation Model Training Report 2025
If you lead AI research, infrastructure, or product around foundation model training, this is your go-to reference for 2025.
Yes, you can deploy Neptune on your infra and other answers
-
Yes! Neptune can be deployed on your on-prem infrastructure or private cloud.
It’s a set of microservices distributed as a Helm chart for Kubernetes deployment. If you need help, our deployment engineers are here to assist every step of the way.
If you’re interested in self-hosted Neptune, contact us.
-
Short version. Teams switch to Neptune when they need:
- A snappier, scalable UI – Instantly render massive tables and charts, and search through your logs even with thousands of tracked metrics.
- Pricing that’s more aligned with their needs – Our pricing model doesn’t limit tracked hours.
- A dedicated experiment tracker – Neptune focuses on experiment tracking, not an end-to-end platform.
For the long version, read this full feature-by-feature comparison.
-
Switching to Neptune is straightforward. Our client libraries are similar enough that you can migrate without breaking your workflow. Plus, you’ll get all the core experiment tracking and monitoring features you’re used to, with a UI designed for scale.
Here’s how it works:
- Migrate your historical data with our ready-to-go migration script. Check the script.
- Update your code (most changes take just a few lines).
- Enjoy a smooth transition, our team is on hand to resolve any migration issues within 24 hours.
-
Yes. Neptune makes it simple and fast to query, filter, and extract experiment data at scale.
Using the neptune-query API, you can pull metrics, losses, validation results, and other metadata from thousands or even millions of data points with minimal latency. The data can be fetched directly into tables, data frames, or series, so you can run statistical analyses, compare experiments, or perform large-scale meta-analyses with ease.
This is your data after all, so you should always have fast, direct access to it.
Foundation models require a tracker ready for their scale and challenges
Interested to know how Neptune can help you with that?