Skip to content

Dartayous/synthetic-data-generation-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synthetic Data Generation Pipeline (NVIDIA Omniverse)

Overview

Built a synthetic data generation pipeline in NVIDIA Omniverse for computer vision using a custom-authored data center digital twin, semantic labeling, multi-camera capture, and controlled camera perturbation to generate annotated RGB, segmentation, and bounding-box datasets.


Preview

Preview


Digital Twin Environment

A modular data center + office environment was built using OpenUSD principles, including:

  • Server racks and infrastructure zones
  • Office workstations and monitors
  • Support equipment (carts, cabinets)

Environment


Multi-Camera Capture System

Six strategically placed cameras capture distinct semantic zones:

  • Server aisles
  • Office workspace
  • Support areas

Each camera generates independent datasets with controlled jitter to simulate real-world variation.

Camera Placement


Semantic Labeling

Objects were labeled using Omniverse Replicator:

  • Servers
  • Desks
  • Monitors
  • Chairs
  • Equipment

This enables automated generation of training-ready annotations.


Dataset Output

Each frame produces:

  • RGB images
  • Semantic segmentation
  • 2D bounding boxes

Example Outputs

Server Dataset View Server Row

Office Dataset View Office

Semantic Segmentation Segmentation


Pipeline Capabilities

  • Multi-camera dataset generation
  • Scoped camera jitter (controlled perturbation)
  • Lighting variation support
  • Structured dataset output for ML pipelines

Engineering Challenges Solved

  • Replicator instability → resolved via staged execution strategy
  • Camera targeting issues → solved with authored camera system
  • Scene coverage gaps → resolved via multi-camera architecture
  • Semantic labeling pipeline → implemented programmatically

Tech Stack

  • NVIDIA Omniverse USD Composer
  • Omniverse Replicator
  • OpenUSD (USD / USDA)
  • Python

Future Work

  • Domain randomization (materials, lighting, layout)
  • Large-scale dataset generation (1k–10k frames)
  • Integration with training pipelines (PyTorch / TensorFlow)

Key Insight

This project demonstrates how digital twins can replace real-world data collection by generating scalable, labeled datasets for computer vision.


About

Synthetic data pipeline using OpenUSD and Omniverse Replicator to generate labeled datasets for computer vision workflows

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors