Dresty: AI-Powered Virtual Try-On System

Dresty is an advanced virtual try-on system that leverages Stable Diffusion 3 to generate realistic images of clothing items on model images. The system allows users to visualize how different garments would look when worn by a model, with high-quality results that maintain the original pose and appearance.

Features

Multi-category support: Try on upper-body garments (shirts, tops, sweaters), lower-body garments (skirts, pants), and dresses
High-quality generation: Powered by Stable Diffusion 3 for realistic and detailed results
Pose preservation: Maintains the original pose of the model through advanced pose guiding
Interactive masking: Adjust mask parameters to fine-tune the try-on area
Customizable generation: Control parameters like steps, guidance scale, and output resolution
User-friendly interface: Simple two-step process with a Gradio web UI

Technology Stack

Deep Learning Framework: PyTorch, Torchvision
Diffusion Models: Stable Diffusion 3 (customized for virtual try-on)
Computer Vision: DWpose for pose detection, Human Parsing for segmentation
Image Encoding: CLIP Vision models (large and bigG variants)
Web Interface: Gradio
Additional Libraries: OpenCV, scikit-image, NumPy, Einops

Installation

Prerequisites

CUDA-compatible GPU (recommended)
Python 3.10
Git LFS (for downloading models)

Automatic Setup (Linux)

Run the setup script to create a conda environment and download required models:

bash setup.sh

Manual Setup

Create and activate a conda environment:

conda create -y -n viton python=3.10
conda activate viton

Install PyTorch with CUDA support:

conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Install dependencies:

pip install -r requirements.txt

Download models from Hugging Face:

git lfs install
git clone https://huggingface.co/BoyuanJiang/Dresty ~/viton_models

Usage

Running the Application

python app.py --model_path /path/to/models

Additional options:

--device cpu: Run on CPU (much slower)
--fp16: Use FP16 precision (faster but may be less accurate)
--offload: Offload models to CPU when not in use (saves GPU memory)
--aggressive_offload: More aggressively offload models (for low memory GPUs)

Using the Interface

The virtual try-on process consists of two main steps:

Step 1: Prepare the Mask

Select a model image from the examples or upload your own
Select a garment image from the examples or upload your own
Choose the appropriate garment category (Upper-body, Lower-body, or Dresses)
Adjust mask offsets if needed (top, bottom, left, right)
Click "Step 1: Run Mask"

Step 2: Generate Try-On Result

Adjust generation parameters if desired:
- Steps: Number of diffusion steps (higher = better quality but slower)
- Guidance scale: Controls adherence to the input garment (higher = more faithful to garment style)
- Seed: Random seed for reproducibility (-1 for random)
- Number of images: How many variations to generate
- Resolution: Output image resolution
Click "Step 2: Run Try-on"
View and save the generated results

System Architecture

Pipeline Overview

Input Processing:
- Model image and garment image are loaded and preprocessed
- Human parsing identifies body parts in the model image
- DWpose detects keypoints for pose information
Mask Generation:
- Based on garment category and body keypoints
- Creates a mask indicating where the new garment will be placed
- User can adjust mask boundaries with offset parameters
Pose Encoding:
- Extracts pose features from the model image
- Helps maintain the person's pose in the final output
Garment Processing:
- Garment image is encoded using CLIP vision models
- Creates embeddings that guide the generation process
Try-On Generation:
- Combines masked model image, garment image, pose features, and mask
- Runs the Stable Diffusion 3 pipeline with specialized transformers:
  - transformer_garm: Processes the garment image
  - transformer_vton: Handles the virtual try-on process
- Generates the final image with the garment realistically fitted on the model

Preprocessing Flow

The preprocessing components in the preprocess directory play a crucial role in the virtual try-on process:

Human Parsing (`preprocess/humanparsing`)

Usage in the Application:
- The human parsing functionality is used through the Parsing class defined in preprocess/humanparsing/run_parsing.py.
- This class is instantiated in src/generator.py and used in the generate_mask method to segment different parts of the human body in model images.
- The parsing results are then used by get_mask_location in src/utils_mask.py to create masks for different clothing categories (Upper-body, Lower-body, Dresses).
Implementation Details:
- The human parsing functionality uses pre-trained ONNX models (parsing_atr.onnx and parsing_lip.onnx) for inference.
- The actual inference is performed by the onnx_inference function in parsing_api.py, which processes images to segment different body parts.
- The segmentation results are used to create masks that define where garments should be placed on the model's body.

Pose Detection (`preprocess/dwpose`)

Usage in the Application:
- The pose detection functionality is used through the DWposeDetector class defined in preprocess/dwpose/__init__.py.
- This class is instantiated in src/generator.py and used in the generate_mask method to detect body keypoints.
- The detected keypoints are used alongside the human parsing results to create accurate masks for virtual try-on.
Implementation Details:
- The pose detection uses pre-trained ONNX models (yolox_l.onnx and dw-ll_ucoco_384.onnx).
- The Wholebody class in wholebody.py handles the actual pose detection, identifying keypoints for the body, face, and hands.
- These keypoints are then used in utils_mask.py to determine the boundaries of different body parts when creating masks.

Integration in the Main Application Flow

When a user uploads model and garment images in the application:
- The DrestyGenerator.generate_mask method uses both human parsing and pose detection to create a mask for the specified garment category.
- This mask defines where the garment should be placed on the model's body.
- The mask, along with the model and garment images, is then passed to the process method, which uses a Stable Diffusion 3 pipeline to generate the virtual try-on result.
The preprocessing components are critical for the application's functionality as they enable:
- Accurate segmentation of the model's body parts
- Precise detection of body keypoints
- Creation of masks that define garment placement
- Generation of realistic virtual try-on results

System vs. User Responsibilities

System Handles:

Human parsing and pose detection
Mask generation based on garment category
Image preprocessing (resizing, padding)
CLIP encoding of garment images
Pose feature extraction
Running the diffusion model
Post-processing of generated images

User Controls:

Selection of model image
Selection of garment image
Selection of garment category
Adjustment of mask offsets
Generation parameters (steps, guidance scale, seed, etc.)

Project Structure

Dresty/
├── examples/                  # Example images for models and garments
│   ├── garment/               # Example garment images
│   └── model/                 # Example model images
├── preprocess/                # Preprocessing modules
│   ├── dwpose/                # DWpose for pose detection
│   └── humanparsing/          # Human parsing for segmentation
├── resource/                  # Resource files
│   └── img/                   # Images for documentation
├── src/                       # Source code
│   ├── attention_garm.py      # Attention mechanism for garment processing
│   ├── attention_processor_garm.py  # Attention processor for garment
│   ├── attention_processor_vton.py  # Attention processor for try-on
│   ├── attention_vton.py      # Attention mechanism for try-on
│   ├── generator.py           # Main generator class
│   ├── image_utils.py         # Image processing utilities
│   ├── pipeline_stable_diffusion_3_tryon.py  # Main pipeline
│   ├── pose_guider.py         # Pose guiding module
│   ├── transformer_sd3_garm.py  # Transformer for garment processing
│   ├── transformer_sd3_vton.py  # Transformer for try-on
│   ├── ui.py                  # Gradio UI components
│   └── utils_mask.py          # Mask utilities
├── app.py              # Main entry point
├── requirements.txt           # Python dependencies
├── setup.sh                   # Setup script
└── README.md                  # This file

Limitations

Best results are achieved with front-facing model images
Performance depends on GPU capabilities
Some garment types may produce better results than others
Mask adjustments may be needed for optimal results with certain body types

Acknowledgments

This project builds upon several open-source projects and research:

Stable Diffusion 3 by Stability AI
DWpose for human pose estimation
Human parsing for semantic segmentation
CLIP by OpenAI for vision encoding
Gradio for the web interface

License

Apache License 2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dresty: AI-Powered Virtual Try-On System

Features

Technology Stack

Installation

Prerequisites

Automatic Setup (Linux)

Manual Setup

Usage

Running the Application

Using the Interface

Step 1: Prepare the Mask

Step 2: Generate Try-On Result

System Architecture

Pipeline Overview

Preprocessing Flow

Human Parsing (`preprocess/humanparsing`)

Pose Detection (`preprocess/dwpose`)

Integration in the Main Application Flow

System vs. User Responsibilities

System Handles:

User Controls:

Project Structure

Limitations

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
examples		examples
preprocess		preprocess
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
setup.sh		setup.sh

aristideubertas/viton

Folders and files

Latest commit

History

Repository files navigation

Dresty: AI-Powered Virtual Try-On System

Features

Technology Stack

Installation

Prerequisites

Automatic Setup (Linux)

Manual Setup

Usage

Running the Application

Using the Interface

Step 1: Prepare the Mask

Step 2: Generate Try-On Result

System Architecture

Pipeline Overview

Preprocessing Flow

Human Parsing (preprocess/humanparsing)

Pose Detection (preprocess/dwpose)

Integration in the Main Application Flow

System vs. User Responsibilities

System Handles:

User Controls:

Project Structure

Limitations

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Human Parsing (`preprocess/humanparsing`)

Pose Detection (`preprocess/dwpose`)

Packages