Affordance based Style Transfer

ControlNet is a neural network structure to control diffusion models by adding extra conditions.

It copys the weights of neural network blocks into a "locked" copy and a "trainable" copy.

The "trainable" one learns your condition. The "locked" one preserves your model.

Thanks to this, training with small dataset of image pairs will not destroy the production-ready diffusion models.

The "zero convolution" is 1×1 convolution with both weight and bias initialized as zeros.

Before training, all zero convolutions output zeros, and ControlNet will not cause any distortion.

No layer is trained from scratch. You are still fine-tuning. Your original model is safe.

This allows training on small-scale or even personal devices.

This is also friendly to merge/replacement/offsetting of models/weights/blocks/layers.

FAQ

Q: But wait, if the weight of a conv layer is zero, the gradient will also be zero, and the network will not learn anything. Why "zero convolution" works?

A: This is not true. See an explanation here.

Stable Diffusion + ControlNet

By repeating the above simple structure 14 times, we can control stable diffusion in this way:

In this way, the ControlNet can reuse the SD encoder as a deep, strong, robust, and powerful backbone to learn diverse controls. Many evidences (like this and this) validate that the SD encoder is an excellent backbone.

Note that the way we connect layers is computational efficient. The original SD encoder does not need to store gradients (the locked original SD Encoder Block 1234 and Middle). The required GPU memory is not much larger than original SD, although many layers are added. Great!

Production-Ready Pretrained Models

First create a new conda environment

conda env create -f environment.yaml
conda activate control

All models and detectors can be downloaded from our Hugging Face page. Make sure that SD models are put in "ControlNet/models" and detectors are put in "ControlNet/annotator/ckpts". Make sure that you download all necessary pretrained weights and detector models from that Hugging Face page, including HED edge detection model, Midas depth estimation model, Openpose, and so on.

We provide 9 Gradio apps with these models.

All test images can be found at the folder "test_imgs".

Train with Your Own Data

Training a ControlNet is as easy as (or even easier than) training a simple pix2pix.

See the steps here.

Related Resources

Special Thank to the great project - Mikubill' A1111 Webui Plugin !

We also thank Hysts for making Hugging Face Space as well as more than 65 models in that amazing Colab list!

Thank haofanwang for making ControlNet-for-Diffusers!

We also thank all authors for making Controlnet DEMOs, including but not limited to fffiloni, other-model, ThereforeGames, RamAnanth1, etc!

Besides, you may also want to read these amazing related works:

Composer: Creative and Controllable Image Synthesis with Composable Conditions: A much bigger model to control diffusion!

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models: A much smaller model to control stable diffusion!

ControlLoRA: A Light Neural Network To Control Stable Diffusion Spatial Information: Implement Controlnet using LORA!

And these amazing recent projects: InstructPix2Pix Learning to Follow Image Editing Instructions, Pix2pix-zero: Zero-shot Image-to-Image Translation, Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation, MaskSketch: Unpaired Structure-guided Masked Image Generation, SEGA: Instructing Diffusion using Semantic Dimensions, Universal Guidance for Diffusion Models, Region-Aware Diffusion for Zero-shot Text-driven Image Editing, Domain Expansion of Image Generators, Image Mixer, MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Citation

@misc{zhang2023adding,
  title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
  author={Lvmin Zhang and Maneesh Agrawala},
  year={2023},
  eprint={2302.05543},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Arxiv Link

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
aff_setup		aff_setup
annotator		annotator
cldm		cldm
docs		docs
font		font
github_page		github_page
ldm		ldm
models		models
test_imgs		test_imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
environment.yaml		environment.yaml
generate_depth_maps.py		generate_depth_maps.py
gradio_annotator.py		gradio_annotator.py
gradio_depth2image.py		gradio_depth2image.py
gradio_seg2image.py		gradio_seg2image.py
join_results.py		join_results.py
load_affordance_dataset.py		load_affordance_dataset.py
prepare_affordance_dataset.py		prepare_affordance_dataset.py
share.py		share.py
tool_add_control.py		tool_add_control.py
tool_add_control_sd21.py		tool_add_control_sd21.py
tool_transfer_control.py		tool_transfer_control.py
train_affordance_with_depth.py		train_affordance_with_depth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Affordance based Style Transfer

FAQ

Stable Diffusion + ControlNet

Production-Ready Pretrained Models

Train with Your Own Data

Related Resources

Citation

About

Uh oh!

Releases

Packages

Languages

License

anishrams/control-net-affordance

Folders and files

Latest commit

History

Repository files navigation

Affordance based Style Transfer

FAQ

Stable Diffusion + ControlNet

Production-Ready Pretrained Models

Train with Your Own Data

Related Resources

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages