This is a community effort to replicate the RB-Modulation project.
- Link to the RB-Modulation paper
- Link to the original project description
- Link to Würstchen repository
RB-Modulation is a plug-and-play solution for (a) stylization with various prompts, and (b) composition with reference content images while maintaining sample diversity and prompt alignment by Litu Rout. The official code is not yet released, and this repository aims to collaboratively reproduce the project based on the available resources.
RB-Modulation utilizes the pre-trained weights from the Würstchen model. You can download the models from the following links:
- Download: Hugging Face
- Parameters: 1B (Stage C) + 600M (Stage B) + 19M (Stage A)
- Conditioning: CLIP-H-Text
- Training Steps: 800,000
- Resolution: 512x512
- Download: Hugging Face
- Parameters: 1B (Stage C) + 600M (Stage B) + 19M (Stage A)
- Conditioning: CLIP-bigG-Text
- Training Steps: 918,000
- Resolution: 1024x1024
RB-Modulation uses CSD (Contrastive Style Descriptor) as a style feature extractor. CSD creates a high-performance model for representing style and outperforms other large-scale pre-trained models and prior style retrieval methods on standard datasets. Using CSD, we examine the extent of style replication in the popular open-source text-to-image generative model Stable Diffusion, and consider different factors that impact the rate of style replication.
We welcome contributions in the following areas:
- Code implementation
- Testing
- Documentation
- Code reviews
- Fork the repository
- Clone your forked repository
- Create a new branch for your feature or bugfix
- Commit your changes and push them to your fork
- Create a pull request