This repository provides a Colab notebook to produce images conditioned on text prompts with GLIDE [1].
- Run
text2im.ipynb
Tip: press <Ctrl+F9>
to run everything.
The process is based on the small, filtered-data GLIDE model, with classifier-free guidance.
Results consist of 64x64 images, and the corresponding 256x256 upsampled versions.
Expected run-time: 2m30s (for the one-time set-up), 1 min (64x64 sampling), 30 sec (256x256 upsampling).
Several uncurated samples obtained with the same prompt: "a magnificent French rooster singing".
The small model has 300 million parameters, compared to the unreleased 3.5 billion parameter model.
As described in Appendix F.1, the training dataset was filtered so that it would not contain:
- images of humans and human-like objects,
- images of violent objects,
- two prevalent hate symbols in America (swastika and confederate flag).
[1] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models. arXiv preprint 2112.10741. 2021.