Img 4

This study explores the use of Hypernetworks and Low-Rank Adaptation (LoRA) to enhance text-to-image generation through diffusion models. It highlights how these techniques can optimize the image generation process by reducing computational complexity and improving model adaptability without requiring extensive retraining. The research evaluates their effectiveness using the Stable Diffusion 1.5 model, demonstrating significant improvements in generating high-quality images from textual descriptions.

Uploaded by

Mohammed Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views5 pages

Img 4

Uploaded by

Mohammed Faisal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

A study on the application of using Hypernetwork

and Low Rank Adaptation for text-to-image

generation based on diffusion models
Artyom O. Levin1, Yuri S. Belov2
Computer Science, Information Technology Department
Bauman Moscow State Technical University, Kaluga branch
Kaluga, Russian Federation
2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE) | 979-8-3503-8289-1/24/$31.00 ©2024 IEEE | DOI: 10.1109/REEPE60449.2024.10479561

1levinao@student.bmstu.ru, 2ysbelov@bmstu.ru

Abstract—Recent advances in the field of image generation Diffusion models have emerged as powerful tools in the
have attracted attention due to the growing number of diverse realm of image generation, leveraging latent diffusion
data sources and test samples. A primary driver of this evolution techniques to transform textual descriptions into visually
is the application of neural networks, particularly for generating coherent images. Rauniyar et al. presented a Text to Image
high-quality images from textual prompts. Despite the potential Generator with Latent Diffusion Models [7]. Their model
of diffusion models in this sector, they typically face utilizes diffusion processes to generate images from textual
computational challenges associated with vast datasets. This inputs, showing a novel approach to bridge the gap between
paper describes the research on two existing solutions: text and visual representation. Building upon this, Kim and
Hypernetworks and Low-Rank Adaptation (LoRA), both
Kang proposed an enhancement to denoising models by
aiming to streamline and optimize the image generation process.
While hypernetworks dynamically adjust model parameters
introducing a Timestep-Aware Predictor for Latent Diffusion-
based on the input text, increasing flexibility and performance, Based Image Generation [8]. This innovation contributes to
LoRA efficiently adapts the primary model style without refining the performance of diffusion models, ensuring more
requiring it to be trained from scratch. Using the Stable accurate and high-quality image synthesis. In addition,
Diffusion 1.5 model as a benchmark, this research evaluates the Rombach et al. delved into High-resolution Image Synthesis
influence of hypernetwork and LoRA modifications. The results with Latent Diffusion Models [9]. Their work focuses on
indicate that both approaches provide efficient and highly achieving impressive results in generating high-resolution
accurate image generation, confirming their efficacy in images through the application of latent diffusion models.
contemporary image generation tasks. These advancements collectively underline the efficacy of
latent diffusion models in the domain of image generation.
Keywords—image generation, text-to-image generation,
diffusion models, hypernetwork, low-rank adaptation Among the cutting-edge technologies in this area,
diffusion models hold promise, however, they face a major
I. INTRODUCTION challenge when dealing with large datasets, specifically the
Recently, the field of image generation, a subfield of issue of computational complexity.
artificial intelligence and computer vision, has experienced To address this challenge, one potential solution is the
significant growth due to the availability of large number of utilization of Low Rank Adaptation (LoRA), a technique that
valuable data sources and test examples. This growth in effectively reduces the number of model parameters and
resources has not only assisted developers in garnering significantly shortens the training time. This approach has the
valuable insights from ordinary users to analyse and improve potential to enhance the efficiency of diffusion models in the
their systems and technologies, but has also significantly context of image generation.
expanded its accessibility to a wide audience through
convenience and accessibility. Low-rank adaptation plays a pivotal role in optimizing the
efficiency of image generation models. Lv et al. introduced
This development is primarily related to the rapid progress Dynamic Low-Rank Instance Adaptation [10], a method
of neural networks, which are extending their influence to designed to adapt the model to instances dynamically,
various research areas. One of the important applications of resulting in universal neural image compression. In a similar
neural networks is image generation, which consists of vein, Hu et al. proposed LoRA: Low-Rank adaptation of Large
creating high-quality images from textual descriptions, Language Models [11], emphasizing the need for efficient
utilizing neural networks and diffusion algorithms. adaptation of language models using low-rank techniques.
The sphere of image generation and diffusion models These approaches underscore the importance of low-rank
demonstrates considerable versatility, extending from the adaptation in streamlining large models, with potential
enhancement of low-light images, exemplified by the work of applications in the field of image generation.
Ooi and Chan [1], to pioneering applications in medical Another noteworthy approach that has gained popularity
imaging, as evidenced by the studies of Nguyen et al. [2] and in recent research is the application of hypernetworks.
Zhang [3]. Moreover, these models find application in urban Hypernetworks enable the generation of images based on
scene editing, as elucidated by Park and Kang [4]. However, a textual descriptions, providing the possibility of dynamically
notable ethical concern arises with the potential misuse of adapting the model parameters depending on the input text, as
diffusion models in the creation of deepfake content, as it was mentioned by T. M. Dinh et al. in their paper [12].
exemplified by the investigation of Chen et al. [5]. In response
to such concerns, Bammey introduces "Synthbuster," a In addition, paper, made by Ruiz et al. introduced
significant advancement in the detection of diffusion model- HyperDreamBooth [13], a model using hypernetworks for fast
generated images [6]. personalization of text-to-image models. Hypernetworks
provide a dynamic framework for adapting models based on

979-8-3503-8289-1/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
textual descriptions, offering a swift and personalized to anticipate the original version of the input data, given the
generation process. These studies collectively illuminate the input data noise representation at each step in the sequence.
significance of hypernetworks in pushing the boundaries of
image synthesis with latent diffusion models, emphasizing Like other generative models, diffusion models have the
adaptability and personalization, increasing the overall ability to model conditional distributions [18]. This feat is
flexibility and performance of the image generation system, achieved through the application of conditional denoising
particularly when dealing with varied and complex textual autoencoders, responsible for the control of the synthesis
descriptions. process in alignment with the corresponding input data. This
input data can encompass a wide variety of forms, ranging
Building on the information gained from these studies, from text and semantic maps to diverse data types converted
current research focuses on the application of hypernetworks into images or text.
and low-rank adaptation for text-to-image generation using
diffusion models. In particular, we explore the integration of B. Hypernetworks Models
hypernetworks and low-rank adaptation models as Hypernetworks are specialized light tools used to fine-tune
complementary enhancements to a diffusion-based text-to- large models such as Stable Diffusion, allowing to adjust their
image generation model. style. They are typically small neural networks, often
resembling simple, linear networks with some additional
II. PROBLEM STATEMENT elements such as dropout and activation functions.
The aim of this study is to investigate the impact of two The key area of hypernetwork integration for in the Stable
different modifications in the image generation model from Diffusion model is the cross-attention module of the U-Net
textual descriptions, namely the hypernetwork model and the noise predictor. Here, hypernetworks play a crucial role by
low-rank adaptation model. To achieve this, it is necessary to introducing two networks to transform the key and query
explain the working principle of diffusion models and describe vectors [19]. This alteration modifies the original model
how these modifications will affect the original model, architecture.
demonstrating this through practical implementation during
by training the aforementioned models. During training, the Stable Diffusion model remains
unchanged, while the attached hypernetwork can adapt. Since
In general, diffusion models offer a way to create images hypernetworks are small and efficient, their training is fast and
based on textual descriptions through by gradually evolving does not require extensive computing resources. This makes it
noise in the pixel space [14]. This evolution of noise passes feasible to train them on standard computers.
through all spatial dimensions, ultimately forming an image
endowed with certain characteristics. However, it is important The main advantages of hypernetworks are their fast
to note that when working with extensive datasets, the training training process and the generation of small model files.
diffusion models can be a time-intensive and resource- Likewise, LoRA models fit diffusion models, but they do it in
demanding endeavor. a different way.
Therefore, in order to introduce or implement any changes C. Low-Rank Adaptation Models
in the behavior of an already trained model, two approaches Low-Rank Adaptation is a specialized training technique
were chosen: the use of a small Hypernetwork model and a designed for modifying diffusion models. It is crucial to
Low-Rank Adaptation (LoRA) model. Each of these methods emphasize that LoRA cannot function on its own; it requires a
does not require retraining the original model from scratch and base model checkpoint file. LoRA's primary role is to make
provides the opportunity to incorporate specific changes into subtle style adjustments to the base model [20].
the primary model [15]. This allows modifying the primary
model without affecting the its control point, significantly The focus of LoRA refinements is on the critical cross-
reducing the training time by working with only small attention layers of Diffusion models. Researchers have
Hypernetwork or LoRA models. established that fine-tuning of these specific layers is sufficient
to achieve training efficiency [10].
The primary model chosen for this research is the Stable
Diffusion 1.5 model, trained on the Laion Aesthetics v2.5+ Cross-attention layers are essentially matrices composed
dataset with various image sizes, but predominantly of weight values arranged in rows and columns [21]. LoRA
512 x 512 pixels. Each image in the dataset is associated with achieves fine-tuning by introducing its own weight parameters
a label reflecting its aesthetic evaluation. into these matrices.

The data is provided in JPEG format, and the research aims A notable aspect of LoRA operation involves splitting a
to demonstrate the influence of hypernetwork and low-rank matrix into two small low-rank matrices. This approach
adaptation models on the base model, highlighting their reduces the total number of stored numerical parameters in the
effectiveness in generating images from textual descriptions. model, enhancing its efficiency and resource intensity.

III. BACKGROUND IV. EXPERIMENT

The model training process involves the following
A. Diffusion Models
sequence of processes:
Diffusion models are probabilistic models used to explore
the distribution of a given dataset. These models aim to During the training process, images are encoded using an
remove noise from normally distributed variables, which autoencoder, which transforms the images into hidden
reflects the reverse training process of a fixed-length Markov representations. An autoencoder uses a relative downsampling
chain [16]. In the field of image synthesis, diffusion models factor equal to 8, mapping images of the shape Width x
use an extended variational lower bound to mirror Height x 3 to hidden images of the shape Width/f x Height/f x
discoloration outcomes. This complex process involves a 4.
sequence of denoising autoencoders, each equipped with Textual prompts are encoded by the ViT-L/14 text
identical weights [17]. Each autoencoder is carefully trained encoder.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
The unencoded output data from the text encoder is fed 2) Weight Initialization
into the U-Net model of hidden diffusion via cross-attention The normal layer weight initialization algorithm was used
mechanisms. to initialize the weights. This method involves random
In both cases, it was decided to conduct additional model initialization weights from a normal distribution with a zero
training based on the Stable Diffusion checkpoint 1.5-pruned- mean and a specified standard deviation [24].
emaonly.ckpt. For that task, a Fig. 1 dataset consisting of 3) Training Parameters
15 images was created. With all necessary conditions for the hypernetwork
training process in place, the following training parameters
were selected and configured:
 The maximum number of training steps was set to
10,000, with intermediate results including images and
checkpoints saved every 100 steps.
 The learning rate was made adaptive to prevent
overfitting on the selected dataset. For the first 800
steps the learning rate was set to 2e-5, for the next
1,600 steps the learning rate was 8e-6, and the
remaining 7,600 steps a learning rate of 5e-6 was used,
as experiments demonstrated that this approach was
the most efficientfor training.
Fig. 1. Dataset for the training proccess.  The batch size was set to 1 and the gradient
accumulation steps were also set to 1 step.
Prior to training, these images were previously
preprocessed using the built-in Stable Diffusion algorithm, B. Low-Rank Adaptation Model
resulting in 15 processed images with dimensions of In order to train low-rank adaptation model, it was
512 x 512 pixels. necessary to create a dataset in the following format Fig. 2:
A. Hypernetwork Model
1) Hypernetwork Formation
After creating the dataset, a hypernetwork based on Stable
Diffusion 1.5 was formed, which has a compact but optimized
architecture for training on a small dataset. Specifically, it
consists of two fully connected layers with an intermediate
size of 2. No drop-down layers were utilized as they were
unnecessary in this context. A linear activation function was
selected because hypernetworks generate parameters or
weights for other models. They act as meta-networks that
control or customize the primary model. The primary goal of
hypernetworks is to generate parameters that perform
optimally for a task while remaining independent of specific
data or input examples.
Fig. 2. Example of the dataset format.
Using a linear activation function, such as an identity
function, ensures a linear relationship between the input and
output of the hypernetwork. This is crucial because linear 1) Dataset Formation
functions possess the following properties: In order to train a small model using low-rank adaptation
(LoRA), a dataset was created. This dataset consists of image-
 Linearity: Using a linear activation function allows the text description pairs. Given that LoRA does not require a
hypernetwork to linearly interact with the primary large dataset (typically 12 to 24 images are sufficient), the
model parameters. This can be valuable as linear same dataset was used to train the hypernetwork, but with
combinations of parameters can represent specific additional text-description files created for each image.
structures or relationships within the model [22].
2) Model Configuration for LoRA
 Unconstrained Representation: Linear activation Similar to the previously trained hypernetwork, the LoRA
functions do not impose restrictions on the range of model was initialized based on the "stable diffusion 1.5-
output values. This allows hypernetworks to freely pruned-emaonly.ckpt." checkpoint. Several crucial parameters
generate parameters that can take positive and negative related to low-rank adaptation were considered:
values without scaling limitations.
 network_dim was set to 32, denoting the
 Simplicity: Linear activation functions are dimensionality of the neural network and specifying
straightforward and computationally efficient. They do the number of layers and neurons in each layer.
not introduce nonlinearity to the hypernetwork, which
can be useful if the primary model already contains a  network_alpha was set to 1, determining the magnitude
sufficient number of nonlinear layers or activation of weight coefficient changes during training. Higher
functions [23]. values of "alpha" can lead to faster convergence but
may increase the risk of overflow or divergence,
whereas lower values of "alpha" can extend training
time but provide more stable results [25].

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
3) Optimizer
The AdamW8bit optimizer was used. This optimizer is a
variation of the AdamW optimization algorithm and is
designed to train models with low-precision numbers, e.g. 8-
bit numbers [26]. This allows to reduce the requirements on
computational resources and speed up training.
4) Inner U-Net and Text Encoder Network Parameters
To ensure the correct operation of the obtained LoRA
model, additional training of the inner U-Net and text
encoding network was conducted [27]. This significantly
enhances the accuracy of the LoRA model. The following
parameters were used:
 unet_learningRate set to 1e-4.
 text_encoder_learningRate set to 5e-5.
Fig. 4. Relation between losses and epochs for the hypernetwork trainning
 The scheduler algorithm is set to "constant" because proccess.
the main purpose of this model is to fine-tune weights
into the main model. B. Low-Rank Adaptation Model
Based on the concept of low-rank adaptation, after such a
5) Training Parameters model is trained, it can be freely applied in the image
Considering all the above settings, it was decided to train generation process using it as a tag in angle brackets, e.g.
the LoRA model for 20 epochs, each consisting of 50 steps. <LoRA:orangeCat:1.0>. As a result, any model initially
The clip_skip parameter was set to 2, indicating that the model understands that it needs to generate an image as close as
performs two diffusion steps between input and the generated possible to what was in the original LoRA dataset. In this case,
image. This helps control the level of detail and diversity in it is an orange cat Fig. 5.
the generated images [28].
The batch_size was set to 6, which means that at each
training iteration, the model processes six images or datasets
simultaneously.
V. RESULTS
Once we had two small models, we compared them and
evaluated the training results.
A. Hypernetwork Model
Since the hypernetwork training algorithm is based on the
attempt to recover original images from the generated noise, Fig. 5. A comparison of the image from the original dataset and the
the images obtained during the training process should generated image at the 17th training epoch for the low-rank adaptation
model.
resemble to some extent the original images in the training
dataset. As a result, the following outcomes were achieved Summarizing the results of Fig. 6, the implemented LoRA
Fig. 3: model was successfully trained and is capable of producing
good, and most importantly, similar images, capturing
distinctive features in the resulting images. Additionally, the
relationship between losses and epochs was identified, with
the minimum losses being recorded at the 17th epoch and
amounting to 0.13284, which is a good result for relatively fast
training.

Fig. 3. A comparison of the image from the original dataset and the
generated image at the 694th epoch of hypernetwork training.

As the comparative analysis shows, the hypernetwork is

effectively trained and excels at generating high-quality
images that closely resemble the original dataset. Notably, it
accurately captures the distinctive features of the resulting
images. Furthermore, the analysis revealed a significant
relationship between losses and epochs of Fig. 4, with the
lowest losses recorded at the 166th epoch and amounting to
0.0043615.
Fig. 6. Relation between losses and epochs for the Low-Rank Adaptation
model trainning proccess.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION [10] Y. Lv, J. Xiang, J. Zhang, W. Yang, X. Han, and W. Yang, “Dynamic
Low-Rank Instance Adaptation for Universal Neural Image
In this paper, we researched two ways to improve image Compression” in Proceedings of the 31st ACM International
generation from textual descriptions: the hypernetwork model Conference on Multimedia (MM '23). Association for Computing
and the low-rank adaptation (LoRA) model. Both methods Machinery, New York, NY, USA, 2023, pp. 632–642.
https://doi.org/10.1145/3581783.3612187.
offer significant benefits when working together with image
[11] E. J. Hu et al., “LoRA: Low-Rank adaptation of Large Language
generation models. The hypernetwork model keeps the core Models,” 2021, arXiv:2106.09685.
model stable during training, letting only the attached
[12] T. M. Dinh, A. T. Tran, R. Nguyen, and B.-S. Hua, “HyperInverter:
hypernetwork adapt. These hypernetworks are small and Improving stylegan inversion via Hypernetwork,” in 2022 IEEE/CVF
efficient, allowing for quick training without the need for high- Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 11379−11388,
perfomance computers. Additionally, they create smaller doi: 10.1109/cvpr52688.2022.01110.
model files, making them practical and efficient.Similarly, the [13] N. Ruiz, Y. Li, V. Jampani, W. Wei, T. Hou, Y. Pritch, N. Wadhwa,
LoRA models enhance models, but they do it in a different M. Rubinstein, K. Aberman, “HyperDreamBooth: HyperNetworks for
way. They split matrices into smaller, low-rank matrices, Fast Personalization of Text-to-Image Models,” 2023,
arXiv:2307.06949.
reducing the overall number of stored parameters in the model.
[14] S. Gu et al., “Vector quantized diffusion model for text-to-image
This optimization improves model efficiency and resource synthesis,” in 2022 IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
utilization. The results of these modifications are impressive. (CVPR), 2022, pp. 10686–10696, doi: 10.1109/cvpr52688.2022.01043.
The hypernetwork model successfully generates images [15] S. Frolov, T. Hinz, F. Raue, J. Hees, and A. Dengel, “Adversarial Text-
similar to the original dataset while preserving its essential to-Image Synthesis: A Review,” 2021, arXiv:2101.09983.
features. Likewise, the LoRA model demonstrates its ability [16] C. Zhang and Y. Peng, “Stacking Vae and gan for context-aware text-
to produce comparable images while preserving their distinct to-image generation,” in 2018 IEEE Fourth Int. Conf. Multimedia Big
characteristics.In summary, the usage of hypernetworks and Data (BigMM), 2018, pp. 1–5, doi: 10.1109/bigmm.2018.8499439.
LoRA models in image generation models significantly [17] E. Jeon, K. Kim, and D. Kim, “Fa-gan: Feature-aware gan for text to
improves the process of generating images from textual image synthesis,” in 2021 IEEE Int. Conf. Image Process. (ICIP), 2021,
pp. 2443–2447, doi: 10.1109/icip42928.2021.9506172.
descriptions. These modifications speed up training, reduce
[18] R. Yanagi, R. Togo, T. Ogawa, and M. Haseyama, “Scene retrieval
resource requirements, and result in high quality images while using text-to-image gan-based visual similarities and image-to-text
preserving important features. model-based textual similarities,” in 2019 IEEE 8th Global Conf.
Consum. Electron. (GCCE), 2019, pp. 13–14, doi:
REFERENCES 10.1109/gcce46687.2019.9015366.
[1] X. P. Ooi and C. Seng Chan, “LLDE: Enhancing Low-Light Images [19] W. Liao, K. Hu, M. Y. Yang, and B. Rosenhahn, “Text to image
with Diffusion Model,” in 2023 IEEE International Conference on generation with semantic-spatial aware gan,” in 2022 IEEE/CVF Conf.
Image Processing (ICIP), Kuala Lumpur, Malaysia, 2023, pp. 1305- Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 18166–18175, doi:
1309, doi: 10.1109/ICIP49359.2023.10222446. 10.1109/cvpr52688.2022.01765.
[2] L. X. Nguyen, P. Sone Aung, H. Q. Le, S. -B. Park and C. S. Hong, “A [20] A. Levin and Y. Belov, “Application of a Low-rank adaptation model
New Chapter for Medical Image Generation: The Stable Diffusion for text to image generation using diffusion models,” E-Scio, vol. 6,
Method,” in 2023 International Conference on Information Networking no. 81, pp. 352–360, 2023.
(ICOIN), Bangkok, Thailand, 2023, pp. 483-486, doi: [21] H. Zhang et al., “Stackgan: Text to photo-realistic image synthesis with
10.1109/ICOIN56518.2023.10049010. stacked generative adversarial networks,” in 2017 IEEE Int. Conf.
[3] S. Zhang, “Dreambooth-based Image Generation Methods for Comput. Vis. (ICCV), 2017, pp. 5908–5916, doi:
Improving the Performance of CNN,” in 2023 IEEE 3rd International 10.1109/iccv.2017.629.
Conference on Electronic Technology, Communication and [22] H. Dong, J. Zhang, D. McIlwraith, and Y. Guo, “I2T2I: Learning text
Information (ICETCI), Changchun, China, 2023, pp. 1181-1184, doi: to image synthesis with textual data augmentation,” in 2017 IEEE Int.
10.1109/ICETCI57876.2023.10176568. Conf. Image Process. (ICIP), 2017, pp. 2015–2019, doi:
[4] M. Park and D. -o. Kang, “Urban Scene Editing with Diffusion Model 10.1109/icip.2017.8296635.
using Segmentation Mask,” in 23rd International Conference on [23] H. Zhang et al., “Stackgan++: Realistic image synthesis with stacked
Control, Automation and Systems (ICCAS), Yeosu, Korea, Republic of, generative adversarial networks,” IEEE Trans. Pattern Anal. Mach.
2023, pp. 1881-1884, doi: 10.23919/ICCAS59377.2023.10316952. Intell., vol. 41, no. 8, pp. 1947–1962, 2019, doi:
[5] Y. Chen, N. A. H. Haldar, N. Akhtar and A. Mian, “Text-image guided 10.1109/tpami.2018.2856256.
Diffusion Model for generating Deepfake celebrity interactions,” in [24] T. D. Sokolov, N. A. Askerova, and A. A. Askerova, “Modeling
2023 International Conference on Digital Image Computing: pseudo-random sequences,” Politechnical Student Journal, vol. 2,
Techniques and Applications (DICTA), Port Macquarie, Australia, no. 67, p. 771, 2022, doi: 10.18698/2541-8009-2022-2-771.
2023, pp. 348-355, doi: 10.1109/DICTA60407.2023.00055. [25] R. A. Serbiev and D. G. Berezan, “Evaluation of the quality of object
[6] Q. Bammey, "Synthbuster: Towards Detection of Diffusion Model recognition on thermal imaging images using neural networks,”
Generated Images," in IEEE Open Journal of Signal Processing, vol. Politechnical Student Journal, vol. 4, no. 81, 2023, doi:
5, pp. 1-9, 2024, doi: 10.1109/OJSP.2023.3337714 10.18698/2541-8009-2023-4-881.
[7] A. Rauniyar, A. Raj, A. Kumar, A. K. Kandu, A. Singh and A. Gupta, [26] I. Rudakov, M.Filippov, and M. Kudryavtsev, “Image generation
“Text to Image Generator with Latent Diffusion Models,” in 2023 method using neural networks based on recoverable byte sequence,”
International Conference on Computational Intelligence, Bulletin of the Moscow State Technical University of Civil Aviation.
Communication Technology and Networking (CICTN), Ghaziabad, Instrument Engineering Series, vol. 1, no. 142, pp. 83–97, 2023.
India, 2023, pp. 144-148, doi: 10.1109/CICTN57981.2023.10140348. [27] K. Gavrilov and Y. Lavrenkov, “Investigation of the application of
[8] J. -U. Kim and D. -J. Kang, “Enhancing Denoising Models convolutional neural networks for image processing and object
Performance Through Timestep-Aware Predictor for Latent Diffusion- recognition,” Electronic Journal: Science, Technology, and Education,
Based Image Generation,” in 23rd International Conference on vol. 2, no. 33, pp. 25–30, 2021.
Control, Automation and Systems (ICCAS), Yeosu, Korea, Republic of, [28] D. Petrin and Y. Belov, “Improving the quality of machine learning
2023, pp. 1937-1940, doi: 10.23919/ICCAS59377.2023.10316977. models in image classification tasks using feature extraction and fine-
[9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- tuning approaches,” Electronic Journal: Science, Technology, and
resolution image synthesis with Latent Diffusion Models,” in Proc. Education, vol. 1, no. 28, pp. 104–111, 2020.
2022 IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022,
pp. 10674−10685, doi: 10.1109/cvpr52688.2022.01042.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.

A Comprehensive Survey of Image Generation Models Based On Deep Learning
No ratings yet
A Comprehensive Survey of Image Generation Models Based On Deep Learning
30 pages
Exploring The Various Machine Learning Models For Image Generation - A Comprehensive Survey Unlocking The Future of Digital Creativity
No ratings yet
Exploring The Various Machine Learning Models For Image Generation - A Comprehensive Survey Unlocking The Future of Digital Creativity
15 pages
Diffusion Models: Challenges & Fixes
No ratings yet
Diffusion Models: Challenges & Fixes
11 pages
Advancing AI-Powered Medical Image Synthesis - Insights From MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
No ratings yet
Advancing AI-Powered Medical Image Synthesis - Insights From MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
19 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Background and Literature Review
No ratings yet
Background and Literature Review
17 pages
Background and Literature Review
No ratings yet
Background and Literature Review
7 pages
Autoregressive Model Beats Diffusion: Llama For Scalable Image Generation
No ratings yet
Autoregressive Model Beats Diffusion: Llama For Scalable Image Generation
26 pages
Meta
No ratings yet
Meta
17 pages
Text To Image Survey
No ratings yet
Text To Image Survey
40 pages
IEEE Editable
No ratings yet
IEEE Editable
8 pages
Thesis 11 51
No ratings yet
Thesis 11 51
41 pages
RND Report
No ratings yet
RND Report
10 pages
Research Paper Shailesh Tagadghar 31031523034
No ratings yet
Research Paper Shailesh Tagadghar 31031523034
16 pages
Omnigen: Unified Image Generation
No ratings yet
Omnigen: Unified Image Generation
20 pages
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
100% (1)
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
7 pages
AI Image Generation Techniques
No ratings yet
AI Image Generation Techniques
46 pages
Image Generation A Review
No ratings yet
Image Generation A Review
39 pages
Image Classification and Generation of Images
No ratings yet
Image Classification and Generation of Images
21 pages
3 Paper
No ratings yet
3 Paper
14 pages
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
No ratings yet
Synthetic Data Generation For Scarce Road Scene Detection Scenarios
10 pages
IEEE Xplore Reference Download 2024.9.24.8.31.51
No ratings yet
IEEE Xplore Reference Download 2024.9.24.8.31.51
2 pages
BTP - 6 Sem - Part1
No ratings yet
BTP - 6 Sem - Part1
40 pages
OmniGen: Unified Image Generation
No ratings yet
OmniGen: Unified Image Generation
23 pages
Documents 5
No ratings yet
Documents 5
5 pages
CV Assignment02
No ratings yet
CV Assignment02
4 pages
(Nsdi24) Nirvana
No ratings yet
(Nsdi24) Nirvana
18 pages
Diffusion Models in Vision Survey
No ratings yet
Diffusion Models in Vision Survey
20 pages
From Words To Pictures Artificial Intelligence Based Art Generator
No ratings yet
From Words To Pictures Artificial Intelligence Based Art Generator
9 pages
AI Image Generation
No ratings yet
AI Image Generation
12 pages
Methods and Trends in Detecting Generated Images: A Comprehensive Review
No ratings yet
Methods and Trends in Detecting Generated Images: A Comprehensive Review
30 pages
Parag
No ratings yet
Parag
20 pages
Updated Poster
No ratings yet
Updated Poster
1 page
18 Image Generation Using Gan's
No ratings yet
18 Image Generation Using Gan's
5 pages
Paper Math
No ratings yet
Paper Math
13 pages
Inf-DiT Upsampling Any-Resolution Image With Memory-Efficient Diffusion Transformer
No ratings yet
Inf-DiT Upsampling Any-Resolution Image With Memory-Efficient Diffusion Transformer
24 pages
Sinddm: A Single Image Denoising Diffusion Model
No ratings yet
Sinddm: A Single Image Denoising Diffusion Model
39 pages
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
No ratings yet
NeurIPS 2021 Diffusion Models Beat Gans On Image Synthesis Paper
15 pages
Medical Image Synthesis with AI
No ratings yet
Medical Image Synthesis with AI
33 pages
Unit 4
No ratings yet
Unit 4
27 pages
Prompt Diffusion Model Explained
No ratings yet
Prompt Diffusion Model Explained
5 pages
A Survey On Generative Diffusion Model
No ratings yet
A Survey On Generative Diffusion Model
25 pages
Kim Arbitrary-Scale Image Generation and Upsampling Using Latent Diffusion Model and CVPR 2024 Paper
No ratings yet
Kim Arbitrary-Scale Image Generation and Upsampling Using Latent Diffusion Model and CVPR 2024 Paper
10 pages
1-Effective Data Augmentation With Diffusion Models
No ratings yet
1-Effective Data Augmentation With Diffusion Models
23 pages
Multimodal Image Synthesis Survey
No ratings yet
Multimodal Image Synthesis Survey
22 pages
Adding Conditional Control To Text-to-Image Diffusion Models
No ratings yet
Adding Conditional Control To Text-to-Image Diffusion Models
33 pages
New Denoising Diffusion Model
No ratings yet
New Denoising Diffusion Model
13 pages
EDiff-Text-To-Image Diffusion Models With An Ensemble of Expert Denoisers
No ratings yet
EDiff-Text-To-Image Diffusion Models With An Ensemble of Expert Denoisers
24 pages
Nguyen Et Al 2025 Enhanced Medical Image Generation Through Advanced Latent Space Diffusion
No ratings yet
Nguyen Et Al 2025 Enhanced Medical Image Generation Through Advanced Latent Space Diffusion
13 pages
Efficient Diffusion Models For Vision A Survey
No ratings yet
Efficient Diffusion Models For Vision A Survey
16 pages
Boosting Latent Diffusion With Perceptual Objectives
No ratings yet
Boosting Latent Diffusion With Perceptual Objectives
19 pages
LCM LoRA Technical Report
No ratings yet
LCM LoRA Technical Report
7 pages
Diffusion Models Surpass GANs
No ratings yet
Diffusion Models Surpass GANs
44 pages
LCM-LoRA - A Universal Stable-Diffusion Acceleration Module
No ratings yet
LCM-LoRA - A Universal Stable-Diffusion Acceleration Module
7 pages
2 PB
No ratings yet
2 PB
9 pages
IRJMETS60300179929 April
No ratings yet
IRJMETS60300179929 April
7 pages
Diffusion
100% (6)
Diffusion
62 pages
P6 Maths SA2 2018 Nanyang Exam Papers
No ratings yet
P6 Maths SA2 2018 Nanyang Exam Papers
50 pages
Strategies For Solving Surface Integrals
No ratings yet
Strategies For Solving Surface Integrals
3 pages
SSDD
0% (1)
SSDD
2 pages
(786412530) Exhibit 4 - 2 - Asme Tolerances
No ratings yet
(786412530) Exhibit 4 - 2 - Asme Tolerances
4 pages
Number System and BODMAS
No ratings yet
Number System and BODMAS
4 pages
Rahat Eng Presentation
No ratings yet
Rahat Eng Presentation
12 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
CHAPTER 13 Transmission Lines
No ratings yet
CHAPTER 13 Transmission Lines
3 pages
01 Units and Physical Quantities
100% (1)
01 Units and Physical Quantities
53 pages
Cauer
No ratings yet
Cauer
2 pages
Digital VLSI System Design Prof. Dr. S. Ramachandran Department of Electrical Engineering Indian Institute of Technology, Madras
No ratings yet
Digital VLSI System Design Prof. Dr. S. Ramachandran Department of Electrical Engineering Indian Institute of Technology, Madras
30 pages
20191217213924205457expected Seating Arrangement Questions For Ibps Clerk Prelims Exams
No ratings yet
20191217213924205457expected Seating Arrangement Questions For Ibps Clerk Prelims Exams
30 pages
Form 2 Dec Revision Booklet
No ratings yet
Form 2 Dec Revision Booklet
282 pages
M.Sc. (5 Yrs) CS
No ratings yet
M.Sc. (5 Yrs) CS
129 pages
Teaching Strategies, Approaches, and Methods
100% (4)
Teaching Strategies, Approaches, and Methods
47 pages
Momentum Lab
50% (4)
Momentum Lab
3 pages
Sensory Processing Measure Form (Spanish)
100% (3)
Sensory Processing Measure Form (Spanish)
8 pages
Differential Geomet: Mathematics (H) 39
No ratings yet
Differential Geomet: Mathematics (H) 39
1 page
Nat 5 Notes
No ratings yet
Nat 5 Notes
51 pages
Mathcad Manual PDF
No ratings yet
Mathcad Manual PDF
170 pages
Hydraulics and Hydraulic Machinery
No ratings yet
Hydraulics and Hydraulic Machinery
3 pages
Grade 12 Data Analysis Guide
No ratings yet
Grade 12 Data Analysis Guide
15 pages
10th Maths Important 5 Mark Questions English Medium
No ratings yet
10th Maths Important 5 Mark Questions English Medium
4 pages
Chapter 5 Forecasting: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 5 Forecasting: Quantitative Analysis For Management, 11e (Render)
27 pages
Analyzing Forces in Truss Structures
No ratings yet
Analyzing Forces in Truss Structures
50 pages
Section Properties:: Input Tables Settings
No ratings yet
Section Properties:: Input Tables Settings
1 page
Waves and Wave Interaction in Plasmas 1st Edition Prasanta Chatterjee Download PDF
No ratings yet
Waves and Wave Interaction in Plasmas 1st Edition Prasanta Chatterjee Download PDF
51 pages
Etabs Steel Frame Design Manual PDF
100% (1)
Etabs Steel Frame Design Manual PDF
2 pages
Finite and Infinite Geometric Series
No ratings yet
Finite and Infinite Geometric Series
19 pages
Ebbinghaus 1885 PDF
No ratings yet
Ebbinghaus 1885 PDF
100 pages