0% found this document useful (0 votes)
27 views5 pages

Img 4

This study explores the use of Hypernetworks and Low-Rank Adaptation (LoRA) to enhance text-to-image generation through diffusion models. It highlights how these techniques can optimize the image generation process by reducing computational complexity and improving model adaptability without requiring extensive retraining. The research evaluates their effectiveness using the Stable Diffusion 1.5 model, demonstrating significant improvements in generating high-quality images from textual descriptions.

Uploaded by

Mohammed Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views5 pages

Img 4

This study explores the use of Hypernetworks and Low-Rank Adaptation (LoRA) to enhance text-to-image generation through diffusion models. It highlights how these techniques can optimize the image generation process by reducing computational complexity and improving model adaptability without requiring extensive retraining. The research evaluates their effectiveness using the Stable Diffusion 1.5 model, demonstrating significant improvements in generating high-quality images from textual descriptions.

Uploaded by

Mohammed Faisal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A study on the application of using Hypernetwork

and Low Rank Adaptation for text-to-image


generation based on diffusion models
Artyom O. Levin1, Yuri S. Belov2
Computer Science, Information Technology Department
Bauman Moscow State Technical University, Kaluga branch
Kaluga, Russian Federation
2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE) | 979-8-3503-8289-1/24/$31.00 ©2024 IEEE | DOI: 10.1109/REEPE60449.2024.10479561

1levinao@student.bmstu.ru, 2ysbelov@bmstu.ru

Abstract—Recent advances in the field of image generation Diffusion models have emerged as powerful tools in the
have attracted attention due to the growing number of diverse realm of image generation, leveraging latent diffusion
data sources and test samples. A primary driver of this evolution techniques to transform textual descriptions into visually
is the application of neural networks, particularly for generating coherent images. Rauniyar et al. presented a Text to Image
high-quality images from textual prompts. Despite the potential Generator with Latent Diffusion Models [7]. Their model
of diffusion models in this sector, they typically face utilizes diffusion processes to generate images from textual
computational challenges associated with vast datasets. This inputs, showing a novel approach to bridge the gap between
paper describes the research on two existing solutions: text and visual representation. Building upon this, Kim and
Hypernetworks and Low-Rank Adaptation (LoRA), both
Kang proposed an enhancement to denoising models by
aiming to streamline and optimize the image generation process.
While hypernetworks dynamically adjust model parameters
introducing a Timestep-Aware Predictor for Latent Diffusion-
based on the input text, increasing flexibility and performance, Based Image Generation [8]. This innovation contributes to
LoRA efficiently adapts the primary model style without refining the performance of diffusion models, ensuring more
requiring it to be trained from scratch. Using the Stable accurate and high-quality image synthesis. In addition,
Diffusion 1.5 model as a benchmark, this research evaluates the Rombach et al. delved into High-resolution Image Synthesis
influence of hypernetwork and LoRA modifications. The results with Latent Diffusion Models [9]. Their work focuses on
indicate that both approaches provide efficient and highly achieving impressive results in generating high-resolution
accurate image generation, confirming their efficacy in images through the application of latent diffusion models.
contemporary image generation tasks. These advancements collectively underline the efficacy of
latent diffusion models in the domain of image generation.
Keywords—image generation, text-to-image generation,
diffusion models, hypernetwork, low-rank adaptation Among the cutting-edge technologies in this area,
diffusion models hold promise, however, they face a major
I. INTRODUCTION challenge when dealing with large datasets, specifically the
Recently, the field of image generation, a subfield of issue of computational complexity.
artificial intelligence and computer vision, has experienced To address this challenge, one potential solution is the
significant growth due to the availability of large number of utilization of Low Rank Adaptation (LoRA), a technique that
valuable data sources and test examples. This growth in effectively reduces the number of model parameters and
resources has not only assisted developers in garnering significantly shortens the training time. This approach has the
valuable insights from ordinary users to analyse and improve potential to enhance the efficiency of diffusion models in the
their systems and technologies, but has also significantly context of image generation.
expanded its accessibility to a wide audience through
convenience and accessibility. Low-rank adaptation plays a pivotal role in optimizing the
efficiency of image generation models. Lv et al. introduced
This development is primarily related to the rapid progress Dynamic Low-Rank Instance Adaptation [10], a method
of neural networks, which are extending their influence to designed to adapt the model to instances dynamically,
various research areas. One of the important applications of resulting in universal neural image compression. In a similar
neural networks is image generation, which consists of vein, Hu et al. proposed LoRA: Low-Rank adaptation of Large
creating high-quality images from textual descriptions, Language Models [11], emphasizing the need for efficient
utilizing neural networks and diffusion algorithms. adaptation of language models using low-rank techniques.
The sphere of image generation and diffusion models These approaches underscore the importance of low-rank
demonstrates considerable versatility, extending from the adaptation in streamlining large models, with potential
enhancement of low-light images, exemplified by the work of applications in the field of image generation.
Ooi and Chan [1], to pioneering applications in medical Another noteworthy approach that has gained popularity
imaging, as evidenced by the studies of Nguyen et al. [2] and in recent research is the application of hypernetworks.
Zhang [3]. Moreover, these models find application in urban Hypernetworks enable the generation of images based on
scene editing, as elucidated by Park and Kang [4]. However, a textual descriptions, providing the possibility of dynamically
notable ethical concern arises with the potential misuse of adapting the model parameters depending on the input text, as
diffusion models in the creation of deepfake content, as it was mentioned by T. M. Dinh et al. in their paper [12].
exemplified by the investigation of Chen et al. [5]. In response
to such concerns, Bammey introduces "Synthbuster," a In addition, paper, made by Ruiz et al. introduced
significant advancement in the detection of diffusion model- HyperDreamBooth [13], a model using hypernetworks for fast
generated images [6]. personalization of text-to-image models. Hypernetworks
provide a dynamic framework for adapting models based on

979-8-3503-8289-1/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
textual descriptions, offering a swift and personalized to anticipate the original version of the input data, given the
generation process. These studies collectively illuminate the input data noise representation at each step in the sequence.
significance of hypernetworks in pushing the boundaries of
image synthesis with latent diffusion models, emphasizing Like other generative models, diffusion models have the
adaptability and personalization, increasing the overall ability to model conditional distributions [18]. This feat is
flexibility and performance of the image generation system, achieved through the application of conditional denoising
particularly when dealing with varied and complex textual autoencoders, responsible for the control of the synthesis
descriptions. process in alignment with the corresponding input data. This
input data can encompass a wide variety of forms, ranging
Building on the information gained from these studies, from text and semantic maps to diverse data types converted
current research focuses on the application of hypernetworks into images or text.
and low-rank adaptation for text-to-image generation using
diffusion models. In particular, we explore the integration of B. Hypernetworks Models
hypernetworks and low-rank adaptation models as Hypernetworks are specialized light tools used to fine-tune
complementary enhancements to a diffusion-based text-to- large models such as Stable Diffusion, allowing to adjust their
image generation model. style. They are typically small neural networks, often
resembling simple, linear networks with some additional
II. PROBLEM STATEMENT elements such as dropout and activation functions.
The aim of this study is to investigate the impact of two The key area of hypernetwork integration for in the Stable
different modifications in the image generation model from Diffusion model is the cross-attention module of the U-Net
textual descriptions, namely the hypernetwork model and the noise predictor. Here, hypernetworks play a crucial role by
low-rank adaptation model. To achieve this, it is necessary to introducing two networks to transform the key and query
explain the working principle of diffusion models and describe vectors [19]. This alteration modifies the original model
how these modifications will affect the original model, architecture.
demonstrating this through practical implementation during
by training the aforementioned models. During training, the Stable Diffusion model remains
unchanged, while the attached hypernetwork can adapt. Since
In general, diffusion models offer a way to create images hypernetworks are small and efficient, their training is fast and
based on textual descriptions through by gradually evolving does not require extensive computing resources. This makes it
noise in the pixel space [14]. This evolution of noise passes feasible to train them on standard computers.
through all spatial dimensions, ultimately forming an image
endowed with certain characteristics. However, it is important The main advantages of hypernetworks are their fast
to note that when working with extensive datasets, the training training process and the generation of small model files.
diffusion models can be a time-intensive and resource- Likewise, LoRA models fit diffusion models, but they do it in
demanding endeavor. a different way.
Therefore, in order to introduce or implement any changes C. Low-Rank Adaptation Models
in the behavior of an already trained model, two approaches Low-Rank Adaptation is a specialized training technique
were chosen: the use of a small Hypernetwork model and a designed for modifying diffusion models. It is crucial to
Low-Rank Adaptation (LoRA) model. Each of these methods emphasize that LoRA cannot function on its own; it requires a
does not require retraining the original model from scratch and base model checkpoint file. LoRA's primary role is to make
provides the opportunity to incorporate specific changes into subtle style adjustments to the base model [20].
the primary model [15]. This allows modifying the primary
model without affecting the its control point, significantly The focus of LoRA refinements is on the critical cross-
reducing the training time by working with only small attention layers of Diffusion models. Researchers have
Hypernetwork or LoRA models. established that fine-tuning of these specific layers is sufficient
to achieve training efficiency [10].
The primary model chosen for this research is the Stable
Diffusion 1.5 model, trained on the Laion Aesthetics v2.5+ Cross-attention layers are essentially matrices composed
dataset with various image sizes, but predominantly of weight values arranged in rows and columns [21]. LoRA
512 x 512 pixels. Each image in the dataset is associated with achieves fine-tuning by introducing its own weight parameters
a label reflecting its aesthetic evaluation. into these matrices.

The data is provided in JPEG format, and the research aims A notable aspect of LoRA operation involves splitting a
to demonstrate the influence of hypernetwork and low-rank matrix into two small low-rank matrices. This approach
adaptation models on the base model, highlighting their reduces the total number of stored numerical parameters in the
effectiveness in generating images from textual descriptions. model, enhancing its efficiency and resource intensity.

III. BACKGROUND IV. EXPERIMENT


The model training process involves the following
A. Diffusion Models
sequence of processes:
Diffusion models are probabilistic models used to explore
the distribution of a given dataset. These models aim to During the training process, images are encoded using an
remove noise from normally distributed variables, which autoencoder, which transforms the images into hidden
reflects the reverse training process of a fixed-length Markov representations. An autoencoder uses a relative downsampling
chain [16]. In the field of image synthesis, diffusion models factor equal to 8, mapping images of the shape Width x
use an extended variational lower bound to mirror Height x 3 to hidden images of the shape Width/f x Height/f x
discoloration outcomes. This complex process involves a 4.
sequence of denoising autoencoders, each equipped with Textual prompts are encoded by the ViT-L/14 text
identical weights [17]. Each autoencoder is carefully trained encoder.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
The unencoded output data from the text encoder is fed 2) Weight Initialization
into the U-Net model of hidden diffusion via cross-attention The normal layer weight initialization algorithm was used
mechanisms. to initialize the weights. This method involves random
In both cases, it was decided to conduct additional model initialization weights from a normal distribution with a zero
training based on the Stable Diffusion checkpoint 1.5-pruned- mean and a specified standard deviation [24].
emaonly.ckpt. For that task, a Fig. 1 dataset consisting of 3) Training Parameters
15 images was created. With all necessary conditions for the hypernetwork
training process in place, the following training parameters
were selected and configured:
 The maximum number of training steps was set to
10,000, with intermediate results including images and
checkpoints saved every 100 steps.
 The learning rate was made adaptive to prevent
overfitting on the selected dataset. For the first 800
steps the learning rate was set to 2e-5, for the next
1,600 steps the learning rate was 8e-6, and the
remaining 7,600 steps a learning rate of 5e-6 was used,
as experiments demonstrated that this approach was
the most efficientfor training.
Fig. 1. Dataset for the training proccess.  The batch size was set to 1 and the gradient
accumulation steps were also set to 1 step.
Prior to training, these images were previously
preprocessed using the built-in Stable Diffusion algorithm, B. Low-Rank Adaptation Model
resulting in 15 processed images with dimensions of In order to train low-rank adaptation model, it was
512 x 512 pixels. necessary to create a dataset in the following format Fig. 2:
A. Hypernetwork Model
1) Hypernetwork Formation
After creating the dataset, a hypernetwork based on Stable
Diffusion 1.5 was formed, which has a compact but optimized
architecture for training on a small dataset. Specifically, it
consists of two fully connected layers with an intermediate
size of 2. No drop-down layers were utilized as they were
unnecessary in this context. A linear activation function was
selected because hypernetworks generate parameters or
weights for other models. They act as meta-networks that
control or customize the primary model. The primary goal of
hypernetworks is to generate parameters that perform
optimally for a task while remaining independent of specific
data or input examples.
Fig. 2. Example of the dataset format.
Using a linear activation function, such as an identity
function, ensures a linear relationship between the input and
output of the hypernetwork. This is crucial because linear 1) Dataset Formation
functions possess the following properties: In order to train a small model using low-rank adaptation
(LoRA), a dataset was created. This dataset consists of image-
 Linearity: Using a linear activation function allows the text description pairs. Given that LoRA does not require a
hypernetwork to linearly interact with the primary large dataset (typically 12 to 24 images are sufficient), the
model parameters. This can be valuable as linear same dataset was used to train the hypernetwork, but with
combinations of parameters can represent specific additional text-description files created for each image.
structures or relationships within the model [22].
2) Model Configuration for LoRA
 Unconstrained Representation: Linear activation Similar to the previously trained hypernetwork, the LoRA
functions do not impose restrictions on the range of model was initialized based on the "stable diffusion 1.5-
output values. This allows hypernetworks to freely pruned-emaonly.ckpt." checkpoint. Several crucial parameters
generate parameters that can take positive and negative related to low-rank adaptation were considered:
values without scaling limitations.
 network_dim was set to 32, denoting the
 Simplicity: Linear activation functions are dimensionality of the neural network and specifying
straightforward and computationally efficient. They do the number of layers and neurons in each layer.
not introduce nonlinearity to the hypernetwork, which
can be useful if the primary model already contains a  network_alpha was set to 1, determining the magnitude
sufficient number of nonlinear layers or activation of weight coefficient changes during training. Higher
functions [23]. values of "alpha" can lead to faster convergence but
may increase the risk of overflow or divergence,
whereas lower values of "alpha" can extend training
time but provide more stable results [25].

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
3) Optimizer
The AdamW8bit optimizer was used. This optimizer is a
variation of the AdamW optimization algorithm and is
designed to train models with low-precision numbers, e.g. 8-
bit numbers [26]. This allows to reduce the requirements on
computational resources and speed up training.
4) Inner U-Net and Text Encoder Network Parameters
To ensure the correct operation of the obtained LoRA
model, additional training of the inner U-Net and text
encoding network was conducted [27]. This significantly
enhances the accuracy of the LoRA model. The following
parameters were used:
 unet_learningRate set to 1e-4.
 text_encoder_learningRate set to 5e-5.
Fig. 4. Relation between losses and epochs for the hypernetwork trainning
 The scheduler algorithm is set to "constant" because proccess.
the main purpose of this model is to fine-tune weights
into the main model. B. Low-Rank Adaptation Model
Based on the concept of low-rank adaptation, after such a
5) Training Parameters model is trained, it can be freely applied in the image
Considering all the above settings, it was decided to train generation process using it as a tag in angle brackets, e.g.
the LoRA model for 20 epochs, each consisting of 50 steps. <LoRA:orangeCat:1.0>. As a result, any model initially
The clip_skip parameter was set to 2, indicating that the model understands that it needs to generate an image as close as
performs two diffusion steps between input and the generated possible to what was in the original LoRA dataset. In this case,
image. This helps control the level of detail and diversity in it is an orange cat Fig. 5.
the generated images [28].
The batch_size was set to 6, which means that at each
training iteration, the model processes six images or datasets
simultaneously.
V. RESULTS
Once we had two small models, we compared them and
evaluated the training results.
A. Hypernetwork Model
Since the hypernetwork training algorithm is based on the
attempt to recover original images from the generated noise, Fig. 5. A comparison of the image from the original dataset and the
the images obtained during the training process should generated image at the 17th training epoch for the low-rank adaptation
model.
resemble to some extent the original images in the training
dataset. As a result, the following outcomes were achieved Summarizing the results of Fig. 6, the implemented LoRA
Fig. 3: model was successfully trained and is capable of producing
good, and most importantly, similar images, capturing
distinctive features in the resulting images. Additionally, the
relationship between losses and epochs was identified, with
the minimum losses being recorded at the 17th epoch and
amounting to 0.13284, which is a good result for relatively fast
training.

Fig. 3. A comparison of the image from the original dataset and the
generated image at the 694th epoch of hypernetwork training.

As the comparative analysis shows, the hypernetwork is


effectively trained and excels at generating high-quality
images that closely resemble the original dataset. Notably, it
accurately captures the distinctive features of the resulting
images. Furthermore, the analysis revealed a significant
relationship between losses and epochs of Fig. 4, with the
lowest losses recorded at the 166th epoch and amounting to
0.0043615.
Fig. 6. Relation between losses and epochs for the Low-Rank Adaptation
model trainning proccess.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION [10] Y. Lv, J. Xiang, J. Zhang, W. Yang, X. Han, and W. Yang, “Dynamic
Low-Rank Instance Adaptation for Universal Neural Image
In this paper, we researched two ways to improve image Compression” in Proceedings of the 31st ACM International
generation from textual descriptions: the hypernetwork model Conference on Multimedia (MM '23). Association for Computing
and the low-rank adaptation (LoRA) model. Both methods Machinery, New York, NY, USA, 2023, pp. 632–642.
https://doi.org/10.1145/3581783.3612187.
offer significant benefits when working together with image
[11] E. J. Hu et al., “LoRA: Low-Rank adaptation of Large Language
generation models. The hypernetwork model keeps the core Models,” 2021, arXiv:2106.09685.
model stable during training, letting only the attached
[12] T. M. Dinh, A. T. Tran, R. Nguyen, and B.-S. Hua, “HyperInverter:
hypernetwork adapt. These hypernetworks are small and Improving stylegan inversion via Hypernetwork,” in 2022 IEEE/CVF
efficient, allowing for quick training without the need for high- Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 11379−11388,
perfomance computers. Additionally, they create smaller doi: 10.1109/cvpr52688.2022.01110.
model files, making them practical and efficient.Similarly, the [13] N. Ruiz, Y. Li, V. Jampani, W. Wei, T. Hou, Y. Pritch, N. Wadhwa,
LoRA models enhance models, but they do it in a different M. Rubinstein, K. Aberman, “HyperDreamBooth: HyperNetworks for
way. They split matrices into smaller, low-rank matrices, Fast Personalization of Text-to-Image Models,” 2023,
arXiv:2307.06949.
reducing the overall number of stored parameters in the model.
[14] S. Gu et al., “Vector quantized diffusion model for text-to-image
This optimization improves model efficiency and resource synthesis,” in 2022 IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
utilization. The results of these modifications are impressive. (CVPR), 2022, pp. 10686–10696, doi: 10.1109/cvpr52688.2022.01043.
The hypernetwork model successfully generates images [15] S. Frolov, T. Hinz, F. Raue, J. Hees, and A. Dengel, “Adversarial Text-
similar to the original dataset while preserving its essential to-Image Synthesis: A Review,” 2021, arXiv:2101.09983.
features. Likewise, the LoRA model demonstrates its ability [16] C. Zhang and Y. Peng, “Stacking Vae and gan for context-aware text-
to produce comparable images while preserving their distinct to-image generation,” in 2018 IEEE Fourth Int. Conf. Multimedia Big
characteristics.In summary, the usage of hypernetworks and Data (BigMM), 2018, pp. 1–5, doi: 10.1109/bigmm.2018.8499439.
LoRA models in image generation models significantly [17] E. Jeon, K. Kim, and D. Kim, “Fa-gan: Feature-aware gan for text to
improves the process of generating images from textual image synthesis,” in 2021 IEEE Int. Conf. Image Process. (ICIP), 2021,
pp. 2443–2447, doi: 10.1109/icip42928.2021.9506172.
descriptions. These modifications speed up training, reduce
[18] R. Yanagi, R. Togo, T. Ogawa, and M. Haseyama, “Scene retrieval
resource requirements, and result in high quality images while using text-to-image gan-based visual similarities and image-to-text
preserving important features. model-based textual similarities,” in 2019 IEEE 8th Global Conf.
Consum. Electron. (GCCE), 2019, pp. 13–14, doi:
REFERENCES 10.1109/gcce46687.2019.9015366.
[1] X. P. Ooi and C. Seng Chan, “LLDE: Enhancing Low-Light Images [19] W. Liao, K. Hu, M. Y. Yang, and B. Rosenhahn, “Text to image
with Diffusion Model,” in 2023 IEEE International Conference on generation with semantic-spatial aware gan,” in 2022 IEEE/CVF Conf.
Image Processing (ICIP), Kuala Lumpur, Malaysia, 2023, pp. 1305- Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 18166–18175, doi:
1309, doi: 10.1109/ICIP49359.2023.10222446. 10.1109/cvpr52688.2022.01765.
[2] L. X. Nguyen, P. Sone Aung, H. Q. Le, S. -B. Park and C. S. Hong, “A [20] A. Levin and Y. Belov, “Application of a Low-rank adaptation model
New Chapter for Medical Image Generation: The Stable Diffusion for text to image generation using diffusion models,” E-Scio, vol. 6,
Method,” in 2023 International Conference on Information Networking no. 81, pp. 352–360, 2023.
(ICOIN), Bangkok, Thailand, 2023, pp. 483-486, doi: [21] H. Zhang et al., “Stackgan: Text to photo-realistic image synthesis with
10.1109/ICOIN56518.2023.10049010. stacked generative adversarial networks,” in 2017 IEEE Int. Conf.
[3] S. Zhang, “Dreambooth-based Image Generation Methods for Comput. Vis. (ICCV), 2017, pp. 5908–5916, doi:
Improving the Performance of CNN,” in 2023 IEEE 3rd International 10.1109/iccv.2017.629.
Conference on Electronic Technology, Communication and [22] H. Dong, J. Zhang, D. McIlwraith, and Y. Guo, “I2T2I: Learning text
Information (ICETCI), Changchun, China, 2023, pp. 1181-1184, doi: to image synthesis with textual data augmentation,” in 2017 IEEE Int.
10.1109/ICETCI57876.2023.10176568. Conf. Image Process. (ICIP), 2017, pp. 2015–2019, doi:
[4] M. Park and D. -o. Kang, “Urban Scene Editing with Diffusion Model 10.1109/icip.2017.8296635.
using Segmentation Mask,” in 23rd International Conference on [23] H. Zhang et al., “Stackgan++: Realistic image synthesis with stacked
Control, Automation and Systems (ICCAS), Yeosu, Korea, Republic of, generative adversarial networks,” IEEE Trans. Pattern Anal. Mach.
2023, pp. 1881-1884, doi: 10.23919/ICCAS59377.2023.10316952. Intell., vol. 41, no. 8, pp. 1947–1962, 2019, doi:
[5] Y. Chen, N. A. H. Haldar, N. Akhtar and A. Mian, “Text-image guided 10.1109/tpami.2018.2856256.
Diffusion Model for generating Deepfake celebrity interactions,” in [24] T. D. Sokolov, N. A. Askerova, and A. A. Askerova, “Modeling
2023 International Conference on Digital Image Computing: pseudo-random sequences,” Politechnical Student Journal, vol. 2,
Techniques and Applications (DICTA), Port Macquarie, Australia, no. 67, p. 771, 2022, doi: 10.18698/2541-8009-2022-2-771.
2023, pp. 348-355, doi: 10.1109/DICTA60407.2023.00055. [25] R. A. Serbiev and D. G. Berezan, “Evaluation of the quality of object
[6] Q. Bammey, "Synthbuster: Towards Detection of Diffusion Model recognition on thermal imaging images using neural networks,”
Generated Images," in IEEE Open Journal of Signal Processing, vol. Politechnical Student Journal, vol. 4, no. 81, 2023, doi:
5, pp. 1-9, 2024, doi: 10.1109/OJSP.2023.3337714 10.18698/2541-8009-2023-4-881.
[7] A. Rauniyar, A. Raj, A. Kumar, A. K. Kandu, A. Singh and A. Gupta, [26] I. Rudakov, M.Filippov, and M. Kudryavtsev, “Image generation
“Text to Image Generator with Latent Diffusion Models,” in 2023 method using neural networks based on recoverable byte sequence,”
International Conference on Computational Intelligence, Bulletin of the Moscow State Technical University of Civil Aviation.
Communication Technology and Networking (CICTN), Ghaziabad, Instrument Engineering Series, vol. 1, no. 142, pp. 83–97, 2023.
India, 2023, pp. 144-148, doi: 10.1109/CICTN57981.2023.10140348. [27] K. Gavrilov and Y. Lavrenkov, “Investigation of the application of
[8] J. -U. Kim and D. -J. Kang, “Enhancing Denoising Models convolutional neural networks for image processing and object
Performance Through Timestep-Aware Predictor for Latent Diffusion- recognition,” Electronic Journal: Science, Technology, and Education,
Based Image Generation,” in 23rd International Conference on vol. 2, no. 33, pp. 25–30, 2021.
Control, Automation and Systems (ICCAS), Yeosu, Korea, Republic of, [28] D. Petrin and Y. Belov, “Improving the quality of machine learning
2023, pp. 1937-1940, doi: 10.23919/ICCAS59377.2023.10316977. models in image classification tasks using feature extraction and fine-
[9] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- tuning approaches,” Electronic Journal: Science, Technology, and
resolution image synthesis with Latent Diffusion Models,” in Proc. Education, vol. 1, no. 28, pp. 104–111, 2020.
2022 IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022,
pp. 10674−10685, doi: 10.1109/cvpr52688.2022.01042.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on June 18,2025 at 11:17:02 UTC from IEEE Xplore. Restrictions apply.

You might also like