0% found this document useful (0 votes)
3 views10 pages

Image Fusion Based CNN LL

The document presents an efficient deep learning framework for fusing multimodal medical images, utilizing a Siamese CNN model for consistent feature extraction and improved fusion quality. It integrates Gaussian pyramid decomposition and a localized similarity-based strategy to enhance image clarity and usability in clinical settings. The proposed method demonstrates superior performance in preserving image details and improving diagnostic quality compared to existing fusion techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views10 pages

Image Fusion Based CNN LL

The document presents an efficient deep learning framework for fusing multimodal medical images, utilizing a Siamese CNN model for consistent feature extraction and improved fusion quality. It integrates Gaussian pyramid decomposition and a localized similarity-based strategy to enhance image clarity and usability in clinical settings. The proposed method demonstrates superior performance in preserving image details and improving diagnostic quality compared to existing fusion techniques.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

International Journal of Information and Electronics Engineering, Vol. 15, No.

6, JUN
2025

An Efficient Deep Learning Framework for Fusing Multimodal Images in


Computer Vision Tasks
G.VENKATA
KAVYA1, SK.MUNWAR ALI2
1
PG Scholar, Dept. of E.C.E, Eswar College Of Engineering, Narasaraopet, Palnadu, Dt.
2
Asst Professors, Dept. of E.C.E, Eswar College Of Engineering, Narasaraopet, Palnadu, Dt.
In medical imaging, techniques like CT, MRI, PET, and SPECT offer complementary
structural and functional information. Image fusion aims to integrate this diverse data to
enhance contrast, clarity, and clinical usability. An effective fusion method should preserve
all relevant source image details without introducing artefacts or misalignment. The proposed
approach uses a Siamese CNN model with identical weight branches for consistent feature
extraction, improving training efficiency and fusion quality. It integrates Gaussian pyramid
decomposition and multi scale transformation for perceptually aligned fusion, while a
localized similarity-based strategy adaptively refines coefficients. Gaussian pyramid
decomposition is used, and the pyramid transform is used for multi scale decomposition, so
that the fusion process is more in line with human visual perception. In addition, the localized
similarity-based fusion strategy is used to adaptively adjust the decomposed coefficients. And
CNN-driven approaches, delivering superior fused images for medical diagnosis.
Key Words: Deep Learning, Fusion Method, Neural Network, Medical Imaging,
Convolutional Network, Image Quality
1. Introduction (PET), single photon emission computed
With the rapid development of sensor and tomography (SPECT), etc. Due to the
computer technology, medical imaging has difference in imaging mechanism, medical
emerged as an irreplaceable component in images with different modalities focus on
various clinical applications including different categories of organ/tissue
diagnosis, treatment planning and surgical information. For instance, the CT images
navigation. To provide medical are commonly used for the precise
practitioners sufficient information for localization of dense structures like bones
clinical purposes, medical images obtained and implants, the MR images can provide
with multiple modalities are usually excellent soft-tissue details with high-
required, such as X-ray, computed resolution anatomical information, while
tomography (CT), magnetic resonance the functional information on blood flow
(MR), positron emission tomography and metabolic changes can be offered by
1
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

PET and SPECT images but with low refine fusion, improving robustness and
spatial resolution. Multi-modal medical visual quality.
image fusion aims at combining the 2. Related Work
complementary information contained in
In our recent work [8], a CNN-based
different source images by generating a
multi-focus image fusion method which
composite image for visualization, which
can obtain state-of-the-art results was
can help physicians make easier and better
proposed. In the method, two source
decisions for various purposes
images are fed to the two branches of a
Recent advancements in medical image
Siamese convolutional network in which
fusion have focused on multi-scale
the two branches share the same
transform (MST) techniques to handle the
architecture and weights respectively.
intensity variations caused by differing
Each branch contains three convolutional
imaging mechanisms. Traditional MST-
layers and the obtained feature maps
based fusion involves decomposition,
essentially act as the role of activity level
fusion, and reconstruction steps using
measures. Issue, we adopt a local
transforms like pyramids, wavelets,
similarity-based fusion strategy to
contour lets, and shear lets. A key
determine the fusion mode for the
challenge in this process is creating
decomposed coefficients [12]. When the
accurate weight maps through activity
contents of source images have high
level measurement and weight assignment,
similarity, the “weighted-average” fusion
which often suffer from issues like noise,
mode is applied to avoid losing useful
misregistration, and intensity differences.
information. In this situation, the weights
To address these limitations, this study
obtained by the CNN are more reliable
introduces a convolutional neural network
than the coefficient-based measure, so they
(CNN) that learns an end-to-end mapping
are employed as the merging weights.
from source images to weight maps,
When the similarity of image contents is
integrating both steps optimally. The
low, the “choose-max” or “selection”
method also uses image pyramids for
fusion mode is preferred to mostly
perceptual alignment and a local
preserve the salient details from source
similarity-based strategy to adaptively
images. In this situation, the CNN output
is not reliable, and the pixel activity is
2
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

directly measured by the absolute values of and transform domain fusion. Another
the decomposed coefficients. important spatial domain fusion method is
3. Existing System the high pass filtering based technique.
The current image fusion methods are Here the high frequency details are
divided into two categories. One is to injected into up sampled version of MS
directly fuse source images in spatial images. The disadvantage of spatial
domain. However, this kind of methods is domain approaches is that they produce
not good in dealing with edge. The other spatial distortion in the fused image.
one is to integrate source images in Spectral distortion becomes a negative
transform domain. This type of approaches factor while we go for further processing,
could remove the block effect and get such as classification problem distortion
more consistent fusion result. Image fusion can be very well handled by transform
methods based on MSD draw researcher's domain approaches on image fusion. The
attention in recent years. For example, multi resolution analysis has become a
Discrete wavelet transform based very useful tool for analysing remote
method[8, 9], stationary wavelet transform sensing images.
based method[10], non-sub sampled
contour let transform based method DWT,
as a popular MSD tool, is first proposed, It
provides a richer scale space analysis for
image compared to other MSD tools
because it can decompose image into
magnitude and phase information. The
magnitude of DWT is near shift invariant
Fig 1: Standard Image Fusion
so it have better texture representation than
The Discrete Wavelet Transform (DWT) is
wavelet and complex wavelet, and the
widely used in image fusion due to its
phase of DWT contains richer geometric
ability to separate images into different
information.
frequency components. While low-pass
3.1 Standard Image Fusion Methods
filtering reduces the image resolution by
Image fusion methods can be broadly
removing high-frequency details, it does
classified into two - spatial domain fusion
not change the image scale; sub-sampling
3
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

afterward doubles the scale by discarding values indicate a better fusion result for all
redundant samples without losing indexes
information. Most of the image’s energy 4. Proposed Method
resides in the low-frequency part, which
The convolutional network used in the
forms the approximation of the fused
proposed fusion algorithm. It is a Siamese
image and strongly influences its overall
network in which the weights of the two
perception. This paper proposes a
branches are constrained to the same. Each
weighted average fusion rule for the low-
branch consists of three convolutional
frequency sub bands, using a
layers and one max-pooling layer which is
comprehensive feature that combines the
the same as the network used in [24]. To
phase, magnitude, and spatial variance of
reduce the memory consumption as well as
the low-frequency coefficients. The low-
increase the computational efficiency, we
frequency DWT components are
adopt a much slighter model in this work
represented by one magnitude matrix and
by removing a fully connected layer from
three phase matrices, capturing local shifts
the network used in [24]. The 512 feature
and texture information to improve fusion
maps after concatenation are directly
quality. where α is a visual constant and is
connected to a 2-dimensional vector. It can
determined by a physiological visual test
be calculated that the slight mode only
in a range of 0.6 to 0.7, L(i,j) and H(i,j) are
takes up about 1.66 MB of physical
low frequency coefficient and high
memory in single precision, which is
frequency coefficient, respectively. The
significantly less than the 33.6 MB model
relationship between contrast and
employed in [14]. Finally, this
background intensity is non-linear, which
2dimensional vector is fed to a 2-way Soft
makes the human visual system highly
Max layer (not shown in Fig.2), which
sensitive to contrast variations. To provide
produces a probability distribution over
further quantitative comparison of various
two classes. The two classes correspond
fusion methods, objective evaluations of
two kinds of normalized weight
fused results are given in this section. The
assignment results, namely, “first patch 1
spatial frequency (SF) and MSE ,PSNR
and second patch 0” and “first patch 0 and
are adopted as evaluation indexes. Large
second patch 1”, respectively. The
probability of each class indicates the
4
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

possibility of each weight assignment. In pyramid is made by forming the difference


this situation, also considering that the sum between images at adjacent levels in the
of two output probabilities is 1, the pyramid and performing image
probability of each class just indicates the interpolation between adjacent levels of
weight assigned to its corresponding input resolution, to enable computation of pixel
patch. wise differences.
The Gaussian pyramid is computed as
follows. The original image is convolved
with a Gaussian kernel. As described
above the resulting image is a low pass
filtered version of the original image. The
Fig.2: Architecture for CNN training cut-off frequency can be controlled using
4.1 Pyramid Generation the parameter . The Laplacian is then
There are two main types of pyramids: low computed as the difference between the
pass and band pass. A low pass pyramid is original image and the low pass filtered
made by smoothing the image with an image. This process is continued to obtain
appropriate smoothing filter and then a set of band-pass filtered images (since
subsampling the smoothed image, usually each is the difference between two levels
by a factor of 2 along each coordinate of the Gaussian pyramid). Thus, the
direction. The resulting image is then Laplacian pyramid is a set of band pass
subjected to the same procedure, and the filters
cycle is repeated multiple times. Each
cycle of this process results in a smaller
image with increased smoothing, but with
decreased spatial sampling density (that is,
decreased image resolution). If illustrated
graphically, the entire multi-scale
representation will look like a pyramid,
with the original image on the bottom and
each cycle's resulting smaller image
stacked one atop the other. A band pass
5
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

Fig 3: the filtered images stacked one detail images L0, L1, …, LN and the low
on top of the other form a tapering pass
pyramid structure, hence the name image 𝑔𝑁 is as follows:
Step 1: CNN-based weight map
generation. Feed the two source images 𝐴
and 𝐵 to the two branches of the
convolutional network, respectively. The
weight map 𝑊 is generated using the
approach described above.
Step 2: Pyramid decomposition.
Decompose each source image into a
Laplacian pyramid. Let {𝐴} and {𝐵}
respectively denote the pyramids of 𝐴
and
𝐵, where 𝑙 indicates the 𝑙-th
decomposition level. Decompose the
weight map 𝑊 into a Gaussian pyramid
{𝑊}. The total decomposition level of
each pyramid is set to the highest possible
value⌊log2 min(𝐻,𝑊)⌋, where 𝐻×𝑊 is
the spatial size of source images
and⌊⋅⌋denotes the flooring operation.

Fig.4: Proposed medical image fusion


The inverse transform to obtain the
original image g0 from the N

6
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

1. 𝑔𝑁 is up sampled by inserting zeros artifacts, and ensures that


between the sample values and
interpolating the missing values by
convolving it with the filter w to
obtain the image 𝑔′ .
2. The image 𝑔′ is added to the lowest




level detail image LN to obtain the
approximation image at the next
upper level:
𝑔𝑁−1 = 𝐿𝑁 + 𝑔′ �

3. Steps 1 and 2 are repeated on the
detail images L0, L1, …, LN−1 to
obtain the original image.
The multimodal image fusion method
based on deep learning utilizes a
pyramid- based CNN architecture to
effectively combine information from
different medical imaging modalities,
such as CT and MRI. This approach
decomposes source images into multi-
scale representations using image
pyramids, capturing features at various
levels of detail. A convolutional neural
network (CNN) is then trained to
generate weight maps that guide the
fusion process, allowing it to jointly
perform activity level measurement and
weight assignment in an optimized
manner. By integrating local similarity
strategies and multi-scale decomposition,
the method enhances contrast, reduces
7
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

the fused image retains complementary Fig 5: Test images (a) dataset (MR)& (b)
information from all input sources, dataset (CT)
resulting in improved diagnostic quality.
5 .Results and Discussion
The experiments were conducted using
MATLAB 2016b on a high-speed CPU to
ensure faster execution, using the test
images shown in Fig5. The goal of any
image fusion algorithm is to effectively
(a) (b)
combine essential information from both
Fig 6: Obtained fused images for dataset
source images into a single output image.
using (a) DWT (b) Proposed method
The quality of the fused image is assessed
However, all the existing fusion methods
both visually and quantitatively using
outputs not good at visual perception, lack
fusion metrics to provide a comprehensive
of contrast with edge information and
evaluation of the proposed and existing
texture preservation. Our proposed method
methods. Fused image cannot be judged
with datasets Fig 6 Obtained fused images
exclusively by seeing the output image or
for dataset using (a) DWT (b) Proposed
by measuring fusion metrics. It should be
method which are presented in looks more
judged qualitatively using visual display
quality in visualization, good contrast with
and quantitatively using fusion metrics. In
proper edge information and excellent
this section, we are presenting both visual
texture preservation as the value of entropy
quality and quantitative analysis of
is much higher.
proposed and existing algorithms such as,
6. Conclusion & Future work
In this paper, a medical image fusion
method based on convolutional neural
networks is proposed. We employ a
Siamese network to generate a direct
mapping from source images to a weight
map which contains the integrated pixel
activity information. The main novelty of
this approach is it can jointly implement
8
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

activity level measurement and weight [4] B. Yang and S. Li, “Pixel-level image
assignment via network learning, which fusion with simultaneous orthogonal
can overcome the difficulty of artificial matching pursuit,” Information Fusion,
design. To achieve perceptually good vol. 13, pp. 10–19, 2012.
results, some popular techniques in image [5] S. Li, H. Yin, and L. Fang, “Group-
fusion such as multi-scale processing and sparse representation with dictionary
adaptive fusion mode selection are learning for medical image denoising and
appropriately adopted. Experimental fusion,” IEEE Transactions on Biomedical
results demonstrate that the proposed Engineering, vol. 59, pp. 3450–3459,
method can obtain high-quality results in 2012.
terms of visual quality and objective [6] Z. L. G. Bhatnagar, Q. Wu, “Directive
metrics. In addition to the proposed contrast based multimodal medical image
algorithm itself, another contribution of fusion in nsct domain,” IEEE Transactions
this work is that it exhibits the great on Multimedia, vol. 15, pp. 1014–1024,
potential of some deep learning techniques 2013.
for image fusion, which will be further [7] S. Li, X. Kang, and J. Hu, “Image
studied in the future. fusion with guided filtering,” IEEE
REFERENCES Transactions on Image Processing, vol. 22,
[1] A. James and B. Dasarathy, “Medical no. 7, pp. 2864–2875, 2013.
image fusion: a survey of the state of the [8] R. Shen, I. Cheng, and A. Basu,
art,” Information Fusion, vol. 19, pp. 4–19, “Cross-scale coefficient selection for
2014. volumetric medical image fusion,” IEEE
[2] L. Yang, B. Guo, and W. Ni, Transactions on Biomedical Engineering,
“Multimodality medical image fusion vol. 60, pp. 1069–1079, 2013.
based on multi scale geometric analysis of [9] R. Singh and A. Khare, “Fusion of
contourlet transform,” Neurocomputing, multimodal medical images using
vol. 72, pp. 203–211, 2008. daubechies complex wavelet transform c a
[3] Z. Wang and Y. Ma, “Medical image multiresolution approach,” Information
fusion using m-pcnn,” Information Fusion, Fusion, vol. 19, pp. 49–60, 2014.
vol. 9, pp. 176–185, 2008. [10] L. Wang, B. Li, and L. Tan,
“Multimodal medical volumetric data
9
Doi:10.48047/ijiee.2025.15.6.16
International Journal of Information and Electronics Engineering, Vol. 15, No. 6, JUN
2025

fusion using 3-d discrete shearlet [14] Q. Wang, S. Li, H. Qin, and A. Hao,
transform and global-to-local rule,” IEEE “Robust multi-modal medical image
Transactions on Biomedical Engineering, fusion via anisotropic heat diffusion
vol. 61, pp. 197–206, 2014. guided low-rank structural analysis,”
[11] Z. Liu, H. Yin, Y. Chai, and S. Yang, Information Fusion, vol. 26, pp. 103–121,
“A novel approach for multimodal medical 2015.
image fusion,” Expert Systems with [15] Y. Liu and Z. Wang, “Simultaneous
Applications, vol. 41, pp. 7425–7435, image fusion and denosing with adaptive
2014. sparse representation,” IET Image
[12] G. Bhatnagar, Q. Wu, and Z. Liu, “A Process., vol. 9, no. 5, pp. 347–357, 2015.
new contrast based multimodal medical [16] Y. Yang, Y. Que, S. Huang, and P.
image fusion framework,” Lin, “Multimodal sensor medical image
Neurocomputing, vol. 157, pp. 143– 152, fusion based on type-2 fuzzy logic in nsct
2015. domain,” IEEE Sensors Journal, vol. 16,
[13] Y. Liu, S. Liu, and Z. Wang, “A pp. 3735–3745, 2016.
general framework for image fusion based [17] J. Du, W. Li, B. Xiao, and Q. Nawaz,
on multi-scale transform and sparse “Union laplacian pyramid with multiple
representation,” Information Fusion, vol. features for medical image fusion,” Neuro
24, no. 1, pp. 147–164, 2015. computing, vol. 194, pp. 326–339, 2016.

10
Doi:10.48047/ijiee.2025.15.6.16

You might also like