0% found this document useful (0 votes)
17 views13 pages

CNN Paper

Paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

CNN Paper

Paper
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Knowledge-Based Systems 285 (2024) 111343

Contents lists available at ScienceDirect

Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys

Lightweight image super-resolution for IoT devices using deep residual


feature distillation network
Sevara Mardieva, Shabir Ahmad, Sabina Umirzakova, M.J. Aashik Rasool, Taeg Keun Whangbo ∗
Department of IT Convergence Engineering, Gachon University, Seongnam, South Korea

ARTICLE INFO ABSTRACT

Keywords: The 5th industrial revolution is characterized by an extensive interconnection of embedded devices, which
Lightweight image super-resolution offer a range of services, including the monitoring of their environments. Images captured from remote cameras
Internet of Thing require enhancements for effective analysis. Despite recent progress in single-image super-resolution techniques
Deep residual feature distillation network
by yielding impressive results through deep convolutional neural networks, the complexity of these advanced
Multi-kernel depthwise-separable convolution
models renders them impractical for use on miniaturized Internet of Things (IoT) devices, primarily due to their
block
limited computational capabilities and memory constraints. Furthermore, the rapid evolution of IoT devices
necessitates efficient image super-resolution techniques, while existing advanced methods, based on deep
convolutional neural networks, are too resource-intensive for these devices, and this gap highlights the need for
a more suitable solution. In this study, we introduce a lightweight, efficient super-resolution model specifically
designed for IoT devices. This model incorporates a novel deep residual feature distillation block (DRFDB),
which leverages a depthwise-separable convolution block (DCB) for effective feature extraction. The focus
is on reducing computational and memory demands without compromising on image quality. The proposed
DCB extracts coarse features from given input features as calculation units, using two operations, depthwise
and pointwise convolutions. These two operations are able to significantly reduce the number of parameters
and floating-point operations while maintaining a PSNR value higher than the 90% threshold. We modify
the proposed DCB and introduce a multi-kernel depthwise-separable convolution block (MKDCB) to fine-tune
the model. The experiments, conduct on various standard datasets such as DIV2K, Set5, Set14, Urban100,
and Manga109 by demonstrating that our model significantly outperforms existing methods in terms of both
image quality and computational efficiency. The model shows improved performance metrics like PSNR, while
requiring fewer parameters and less memory usage, making it highly suitable for IoT applications. This study
presents a breakthrough in super-resolution for IoT devices, balancing high-quality image reconstruction with
the limited resources of these devices.

1. Introduction operating on a CPU, the SRCNN achieved a high processing speed,


rendering it suitable for practical, real-time applications. The study
Single-image super-resolution (SISR) is a fundamental low-level task conducted by Wang et al. [3], was implemented a neural network
in computer vision that aims to reconstruct visually appealing high- that further enhanced the traditional sparse coding model. The study
resolution (HR) images from their corresponding low-resolution (LR) demonstrated that incorporating domain expertise along with a large
images. Over the last decade, deep-learning techniques have become learning capacity (e.g., × 4) could complement each other to further
dominant, surpassing model-based solutions. Some of the pioneering enhance SR performance.
works on SISR focused on reducing the model size to enhance the The motivation behind SISR is driven by the necessity to improve
inference speed on low-powered devices. For instance, an attempt to image resolution across various applications. Despite numerous at-
reconstruct LR into HR was made by Chao Dong et al. [1] by their tempts to develop lightweight super-resolution networks, most methods
super-resolution convolutional neural network (SRCNN). This method still have the significant number of parameters and substantial compu-
involved training a three-layer deep CNN [2] in order to establish an tational complexity. These factors can limit the practical applications of
end-to-end mapping between low and high-resolution images, while

∗ Corresponding author.
E-mail addresses: sevara1998@gachon.ac.kr (S. Mardieva), shabir@gachon.ac.kr (S. Ahmad), sabinatuit@gachon.ac.kr (S. Umirzakova),
aashikrasool@gachon.ac.kr (M.J.A. Rasool), tkwhangbo@gachon.ac.kr (T.K. Whangbo).

https://doi.org/10.1016/j.knosys.2023.111343
Received 10 October 2023; Received in revised form 22 December 2023; Accepted 25 December 2023
Available online 28 December 2023
0950-7051/© 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

the networks, particularly in situations where resources are constrained FLOPs while concurrently maintaining a high level of the performance.
or real-time processing is required. Moreover, these problems, are cou- In crafting the rationale for using the DRFDB in our study, we are
pled with others such as mobility management, have been addressed in driven by its inherent qualities that cater to the stringent demands of
numerous studies related to IoT [4]. super-resolution within the TinyML. The DRFDB is fundamentally en-
In these cases, the participant’s IoT nodes are embedded resources gineered to distill and refine feature representations through a process
with limited computational abilities; thus, it is not wise to overload that accentuates crucial image details necessary for high-quality super-
them with the machine learning models with parameters in the order of resolution. Integrating the DRFDB into the proposed model, we have
millions and memory in the order of gigabytes, for instance, advanced enabled it to learn and convey rich image features with remarkable effi-
deep learning models like EDSR [5] and SRGAN [6] require consid- ciency, moreover an essential attribute gives the resource constraints of
erable memory and processing power, often exceeding several million IoT devices. Unlike other models, the proposed model uses depthwise-
parameters and gigaflops of computation, which are impractical for IoT separable convolution layers that aids in going through our primary
devices with limited capabilities. A study by Dong et al. [2] showcased objective which is the major limitation of SISR models. In contrast with
that a three-layer SRCNN could accelerate processing on a CPU to make casual convolution layers, depthwise-separable convolution divides the
it suitable for real-time applications. However, it is notable that SRCNN learning process into channel-wise and point-wise operations, dras-
still requires tens of milliseconds for processing a single image patch, tically reducing the computational complexity while simultaneously
highlighting the need for further efficiency. Moreover, the analysis by ensuring that the critical spatial and feature-channel information is
Hinton et al. [7] emphasizes that deep neural networks tend to be retained and refined.
parameter-inefficient, suggesting a high redundancy in parameters that To achieve the enhanced efficiency in the RFDB, we reevaluate
do not significantly contribute to the performance of the network. To its various components. Our observations reveal that, using a normal
address these persistent challenges, researchers have continued to in- convolution layer alone is insufficient to reduce runtime, parameters,
vestigate innovative techniques and optimizations to curb the size and and FLOPs. Instead, we propose the depthwise-separable convolution
complexity of super-resolution methods without sacrificing their perfor- block (DCB), comprising depthwise-separable convolution layers, as a
mance. Numerous deep learning models have made remarkable strides more effective alternative to the SRB. Although a convolution layer is
by drawing inspiration from previous SR methods [3,5,6,8]. One no- effective in feature extraction, its computational demand is too high. To
table example is, the information multi-distillation network (IMDN) [9] address this, we utilize a depthwise-separable convolution layer, which
which enhanced the information distillation network (IDN) [10] by
overcomes most of the aforementioned issues. The depthwise-separable
incorporating contrast-aware attention (CCA) and introducing cascaded
convolution layer is divided into two layers: channel attention (depth-
information multi-distillation blocks (IMDB). IMDN managed to reduce
wise) and spatial attention (pointwise) layers, where the kernel is
model complexity while maintaining performance. The model designed
broken into two smaller kernels, which are multiplied sequentially
to utilize only 0.7M parameters, which was a significant reduction
with the input image to obtain the same effect as the full kernel.
compare to previous models. The blocks of the IMND were composed
Furthermore, we observe and modify our DRFDB into a multi-kernel
of distillation and selective fusion components and retain crucial infor-
depthwise-separable convolution block (MKDCB). In our second pro-
mation and reduce the complexity of the architecture, thereby enabling
posed block, we use depthwise-separable convolution layers, distinct
high-performance SR networks with fewer resources and computational
from the DCB and input and middle channels. In addition, each DRFDB
demands. In addition, IMDN [9] won the AIM2019 [11] efficient super-
consists of four MKDCBs to improve and obtain additional information.
resolution challenge. Furthermore, rethinking of IMDN was introduced
Furthermore, the residual nature of DRFDB harnesses the strength
a residual feature distillation network for lightweight image super-
of residual learning to address the vanishing gradient problem often
resolution (RFDN) [12]. Despite these improvements, the requirement
encounters in deep neural networks, thereby fortifying the training
for high-fidelity image resolution in real-time scenarios remains a
process. In essence, this approach ensures that the network can delve
concern. According to the studies conducted by Shi et al. [3], the
deeper into the image data, learning intricate patterns without be-
computational load for super-resolution tasks should be minimized to
under 108 FLOPs to be deemed efficient for real-time applications on coming overwhelmed by complexity. Fig. 2 delineates the architecture
edge devices. of the proposed model, showcasing an assortment of kernel sizes.
As the winner of AIM2020 [13], to capture and learn more dis- This assortment is critical, as it ensures the comprehensive capture
tinctive feature representations, RFDN [12] comprises multiple feature of information without compromising during the convolution process.
distillation connections. A shallow residual block (SRB) was proposed, Table 1 presents a comparative analysis between the DRFDN model
as the primary building block of RFDN [12] enabling the network and the extant lightweight SISR models. Although other models show
to maximize the benefits of residual learning while maintaining a impressive performance measures, DRFDN stands out by surpassing
lightweight structure. Extensive experimental results indicated that them in both operational efficiency and computational simplicity.
the proposed RFDN [12] achieves a superior balance between perfor- This research addresses these challenges by proposing the DRFDN
mance and model complexity, compared to contemporary state-of-the- that operates with an order of magnitude fewer parameter,FLOPs and
art methods in the domain. As mentioned earlier, although various inference time without sacrificing the image quality. Moreover, the
methods have been attempted to create lightweight super-resolution proposed model demonstrates a 40% reduction in parameters and a
networks, most models still possess numerous parameters and substan- 35% reduction in computational complexity compare to the standard
tial computational complexity. There is a pressing demand for efficient RFDN model while maintaining a Peak Signal-to-Noise Ratio (PSNR)
super-resolution models that better meet practical and commercial re- within 0.5 dB of the reference models. The motivation behind the
quirements, prioritizing faster inference speed through the reduction of proposed method is grounded in the urgent need for efficient image
parameters or FLOPs. To address these challenges, as described in [14], processing techniques compatible with the limited resources of IoT de-
common recommendations include the implementation of techniques vices. Traditional SR methods are often unsuitable for such applications
such as depthwise convolution [15], feature splitting [16], and pixel due to their high computational costs and large memory requirements.
shuffling methods [17]. In this paper, we introduce our innovative By addressing these challenges, this study not only contributes to the
deep residual feature distillation network (DRFDN), which is based on field of image super-resolution but also to the broader context of IoT
the RFDN [12] methodology, the winning entry in the AIM 2020 [13] applications, where efficient data processing is increasingly critical. The
efficient super-resolution challenge. remainder of the paper is organized as follows; Section 2 summarizes
Our primary objective is to enhance the efficiency of the network the relevant literature on super-resolution and model optimization for
by reducing the runtime and number of parameters and minimizing IoT devices. Section 3 discusses the formulation and baseline model

2
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Table 1
Quantitative comparison with state-of-the-art methods on benchmark datasets. The best first, second efficacy
and attributes are in blue, the lower results are in red.
Method Val PSNR Val Time [ms] Params [M] FLOPs [G] Mem [M] Efficacy
DIV2K x4
Baseline 29.04 112.14 0.433 27.1 788.13 99.00
RLFN 29.00 93.73 0.317 19.7 376.46 163.44
LWFANet 29.12 794.93 0.832 135.3 3403.69 28.43
EFDN 29.00 100.99 0.272 16.86 662.51 177.18
FMEN 29.00 94.68 0.341 22.28 306.74 151.69
TransSR 29.02 1508.87 3.142 204.4 515.24 13.87
Virtual Reality Team 27.35 2231.32 0.423 423.16 3336.88 10.41
RFDN(L) 28.38 141.9 0.312 37.67 1461.57 109.17
PRRN 29.05 98.18 0.414 24.8 1462.66 134.47
DRFDN(ours) 29.06 93.47 0.267 14.99 503.99 189.67

and illustrates the proposed modifications. Section 4 We offer exten- need for more mapping layers. To overcome this issue and achieve
sive experimental results, including benchmark comparisons of model a better method performance [1], a new network was demonstrated,
size, speed, and efficiency, which are presented in Table 3. Section 5 and termed a deep-recursive convolutional network (DRCN) [6]. The
concludes the paper. DRCN recursively utilized the feature extraction layer, significantly
For this paper, the main contributions are as follows: reducing the computational complexity of SR network by a factor
of up to 16 times. Furthermore, Kim et al. introduced very deep
1. We introduce a novel Deep Residual Feature Distillation Network super-resolution (VDSR) [22], which represented a notable complex-
(DRFDN) specifically designs for IoT devices. This addresses the ity within SR methodologies. A very lightweight and efficient image
gap in current super-resolution methods, which are generally super-resolution network SISR, optimized for resource-constrained de-
too resource-intensive for such devices. The proposed model sig- vices. Key innovations include the frequency grouping fusion block
nificantly reduces computational and memory demands without (FGFB) [23] for effective feature fusion, the multi-way attention block
compromising image quality. (MWAB) [24] for diverse feature utilization, the lightweight residual
2. The core innovation of the model lies in the implementation concatenation block (LRCB) [25], and the lightweight convolutional
of the deep residual feature distillation block (DRFDB), which block (LConv) [26] to minimize parameters. The network also em-
integrates depthwise-separable convolution block (DCB) and the ployed a Progressive interactive group convolution (PIGC) [27], en-
multi-kernel depthwise-separable convolution block (MKDCB). hancing efficiency over traditional methods. Extensive testing showed
The MKDCB is a novel modified DCB that enhances the ability of this network outperformed existing solutions, striking a superior bal-
the model to process and distill features effectively, crucial for ance between performance and complexity. The focus on reducing
achieving high-quality SR in resource-limited settings. the model size for resource-constrained devices limited the network
3. The proposed model has been rigorously tests across various ability to achieve the absolute best possible results in terms of image
standard datasets like DIV2K, Set5, Set14, Urban100, and quality, compared to larger, more resource-intensive models. How-
Manga109. These tests have consistently demonstrated that the ever, it substantially improved the SR performance by harnessing a
model is superior to existing SR methods, with a focus on metrics sophisticated topology structure and a remarkably large quantity of
such as PSNR, model parameters, computational efficiency, and learnable parameters. Because of the increasing network depth, this
memory complexity. We demonstrate the formulation of the approach inevitably incur escalating computational costs and memory
embedded device profiling; using this formulation, we compare consumption. This can render them inappropriate for resource-limited
the efficacy of the methods 𝜓 for IoT devices. environments such as embedded systems and mobile devices deployed
in IoT applications.
Abbreviation Definition Zhang et al. [28] proposed a generalized singular value decompo-
sition method for asymmetric reconstruction, aimed at accelerating
Single-Image Super-Resolution SISR the process and enhancing its running speed. To increase the op-
Internet of Things IoT erational speed of SR networks further, Shi et al. [3] developed an
Super-Resolution SR efficient sub-pixel convolution method to upscale the resolution of
Peak Signal-to-Noise PSNR feature maps at the end of the SR method, therapy enabling many cal-
Advances in Image Manipulation AIM culations to be performed in a low-dimensional feature space . Despite
Deep Residual Feature Distillation Network DRFDN their significant performance improvements, these approaches often
Depthwise-separable Convolution Block DCB incurred high computational costs. This has motivated researchers to
Multi-kernel Depthwise- separable Convolution Block MKDCB explore and develop more efficient methods for super-resolution tasks,
to balance performance with resource demands and computational
2. Related works complexity. As indicated in the aforementioned studies, optimizing
methods to maintain performance has kept pace with advancements
In recent years, SISR has seen considerable advancements, primar- in TinyML and IoT devices. Lightweight super-resolution methods have
ily due to the integration of deep learning-based methods [5,6,18]. been comprehensively reviewed in a recent study [29]. In the above
Attention mechanisms [19] are widely used in advanced visual tasks studies, various approaches were used to optimize the model. N-Gram
to enhance the performance of SR methodologies [20]. In this section, in Swin transformers for efficient lightweight image super-resolution
we offer a concise overview of recent deep-learning-based methods and introduced NGswin [30], an efficient SIRS network that employed an
their applications in the IoT. innovative N-Gram context within Swin transformers. This approach,
One of the earliest SISR methods was the SRCNN by Dong et al. [1] was unique in the application of N-Grams to low-level vision tasks,
which utilized a three-layer end-to-end [21] to enhance the perfor- enhances the capability of Swin Transformers by enabling them to
mance of SR by drawing significant inspiration from super-resolution consider neighboring local windows, thus expanding the receptive field.
methods rooted in sparse coding. This knowledge aimed to learn the The NGswin network, equipped with an SCDP bottleneck and multi-
mapping between low and high-resolution images, which leads to the scale output processing, demonstrated competitive performance while

3
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

maintaining efficiency. Additionally, the paper extended the N-Gram tion of the baseline model, laying the groundwork for understanding
context to other Swin-based SR models, notably improving SwinIR-NG, the enhancements introduced by the proposed DRFDN and main blocks
which surpassed existing lightweight SR approaches and set new state- DCB and modified MKDCB. Following this, we establish a detailed
of-the-art results. However, achieving the optimal balance between profile for embedded IoT devices, taking into account their unique
reducing complexity and maintaining high performance was challeng- computational and memory constraints.
ing and require further fine-tuning. MFFN: image super-resolution via
multi-level features fusion network [31] introduced a lightweight SISR 3.1. Residual feature distillation network
network optimized for devices with limited resources. It uses two-
level nested residual blocks with asymmetric structures for efficient
In Section 3.1 we present the baseline of the proposed method,
feature extraction and parameter reduction. An autocorrelation weight
RFDN [12]. Section explains the proposed modifications, which extracts
unit was added for better feature fusion, resulting in superior image
the global features better than previous models. Moreover, we outline
reconstruction quality and performance, especially at higher magnifica-
a new multi-phase warm start training version of the DCB that can
tion factors. While the asymmetric structure in the residual blocks was
significantly enhance the performance of compact SR models. Fig. 1a
designed for efficiency, it introduces complexities in the model learning
illustrates the building block of RFDN [12] based on IMDN [9]. The
dynamics, potentially affecting the training stability and convergence
RFDN [12] model uses three SRBs to extract features and refines the
speed. Hybrid pixel-unshuffled network for lightweight image super-
input image. One branch of the information distillation process employs
resolution [32] presented a hybrid pixel-unshuffled network (HPUN)
a single convolution layer for channel reduction, whereas another
for image SR, utilizing efficient downsampling and self-residual depth-
branch significantly improves the input features using a straightforward
wise separable convolutions. This novel approach reduced parameters
SRB with a kernel size of 3 × 3 and a residual connection. At each stage
and computational costs while achieving superior single-image super-
of the distillation process, some features are enhanced by the ESA layer,
resolution performance, surpassing existing methods. The introduction
which is a crucial component in the RFDN [12] architecture, and its
of novel components like autocorrelation weight units added complex-
ity to the model, potentially impacting its interpretability and ease of purpose is to identify and focus on the most important spatial regions of
optimization. Single image super-resolution based on directional vari- the input image. Thus, the ability of the network to capture the salient
ance attention network [33] presented DiVANet, an efficient network features of an improved image. First, the input features are decoupled
for SISR, featuring a novel directional variance attention mechanism for into two convolution layers: a distilled layer (DL) and refinement layer
enhanced feature representation and a residual attention feature group (RL) [9,10,12,38]. In this process, DL produces distilled features, while
for efficient computation. This approach offered state-of-the-art perfor- RL creates a layer that refines the antecedent coarse features. The
mance with lower computational and memory requirements. The novel building procedure is described as follows:
directional variance attention mechanism, while beneficial for feature 𝐹 𝑒distilled1 , 𝐹 𝑒coarse1 = 𝐷𝐿1 (𝐹 𝑒in ), 𝑅𝐿1 (𝐹 𝑒in )
representation, added complexity to the model, potentially affecting its
𝐹 𝑒distilled2 , 𝐹 𝑒coarse2 = 𝐷𝐿2 (𝐹 𝑒coarse1 ), 𝑅𝐿2 (𝐹 𝑒coarse1 )
interpretability and ease of optimization. Single image super-resolution (1)
based on progressive fusion of orientation-aware features [34] intro- 𝐹 𝑒distilled3 , 𝐹 𝑒coarse3 = 𝐷𝐿3 (𝐹 𝑒coarse2 ), 𝑅𝐿3 (𝐹 𝑒coarse2 )
duces the SISR-PF-OA model for SISR, combining Orientation-Aware 𝐹 𝑒distilled4 = 𝐷𝐿4 (𝐹 𝑒coarse3 )
feature extraction with progressive feature fusion. The model used
where 𝐹 𝑒𝑑𝑖𝑠𝑠𝑡𝑒𝑙𝑙𝑒𝑑 represents the 𝑛th distilled features, and 𝐹 𝑒𝑐𝑜𝑢𝑟𝑠𝑒
mixed convolutional kernels and channel attention mechanisms for
is the 𝑛th coarse feature, to further processes across successive lay-
enhanced feature selection, achieving higher accuracy and efficiency
ers [9,10,20,39]. All the features from distilled layers, are element-wise
in image restoration compared to existing methods. The progressive
concatenated, as shown below:
fusion of multi-scale features, although designed for accuracy, intro-
duce challenges in balancing the depth of feature integration with 𝐹 𝑒𝑑𝑖𝑠𝑖𝑙𝑙𝑒𝑑 = 𝐶𝑜𝑛𝑐𝑎𝑡(𝐹 𝑒𝑑𝑖𝑠𝑖𝑙𝑙𝑒𝑑1 , 𝐹 𝑒𝑑𝑖𝑠𝑖𝑙𝑙𝑒𝑑2 , 𝐹 𝑒𝑑𝑖𝑠𝑖𝑙𝑙𝑒𝑑3 , 𝐹 𝑒𝑑𝑖𝑠𝑖𝑙𝑙𝑒𝑑4 ) (2)
computational efficiency. Image super-resolution reconstruction based
on instance spatial feature modulation and feedback mechanism [35]
introduced a novel image SR reconstruction method that incorporates Algorithm 1 DRFDN Training Procedure
instance spatial feature modulation and a feedback mechanism. Unlike
1: Input: Low-resolution images;
traditional deep learning-based methods that upscale images without
2: Output: Super-resolved images;
considering categories and instances, this approach used instance spa-
3: for number of training iterations do
tial features from low-resolution images to modulate the reconstruction
4: Sample batch of 𝑛 low-resolution images {𝐼𝑙𝑟1 , 𝐼𝑙𝑟2 , ..., 𝐼𝑙𝑟𝑛 } from
process. The method iteratively optimized features through a feed-
dataset 𝐷;
back loop, enabling instance-level reconstruction ability. Achieving
5: Pass 𝐼𝑙𝑟𝑛 through the initial convolutional layer to extract coarse
instance-level reconstruction precision consistently across all types of
features 𝐹0 ;
images challenging, especially in images with complex or overlapping
6: for each DRFDB in the network do
instances. Another approach for compressing the model memory is
7: In each iteration MKDCB takes one kernel size from the list
quantization [36]. The student-teacher model [37] is a knowledge
{1, 3, 5, 7}
distillation model that trains a full-fledged deep model alongside its
8: Refine features with MKDCB;
lighter model and minimizes distillation loss. Nonetheless, all the above
9: end for
mentioned efforts focused on generic model training without consid-
10: Concatenate the refined features from MKDCB with features
ering the device profiles. Context-aware and device-aware adaptive
from the parallel convolutional layer;
methods can solve challenges such as task management and latency
11: Flatten the concatenated features and pass through fully
handling [4]; however, despite its massive usefulness, there is still
connected layers;
a research gap on device-aware models. To the best of the authors’
12: Generate super-resolved image output 𝐼𝑠𝑟𝑛 ;
knowledge, this study is the first approach towards its realization.
13: Calculate loss between 𝐼𝑠𝑟𝑛 and ground truth high-resolution
images;
3. Material and methods
14: Update model weights using backpropagation and optimization
algorithm;
This section delineates the comprehensive materials and methodolo-
15: end for
gies employed in our study. Initially, we provide an in-depth descrip-

4
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Fig. 1. (a) RFDB: the residual feature distillation block with three Conv1 layers and one Conv3 layer, where Conv1 and Conv3 represent 1 × 1, 3 × 3 kernel sizes, respectively.
(b) ESA: enhanced spatial attention.

3.2. Proposed deep residual feature distillation network extraction model, a kernel size of 3 × 3, and a non-parametric subpixel
operation. The loss function of the proposed DRFDN model as follows:
Although the RFDB is lightweight and powerful, using a convolution ∑
𝑁
layer with 1 × 1 kernels for the distillation remarkably decreases 𝐿(𝜃) = 1∕𝑁 ‖𝐻𝐷𝑅𝐹 𝑁 (𝐼𝑖𝑙𝑟 − 𝐼𝑖𝑠𝑟 )‖1 (7)
𝑖=1
number of parameters; SRB in Fig. 1 uses 3 × 3 convolution layer which
results in a large number of FLOPs. Therefore, to decrease the number where, 𝐻𝐷𝑅𝐹 𝑁 is the function of the proposed model 𝑡ℎ𝑒𝑡𝑎 denotes the
of parameters and FLOPs, we replace SRB with DCB. learning parameter of DRFN, ||.||1 indicates the 𝐿1 norm, and 𝑙𝑟 and
Deep residual feature distillation block. The proposed DRFDB is 𝑠𝑟 represents the low and high resolution images.
divided into three main parts, as shown in block of the baseline Fig. 1a: Depthwise-separable convolution block. DRFDB uses the DCB,
the first layer is the feature extraction layer, then comes replacement where CONV3 + CONV3 + RELU is the feature extraction layer. The
main difference between depthwise-separable and typical convolution
of SRB with DCB; and the final layer is the reconstruction block, where
is that, there are no mixed input channels, and the number of output
𝐼𝑙𝑟 and 𝐼ℎ𝑟 are the input LR and the output SR images, respectively. To
channels are always equal to the number of input channels [40]. We
extract the coarse features from 𝐼𝑙𝑟 image, the function of the first layer
add pointwise to get more global features and reduce the number of
uses kernel with 3 × 3 size, as follow:
parameters in the network with, a kernel size of 1 × 1. However,
𝐹0 = ℎ𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 (𝐼𝑙𝑟 ) (3) the depthwise-separable convolutions lead to fewer parameters and
eventually reduces the model PSNR. Thus, we develop a method in
where ℎ𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛 (𝐼𝑙𝑟 ), is the first convolution layer, and 𝐹0 is the ex- which we can still leverage its strength with a slight reduction in
tracted coarse features. This is followed by, DRFDB, which gradually the model performance. Instead of using a single channel-attention
extracts features, as shown in the equation below: convolution layer, we use multiple convolution layers with different
kernel sizes . Thus, the number of parameters increases; however, we
𝐹𝑧 = 𝐻𝑧 (𝐹𝑠−1 ), 𝑧 = 1, … , 𝑛 (4) still can make up for the accuracy drop, and we modify DCB into
where 𝐻𝑧 denotes the 𝑧th DRFDB function. The input and output MKDCB.
features are the 𝑧th DRFDB maps. Additionally, the 1 × 1 convolution Multi-kernel depthwise-separable convolution block. A schem-
layer collects all the intermediate features after successive fine-tuning atic of MKDCB is shown in Fig. 2, where we transform the DCB into
an MKDCB by removing all distilled layers Eq. (8) and, receiving input
via DRFDBs. The aggregated characteristics are smoothed using the
from the extracted features, 𝐹0 , as outlined in Eq. (3).
proposed DCB with depthwise and pointwise convolution layers follow
by a 3 × 3 kernel size, as described below: 𝐹 𝑒coarse1 = 𝑅𝐿1 (𝐹 𝑒in )
𝐹 𝑒coarse2 = 𝑅𝐿2 (𝐹 𝑒coarse1 )
𝐹𝑔𝑎𝑡ℎ𝑒𝑟 = 𝐻𝑔𝑎𝑡ℎ𝑒𝑟 (𝐶𝑜𝑛𝑐𝑎𝑡(𝐹1 , … .𝐹𝑛 )) (5) (8)
𝐹 𝑒coarse3 = 𝑅𝐿3 (𝐹 𝑒coarse2 )
𝐻𝑔𝑎𝑡ℎ𝑒𝑟 represents the 1 × 1 convolution layer, follow by the proposed
𝐹 𝑒coarse4 = 𝑅𝐿4 (𝐹 𝑒coarse3 )
DCB with depthwise and pointwise convolution layers and 𝐹𝑔𝑎𝑡ℎ𝑒𝑟 de-
notes the gathered features. Final part, generates the 𝐼𝑠𝑟 image using where 𝑅𝐿𝑗 represents the 𝑛th refinement layer which extracts feature
the gathered features, as follows: from 𝐹 𝑒𝑖𝑛 input image, while 𝐹𝑐𝑜𝑎𝑟𝑠𝑒𝑗 is the 𝑛th coarsed features. Then,
𝐹 𝑒𝑐𝑜𝑎𝑟𝑠𝑒4 is fed with 3 × 3 convolution layer and extracted features reach
𝐼𝑠𝑟 = 𝐹𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 ((𝐹𝑔𝑎𝑡ℎ𝑒𝑟 + 𝐹0 )) (6) into ESA block to get output of DRFDB. Following this, we add 𝐹 𝑒𝑖𝑛
initial input to final output. Then we have,
where 𝐹𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 describes the reconstruction function of the module,
which includes a convolution layer with size similar to the input feature 𝐹 𝑒𝑐𝑜𝑎𝑟𝑠𝑒 = 𝐹 𝑒𝑒𝑠𝑎𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 + 𝐹 𝑒𝑖𝑛 (9)

5
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Fig. 2. The explanation of changed DRFDB where we use multi-kernel distilled convolution block (MKDCB) with different kernel sizes, each MKDCB consists, Conv5 denotes kernel
size of layer 5 × 5 and Conv1 denotes 1 × 1, Conv3 represents kernel size 3 × 3, SiLU denotes activation function (𝛼 = 0.05).

As shown in Algorithm 1, the proposed model starts with LR images multiple convolution layers with different kernel sizes (Conv5 for 5 × 5,
as the input. The goal is to process these images through the DRFDN Conv1 for 1 × 1, and Conv3 for 3 × 3), Fig. 2. In Table 3 we provide
model to produce super-resolved, high-definition images. The training a quantitative comparison of various methods, including DRFDN, on
involves multiple iterations, where each iteration consists of sampling a benchmark datasets. This comparison includes parameters (in million),
batch of low-resolution images and then processing them through the FLOPs (in billion), PSNR (in dB) and Efficiency without unit, cause
layers of the model. These layers include initial convolutional layers before calculating the efficacy we, normalize all units, that makes the
for coarse feature extraction and specialized blocks like the MKDCB for efficacy avoid unit estimation. The total number of parameters in the
further feature refinement. Similar to DCB, the number of input chan- model depends on the configuration of each convolutional layer and
nels is maintained at 50, while the difference is that, to extract broader any fully connected layers. As the proposed model does not consist fully
functionalities we use the middle number of channels, 64. In addition, connected layer, we calculate only convolution layers. Furthermore, the
in the proposed model the input channels are equal to the output parameters of the layers are determined by the number and size of the
channels. Nevertheless, we retain the primary concept of DCB, which convolutional kernels. Floating point operations are calculated based on
is the depthwise-separable convolution layer and the unique aspect of the operations in each layer of the network, this includes the number of
MKDCB is that each layer obtains a list of kernel sizes. This compre- convolutions, activations, and other computational processes. Memory
hensive list of kernels equips a piece of entire block with the ability usage is influenced by the size of the parameters of the models and the
to utilize various numbers of kernels. The quantity of these kernels de- complexity of the operations. For more detail information, we provide
pends on the number of kernel sizes (e.g., 1 or 2). Thus, we use different
our github link in Appendix A.
kernel sizes to avoid vanishing gradient descent while using the 𝐿1 . To
manage more kernels, we use concatenation, as gathering all coarsed
features exhibited significant results in the baseline [12]. In subsequent 3.3. Embedded device profiling and formulation
observations, we employ an activation function, SiLU, which has a
shape as the ReLU function. In addition, in contrast to ReLU, SiLU is a Embedded devices are small-scale miniaturized resources with con-
non-monotonic function. The non-monotonic attribute of SiLU indicates strained computational and storage capabilities. The profile of an em-
that it can introduce more complex and nuanced transformations to bedded device is a function of the primary memory, processing speed,
the data, potentially capturing patterns that monotonic functions, such number of simultaneous input/output (IO) operations, and number of
as ReLU cannot and that it has a smooth derivative that can facilitate connected resources such as sensors and actuators. A model deploys on
training stability. In the delineated algorithm, post-feature refinement embedded resources is desired to consume the lowest resources with
through specialized convolutional blocks, the resultant feature maps are minimal or no sacrifices on the overall model performance. The model
concatenated. This consolidated feature set is subsequently subjected efficacy 𝜓 of a model for a particular IoT device 𝑑, when a model
to a series of fully connected layers, culminating in the generation 𝜁 is deployed on it, must be maximized. The higher its efficacy, the
of super-resolved imagery. The fidelity of these images is rigorously more suitable it is for that particular device. The efficacy 𝜓, which is a
evaluated against the benchmark dataset of high-resolution images. A function of model parameters and device attributes, is described below.
loss metric is computed from this comparative analysis, serving as a An 𝑖th device 𝑑𝑖 with attributes memory 𝑑𝑚𝑖 , storage 𝑠𝑖 , 𝑑𝑖𝑜 , and 𝑑𝑐 is
quantitative measure of the super-resolution the performance of the represented by a vector, as shown in Eq. (10):
model. Iterative optimization of the parameters of the proposed model [ ]𝑇
ensues, guided by the gradient of the computed loss with respect to 𝑑𝑖 = 𝑑𝑚𝑖 𝑑𝑖𝑜 𝑑𝑐 𝑠𝑖 (10)
the weights. This iterative refinement is instrumental in enhancing the
Similarly, the super-resolution model 𝜁𝑖 is characterized by different
capability of the model to synthesize images of superior resolution,
measures such as the number of parameters, inference time, and num-
thereby incrementally improving the performance across successive
training epochs. The optimization process converges upon a model con- ber of FLOPS. We characterize the efficacy of model 𝜁𝑖 for device 𝑑𝑖
figuration that is adept at producing images that closely approximate as directly proportional to model PSNR and inversely proportional to
the ground truth in terms of high resolution and detail fidelity. As we model parameters, FLOPs, memory consumption, and inference time,
mentioned in Algorithm 1, the DRFDB uses DCB where a combination characterized in Eq. (11). That is:
of Conv3 (3 × 3 kernel) and Conv1 (1 × 1 kernel) layers are used for fea- 𝜁(𝑃 𝑆𝑁𝑅)
ture extraction. The MKDCB is, the modified DCB which incorporates 𝜓= (11)
𝜁(𝑝𝑎𝑟𝑎𝑚) + 𝜁(𝑚𝑒𝑚𝑜𝑟𝑦) + 𝜁(𝑓 𝑙𝑜𝑝𝑠) + 𝜁(𝑡)

6
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Fig. 3. Illustration of Efficacy DRFDN with various recent State-of-the-art NTIRE2022 [41] efficient super-resolution challenge participators, for DIV2K dataset for scale factor x4.
a) the efficacy by number of parameters, b)the efficacy by FLOPs, c) the efficacy by PSNR, and d) the efficacy by memory consumption.

We define a weight matrix 𝑊 as a 𝑚×𝑛 matrix where 𝑛 is the number 4.2. Implementation environment
of model attributes that must be optimized and which is the device
profile coefficient. 𝑊 for model 𝑖 and device 𝑗 takes the form, shown To train our model, we create LR training images using bicubic
in Eq. (12): interpolation in MATLAB to down-sample the HR images with scaling
factors of x2, x3, and x4. For the first 1000 epochs, each input LR
⎡ 𝑤11 𝑤12 𝑤13 𝑤1𝑛 ⎤ ⎡𝜁1 ⎤
⎢𝑤 image is randomly cropped into 384 patches. Image augmentation is
𝑤22 𝑤23 𝑤2𝑛 ⎥ ⎢𝜁2 ⎥
⎢ 21 ⎥ ⎢ ⎥ performed using random rotations of 90◦ , 180◦ , 270◦ . We use the L1
𝑊 = ⎢ 𝑤31 𝑤32 𝑤33 𝑤3𝑛 ⎥ × ⎢𝜁3 ⎥ (12)
⎢ . loss to facilitate the training of our model. To compile the network,
. . . ⎥ ⎢.⎥
⎢ ⎥ ⎢ ⎥ we optimize it using the ADAM optimizer. Initially, the learning rate
⎣𝑤𝑚1 𝑤𝑚2 𝑤𝑚3 𝑤𝑚𝑛 ⎦ ⎣ 𝜁𝑡 ⎦
is set at 1x104 , then halves into 5x105 after 200 200 epochs while
In this case, we consider 𝑚 = 4, as mentioned above. The other maintaining the same number of patches for each iteration and a stable
model parameters are, the number of convolution units and the number learning rate. Each MKDCB involve several iterations, the quantity of
of activation functions; however, the top four metrics have the high- which depends on the length of the kernel size list. For the second set
est impact on the device resources. Generally, the objective function, of 1000 epochs, we fine-tune our model using the 𝐿2 loss function and
which is the inverse of model efficacy, is formulated in Eq. (13): increase the number of patches to 512 for each LR image. The entire
1 implementation is performed process using the PyTorch framework and
𝜁𝑖 = 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 (13) an NVIDIA 1080Ti GPU.
𝜓
Thus, efficacy, which is the inverse of the objective function, needs
4.3. Ablation study
to be maximized for constrained devices. To achieve this, the PSNR of
the model has to be maintained above some arbitrary constraint while
The components of the proposed architecture are evaluated using
keeping the other metrics minimum.
two-fold modifications. First, we add the DCB and then the MKDCB.
Details of the experiments are provided in the subsequent sections.
4. Experiments
4.3.1. Effectiveness of the depthwise-separable convolution layer.
4.1. Datasets and metrics After compiling all the research results, we compared the effi-
cacy 𝜓 of the proposed method with those of the participants of the
We use 800 high-quality RGB images from the widely used DIV2K NTIRE2022 challenge [41] because all these methods represent the
dataset to train the proposed method. Five benchmark datasets are latest lightweight SISR methods. Furthermore, in the challenge, the
used to evaluate the performance of the proposed method: Set5, Set14, primary dataset for validation is DIV2K. We also trained it for a DIV2K
B100, Urban100, and Manga109. Furthermore, we utilize PSNR, a scale factor of x2, x3, x4 and calculated the efficacy of each method
commonly used measure in image restoration, to assess the quality of for x4. The quantitative comparison results are shown in Table 1. The
the Y channel after the application of sr, and an established formula to table presents an evaluation of diverse super-resolution algorithms [41]
evaluate the efficacy 𝜓 as described in Section 3.3. with a focus on the parameters of the baseline model. In this table, the

7
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Table 2
Quantitative comparison with state-of-the-art methods on benchmark datasets. The best first, second efficacy, and attributes are in blue, the
lower results are in red.
Method Params FlOPs Set5 Set14 Urban100 Manga109
[M] [G] PSNR/Eff PSNR/Eff PSNR/Eff PSNR/Eff
Scale x2
SRCNN 0.008 6.1 36.66/62.29 32.42/55.93 29.50/56.22 35.60/56.24
FSRCNN 0.013 1.72 37.00/101.67 32.63/ 86.21 29.88/86.83 36.67/86.23
VDSR 0.666 70.5 37.53/30.11 33.05/29.19 30.77/29.25 37.27/ 29.24
LapSRN 0.251 8.57 37.52/42.58 32.99/41.65 30.41/41.77 37.27/41.73
EDSR 1.37 8679 38.11/5.29 33.92/5.51 32.93/5.50 38.40/5.51
MemNet 0.678 762.87 37.78/9.57 33.28/9.70 31.31/9.70 37.72/9.71
CARN 1.592 63.84 37.76/29.17 33.52/28.39 32.09 /28.38 38.36/28.40
IMDN 0.694 45.23 38.00/81.23 33.64/81.79 32.17/81.72 37.80/81.12
baseline 0.534 27.1 38.05/99.00 33.63/99.00 32.12/99.00 38.88/99.00
DRFDN (ours) 0.267 14.99 38.08/ 167.99 33.87/159.75 32.20/165.42 38.96/163.85
Scale x3
SRCNN 0.008 6.1 32.75/61.52 29.30/62.29 26.24/60.30 30.48/ 75.80
FSRCNN 0.013 1.72 33.18/100.82 29.37/100.98 26.43/97.96 31.10/96.58
VDSR 0.666 70.5 33.66/29.86 29.77/29.88 27.14/29.37 32.01/29.02
LapSRN 0.251 8.57 33.81/42.42 29.79/42.30 27.07/41.43 32.21/41.30
EDSR 1.37 8679 33.89/5.17 29.81/5.14 28.54/6.21 33.45/6.21
MemNet 0.678 762.87 34.04/9.54 30.32/9.61 28.17/9.62 33.61/9.62
CARN 1.592 6363.84 34.29/29.28 30.32/29.31 28.17/29.34 33.50/29.2
IMDN 0.694 45.23 34.36/81.22 29.09/77.81 28.21/81.33 33.61/81.19
baseline 0.534 27.1 34.41/99.00 30.41/99.00 28.21/99.00 33.67/99.00
DRFDN (ours) 0.267 14.99 34.45/165.56 30.56/167.71 28.32/174.66 33.78/168.50
Scale x4
SRCNN 0.008 6.1 30.48/68.78 27.50/61.32 24.52/61.53 27.58/61.76
FSRCNN 0.013 1.72 37.00/135.04 32.63/109.67 29.88/109.73 36.67/108.48
VDSR 0.666 70.5 31.35/33.40 28.01/32.27 25.18/32.31 28.83/32.34
LapSRN 0.251 8.57 37.52/56.55 32.99/54.16 30.41/54.14 37.27/53.84
EDSR 1.37 8679 32.35/5.93 28.64/6.21 26.62/6.21 31.02/6.71
MemNet 0.678 762.87 37.78/12.71 33.28/12.81 31.31/12.80 37.72/12.80
CARN 1.37 101.44 38.05/39.04 33.64/37.44 32.23/37.32 38.05/37.28
IMDN 0.694 45.23 28.80/81.76 26.86/82.27 24.71/82.23 28.80/82.21
baseline 0.534 27.1 28.65/99.00 26.83/99.00 24.64/99.00 28.65/99.00
DRFDN (ours) 0.267 14.99 28.73/176.13 26.95/150.82 24.70/ 151.22 28.71/151.46

proposed model excels in several key areas, moreover, it demonstrates models that suffer performance degradation. Moreover, the MKDCB al-
high image quality, as indicated by PSNR, which closely rivals the lows the model to adapt more effectively to varied image features. This
top-performing models. When it comes to operational efficiency, the adaptability is particularly beneficial in images with diverse patterns
DRFDN model processes images rapidly, evidenced by its low validation and edges, where the proposed model shows a noticeable improvement
time, making it one of the fastest among the compared methods. In in edge sharpness and texture clarity. Furthermore, we estimate the
terms of model complexity, DRFDN stands out with the fewest param- efficacy of the method using Eq. (2) to validate results with the DIV2K
eters, reflecting its streamlined architecture that does not compromise dataset at a scale factor of 2, as shown in Table 1. The LWFANet
on performance despite being lightweight. This characteristic is partic- model [41] had the highest PSNR and lowest efficacy. As explained
ularly beneficial for deployment in systems where memory and storage in Section 3.3, the efficacy of the model is directly proportional to
are at a premium. The computational efficiency of the model is further the PSNR and inversely proportional to other attributes. In the case
highlighted by its low number of FLOPs, suggesting that it requires less of LWFANet, those other attributes increase, leading to a decrease in
computational power for image processing tasks. Moreover, the efficacy 𝑝𝑠𝑖. Due to the deeper and intricate architecture of the LWFANet, it re-
score of the DRFDN model is the highest on the table, which implies quires more computational resources. This involves more convolutions
that it offers the best balance between image quality, computational and more intensive calculations, which translates to higher processing
speed, and resource usage. As we mentioned in Section 3.3, embedded power and longer inference time. In IoT devices, memory and power are
devices are compact and, miniaturized systems with limited computing precious resources. The complex architecture of the LWFANet demands
and storage capabilities. Therefore, the SISR methods designed for more memory for storing model parameters and intermediate compu-
embedded systems require minimal primary memory and processing tations, as well as more power for processing, making it less suitable
speed. They also limit the number of concurrent input/output (IO) for lightweight applications. The proposed method exhibits superior
operations and the number of connected resources, such as sensors performance, when juxtaposes with other methods, satisfying all the
and actuators. The DRFDB efficiently distills and concentrates essential stringent requirements stipulated by IoT devices. Although strategic
image features, which is critical for super-resolution tasks. This effi- reduction of attributes is inversely correlate with model efficacy, the
ciency becomes particularly evident in scenarios with complex textures proposed network is superior compared with its counterparts in terms
or fine details, where the proposed model preserves more information of efficacy. Furthermore, the proposed model distinctly illustrates the
compared to traditional models. To calculate the efficacy of the model magnitude of improvement in parameters such as the number of param-
and assess its suitability for specific devices, we first normalize each eters, inference time, FLOPs, and memory consumption relative to the
attribute unit, including model PSNR, memory, number of parameters, baseline [12]. The impact of each attribute on the efficacy of SISR meth-
FLOPs, and inference time. The lightweight nature of the model ensure ods is shown Fig. 3. In addition, we conduct a comparative analysis of
lower computational requirements, making it ideal for IoT devices. This the efficacy of test performances of the NTIRE2022 models [41] across
is clearly observed in tests with resource-constrained devices, where four datasets: Urban100, Manga109, Set14, and Set5 at a scale factor of
the proposed model maintain high performance levels, unlike heavier x4, as presented in Table 3 and we illustrate visual comparison of those

8
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Table 3
Comparison of the performances of the proposed model and other models.
Method Test PSNR [db] Test Time [ms] Params [M] FLOPs [G] Mem [M] Efficacy
Urban100 x4
baseline 24.64 12.29 0.433 27.1 788.13 99
IMDN 24.71 10.08 0.894 58.53 468.79 64.04
EFDN 24.24 11.3 0.272 16.86 572.73 138.27
NLFFC 22.46 176.06 0.423 423.16 3208.88 10.42
PRRN 24.71 28.27 0.414 24.8 1453.31 71.41
LWFANet 24.9 48.36 0.832 135.3 4419.71 25.11
DRFDN(ours) 24.70 11.9 0.267 14.99 503.99 144.77
Manga109 x4
baseline 28.65 11.54 0.433 27.1 788.13 99
IMDN 28.8 11.83 0.894 58.53 468.79 62.53
EFDN 27.94 10.88 0.272 16.86 572.73 136.17
NLFFC 24.65 228.88 0.423 423.16 3208.88 8.72
PRRN 28.86 27.27 0.414 24.8 1453.31 71.09
LWFANet 28.92 58.35 0.832 135.3 4419.71 23.75
DRFDN(ours) 28.71 10.29 0.267 14.99 503.99 147.39
Set5 x4
baseline 28.65 8.31 0.433 27.1 788.13 99
IMDN 28.80 7.97 0.894 58.53 468.79 62.62
EFDN 27.94 8.38 0.276 16.86 662.51 132.31
NLFFC 26.72 84.69 0.423 423.16 3208.88 11.59
PRRN 28.88 20.55 0.414 24.8 1462.66 69.57
LWFANet 28.94 22.99 0.832 135.3 3403.69 28.53
DRFDN(ours) 28.73 8.65 0.267 14.99 503.99 142.96
Set14 x4
baseline 26.83 9.11 0.433 27.1 788.13 99
IMDN 26.86 9.67 0.894 58.53 468.79 68.28
EFDN 26.77 10.05 0.276 16.73 662.51 125.75
NLFFC 25.24 82 0.423 423.16 3208.88 11.79
PRRN 26.85 20.69 0.414 24.8 1462.66 67.09
LWFANet 26.93 27.74 0.832 135.3 3403.69 26.24
DRFDN(ours) 26.95 10.48 0.267 14.99 503 138.14

models, using the datasets, such as; DIV2K in Fig. 4, Set5 in Fig. 5, and Table 4
Validation performance of the proposed model with lightweight models for real-time
Set14 in Fig. 6. In the Urban100 dataset, the DRFDN model, achieves
IoT devices. The best first, second efficacy, and attributes are in blue, the lower results
a PSNR of 24.70 dB, which is competitive with other state-of-the-art are in red..
methods. It also boasts the lowest test time at 11.9 ms, demonstrating Model RLFN BSRN AIDN HNCT EIMDN DRFDN
rapid processing capability. The architectural efficiency of the models
Val PSNR 28.88 28.78 28.85 28.83 28.6 28.95
is evident in its minimal parameter count of 0.267M and FLOPs at Val time 95.06 132.3 114.03 157.89 110.4 93.23
14.99G, indicating a lightweight design with reduced computational
demands. Despite its lean architecture, the DRFDN model does not
sacrifice performance, achieving a high Efficacy score of 144.77, which
suggests a superior balance between image quality and resource utiliza- feature network (RLFN) [43] is a model developed for efficient SISR,
tion. A similar pattern is observed in the Manga109 dataset, where the and the winner of NTIRE2022 super-resolution challenge. This model
DRFDN model presents a PSNR of 28.71 dB, again showing high-quality stands out for its suitability in resource-constrained environments, such
image enhancement capabilities. The model maintains a swift test time as IoT devices, due to its lightweight and fast architecture. RLFN
of 10.29 ms, reinforcing its potential for real-time applications. With primarily achieves this by simplifying the network structure, employing
the lowest parameters and FLOPs among the compared models, the a minimal number of layers, and optimizing the connections between
DRFDN model underscores its design optimization. The efficacy score them. However, its streamlined and simplified architecture is ben-
of the proposed model, 147.39 is the highest among the methods eficial for reducing computational load, might not capture complex
image features as effectively as more elaborate models. Though the
evaluated, further establishing its proficiency. In the Set5 dataset, the
issues with image features potentially impact the quality of super-
performance of the DRFDN model is 28.73 dB, and it maintains a quick
resolution in images with intricate details or textures, the validation
test time of 8.65 ms, the fastest in this dataset. The parameters and
time of the model is the second best real-time for IoT devices. The
FLOPs remain the lowest, and the Efficacy score is high at 142.96,
blueprint separable residual network (BSRN) [39] is an innovative
reinforcing the balanced performance of the model. Finally, the Set14
model that is adept at reconstructing HR images from LR inputs with
dataset sees the DRFDN model with a PSNR of 26.95 dB and test a focus on efficiency. Its architecture, drawing inspiration from the
time of 10.48 ms. The model continues to exhibit the least number RFDN, employs blueprint separable convolutions (BSConv) to optimize
of parameters and FLOPs, ensuring a lightweight model conducive to convolutional operations, thereby reducing redundancy. Additionally,
various deployment scenarios. The efficacy score of 138.14 is indicative the model integrates advanced attention mechanisms like enhanced
of the model overall efficiency and effectiveness. (see Table 3). spatial attention and contrast-aware channel attention to bolster its
feature representation capabilities. Despite these advancements, the
4.3.2. Comparison of the proposed model with the lightweight SOTA models specialized architecture of BSRN, primarily tailored for efficiency, faces
for real-time IoT devices challenges in scenarios requiring extreme computational efficiency or
Table 4 presents the comparison of five lightweight models [20, specific adaptations for certain hardware environments. Moreover, for
39,43–45] and the proposed model for real-time IoT devices. Fig. 7 real-time IoT devices BSNR shows lower results compared with other
shows the visual representation of each model. The residual local models.

9
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Fig. 4. Visual comparisons of DRFDN with other lightweight SR methods [42] on DIV2K(4x) dataset .

Fig. 5. Visual comparisons of DRFDN with other lightweight SR methods [42] on Set5(4x) dataset.

Fig. 6. Visual comparisons of DRFDN with other lightweight SR methods [42] on Set14(4x) dataset .

The asymmetric information distillation network (AIDN) [45] is PSNR, while inference time becomes lower cause of the depth of the
an innovative model designed for lightweight image super-resolution, model. The hybrid network of CNN and transformer (HNCT) [20] is a
aiming to deliver performance akin to more complex networks like sophisticated model designed to enhance single-image super-resolution,
SRResNet but with significantly fewer parameters. The main block of merging the strengths of CNNs and Transformers. HNCT consists of
AIDN is the asymmetric information distillation block (AIDB), which several components: a shallow feature extraction module, hybrid blocks
employs a unique method of processing distilled information through of CNN and transformer (HBCTs), a dense feature fusion module, and
1 × 1 convolution operations. This approach is complemented by the an up-sampling module. This architecture enables HNCT to extract both
asymmetric information enhancement block (AIEB), which enhances local and non-local image features effectively. Despite its innovative
image features by focusing on different directions, effectively boosting approach, HNCT may have some limitations, cause the hybrid structure
the performance of the model. Despite the AIDN demonstrates the high of the model, while being the beneficial for capturing a wide range of

10
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

Fig. 7. Visual comparisons of DRFDN with other lightweight SR methods [41] for real-time IoT devices. As a data, there is used the hall of 5th floor, AI building of the Gachon
University.

features, can introduce additional computational complexity compared the specific block called the MKDCB. The block allows to adapt more ef-
to purely CNN-based methods. This leads to slightly slower inference fectively to varied image features and makes it adaptability,particularly
speeds, which is the limiting factor in certain real-time applications. beneficial for images with diverse patterns and edges. The proposed
As shown in the table, HNCT demonstrates the lowest inference time model demonstrates noticeable improvement in edge sharpness and
among other models. texture clarity, which are crucial aspects of super-resolution, espe-
The enhanced information multiple distillation network (EIMDN) cially in intricate images. Moreover, the model is designed to ensure
[44] is a lightweight single-image super-resolution network, tailored lower computational requirements, making it ideal for IoT devices. It
for use in portable devices with limited computing power and stor- maintains high-performance levels in tests with resource-constrained
age. Building upon the IMDN, EIMDN integrates the Ghost module devices, unlike heavier models that tend to suffer from performance
to streamline the feature extraction process, significantly reducing the
degradation. The model demonstrates superior performance in terms
number of parameters and computational load. The network also em-
of PSNR and test time, as indicated by the results. As Table 5 presents,
ploys a coordinated attention mechanism for better channel attention
the DRFDN is characterized by its lightweight nature and robustness,
and spatial awareness and uses a feedback mechanism to enhance the
making it well-suited for IoT applications. Notably, the model achieves
fusion of low and high-level features, thereby improving the recon-
the fastest inference time, surpassing even the validation time of the
struction effect. Despite its advancements, EIMDN faces limitations in
DIV2K dataset. However, it is important to note a slight compromise in
handling extremely HR tasks or complex image scenarios, such as those
in medical or remote sensing fields, which is demonstrated in Table 5, PSNR, which is marginally lower at 28.95.
the PSNR of the model shows the lowest one, while reaching 110.4 ms In this study, for testing the models for real-time IoT devices, is used
in the inference time. The proposed method addresses the handling of the standard platform robot TurtleBot3 with upgraded Rasbperry Pi4
different types of images, such as natural scenes, faces, or text, through and 8MP Camera, enhanced 360◦ LIDAR.

11
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

4.4. Comparison with the state-of-the-art methods a slight compromise in PSNR, which is marginally lower for real-
time IoT devices, while reaching lower inference time. Future research
In this section, we compare the efficacies of state-of-the-art SR could explore the integration of this model into various real-world
methods the proposed DRFDN method. Table 2 presents the compar- IoT applications, examining its adaptability and further optimizing its
ative analysis of the efficacy of the SISR methods across multiple performance to meet diverse needs.
datasets Set5, Set14, Urban100, and Manga109 for scaling factors (x2,
x3, x4). In addition, each model is evaluated based on parameters, CRediT authorship contribution statement
computational complexity, and effectiveness, including SRCNN [1],
FSRCNN [46], VDSR [47], LapSRN [48], EDSR [5], MemNet [49], Sevara Mardieva: Writing – review & editing, Writing – original
CARN [50], IMDN [9] and the baseline [12]. For the x2 scaling fac- draft, Software, Methodology, Formal analysis, Data curation, Con-
tor, SRCNN, a foundational model in the field, demonstrates modest ceptualization. Shabir Ahmad: Supervision, Project administration,
efficacy, while models like VDSR, EDSR, with more complex architec- Formal analysis, Data curation. Sabina Umirzakova: Writing – re-
tures, exhibit higher PSNR values, indicative of superior image quality. view & editing, Writing – original draft, Visualization, Software. M.J.
Although EDSR [5] exhibiting the highest PSNR performance at x2, Aashik Rasool: Visualization, Project administration, Methodology.
its efficacy is the least favorable among the state-of-the-art methods. Taeg Keun Whangbo: Writing – review & editing, Supervision, Inves-
As delineated in Section 3.3, the efficacy of a method directly pro- tigation.
portional to the PSNR and inversely proportional to other attributes.
In the case of EDSR [5], both the number of parameters and FLOPs
Declaration of competing interest
were considerably high, which consequently led to an extremely low
efficacy. Notably, MemNet and CARN present a substantial increase
The authors declare that they have no known competing finan-
in FLOPs, which correlates with their increased depth and complexity,
cial interests or personal relationships that could have appeared to
yet this does not consistently translate to proportionate gains in PSNR,
influence the work reported in this paper.
highlighting a trade-off between computational demand and perfor-
mance. At the x3 scaling factor, the increase in model parameters for
algorithms like FSRCNN and LapSRN yields marginal improvements Data availability
in PSNR, suggesting diminishing returns for added complexity. The
DRFDN model, while maintaining lower parameter counts and compu- Our dataset open access.
tational costs, offers competitive PSNR values, reflecting its efficiency.
As the challenge intensifies with the x4 scaling factor, the DRFDN Acknowledgments
model continues to deliver competitive performance with a notable
balance between efficacy and computational efficiency, particularly in This research was supported by Culture, Sports and Tourism R&D
the Urban100 dataset, where intricate textural details are prevalent. In Program through the Korea Creative Content Agency grant funded by
contrast, while EDSR and MemNet achieve high PSNR values, their sig- the Ministry of Culture, Sports and Tourism in 2023 (Project Name: Cul-
nificantly higher FLOPs indicates a heavier computational complexity. tural Technology Specialist Training and Project for Metaverse Game,
The analysis across datasets reveals that models with larger parameter Project Number: RS-2023-00227648, Contribution Rate: 100%)’’
spaces and higher computational requirements do not consistently out-
perform more optimized models like DRFDN. This suggests that careful Appendix A
architectural choices can lead to models that are both computationally
economical and capable of high-quality image upscaling, which is In https://github.com/sevaramardi/deep_r_f_d_n, we provide our
essential for practical deployment scenarios. github account link, where are located the Python codes of the pro-
While all SISR methods achieved high PSNR performances, the posed model. In the repository deep_r_f_d_n, are 3 major .py files for
corresponding attributes the number of parameters, FLOPs, memory running the code: main.py, mkdc_block.py, are for the training the
consumption, and runtime reaches substantial quantities, resulting in model and test.py for testing it. Moreover, we share with the pretrained
a significant reduction in efficacy. In this scenario, the proposed model model of the proposed model, .pth file for scale factor (x4).
demonstrates superior efficacy, with an improvement of approximately
50% relative to the baseline in all instances. Though methods such
Appendix B. Supplementary data
as SRCNN [1] and FSRCNN [46] have fewer parameters, their signif-
icant memory consumption renders them less practical for embedded
Supplementary material related to this article can be found online
devices. In contrast, our method is more suitable for IoT edge nodes.
at https://doi.org/10.1016/j.knosys.2023.111343.
5. Conclusions
References
In this study, we successfully develop DRFDN, by achieving notable
[1] C. Dong, C.C. Loy, K. He, X. Tang, Image super-resolution using deep con-
improvements in single-image super-resolution tasks for IoT devices. volutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2) (2015)
The effectiveness of the model is validated through rigorous testing on 295–307.
multiple datasets, where it consistently delivered superior performance [2] C. Dong, C.C. Loy, K. He, X. Tang, Image super-resolution using deep con-
in image quality (measured by PSNR) compared to current state-of- volutional networks, IEEE Trans. Pattern Anal. Mach. Intell. 38 (2) (2015)
295–307.
the-art methods. The introduction of the DCB and MKDCB within
[3] W. Shi, J. Caballero, F. Huszár, J. Totz, A.P. Aitken, R. Bishop, D. Rueckert,
the DRFDN represents a significant advancement in the field. These Z. Wang, Real-time single image and video super-resolution using an efficient
components are pivotal in reducing the runtime of the model, param- sub-pixel convolutional neural network, in: Proceedings of the IEEE Conference
eter count, and FLOPs, thereby addressing the primary challenge of on Computer Vision and Pattern Recognition, 2016, pp. 1874–1883.
deploying high-performance super-resolution in resource-constrained [4] S. Ahmad, S. Malik, I. Ullah, M. Fayaz, D.-H. Park, K. Kim, D. Kim, An adaptive
approach based on resource-awareness towards power-efficient real-time periodic
environments like IoT devices. The results and methodologies presented
task modeling on embedded IoT devices, Processes 6 (7) (2018) 90.
in this study have far-reaching implications, particularly for enhancing [5] B. Lim, S. Son, H. Kim, S. Nah, K. Mu Lee, Enhanced deep residual networks
remote where high-quality image resolution is crucial yet constrained for single image super-resolution, in: Proceedings of the IEEE Conference on
by the capabilities of IoT devices. However, it is important to note Computer Vision and Pattern Recognition Workshops, 2017, pp. 136–144.

12
S. Mardieva et al. Knowledge-Based Systems 285 (2024) 111343

[6] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, [28] X. Zhang, J. Zou, K. He, J. Sun, Accelerating very deep convolutional networks
A. Tejani, J. Totz, Z. Wang, et al., Photo-realistic single image super-resolution for classification and detection, IEEE Trans. Pattern Anal. Mach. Intell. 38 (10)
using a generative adversarial network, in: Proceedings of the IEEE Conference (2015) 1943–1955.
on Computer Vision and Pattern Recognition, 2017, pp. 4681–4690. [29] G. Gendy, G. He, N. Sabor, Lightweight image super-resolution based on deep
[7] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep learning: State-of-the-art and future directions, Inf. Fusion 94 (2023) 284–310.
convolutional neural networks, Commun. ACM 60 (6) (2017) 84–90. [30] H. Choi, J. Lee, J. Yang, N-gram in swin transformers for efficient lightweight
[8] S. Umirzakova, S. Ahmad, L.U. Khan, T. Whangbo, Medical image super- image super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer
resolution for smart healthcare applications: A comprehensive survey, Inf. Fusion Vision and Pattern Recognition, 2023, pp. 2071–2081.
(2023) 102075. [31] Y. Chen, R. Xia, K. Yang, K. Zou, MFFN: Image super-resolution via multi-level
[9] Z. Hui, X. Gao, Y. Yang, X. Wang, Lightweight image super-resolution with infor- features fusion network, Vis. Comput. (2023) 1–16.
mation multi-distillation network, in: Proceedings of the 27th Acm International [32] B. Sun, Y. Zhang, S. Jiang, Y. Fu, Hybrid pixel-unshuffled network for lightweight
Conference on Multimedia, 2019, pp. 2024–2032.
image super-resolution, in: Proceedings of the AAAI Conference on Artificial
[10] Z. Hui, X. Wang, X. Gao, Fast and accurate single image super-resolution via
Intelligence, Vol. 37, No. 2, 2023, pp. 2375–2383.
information distillation network, in: Proceedings of the IEEE Conference on
[33] P. Behjati, P. Rodriguez, C. Fernández, I. Hupont, A. Mehri, J. Gonzàlez, Single
Computer Vision and Pattern Recognition, 2018, pp. 723–731.
image super-resolution based on directional variance attention network, Pattern
[11] A. Lugmayr, M. Danelljan, R. Timofte, M. Fritsche, S. Gu, K. Purohit, P. Kandula,
Recognit. 133 (2023) 108997.
M. Suin, A. Rajagoapalan, N.H. Joon, et al., Aim 2019 challenge on real-world
[34] Z. He, D. Chen, Y. Cao, J. Yang, Y. Cao, X. Li, S. Tang, Y. Zhuang, Z.-m. Lu,
image super-resolution: Methods and results, in: 2019 IEEE/CVF International
Single image super-resolution based on progressive fusion of orientation-aware
Conference on Computer Vision Workshop, ICCVW, IEEE, 2019, pp. 3575–3583.
[12] J. Liu, J. Tang, G. Wu, Residual feature distillation network for lightweight features, Pattern Recognit. 133 (2023) 109038.
image super-resolution, in: Computer Vision–ECCV 2020 Workshops: Glasgow, [35] L. Fu, H. Jiang, H. Wu, S. Yan, J. Wang, D. Wang, Image super-resolution recon-
UK, August 23–28, 2020, Proceedings, Part III 16, Springer, 2020, pp. 41–55. struction based on instance spatial feature modulation and feedback mechanism,
[13] K. Zhang, M. Danelljan, Y. Li, R. Timofte, J. Liu, J. Tang, G. Wu, Y. Zhu, X. Appl. Intell. 53 (1) (2023) 601–615.
He, W. Xu, et al., Aim 2020 challenge on efficient super-resolution: Methods [36] S. Fu, Z. Li, K. Liu, S. Din, M. Imran, X. Yang, Model compression for IoT
and results, in: Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August applications in Industry 4.0 via multiscale knowledge transfer, IEEE Trans. Ind.
23–28, 2020, Proceedings, Part III 16, Springer, 2020, pp. 5–40. Inf. 16 (9) (2019) 6013–6022.
[14] L. Yu, X. Li, Y. Li, T. Jiang, Q. Wu, H. Fan, S. Liu, DIPNet: Efficiency distillation [37] G. Yao, Z. Li, B. Bhanu, Z. Kang, Z. Zhong, Q. Zhang, MTKDSR: Multi-teacher
and iterative pruning for image super-resolution, 2023, arXiv preprint arXiv: knowledge distillation for super resolution image reconstruction, in: 2022 26th
2304.07018. International Conference on Pattern Recognition, ICPR, IEEE, 2022, pp. 352–358.
[15] G. Zhou, W. Chen, Q. Gui, X. Li, L. Wang, Split depth-wise separable [38] H. Li, Y. Yang, M. Chang, S. Chen, H. Feng, Z. Xu, Q. Li, Y. Chen, Srdiff: Single
graph-convolution network for road extraction in complex environments from image super-resolution with diffusion probabilistic models, Neurocomputing 479
high-resolution remote-sensing images, IEEE Trans. Geosci. Remote Sens. 60 (2022) 47–59.
(2021) 1–15. [39] Z. Li, Y. Liu, X. Chen, H. Cai, J. Gu, Y. Qiao, C. Dong, Blueprint separable residual
[16] Z. Wang, D. Liu, J. Yang, W. Han, T. Huang, Deep networks for image super- network for efficient image super-resolution, in: Proceedings of the IEEE/CVF
resolution with sparse prior, in: Proceedings of the IEEE International Conference Conference on Computer Vision and Pattern Recognition, 2022, pp. 833–843.
on Computer Vision, 2015, pp. 370–378. [40] X. Zhu, D. Cheng, Z. Zhang, S. Lin, J. Dai, An empirical study of spatial attention
[17] V. Nascimento, R. Laroca, J.d.A. Lambert, W.R. Schwartz, D. Menotti, Combining mechanisms in deep networks, in: Proceedings of the IEEE/CVF International
attention module and pixel shuffle for license plate super-resolution, in: 2022 Conference on Computer Vision, 2019, pp. 6688–6697.
35th SIBGRAPI Conference on Graphics, Patterns and Images, Vol. 1, SIBGRAPI, [41] Y. Li, K. Zhang, R. Timofte, L. Van Gool, F. Kong, M. Li, S. Liu, Z. Du, D. Liu, C.
IEEE, 2022, pp. 228–233. Zhou, et al., NTIRE 2022 challenge on efficient super-resolution: Methods and
[18] J. Kim, J.K. Lee, K.M. Lee, Accurate image super-resolution using very deep results, in: Proceedings of the IEEE/CVF Conference on Computer Vision and
convolutional networks, in: Proceedings of the IEEE Conference on Computer
Pattern Recognition, 2022, pp. 1062–1102.
Vision and Pattern Recognition, 2016, pp. 1646–1654.
[42] A. Lugmayr, M. Danelljan, R. Timofte, Ntire 2020 challenge on real-world
[19] S. Woo, J. Park, J.-Y. Lee, I.S. Kweon, Cbam: Convolutional block attention
image super-resolution: Methods and results, in: Proceedings of the IEEE/CVF
module, in: Proceedings of the European Conference on Computer Vision, ECCV,
Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp.
2018, pp. 3–19.
494–495.
[20] J. Fang, H. Lin, X. Chen, K. Zeng, A hybrid network of cnn and transformer for
[43] F. Kong, M. Li, S. Liu, D. Liu, J. He, Y. Bai, F. Chen, L. Fu, Residual local
lightweight image super-resolution, in: Proceedings of the IEEE/CVF Conference
feature network for efficient super-resolution, in: Proceedings of the IEEE/CVF
on Computer Vision and Pattern Recognition, 2022, pp. 1103–1112.
[21] K. O’Shea, R. Nash, An introduction to convolutional neural networks, 2015, Conference on Computer Vision and Pattern Recognition, 2022, pp. 766–776.
arXiv preprint arXiv:1511.08458. [44] J. Wang, Y. Wu, S. He, P.K. Sharma, X. Yu, O. Alfarraj, A. Tolba, Lightweight
[22] J. Kim, J.K. Lee, K.M. Lee, Accurate image super-resolution using very deep single image super-resolution convolution neural network in portable device.,
convolutional networks, in: Proceedings of the IEEE Conference on Computer KSII Trans. Internet Inf. Syst. 15 (11) (2021).
Vision and Pattern Recognition, 2016, pp. 1646–1654. [45] Z. Du, D. Liu, J. Liu, J. Tang, G. Wu, L. Fu, Fast and memory-efficient net-
[23] Q. Zhu, P. Li, Q. Li, Attention retractable frequency fusion transformer for work towards efficient image super-resolution, in: Proceedings of the IEEE/CVF
image super resolution, in: Proceedings of the IEEE/CVF Conference on Computer Conference on Computer Vision and Pattern Recognition, 2022, pp. 853–862.
Vision and Pattern Recognition, 2023, pp. 1756–1763. [46] C. Dong, C.C. Loy, X. Tang, Accelerating the super-resolution convolutional
[24] J. Tang, K. Li, M. Hou, X. Jin, W. Kong, Y. Ding, Q. Zhao, MMT: Multi-way neural network, in: Computer Vision–ECCV 2016: 14th European Conference,
multi-modal transformer for multimodal learning, in: Proceedings of the Thirty- Amsterdam, the Netherlands, October 11-14, 2016, Proceedings, Part II 14,
First International Joint Conference on Artificial Intelligence, IJCAI-22, LD Raedt, Springer, 2016, pp. 391–407.
Ed. International Joint Conferences on Artificial Intelligence Organization, Vol. [47] J. Kim, J.K. Lee, K.M. Lee, Deeply-recursive convolutional network for image
7, 2022, pp. 3458–3465. super-resolution, in: Proceedings of the IEEE Conference on Computer Vision
[25] A. Mehri, P.B. Ardakani, A.D. Sappa, MPRNet: Multi-path residual network for and Pattern Recognition, 2016, pp. 1637–1645.
lightweight image super resolution, in: Proceedings of the IEEE/CVF Winter [48] W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. Yang, Deep laplacian pyramid networks
Conference on Applications of Computer Vision, 2021, pp. 2704–2713. for fast and accurate super-resolution, in: Proceedings of the IEEE Conference
[26] W. Bao, X. Yang, D. Liang, G. Hu, X. Yang, Lightweight convolutional neural on Computer Vision and Pattern Recognition, 2017, pp. 624–632.
network model for field wheat ear disease identification, Comput. Electron. Agric. [49] Y. Tai, J. Yang, X. Liu, C. Xu, Memnet: A persistent memory network for image
189 (2021) 106367. restoration, in: Proceedings of the IEEE International Conference on Computer
[27] Y. Pang, X. Zhao, L. Zhang, H. Lu, Multi-scale interactive network for salient Vision, 2017, pp. 4539–4547.
object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision [50] N. Ahn, B. Kang, K.-A. Sohn, Fast, accurate, and lightweight super-resolution
and Pattern Recognition, 2020, pp. 9413–9422. with cascading residual network, in: Proceedings of the European Conference on
Computer Vision, ECCV, 2018, pp. 252–268.

13

You might also like