Agriculture 14 00949
Agriculture 14 00949
Article
Loop Closure Detection with CNN in RGB-D SLAM
for Intelligent Agricultural Equipment
1,2,
Haixia Qi * , Chaohai Wang 1, Jianwen Li 1
and Linlin Shi 1
1
College of Engineering, South China Agricultural University, Guangzhou 510642, China;
charwang@stu.scau.edu.cn (C.W.); ljw1@stu.scau.edu (J.L.); lynnshi@scau.edu.cn (L.S.)
2
Guangdong Laboratory for Lingnan Modern Agriculture, Guangzhou 510642, China
* Correspondence: qihaixia@scau.edu.cn
Abstract: Loop closure detection plays an important role in the construction of reliable maps
for intelligent agricultural machinery equipment. With the combination of convolutional neural
networks (CNN), its accuracy and real-time performance are better than those based on
traditional manual features. However, due to the use of small embedded devices in agricultural
machinery and the need to handle multiple tasks simultaneously, achieving optimal response
speeds becomes challenging, especially when operating on large networks. This emphasizes the
need to study in depth the kind of lightweight CNN loop closure detection algorithm more
suitable for intelligent agricultural machinery. This paper compares a variety of loop closure
detection based on lightweight CNN features. Specifically, we prove that GhostNet with feature
reuse can extract image features with both high-dimensional semantic information and low-
dimensional geometric information, which can significantly improve the loop closure detection
accuracy and real-time performance. To further enhance the speed of detection, we implement
Multi-Probe Random Hyperplane Local Sensitive Hashing (LSH) algorithms. We evaluate our
approach using both a public dataset and a proprietary greenhouse dataset, employing an
incremental data processing method. The results demonstrate that GhostNet and the Linear
Scanning Multi-Probe LSH algorithm synergize to meet the precision and real-time requirements
of agricultural closed-loop detection.
Keywords: intelligent agricultural equipment; RGB-D SLAM; loop closure detection; lightweight
Citation: Qi, H.; Wang, C.; Li, J.; Shi,
convolutional neural networks; multi-probe random-hyperplane locality-sensitive hashing
L. Loop Closure Detection with CNN
in RGB-D SLAM for Intelligent
Agricultural Equipment.
Agriculture 2024, 14, 949.
1.Introduction
https://doi.org/
10.3390/agriculture14060949 In autonomous robotic systems, simultaneous localization and mapping (SLAM)
has been a focal point of research for decades [1–3]. Its primary aim is to map unknown
Academic Editor: Francesco
environments while concurrently localizing the robot within them, which is a critical
Marinello
function for agricultural robots performing tasks like navigation [4,5], path planning [6],
Received: 13 April 2024 and manipulation [7,8]. The classic SLAM process comprises four primary tasks: visual
Revised: 4 June 2024 odometry, optimization, loop closure detection and mapping [9,10]. Loop closure detection
Accepted: 11 June 2024 serves the function of recognizing previously visited locations. If a loop closure has
Published: 18 June 2024 occurred, adjustments can be made to the robot’s estimated trajectory based on the
error between the current estimated map point and the same map point that was last
visited. These adjustments correct for inaccuracies stemming from imprecise sensor
measurements, uncertain environmental conditions, and errors in odometry estimation.
Copyright: © 2024 by the Therefore, loop closure detection is crucial for correcting errors and optimizing the
authors. Licensee MDPI, Basel, local map [11–13]. Common loop closure detection is divided into two steps: feature
Switzerland. This article is an open extraction and feature matching. Feature extraction is to extract the feature information
access article distributed under of the current image, and feature matching is to use the feature information extracted in
the terms and conditions of the
the previous step to match the previously obtained image features to determine
Creative Commons Attribution (CC
whether the current location has ever been visited. However, the traditional manual
BY) license (https://
features in feature extraction are
creativecommons.org/licenses/by/
4.0/).
Agriculture 2024, 14, 949. https://doi.org/10.3390/agriculture14060949 https://www.mdpi.com/journal/agriculture
Agriculture 2024, 2 of 19
14, 949
not effective in greenhouse scenes, so the paper uses convolutional neural networks (CNN)
to extract image features. However, because the image features extracted by CNN are
high-dimensional information, it is difficult to use the bag of words matching method, so
the hash algorithm is proposed for feature matching.
With advancements in neural networks, researchers have noticed parallels
between loop closure detection in visual SLAM and image recognition and classification
problems addressed by neural networks. Both tasks boil down to addressing the
challenge of associating image data accurately [14]. Chen et al. introduced a loop
closure detection algorithm based on CNN and spatial-sequential filters, which
improved the recall rate by 75% in the dataset [15]. It has been demonstrated that
image features extracted from the neural network outperform manually designed ones,
thereby enhancing the accuracy of loop closure detection.
1.1. Review
The current research on image appearance can be categorized into two main
directions: unsupervised learning of image features using self-encoder, and extraction of
image features through off-the-shelf CNN for loop closure detection. Gao et al. proposed a
stacked denoising auto-encoder model for unsupervised learning, achieving satisfactory
precision. However, the detection time of all frames will be about 2.2 s, which is not
suitable for real-time loop closure detection [16]. Jia Xuewei introduced PCANet-LDA,
combining unsupervised neural networks and linear discriminant analysis, and achieved a
60.2% reduction in time cost compared with the GoodLeNet network by extracting features
based on network class differentiation [17].
Some approaches to loop closure detection rely on CNN. Hou et al. demonstrated
that AlexNet extracts descriptors three times faster than traditional manual descriptors
like SIFT features and Gist descriptors under significant illumination changes [18]. Xia
et al. proposed using the cascaded neural network model PCANet for better loop closure
detection. It takes more than 19% less time than SIFT features and Gist and guarantees
a minimum average precision of around 75% [19]. They also found that AlexNet fea-
tures trained twice using SVM yield optimal results in loop closure detection experiments,
showing more robustness than manually designed features [20]. Retraining the network
model is a promising approach to improve loop closure detection accuracy. Wang used
PCA for feature compression and sparse constraints on feature vectors before
comparing similarities, achieving that by expressing the features of an image with a
500-dimensional vector [21]. Lopez-Antequera observed enhanced loop closure
detection accuracy by re- training AlexNet with the places dataset [22]. Similarly,
Sünderhauf proposed an integrated hashing algorithm and semantic search space
partitioning technique, which accelerated loop closure detection by utilizing the
Hamming distance, resulting in a 99.6% speed increase [23]. Shahid fine-tuned pre-
trained AlexNet with different distance metrics, con- cluding that cosine distance works
more accurately than Euclidean distance [24]. Overall, either retraining the off-the-shelf
CNN model with a more targeted dataset or directly using a pre-trained CNN limits the
loop closure detection process to feature extraction and matching, allowing loop closure
detection to be performed online in real-time, which is beneficial for practical
engineering applications [25,26].
acceleration in a greenhouse scenario. Therefore, the main contributions of this paper are
the innovative applications of existing algorithms, as detailed below:
(1) During the image feature extraction phase in closed-loop detection, pre-existing CNN
models are employed to replace traditional manual methods, such as SIFT, for extract-
ing image features. Taking the accuracy and detection time of the detection algorithm
as the evaluation criteria, we compare the VGG19 CNN with three lightweight CNNs,
i.e., GhostNet, ShuffleNet V2, and Efficientnet-B0 models, in an open dataset. Mean-
while, we establish the Greenhouse dataset to verify that the most suitable model for
loop closure detection is GhostNet.
(2) Using the Random-Hyperplane Locality-Sensitive Hashing (RHLSH) algorithm to
reduce dimensionality and match features, which are extracted by CNN models. To
further accelerate the loop closure detection, two multi-probe random-hyperplane
locality-sensitive hashing algorithms are selected to speed up the detection algorithm.
In the proposed Greenhouse dataset, the experiments show that the step-wise probing
random-hyperplane locality-sensitive hashing using linear scanning can significantly
reduce the feature matching time with less accuracy loss.
2.Methods
Feature extraction and feature mapping are two key steps in loop closure
detection, which mainly affect the detection accuracy and time of the algorithm. Based on
this, in this section, we first select the appropriate lightweight CNN models for
intelligent agricultural equipment, which have been pre-trained, and we only use the
CNN models for image feature extraction, which does not require training the CNN
models. Then, the image features extracted by CNN models are matched by a hash
algorithm. Then two improved algorithms based on the hash algorithm are used to
accelerate the matching and compare the performance. Meanwhile, we establish the
GreenHouse dataset to demonstrate its performance. Accuracy–recall curves and
average accuracy, as well as average time, are used as performance evaluation metrics
for loop closure detection.
CNNs are depicted in Figure 1, illustrating their feature extraction process and closed-loop
detection utilization. Each model, including GhostNet, ShuffleNet v2, and EfficientNet-
B0, employs distinct feature reuse strategies to achieve efficiency and effectiveness in
loop closure detection. The solid arrows in the figure represent the data flow within the
parts of the CNN models used in this paper, while the dashed arrows indicate the
original framework of the CNN models.
GhostNet
Ghost
Input Conv Bottlenecks … x16 Conv Pool Conv FC Output
FeatureMap
Size: 1×1×1280
(a)
ShuffleNetV2
ShuffleNet Global
Input Conv MaxPool UnitsV2 … x16 Conv FC Output
Pool
FeatureMap
Size: 7×7×1024
(b)
EfficientNet-B0
Conv MBConv Conv Pool
Input … x16 FC Output
FeatureMa
p
Size: 7×7×1280
(c)
VGG19 Input
Conv
}
…
layers
Size: 7×7×512
FeatureMa Pool
p
Output
X Convs
(d)
Figure 1. Structure images of four deep CNN models structure images. (a) The framework of GhostNet; (b) The framework of
ShuffleNet; (c) The framework of EfficientNet-B0; (d) The framework of VGG19.
2.2. Feature Matching in Loop Closure Detection with CNNs
A visual bag-of-words (BoW) model based on manually designed features is the most
commonly used solution for loop detection [32–39]. This method involves extracting feature
points from images using algorithms such as SIFT, SURF, or ORB, followed by clustering to
divide these points and their descriptors into multiple words. This allows the detection of
related feature vectors for the image through the BoW mapping. Here, we adopt a BoW
model based on SIFT feature points and use cosine similarity to measure image similarity.
In agricultural settings, the abundance of local feature points and the scene’s element
similarity render traditional methods less practical compared to those based on CNNs.
Agriculture 2024, 5 of 19
14, 949
However, CNN-extracted feature vectors often suffer from high dimensionality, necessitat-
ing methods like the RHLSH algorithm for downsizing and initial retrieval of image feature
vectors. RHLSH partitions high-dimensional space using random hyperplanes and orga-
nizes vectors based on their positions [40]. As illustrated in Figure 2, the CNN-extracted
feature map is reshaped into a feature vector and projected onto randomly generated
hyperplanes via hash function families represented by Hamming code. This approach
effectively represents the high-dimensional feature map using hash codes on randomly
generated, relatively low-dimensional hyperplanes.
Feature Vector
Reshape
...
Feature Map
Hash Code
Random-hyperplane
In high-dimensional space, any randomly sampled normal vector following the stan-
dard multivariate normal distribution N(0, I) has an equal probability of occurrence in
all directions, ensuring uniform sampling [41]. Consequently, projecting onto multiple
hyperplanes and calculating matching scores can enhance the matching accuracy of feature
maps. The workflow is as follows: after an image is processed by CNNs and the hash
code is generated, the corresponding hash bucket’s feature map is tallied. Each hash code
in different hyperplanes corresponds to a distinct hash bucket. Occasionally, multiple
feature maps may reside in a hash bucket, indicating that a hash code may correspond to
various feature maps. Therefore, statistical analysis of the feature maps within the hash
bucket is necessary. Ultimately, the feature map with the highest score surpassing the preset
threshold is deemed successfully matched. Otherwise, if the score falls below the threshold,
the matching fails, indicating that loop closure has not occurred. This entire process is
depicted in Figure 3.
Feature Map0
Feature Map1
Feature Map2
Feature Map3 Feature Map0 +1+1..
Feature Map1 +1...
...
Feature Map2 +1...
Feature Map3 +1...
Feature Map0 Feature Map4 +1...
Feature Map4 ... ...
Increasing the number of hash function families and hash tables can enhance
search accuracy and recall, but it also escalates memory space usage. To mitigate this,
expanding the search range within the same hash table can be beneficial. Multi-
probe Random- Hyperplane Locality-Sensitive Hashing (RHLSH) is an exploration
method that improves search recall to some extent. Key strategies for expanding the
search range include Step- Wise Probing RHLSH (SWP-RHLSH) and Query-Directed
Probing RHLSH (QDP-RHLSH). For SWP-RHLSH, the Boolean hash value of the
feature vector allows for gradual search range expansion based on the number of
differing bits in the hash value. As feature vectors dynamically increase in loop closure
detection, a linear scan is employed initially to
determine hash bucket perturbations within a specified range, expediting the search.
For QDP-RHLSH, a random hyperplane within the same hash table further refines
search probability. Hash buckets with a higher likelihood of containing nearest neighbor
feature vectors are prioritized, reducing incorrect feature vector exploration. An evaluation
probability function with respect to the random hyperplane can be defined as (2) for a
given sequence of perturbation vectors Vp = [δ1, δ2, · · · , δk ]T with δl ∈ [0, 1]. When δl =
0,
indicating no perturbation, the probability of collision is (1) [41].
k
P{H(v1) = H(v2)|θ(v1, v2)} = p T · v2
) (1)
∏Φ( ||v2||·¨ tan
j θ(v¨1, v2)
j=1
k
P{H(v1) = H(v2)|θ(v1, v2)}
W (1) =
i,j ∏ 1 − P{H(v1) = H(v2)|θ(v1, v2)}
(2)
j=
1
where Φ is the standard normal distribution function; pj is the normal vector of the random
hyperplane; θ(v1, v2) is the angle between the two nearest neighbor feature vectors,
and the range of values is usually θ(v1, v2) ∈ (0, π/2). Wi,j(1) is defined as the
evaluation probability function of the hash code corresponding to the j hash function
under the I feature vector to be matched after adding perturbations.
Together with the use of the shift transform (3) and the expend transform (4) [42], the
construction of the maximum heap with Wi,j(1) as the weights can be achieved to
obtain the perturbation vector of the top M maximum weights, where the perturbation
vector is transformed from the set of perturbations, taking k = 4 as an example: assume
that the results of the descending sort of Wi,j(1) are {Wi,3(1), Wi,1(1), Wi,4(1), Wi,2(1)},
and for the perturbation set S = {1, 4}, the first and fourth positions of Wi,j(1) after
descending sorting are chosen as the perturbation positions, and the perturbation vector
is Vp = [0, 1, 1, 0]T.
The TUM dataset contains a variety of objects, such as office desks, chairs, computer
equipment, and robotic arm models, providing abundant texture and structure for image
feature extraction. Additionally, the camera trajectory in this dataset forms a large circular
closed trajectory with overlap at the initial and final points. This setup mirrors conditions
often found in agricultural scenes, characterized by rich texture structures (as depicted in
Figure 4a).
(a) (b)
Figure 4. Example of a dataset image. (a) TUM; (b) GreenHouse.
However, the TUM dataset lacks ground truth information for evaluating loop closure
detection algorithms. Instead, it offers camera motion trajectory coordinate files detected by
high-precision sensors. To establish correlations between pose coordinate files and image
data, the scripting tool provided by TUM was utilized. Matches were defined between pose
coordinates and image data with a time difference within 0.02 s. The occurrence of loop
closure was determined by calculating the pose error between any two frames within the
matched camera pose coordinates. Given the relatively minor positional changes between
Agriculture 2024, 8 of 19
14, 949
adjacent images, positional errors of the neighboring 150 images are disregarded. The
positional error calculation is expressed as Equation (5).
−1
e = ||Ti Tj − I (5)
where T is the camera pose; subscripts I,j are the image serial numbers, i = {1, 2, · · · ,
n} and j = {1, 2, · · · , (i − 150)}; I is the unit matrix.
The collection time of the greenhouse scene dataset is chosen at 10:00 am when the
light intensity is high. The dataset includes a variety of green vegetables that have been
planted on cultivators, blank cultivators that have not been planted, automated agricultural
equipment, and other common agricultural production environment elements. It is also
ensured that a large circular closed trajectory exists in the dataset (as shown in Figure 4b).
The GreenHouse dataset captures authentic greenhouse agricultural scenes using the D435i
depth camera. Cameras are typically categorized as monocular, binocular, and RGBD.
Monocular and binocular cameras require depth estimation through algorithms, while
RGBD cameras can directly measure depth. Consequently, RGBD cameras exhibit the
highest average depth accuracy among the three types. Therefore, the ORB-SLAM2 system
is employed to compute the D435i camera’s motion trajectory in the greenhouse scene
dataset, serving as the reference trajectory. And Formula (5) is applied to derive the ground
truth for loop closure detection.
The loop closure detection ground truth is saved in the form of a matrix. If the i image
and the j image constitute a loop closure, the corresponding value of the ground truth
matrix (i, j) is 1, and the opposite is 0.
3.Results
3.1. Feature Extraction Comparative Experiment
To facilitate a more concise comparison of the performance variations among three
lightweight CNN models—GhostNet, ShuffleNet v2, and EfficientNet-B0—in extracting fea-
tures from RGB-D images, and to further explore the influence of lightweight CNN
models on loop closure detection in agricultural scenes, we conducted three sets of
experiments:
The first experiment is to respectively extract RGB image features from the TUM
dataset using a visual word bag model method based on SIFT features and pre-trained
VGG19 and three pre-trained lightweight CNN models.
The second and third sets of experiments utilized a pre-trained VGG19 model and the
three pre-trained lightweight CNN models to extract RGB-D image features from the TUM
and GreenHouse datasets, respectively.
In the latter two sets of experiments, the depth image was replicated to match the num-
ber of depth image channels with the RGB image channels. The merged and concatenated
RGB and depth image features formed the image feature vector, ensuring the integrity of
the extracted feature information. Through these combined approaches, a comprehensive
analysis of the performance disparities among the lightweight CNN models and their
impact on loop closure detection in agricultural scenes can be conducted, fostering a deeper
understanding of their applicability and effectiveness. The accuracy of various algorithms
Agriculture 2024, 9 of 19
14, 949
for feature extraction for loop closure detection was measured by calculating the cosine
similarity between image features.
In the result analysis, Formulas (6) and (7) are used to calculate the degree of variation
in the value–optimization rate.
Af − (6)
A
Rra =
Ar
Tf − (7)
T
Rrt = −
Tr
where Ra represents the accuracy optimization rate; Af represents the accuracy rate
after optimization; Ar represents reference accuracy; Rt represents the time cost
optimization rate; Tf represents the time cost after optimization; Tr represents the
reference time cost.
Figure 5. PR curve of extracted RGB image features from the TUM dataset.
Table 2. Comparison of extracted RGB image features from the TUM dataset.
(a) (b)
Figure 6. PR curve of extracted RGB-D image features from TUM/GreenHouse dataset. (a) For
the TUM dataset; (b) for the GreenHouse dataset.
lightweight CNN models maintain high accuracy rates, whereas the VGG19 model lags
behind. Additionally, the GhostNet model delivers superior accuracy at a 75% recall rate.
In conjunction with Table 3, the GhostNet model-based algorithm demonstrates the
highest average accuracy among the four algorithms, reaching 59.4% for extracting RGB-D
image features for loop closure detection. The feature extraction time for a single image is
only 2 ms longer than that of the ShuffleNet v2 model algorithm. In summary, it shows
that the GhostNet-based model algorithm outperforms the other three CNN-based model
algorithms and the three lightweight CNN model algorithms in extracting features for loop
closure detection on RGB-D images in the TUM dataset. The results shown in Table 4 are
similar to those shown in Table 3: the GhostNet model-based algorithm is the best in terms
of overall performance.
Table 3. Comparison of extracted RGB-D image features from the TUM/GreenHouse dataset.
Table 4. Comparison of extracted RGB-D image features from the GreenHouse dataset.
As depicted in Figure 7, the PR curves derived from SWP-RHLSH and those without a
matching strategy are nearly indistinguishable until an 80% recall rate, while the PR curves
from QDP-RHLSH only overlap significantly until a 50% recall rate. With increasing recall
rates, the QDP-RHLSH-based algorithm exhibits more misclassifications compared to the
SWP-RHLSH-based algorithm.
4.Physical Experiment
We have integrated the loop closure detection algorithm into the feature-based visual
odometry system to optimize the trajectory it generates. We assembled a bespoke platform
featuring a D435i camera, which we affixed to the physical mobile robot, ‘Thunder’, for our
experiments. ‘Thunder’ is a mobile robot produced by Chaowenda Robot Technology, lo-
cated in Shenzhen, China.We conducted tests in both a standard orchard and a greenhouse,
as depicted in Figure 8.
We carried out four experiments under various conditions, detailed as follows:
For visual observation, we drove the robot around the field in a rectangular path.
The camera trajectory generated by testing in the greenhouse is illustrated in Figure 9a.
The outdoor orchard test, conducted on a sunny day, yielded the camera trajectory
shown in Figure 9b.
On an overcast day, the resulting camera trajectory from the outdoor orchard test is
presented in Figure 9c.
The camera trajectory obtained from the outdoor orchard test on a rainy day, with a
precipitation level of 2.81 mm, is also displayed in Figure 9d.
In Figure 9a,b, the blue curve represents the camera trajectory as estimated by the
SLAM system, while the red circle highlights the location of the loop closure detection event.
Agriculture 2024, 13 of 19
14, 949
Figure 8. The physical mobile robot, ‘Thunder’, collects data in a standard orchard.
(a) (b)
(c) (d)
Figure 9. Camera tracks produced by the SLAM system under various conditions. (a) In
greenhouse;
(b) sunny orchard; (c) cloudy orchard; (d) rainy orchard.
The experimental results reveal that the algorithm examined in this study functions ef-
fectively on both sunny and overcast days in greenhouses and outdoor orchards. However,
its loop closure detection capability is compromised during rainy conditions. We attribute
this malfunction to the significant interference of raindrops, which not only obfuscate the
visual distinctions between different locations within the orchard but also exacerbate the
image blurriness, hindering the algorithm’s performance. Moreover, Figure 9c,d suggest
that reduced light intensity profoundly affects the visual odometer’s accuracy. Specifically,
the trajectory illustrated in Figure 9c shows a substantial increase in camera trajectory
Agriculture 2024, 14 of 19
14, 949
jitter during turns made under low-light conditions. Specifically, the trajectory depicted
in Figure 9c demonstrates a significant augmentation in camera trajectory jitter during
turns executed in low-light conditions, and Figure 9d reveals noticeable camera trajectory
distortion. Figure 9d also reveals that the cumulative error of the SLAM system, in the
failure of loop closure detection, is considerably large within an agricultural setting. Given
that agricultural tasks frequently involve returning to previously visited locations, and the
visual similarity within farms or fruit orchards is relatively high, the implementation of
loop closure detection is more vital in agricultural scenarios compared to other
contexts.
5.Discussion
In Section 3.1, image features extracted by CNN models are typically faster than
manual methods. We consider that the TUM dataset contains rich image textures, resulting
in the extraction of numerous corner points, which in turn increases the computation time
required for SIFT descriptors. However, in scenarios where image textures and corner
points are scarce, the accuracy of image feature matching based on descriptors may not
be desired [43]. Therefore, it can be considered that the manual methods are less adept at
capturing redundant information than CNN models. We identified the GhostNet model as
the most suitable model for loop closure detection in intelligent agricultural equipment
among the four CNN models discussed in the article. Figures 5 and 6a illustrate that
the PR curves of algorithms based on the EfficientNet-B0 model exhibit a similar trend.
This similarity is primarily attributed to the Squeeze-and-Excitation (SE) module, which
effectively utilizes crucial deep feature information while disregarding less important
details [44]. Given the resemblance between depth images and shallow feature information,
the SE module tends to overlook more information, resulting in a similar trend in the curves.
Moreover, the overall trend of the PR curves of the ShuffleNet v2 model-based algo-
rithm in these plots differs considerably due to the channel mixing operation. The feature
information of the depth image retains the original shallow information, improving the
reuse rate of the feature information. The GhostNet model-based algorithm significantly
enhances the accuracy and real-time performance of loop closure detection, mainly due
to the cheap linear operation inside the Ghost module and the Batch Normalization (BN)
operation outside it [45]. The cheap linear operation enables deeper features to incorpo-
rate shallow feature information, ensuring comprehensive data description. Compared to
traditional algorithms, CNNs offer better accuracy and stability in feature extraction. In
the GreenHouse dataset, scene elements and details are richer than in the TUM dataset,
demanding higher feature extraction ability from model algorithms. Thus, for agricultural
scenes, loop closure detection algorithms based on lightweight CNN models require mod-
els that can retain more redundant feature information with fewer parameters, thereby
facilitating adaptation to the farming environment.
Regarding Section 3.2, the SWP-RHLSH-based algorithm considers all elements in
close hash buckets, improving the real-time performance of RGB-D SLAM loop closure
detection through a linear scan. Conversely, the QDP-RHLSH-based algorithm’s real-
time performance improvement is less evident due to the resource consumption caused
by its floating-point operations [40,41]. Based on comparison experiments, the
GhostNet- based algorithm for image feature extraction combined with SWP-RHLSH for
search range filtering is deemed most suitable for loop closure detection implementation
in intelligent agricultural equipment among the four CNN models discussed in the
article. The physical experiment outcomes demonstrate that the algorithm presented in
this paper exhibits robustness within agricultural settings. It operates effectively in
both the greenhouse environment and across various weather conditions encountered
in outdoor orchards.
6.Conclusions
With the growing need for precision and intelligence in agricultural machinery,
loop closure detection in visual SLAM must not only determine spatial congruence but
also be adaptable to various small embedded devices on smart agricultural equipment.
This
Agriculture 2024, 15 of 19
14, 949
study compares multiple loop closure detection methods based on lightweight CNN
features. It is observed that features extracted from GhostNet significantly enhance both
the accuracy and real-time performance of loop closure detection. This is attributed to
the Ghost module in GhostNet, which preserves redundant features, enabling deeper
feature information to encompass shallow features through feature reuse. To further
expedite loop closure detection, two multi-probe random-hyperplane locality-sensitive
hashing (RHLSH) algorithms are compared experimentally. SWP-RHLSH, employing
linear scanning, markedly reduces feature matching time with minimal accuracy loss,
making it more suitable for use in intelligent agricultural equipment detection algorithms.
This is due to the smaller number of hash buckets screened by SWP-RHLSH in small to
medium-sized agricultural settings, eliminating the need for floating-point operations to
evaluate probabilities.
However, this study still has limitations. It utilized pre-trained CNN models for
image feature extraction and did not investigate the impact of training the CNN models
on the TUM/Greenhouse datasets on image feature extraction. Furthermore, it is
important to note that this is a preliminary, phased outcome. A comprehensive SLAM
system is currently absent, which is essential for fully realizing the algorithm’s
potential. Future research will progressively refine a SLAM system tailored for
agricultural settings and deploy it into devices for further testing. For instance, it will be
applied to non-standard orchard scenarios and dense crop environments. Furthermore,
there is a need to delve deeper into how SLAM technology in agricultural settings can be
integrated with artificial intelligence techniques to enhance the accuracy and
computational speed of positioning in smart agricultural equipment. At the same time,
this would help conserve hardware resources for other tasks and improve operational
efficiency, navigation accuracy, and task planning in real agricultural environments.
Author Contributions: Conceptualization, H.Q.; methodology, H.Q., C.W. and J.L.; software,
C.W. and J.L.; validation, H.Q. and L.S.; formal analysis, C.W. and J.L.; investigation, C.W. and
J.L.; re- sources, H.Q.; data curation, J.L.; writing—original draft preparation, J.L. and C.W.; writing—
review and editing, C.W. and H.Q.; visualization, C.W.; supervision, H.Q.; project administration,
H.Q.; fund- ing acquisition, H.Q. All authors have read and agreed to the published version of the
manuscript.
Funding: This work was partially supported by the subject construction projects in specific
universi- ties, which is a subject construction project at South China Agricultural University, with the
funding number 2023B10564003.
Institutional Review Board Statement: This study did not require ethical approval.
Data Availability Statement: The relevant GreenHouse dataset for this study is available at
https:
//github.com/SCAU-AIUS/SLAM-for-agricultural-equipment (accessed on 30 May 2024).
Acknowledgments: The authors acknowledge the editors and reviewers for their constructive
comments and all the support on this work. The authors acknowledge Quanchen Ding for polishing
the article.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Pan, Z.; Hou, J.; Yu, L. Optimization RGB-D 3-D Reconstruction Algorithm Based on Dynamic SLAM. IEEE Trans. Instrum.
Meas.
2023, 72, 1–13. [CrossRef]
2. Nguyen, D.D.; Elouardi, A.; Florez, S.A.; Bouaziz, S. HOOFR SLAM system: An embedded vision SLAM algorithm and its
hardware-software mapping-based intelligent vehicles applications. IEEE Trans. Intell. Transp. Syst. 2018, 20,
4103–4118. [CrossRef]
3. Mur-Artal, R.; Tardos, J.D. ORB-SLAM2: An Open-Source SLAM system for monocular, stereo, and RGB-D cameras. IEEE
Trans. Robot. 2017, 33, 1255–1262. [CrossRef]
4. Zou, Q.; Sun, Q.; Chen, L.; Nie, B.; Li, Q. A comparative analysis of LiDAR SLAM-based indoor navigation for autonomous
vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6907–6921. [CrossRef]
5. Wang, T.; Chen, B.; Zhang, Z.; Li, H.; Zhang, M. Applications of machine vision in agricultural robot navigation: A review.
Comput. Electron. Agric. 2022, 198, 107085. [CrossRef]
Agriculture 2024, 16 of 19
14, 949
6. Wen, S.; Zhao, Y.; Yuan, X.; Wang, Z.; Zhang, D.; Manfredi, L. Path planning for active SLAM based on deep reinforcement
learning under unknown environments. Intell. Serv. Robot. 2020, 13, 263–272. [CrossRef]
7. Peng, J.; Shi, X.; Wu, J.; Xiong, Z. An object-oriented semantic slam system towards dynamic environments for mobile manipula-
tion. In Proceedings of the 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Hong Kong,
China, 8–12 July 2019; pp. 199–204.
8. Simon, J. Fuzzy Control of Self-Balancing, Two-Wheel-Driven, SLAM-Based, Unmanned System for Agriculture 4.0 Applications.
Machines 2023, 11, 467. [CrossRef]
9. Zhu, H.; Xu, J.; Chen, J.; Chen, S.; Guan, Y.; Chen, W. BiCR-SLAM: A multi-source fusion SLAM system for biped climbing robots
in truss environments. Robot. Auton. Syst. 2024, 176, 104685. [CrossRef]
10. Song, S.; Yu, F.; Jiang, X.; Zhu, J.; Cheng, W.; Fang, X. Loop closure detection of visual SLAM based on variational autoencoder.
Front. Neurorobot. 2024, 17, 1301785. [CrossRef]
11. Tsintotas, K.A.; Bampis, L.; Gasteratos, A. The Revisiting Problem in Simultaneous Localization and Mapping: A Survey on
Visual Loop Closure Detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19929–19953. [CrossRef]
12. Guclu, O.; Can, A.B. Integrating global and local image features for enhanced loop closure detection in RGB-D SLAM systems.
Vis. Comput. 2020, 36, 1271–1290. [CrossRef]
13. Xu, M.; Lin, S.; Wang, J.; Chen, Z. A LiDAR SLAM System with Geometry Feature Group Based Stable Feature Selection and
Three-Stage Loop Closure Optimization. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [CrossRef]
14. Angeli, A.; Doncieux, S.; Meyer, J.A.; Filliat, D. Real-time visual loop-closure detection. In Proceedings of the IEEE
International Conference on Robotics & Automation, Pasadena, CA, USA, 19–23 May 2008.
15. Chen, Z.; Lam, O.; Jacobson, A.; Milford, M. Convolutional Neural Network-based Place Recognition. arXiv 2014,
arXiv:1411.1509.
16. Gao, X.; Zhang, T. Unsupervised Learning to Detect Loops Using Deep Neural Networks for Visual SLAM System. Auton.
Robot.
2017, 41, 1–18. [CrossRef]
17. Jia, X. Research on Loop Closure Detection of Mobile Robots Based on PCANet-LDA; Harbin Institute of
Technology: Harbin, China, 2019. (In Chinese)
18. Hou, Y.; Zhang, H.; Zhou, S. Convolutional Neural Network-based Image Representation for Visual Loop Closure Detection. In
Proceedings of the 2015 IEEE International Conference on Information and Automation, Lijiang, China, 8–10 August 2015.
19. Xia, Y.; Li, J.; Qi, L.; Fan, H. Loop closure detection for visual SLAM using PCANet features. In Proceedings of the 2016
International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016.
20. Xia, Y.; Li, J.; Qi, L.; Yu, H.; Dong, J. An Evaluation of Deep Learning in Loop Closure Detection for Visual SLAM. In Proceedings
of the 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications
(GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Exeter, UK, 21–23
June 2017.
21. Wang, K. Research on Loop Closure Detection of Visual SLAM Based on Deep Learning; Harbin Engineering
University: Harbin, China,
2019. (In Chinese)
22. Lopez-Antequera, M.; Gomez-Ojeda, R.; Petkov, N.; Gonzalez-Jimenez, J. Appearance-invariant Place Recognition by
Discrimina- tively Training a Convolutional Neural Network. Pattern Recognit. Lett. 2017, 92, 89–95. [CrossRef]
23. Sunderhauf, N.; Shirazi, S.; Dayoub, F.; Upcroft, B.; Milford, M. On the Performance of ConvNet Features for Place Recognition.
In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28
September–2 October 2015.
24. Shahid, M.; Naseer, T.; Burgard, W. DTLC: Deeply Trained Loop Closure Detections for Lifelong Visual SLAM. In Proceedings of
the Workshop on Visual Place Recognition, Conference on Robotics: Science and Systems (RSS), Ann Arbor, MI, USA, 18–22 June
2016.
25. Yu, C.; Liu, Z.; Liu, X.-J.; Qiao, F.; Wang, Y.; Xie, F.; Wei, Q.; Yang, Y. A DenseNet feature-based loop closure method for visual
SLAM system. In Proceedings of the 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), Dali, China, 6–8
December 2019.
26. Zhang, X.; Yan, S.; Zhu, X. Loop Closure Detection for Visual SLAM Systems Using Convolutional Neural Network. In
Proceedings of the 2017 23rd International Conference on Automation and Computing (ICAC), Huddersfield, UK, 7–8 September
2017.
27. Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and
Huffman Coding. Fiber 2015, 56, 3–7.
28. Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the
International Conference on Machine Learning 2019, Long Beach, CA, USA, 9–15 June 2019.
29. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv
2017, arXiv:1707.01083v2.
30. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the 2020
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020.
31. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014,
arXiv:1409.1556.
32. Zisserman, S. Video Google: A Text Retrieval Approach to Object Matching in Videos. In Proceedings of the Ninth IEEE
International Conference on Computer Vision, Nice, France, 14–17 October 2003.
Agriculture 2024, 17 of 19
14, 949
33. Philbin, J.; Chum, O.; Isard, M.; Sivic, J.; Zisserman, A. Object Retrieval with Large Vocabularies and Fast Spatial Matching. In
Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007.
34. Angeli, A.; Filliat, D.; Doncieux, S.; Meyer, J.-A. Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual
Words. IEEE Trans. Robot. 2008, 24, 1027–1037. [CrossRef]
35. Cummins, M.; Newman, P. FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance. Int. J. Robot. Res.
2008,
27, 647–665. [CrossRef]
36. Cummins, M.; Newman, P. Appearance-only SLAM at Large Scale with FAB-MAP 2.0. Int. J. Robot. Res. 2011, 30, 1100–
1123. [CrossRef]
37. Liang, M.; Min, H.; Luo, R. Graph-based SLAM: A Survey. Robot 2013, 35, 500–512. (In Chinese) [CrossRef]
38. Zhang, G.; Lilly, M.J.; Vela, P.A. Learning Binary Features Online from Motion Dynamics for Incremental Loop-Closure Detection
and Place Recognition. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm,
Sweden, 16–21 May 2016.
39. Labbé, M.; Michaud, F. RTAB-Map as an Open-source Lidar and Visual Simultaneous Localization and Mapping Library for
Large-scale and Long-term Online Operation. J. Field Robot. 2019, 36, 416–446. [CrossRef]
40. Yuan, C.; Liu, M.; Luo, Y.; Chen, C. Recent Advances in Locality-Sensitive Hashing and Its Performance in
Different Applications;
Chengdu University of Technology: Chengdu, China, 2023.
41. Lv, Q.; Josephson, W.; Wang, Z.; Charikar, M.; Li, K. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search.
In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, 23–27 September 2007.
42. Liu, S.; Sun, J.; Liu, Z.; Peng, X.; Liu, S. Query-Directed Probing LSH for Cosine Similarity. In Proceedings of the 2016 Fifth
International Conference on Network, Communication and Computing (ICNCC 2016), Kyoto, Japan, 17–21 December 2016.
43. Lowe, G.D. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [CrossRef]
44. Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell.
2019, 42, 2011–2023. [CrossRef]
45. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In
Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.