Welding Detection Radiography
Welding Detection Radiography
Measurement
journal homepage: www.elsevier.com/locate/measurement
A R T I C L E I N F O A B S T R A C T
Keywords: In order to remove the limitations of human interpretation, many computer-aided algorithms have been
Non-destructive testing developed to automatically detect defects in radiographic images. Compared with traditional detection algo
Radiographic images rithms, deep learning algorithms have the advantages of strong generalization ability and automatic feature
Welding defects
extraction, and have been applied in welding defect detection. However, these algorithms still need further
Deep learning
Semantic segmentation
research in the acquisition and cleaning of welding radiographic image data, the selection and optimization of
deep neural networks, and the generalization and interpretation of network models. Therefore, this paper pro
poses an automatic welding defect detection system based on semantic segmentation method. Firstly, a dataset of
radiographic images of welding defects, called RIWD, is set up, and the corresponding data preprocessing and
annotation methods are designed for the training and evaluation of the algorithm. Secondly, an end-to-end FPN-
ResNet-34 semantic segmentation network-based defect detection algorithm is implemented, and the network
architecture is experimentally demonstrated to be suitable for defect features extraction and fusion. Thirdly, to
improve the detection performance of the algorithm, an optimization strategy for the network is designed ac
cording to the data characteristics of defects, which includes data augmentation based on combined image
transformations and class balancing using a hybrid loss function with dice loss and focal loss. Finally, to ensure
the reliability of the algorithm, the generalization ability of the algorithm is tested using external validation, and
the defect features learned by the network are visualized by post-interpretation technique. The experimental
results show that our method can correctly discriminate defect types and accurately describe defect boundaries,
achieving 0.90 mPA, 0.86 mR, 0.77 mF1 and 0.73 mIoU, which can be applied to automatically interpret
radiographic images.
1. Introduction film. The weld defect is described by the change in intensity of the
radiographic image. These films should be inspected by certified in
Welding is a widely used joining process in the manufacture of spectors to assess and interpret the quality of the weld, called human
various metal structural parts, such as automobiles, ships, aircraft, interpretation. However, human interpretation suffers from many
pressure vessels and pipelines, etc. In the welding process, due to the drawbacks such as lack of objectivity and consistency, low detection
complex conditions, it is inevitable that various types of welding defects efficiency, and the possibility of missing small defects, which has
will be produced, which seriously damage the quality of welding. developed computer-aided defect detection systems based on radio
Therefore, it is necessary to check the quality of welded joints using non- graphic images [1,2].
destructive testing (NDT) techniques. The radiographic test (RT) used to A considerable amount of literature has been published on automatic
inspect the internal defects of the weld is the critical NDT technique for welding defect detection algorithms. Traditionally, the defects
welding. The X-ray or gamma source produces the weld radiographic segmented using image processing algorithms are characterized by
image by penetrating the weld structure and exposing the photographic human-defined feature vectors and recognized by machine learning
* Corresponding author at: Faculty of Materials and Manufacturing, Beijing University of Technology, Beijing 100124, China.
E-mail addresses: xuhao@emails.bjut.edu.cn (H. Xu), zhihongyan@163.com (Z.H. Yan), miyayuki@emails.bjut.edu.cn (B.W. Ji), huangpf@bjut.edu.cn
(P.F. Huang), chengjianpeng@emails.bjut.edu.cn (J.P. Cheng), wxd184@163.com (X.D. Wu).
https://doi.org/10.1016/j.measurement.2021.110569
Received 22 September 2021; Received in revised form 1 December 2021; Accepted 3 December 2021
Available online 8 December 2021
0263-2241/© 2021 Elsevier Ltd. All rights reserved.
H. Xu et al. Measurement 188 (2022) 110569
Fig. 1. Different levels of defect detection tasks. (a): Classification. (b): Object detection. (c): Segmentation. (d): Semantic segmentation.
2
H. Xu et al. Measurement 188 (2022) 110569
Fig. 2. The framework of the defect detection system. The key steps are marked in red.
Fig. 3. Some example images (after extracting the weld zone) of the dataset. (a): R0001. (b): R0002. (c): W0004.
similarity of defects, Dong et al. [43] proposed a pyramid feature (2). Generalization and interpretation of detection algorithms:
fusion and global context attention network. Du et al. [44] Most researches have only verified the effectiveness of algorithms
improved the detection accuracy of Faster R-CNN using FPN and in a specific project rather than in different scenarios, which ig
RoIAlign. The method proposed by Jiang et al. [45] includes an nores the advantage of the transferability of deep learning algo
improved pooling strategy and an enhanced feature selection rithms. In addition, DNN models are able to learn features
method. Gong et al. [46] proposed a deep transfer learning model automatically, but this hidden learning process also reduces the
to extract defect features. interpretability of the models, raising concerns about the reli
ability of applying the black-box models in the field of quality
Although DNNs have shown some advantages in welding defect inspection.
detection tasks, the following problems still exist in the application:
Therefore, to achieve high-level defect detection tasks and to solve
(1). The levels of defect detection tasks: The algorithms detect the problems of difficult labeling and imbalance of data as well as the
defects at different levels, as shown in Fig. 1, from top to bottom, poor interpretation and unverified generalization of algorithms, this
the algorithms provide finer granularity detection and richer paper proposes an automatic welding defect detection system in radio
defect information. However, limited by the annotation granu graphic images based on semantic segmentation method, as shown in
larity of the data[47], a majority of studies have only imple Fig. 2. The main contributions are as follows:
mented defects classification [39,41,45,46,48–50] (Fig. 1 (a)) or
object detection [29,35,44] (Fig. 1 (b)). A few studies have (1). Based on the collected data, a dataset of radiographic images of
implemented defects segmentation, but have not recognized welding defects, called RIWD, is constructed as the basis for the
types of defects [36,42] (Fig. 1 (c) [51]). The information pro algorithm study. Moreover, the corresponding data processing
vided by these tasks above is not sufficient for radiographic film method is designed, including image preprocessing, annotation,
interpretation. Because the welding quality rating criteria specify and sample set setting, for training and evaluation of the semantic
the type, number, spacing and size of acceptable defects, which segmentation network.
requires the detection algorithm to not only recognize defect (2). An end-to-end welding defect detection algorithm based on the
types but also to describe defect boundaries, i.e., the semantic FPN-ResNet-34 semantic segmentation network is implemented.
segmentation task of defects (Fig. 1 (d)). The model can output complete information, including the defect
3
H. Xu et al. Measurement 188 (2022) 110569
Fig. 4. Comparison of cropping strategies. (a): Resize and crop the image. (b): Keep the original resolution of the image.
Fig. 5. Image annotation process. (a): The complex defect boundaries in the image are difficult to be manually labeled. (b): The defect regions are automatically
labeled by the MSMI algorithm. (c): The types of defect regions are manually labeled.
Table 1
The statistical information about defects in the sample set.
Radiographic Image Depth Defect Region
Types Number Pixel Pixel Training Valid Test Set Film Rescale Annotation
of of number ratio of Set Set (kpixel)
defects defects of defects defects (kpixel) (kpixel)
Weld Bead
(kpixel) (%)
Extraction
CR 623 1689.72 0.49 1351.78 236.56 101.38
Digitization Defect Category
PO 3707 2460.78 0.72 2042.45 123.04 295.29 (Scan/Photograph) Annotation
Image
SL 897 777.03 0.23 696.02 16.31 85.47 Cropping
LPF 625 1444.64 0.42 1227.94 130.02 75.12
Radiographic Patch Image Ground Truth
Image Sample Label
category, boundary and location by automatically learning and
aggregating the semantic features of the defect, which can be
directly used for rating the weld quality. RIWD GDXray
(3). The experiments verified that the architecture of FPN-ResNet-34 R0002 R0001 W0004
semantic segmentation network is suitable for the defect features
extraction and fusion. Furthermore, according to the data char
acteristics of defects, the optimization strategy of the network is Sample Set
External
designed, including data augmentation based on image trans Validation Set
Training Set Validation Set Test Set
formation and class balancing based on loss function, which
effectively improves the detection performance of the network. Fig. 6. The framework of data acquisition and processing.
(4). The algorithm is tested on an external validation set to verify its
generalization ability to different scenarios. In addition, the
4
H. Xu et al. Measurement 188 (2022) 110569
reliability of the algorithm is analyzed by visualizing the features 2.2. Image preprocessing
learned from the network.
The images in our dataset are preprocessed:
The remaining part of this paper is organized as follows: we present
the data-related work in Section 2. After that, Section 3 briefly describes (1). Image bit depth rescaling: The original 12-bit data depth is
network architectures and optimization methods. Section 4 designs ex rescaled to 8 bits, using the linear color look-up table method to
periments to validate our method, and analyzes and discusses the ensure that all necessary defect information remains in the 8-bit
experimental results in detail. Finally, we draw conclusions in Section 5. image.
(2). Weld zone extraction: The weld zone is adaptively segmented
2. Data acquisition and processing using the maximum between-cluster variance method, which
avoids other zones to interfere with the detection.
In the field of weld RT, the lack of defect data has restricted the (3). Image cropping: The original radiographic image is normally
development of algorithms. In this section, we acquire weld radio high-resolution and long strip, which needs to be cropped into
graphic images, construct a dataset RIWD, preprocess and annotate the low-resolution small patches to input the model and reduce
images as learning samples for DNNs. computational resources. We propose a cropping strategy for the
welds, which resizes the image to the target resolution and then
2.1. Datasets crops small patches in a tiled pattern. In contrast, many studies
use a cropping strategy that keeps the original resolution
2.1.1. RIWD [39,41,49]. Fig. 4 compares these two strategies, where both
Based on digitized weld radiographic films, we constructed a dataset original images are converted to 224 × 224 resolution patches.
of radiographic images of welding defects (RIWD), which consists of two Compared with the cropping strategy that keeps the original
subsets: resolution, our strategy obtains a smaller number of patches, but
can retain more weld context information and provide a larger
(1). R0001: We photograph 57 illustrative plats of typical welding receptive field, which is helpful for the training of the model.
defects, as shown in Fig. 3 (a), on which the defects are obvious
and numerous, suitable for training the algorithm to learn the 2.3. Image annotation
distinguishing defect features.
(2). R0002: We scan 59 weld radiographic films from realistic tests, Training the semantic segmentation network requires pixel-level
as shown in Fig. 3 (b), on which the defects are rare, small and ground truth annotations. While manually annotating these complex
unobvious, but this subset can be used to verify the algorithm’s defect regions (Fig. 5 (a)) is difficult and expensive. So, we annotate
performance in realistic detection. images as follows:
2.1.2. GDXray (1). Defect region annotation: The MSMI algorithm we proposed can
GDXray [51] is a public dataset, including a group of X-ray welding extract the defect region adaptively [14], so this algorithm is
images (Welds). The group Welds contains 88 images arranged in 3 applied in cropped images and accurately describes defect
series, which is used to evaluate the performance of detection algorithms boundaries, as shown in Fig. 5 (b), which greatly simplifies the
[36,39,49]. annotation operation. We also manually exclude some defect
We select 38 images from the group Welds, called Series W0004 regions that are incorrectly segmented by this algorithm.
(subset) (Fig. 3 (c)). And this subset is extended with multi-class ground (2). Defect type annotation: According to ISO 6520 and ISO 5817, we
truth annotations to evaluate the performance of the semantic seg annotate these regions as four common defect types: Crack (CR),
mentation algorithm. Porosity (PO), Slag inclusion (SL), Lack of penetration or Lack of
fusion (LPF), as shown in Fig. 5 (c).
5
H. Xu et al. Measurement 188 (2022) 110569
Fig. 8. Other semantic segmentation networks architectures. (a): U-net. (b): PSPNet. (c): Linknet.
6
H. Xu et al. Measurement 188 (2022) 110569
Table 2 Table 3
Combination of image transformations. Weights setting.
Types of Methods of transformations Probabilities Types CR PO SL LPF Background
transformations
Weights λ Dice 2 1 3 2 0.5
Geometric Horizontal flip 0.5
transformation
Noise addition Gaussian noise 0.2
Gray-level CLAHE; Random brightness;Random 0.9 Table 4
transformation Gamma Hyperparameters configuration.
Grayscale Random Contrast; Random HSV 0.9
transformation Hyperparameters Values
Fuzzy mapping Sharpening; Random blur; Motion blur 0.9 Classification categories 5
Batch size 8
Learning rate 0.0001
weak texture, etc.), the data characteristics of defects also bring chal Training iterations 30
lenges for detection: Activation function Softmax
Gradient optimum algorithm Adam
Backbone ResNet-34
(1). The data size is relatively small: Although we have expanded Initialization parameters Pre-training weights on ImageNet
the data size as much as possible, which is still less for training Loss function Cross-entropy
DNNs. However, these small number of images contain a large Data augmentation No
number of defect objects (approximately 6000), reflecting the
high annotation complexity of our RIWD dataset.
pathway consists of four down-sampling stages (also known as
(2). Defect objects are too small: All types of defects only occupy
pyramid levels) where the convolutional layers in the same stage
less than 2% of pixels in images in total. It is a challenge to detect
produce output feature maps of the same size. And CNNs can be
such small defects from the weld background, which can also be
used to construct this pathway.
regarded as an extremely imbalanced problem between the
(2). Top-down pathway: The top-down pathway produces higher
foreground and background categories.
resolution features by up-sampling (the nearest interpolation)
(3). Category imbalance of defects: Both the number and the pixel
spatially coarser, but semantically stronger, feature maps from
number of defects prove this, which may cause poor prediction of
higher pyramid levels.
defect categories with small samples.
(3). Lateral connections: The features from the top-down pathway are
enhanced by the features from the corresponding bottom-up
The above challenges need to be taken into account when designing
pathway via lateral connections. The bottom-up feature map is
our algorithm.
of lower semantics, but its activations are more accurately
In summary, the framework in terms of data of this paper is shown in
localized as it is subsampled fewer times, so it can be used to
Fig. 6.
3. Methodology
Table 5
In this section, we briefly explain the deep learning methods, Comparisons of performances and efficiencies on semantic segmentation
including network architectures and optimization strategies. networks.
Models mIoU mF1 mPA mR Training Inference
time(s) time(s)
3.1. FPN-based semantic segmentation network
FPN 0.68 ± 0.68 ± 0.80 ± 0.86 ± 296.8 ± 0.040 ±
0.00 0.00 0.01 0.01 3.6 0.001
Feature Pyramid Network (FPN) is a feature extraction network [52], U-net 0.47 ± 0.50 ± 0.64 ± 0.79 ± 192.9 ± 0.038 ±
originally proposed for multi-scale object detection, which is con 0.01 0.01 0.01 0.01 4.5 0.002
structed from three parts: PSPNet 0.62 ± 0.63 ± 0.80 ± 0.76 ± 390.0 ± 0.040 ±
0.01 0.01 0.01 0.00 9.4 0.001
Linknet 0.55 ± 0.57 ± 0.73 ± 0.79 ± 199.8 ± 0.039 ±
(1). Bottom-up pathway: The bottom-up pathway computes a feature
0.06 0.06 0.07 0.01 7.2 0.001
hierarchy consisting of feature maps at several scales. This
Fig. 9. Example of data augmentation by combined image transformations. (a): Original samples. (b): Augmented samples.
7
H. Xu et al. Measurement 188 (2022) 110569
Fig. 10. Comparison of defect segmentation results. (a): Origin image. (b): Ground truth. (c): FPN. (d): U-net. (e): PSPNet. (f): Linknet.
Table 6
Comparisons of performances and efficiencies between ResNet backbones with different depths.
ResNets mIoU mF1 mPA mR Para. Training time(s) Inference time(s)
Table 7
Comparisons of performances and efficiencies between some advanced backbones.
Backbones mIoU mF1 mPA mR Para. Trainingtime(s) Inference time(s)
enhance the top-down feature. The feature maps are laterally from the features of FPN [53]. The features of each level after the
connected by element-wise addition. connection are up-sampled to feature maps of the same resolution by
bilinear interpolation, then fused into one feature map by channel
A branch is then added to generate the semantic segmentation output concatenation, and finally the size and channel of this feature map are
8
H. Xu et al. Measurement 188 (2022) 110569
9
H. Xu et al. Measurement 188 (2022) 110569
Fig. 11. Training curves. (a): Loss function curves. (b): mIoU curves. (c): mF1 curves.
where gtc and prc represent the one-hot ground truth and predicted ∑
C
2TPc
probability of category c, respectively, and C is the number of categories, LDice = C − (2)
2TPc + FNc + FPc
set to 5 (4 defect categories plus 1 background category). c=1
10
H. Xu et al. Measurement 188 (2022) 110569
Fig. 12. Obvious defects. (The top figure is original, and the middle figure is annotation, while the bottom figure is prediction result, same below).
∑
C hyperparametric configuration, listed in Table 4. We refer to open
LFocal = − gtc ⋅α⋅(1 − prc )γ ⋅log(prc ) (3) source libraries [62–64] in implementing algorithms.
c=1
where α is the weighting factor for balancing positive and negative 4.2. Evaluation indicators
samples, while γ is the moderating factor for hard and easy samples,
which are set to be 0.25 and 2 here, respectively. In addition, the focal 4.2.1. Performance evaluation indicators
loss also stabilizes the training process. We introduce some evaluation indicators to verify the algorithm’s
Furthermore, to address the extremely imbalanced data distribution defect detection performance:
problem, Zhu et al. [61] uses a hybrid loss function, including dice loss
and focal loss, and the total loss can be formulated as: (1). Mean Class Pixel Accuracy (mPA): Accuracy reflects the pro
portion of correctly predicted pixels in the object class and can be
LHybrid = LDice + λ⋅LFocal (4) used to measure false alarms:
where λ is the tradeoff between dice loss LDice and focal loss LFocal , and is ∑
C− 1
TPc
set to be 1. mPA = (6)
c=1
TPc + FPc
However, we noticed that in our defect detection task, the imbalance
problem not only exists between the foreground and the background, where the background class is not considered, so there are C-1 classes.
but also between the various categories of defects. Therefore, we add a
weighting factor λDice for balancing the categories to dice loss based on (2). Mean Recall (mR): This indicator represents how many pixels of
Eq. (4), and the new loss function can be written as: the object class in the sample are correctly predicted, and it can
be used to measure missed detection:
LHybrid = λDice ⋅LDice +λ⋅LFocal
∑
C− 1
TPc
∑ ∑ (7)
C C
λDice 2TPc mR =
= C − ∑Cc ⋅ − λ⋅ gtc ⋅α⋅(1− prc )γ ⋅log(prc ) TPc + FNc
λ Dice
c=1
2TP c +FN c +FP c c=1
c=1
c=1 c
(5)
(3). Macro F1 score (mF1): It balances accuracy and recall:
where λDice represents the weight of category c, which is set according to
∑
C− 1
2PAc ⋅Rc
the data distribution in the sample set (Table 1), as shown in Table 3. mF1 = (8)
c=1
PAc + Rc
4. Experimental results and discussion Where PAc and Rc represent the accuracy and recall of class c,
respectively.
In this section, we test and discuss the detection performances of the
above methods on weld radiographic images through detailed experi (4). Mean intersection over union ratio (mIoU): It is the ratio of the
ments and comparisons. intersection over the union of the predictive value and the true
value of object class, and is used to indicate the similarity be
4.1. Experimental setup tween the predictive value and the true value:
∑
C− 1
Based on the above sample set, we compare the detection perfor mIoU =
TPc
(9)
mances of several networks and evaluate the improvement effect of c=1
TPc + FNc + FPc
optimization strategies. All experiments are conducted in the Tensor
Flow 2.4 framework on a PC with 32 GB RAM, an Intel i7 processor, an 4.2.2. Efficiency evaluation indicators
NVIDIA RTX 3070 GPU, and a 64-bit Windows 10 operating system, To meet the efficiency requirements of detection, we also evaluate
using Python 3.8 and CUDA 11.2. These experiments share a the algorithm’s efficiency using the following indicators:
11
H. Xu et al. Measurement 188 (2022) 110569
Fig. 13. Weak defects. (a): Crack. (b): Lack of penetration. (c): Lack of fusion.
12
H. Xu et al. Measurement 188 (2022) 110569
(1). Training time: The training time of the model can reflect its 4.3.1. FPN semantic segmentation network for defect feature fusion
computational cost to some extent. Based on the above sample set and training parameters, we train four
(2). Inference time: The execution speed can directly reflect the effi semantic segmentation networks: FPN, U-net, PSPNet and Linknet. Their
ciency of the algorithm, usually expressed by the inference time evaluation results on the validation set are presented in Table 5 (they are
of the model. The time at which the model predicts a sample averaged over 5 runs, and the same for Tables 9 and 10) , from which we
image is recorded. can see that the FPN semantic segmentation network outperforms other
(3). Trainable parameters (Para.): The number of trainable parame networks in all evaluation indicators, especially in mIoU and mF1. And
ters is also provided as a reference to quantify the size of the in terms of computational cost and prediction efficiency, U-net and
model. Linknet perform better because they are structured with fewer compu
tational parameters.
4.3. Network architecture Fig. 10 shows some prediction results, from which we can observe
that: First, when predicting defect regions, except that PSPNet misses
Appropriate feature extraction and aggregation is crucial for defect many weak defects (columns 2 and 3 of Fig. 10), the rest of networks can
classification [53]. And the network’s architecture is the key factor segment defect regions. Second, when predicting the defect types, FPN
influencing defect feature extraction and fusion. Therefore, in this part, predicts more accurately, while other networks have semantic confusion
we select the appropriate network architecture for the welding defect problems, specifically:
detection task, and demonstrate the superiority of the selected model
through comparative experiments. (1). PO have the larger sample size and greater feature differentia
tion, so the models performed well in both their region
13
H. Xu et al. Measurement 188 (2022) 110569
segmentation and category judgement, but also missed some too- the deeper level, which are also better characterized. To explore the
small PO (column 3 of Fig. 10). impact of defect feature extraction on the detection results, we train
(2). The similarity of features between CR and LPF has led to confu FPNs with ResNets of different layers as backbones, and their perfor
sion when the models predict their types, with some models mances on the validation set are presented in Table 6.
predicting the same defect region as both defect types (columns 1, We can see from the experimental results that FPN-ResNet-34 has the
5 and 6 of Fig. 10). best detection performance, which means that the features extracted by
(3). SL have the fewest samples, so they are often misclassified as PO the ResNet-34 backbone are sufficient to characterize the defects. Since
with similar features (column 4 of Fig. 10). defect detection is a lower semantic level task (the weld structures are
fixed and the defect features are simple on the sample images), it is not
The difference in the architecture of the networks is undoubtedly an necessary to extract the defect features by a deeper network. Moreover,
important reason for the above problems. From the perspective of training a DNN with many parameters using few samples may also lead
feature learning, features with different levels of CNNs have different to overfitting, reducing detection performance (and efficiency). There
sensitivities to objects. Low-level features have higher resolution, so fore, the feature extraction network with the best detection performance
they generate clear and detailed boundaries, and are sensitive to posi is not the deepest one, but the one best suited to the defect semantic
tional deviations, but with less contextual semantic information. While feature level.
high-level features have more abstract semantic information, which is In addition, the number of parameters and the time to predict a
the main basis for classification, but weaker shape and location infor sample for ResNet with different layers are also given in Table 6.
mation [65]. Assuming that a weld radiograph is cropped to 10 sample images
Thus, the accurate prediction of defect boundaries and types by FPN (usually less than 10), it takes only about 0.4 s for FPN-ResNet-34 to
proves that it has learned both low-level and high-level features of de interpret it. Therefore, this end-to-end model can basically come up to
fects. However, the defect prediction results of other networks are not as the standard of real-time detection.
good as FPN, even they should extract the same defect features based on The computational overhead of the network is largely controllable.
the same backbone (ResNet-34), which illustrates that the key to the As listed in Table 5, the training time and inference time of the model are
problem lies in the difference of the feature fusion modules. relatively stable. Because the size of the input images and the number of
In the feature fusion architecture, the biggest difference between network trainable parameters are both fixed. Moreover, original images
FPN and other networks is that FPN concatenates the multi-layer feature are converted to 224 × 224 resolution patches by preprocessing
maps obtained from the pyramid structure to predict the semantic seg methods and the shallower ResNet-34 is selected as the backbone
mentation results. In this way, the extracted high-level semantic features network, which all reduce the computational burden to some extent.
are directly used for prediction, cleverly avoiding the loss of these deep In summary, we select the FPN-ResNet-34 semantic segmentation
features during the transfer between the models’ different layers, and network to build our defect detection system.
the contribution of these semantic features to the prediction results
guarantees the correct classification of defects. While U-net and Linknet 4.3.3. Other advanced backbones and hyperparameters setting
also concatenate high-level semantic features, but these features In addition to ResNet, we apply some other advanced networks as
contribute less to the prediction results, which leads to misclassification backbones of FPN, including ResNeXt[66], Se-ResNeXt[67], Inception-
of defect types. And PSPNet pools the extracted feature maps to too v3[68], and DenseNet[69]. As given in Table 7, replacing these back
small size, which leads to missed detection of weak defects. bone networks did not significantly improve the detection performance,
In summary, the experiments and results analysis demonstrate that but rather increased the computational cost. We believe that the reason
the FPN semantic segmentation network is very suitable for defect is that these advanced models are designed for generic tasks and not for
detection tasks due to its feature fusion architecture, and it shows good specific defect detection challenges.
detection performance on the sample set. When initializing the model parameters, we utilize the pre-training
weights of the backbone on ImageNet. Specifically, the encoder pa
4.3.2. ResNet backbone for defect feature extraction rameters pre-trained on ImageNet are frozen and copied to the network,
Without degradation, deeper networks extract semantic features at after which the unfrozen encoder and the randomly initialized decoder
14
H. Xu et al. Measurement 188 (2022) 110569
15
H. Xu et al. Measurement 188 (2022) 110569
Fig. 18. Visualization of the middle activation layers. From left to right: input image, layer from stage 1, stage 2, stage 3 of the network. (a): Porosity. (b): Lack of
penetration. (c): Multiple defects.
the model. and focal loss outperform cross-entropy loss on our model, which shows
that balancing the categories by the loss function can improve the per
4.4.2. Loss function-based category balancing formance of the model (see Fig. 11 (b) (c)). Second, the hybrid loss does
Another challenge of our defect detection task is caused by the not improve the model performance, while it significantly improves the
imbalanced data distribution, which is reflected in two aspects: network after adding the weights for balancing defect categories, which
proves the superiority of the loss function we designed for defect data
(1). Imbalance between foreground and background: Since the loss distribution. Third, if the model is not trained with focal loss, the pre
function of image semantic segmentation is based on pixel-wise diction results may not contain defects (all background) and the model
labeling, and small defects with few pixels contribute less to the needs to be retrained, which demonstrates the role of the focal loss
loss, which may cause the model trained to minimize loss actually function for stable training.
has a poor detection performance.
(2). Imbalance among defect categories: The difference in the 4.4.3. Visualization of test results
contribution to loss by various defect types may lead to the It is also worth noting that although our method is not outstanding in
model’s poor prediction performance on the defect types with the evaluation indicators, this does not mean that it has poor detection
few samples. performance. Because these indicators are evaluated based on defects’
pixels, rather than defects’ individual objects. This leads to the evalua
This problem is also reflected in the detection results in Fig. 10, so we tion indicators underestimating the detection performance of the model.
solve it by using the loss function of Eq. (5). To validate the effect of It is possible that just a few pixel errors between the defect boundaries
different loss functions on the model performance, we train FPN-ResNet- predicted by the model and the labeled boundaries can cause a signifi
34 using different loss functions. We try five loss functions, including cant degradation of the indicators (especially for IoU, which is very
cross-entropy loss, dice loss, focal loss, hybrid loss between dice loss and sensitive to object position deviations), even though these defects can be
focal loss, and our proposed hybrid loss with added category weights. considered as detected.
The performances of the model trained with the five loss functions Therefore, in this part, we show some defect detection results of the
described above are shown in Table 10, and the training curves is shown optimized FPN-ResNet-34 model on the test set (as shown in Fig. 12-
in Fig. 11. Fig. 16, from top to bottom, original images, ground truth annotations,
We notice some observations from this experiment: First, dice loss and prediction results) to reflect its detection performance more
16
H. Xu et al. Measurement 188 (2022) 110569
intuitively. Firstly, the model can accurately predict the boundaries of network to other detection scenarios, we use the optimized FPN-ResNet-
obvious defects (Fig. 12). Secondly, for some images with detection 34 to predict the images in the external validation set (R0002),
challenges, whether they are CR and LPF with low contrast (Fig. 13), or achieving 0.63 mIoU and 0.64 mF1.
small object PO and SL (Fig. 14), the model is able to detect these de Fig. 17 shows some of the predicted results, and although these films
fects. Finally, due to the excellent feature learning capability of the from realistic projects are more difficult to detect, the network also
DNN, the model can also be adapted to detect various types of defects detects their defects relatively accurately. Based on the good general
(Fig. 15) and in multiple welding scenarios (Fig. 16) of the sample set. ization performance of the DNN, our method shows the potential for
These visualization results demonstrate the good defect detection per applying in practical inspection.
formance of our method.
4.5.2. Feature visualization-based network interpretation
4.5. Network verification The reason why DNNs have strong generalization ability is because
they learn generic semantic features. To better understand these fea
4.5.1. External validation-based network generalization tures, we visualized the middle activation layers of FPN-ResNet-34 using
Although the optimized model performs well on the sample set, this the post-interpretation technique, and Fig. 18 shows the transformation
does not mean that it is also reliable in detecting weld defects for other patterns of these different layers for the input image.
scenarios. Therefore, to evaluate the generalization ability of the We can clearly observe from the figure that the shallow feature maps
17
H. Xu et al. Measurement 188 (2022) 110569
are only simple transformations of input images, which almost contain study. Furthermore, in view of the characteristics of weld radio
the complete original information, i.e., the low-level features retain graphic images, an image preprocessing and annotation method
more defect location and boundary information. While as the network is designed, which preserves the original information of the data
layers get deeper, the feature maps become more abstract, representing while enabling the images to be used for training and evaluation
higher-level features. In the last layer, it is obvious that some feature of the semantic segmentation networks.
maps that only activate the defect regions, i.e., high-level semantic (2). Based on the FPN-ResNet-34 semantic segmentation network, an
features, which are the main basis for defect prediction. end-to-end welding defect detection is implemented. The
The visualized middle activation layer intuitively shows the defect network receives weld radiographic images as input, automati
features learned by the network. While FPN-ResNet-34 extracts the cally extracts and fuses semantic features of defects, and outputs
defect high-level features in its third stage, consistent with the viewpoint complete information, including defect category, boundary, and
about the defect feature extraction and fusion that we obtained when location, which can be directly used for weld quality rating.
discussing the network architecture earlier. Compared with other networks, FPN-ResNet-34 exhibits better
detection performance due to its architectural features, achieving
0.66 mIoU, 0.67 mF1, 0.88 mPA, and 0.76 mR on the validation
4.6. Limitations and future work
set. Moreover, the network predicts a single sample image in only
0.04 s, which can meet the efficiency requirement of real-time
In summary, we propose an automatic detection method for welding
detection.
defects in radiographic images based on semantic segmentation, and the
(3). According to the data characteristics of welding defects, the
overall framework of this method is shown in Fig. 19.
optimization strategy of FPN-ResNet-34 is proposed, including
Although our method has shown good detection performance, it still
image transformation-based data augmentation and hybrid loss
has some limitations that need to be improved in our future work:
function-based data distribution balancing. The improvement of
network performance by the optimization strategy is demon
(1). Image transformation has limited augmentation of our data, so
strated experimentally, with the optimized network achieving
we will probably try some new data augmentation methods, such
0.77 mIOU, 0.73 mF1, 0.90 mPA and 0.86 mR.
as manual defect simulation, GAN-based defect generation or
(4). To verify our method’s generalization ability to different detec
other reconstruction methods [70].
tion scenarios, the network is tested on an external validation set
(2). Although the network exhibits a certain generalization ability, it
and achieves 0.63 mIoU and 0.64 mF1, indicating its potential for
does not perform excellently on the external validation set, as
application to practical welding inspection engineering. The
shown in Fig. 20, with false alarms. In the next step, we will
defect features learned by the network are shown by visualizing
design a transfer learning strategy for the algorithm, so that the
the middle activation layer, which illustrates the robustness and
trained model can be fine-tuned based on external data to
generalizability of our method from the perspective of feature
improve its transfer performance and give the algorithm the
learning.
accumulated learning ability when facing new data.
(3). In this paper, we use the MSMI algorithm to automatically label
In addition, the idea of constructing the system in this paper can also
the defect regions, which greatly reduces the workload of data
be extended to other NDT fields.
annotation, but the labeling of defect types is still done artifi
cially. Compared with supervised learning that requires labeled
CRediT authorship contribution statement
data, unsupervised learning is certainly a new idea to solve the
problem. Therefore, in the future, we may build a better defect
H. Xu: Conceptualization, Methodology, Software, Formal analysis,
detection system based on unsupervised learning.
Data curation, Writing – original draft, Writing – review & editing,
Visualization. Z.H. Yan: Conceptualization, Methodology, Software,
5. Conclusions Validation, Data curation, Writing – review & editing, Funding acqui
sition, Supervision. B.W. Ji: Software, Investigation, Resources, Data
In this paper, we construct an automatic welding defect detection curation. P.F. Huang: Investigation, Resources. J.P. Cheng: Investiga
system in radiographic images based on semantic segmentation method. tion. X.D. Wu: Resources.
The main research results of this paper are summarized as follows:
18
H. Xu et al. Measurement 188 (2022) 110569
Declaration of Competing Interest [25] B. Chen, Z. Fang, Y. Xia, L. Zhang, Y. Huang, L. Wang, Accurate defect detection via
sparsity reconstruction for weld radiographs, NDT and E Int. 94 (2018) 62–69.
[26] F.M. Suyama, M.R. Delgado, R. Dutra da Silva, T.M. Centeno, Deep neural
The authors declare that they have no known competing financial networks based approach for welded joint detection of oil pipelines in radiographic
interests or personal relationships that could have appeared to influence images with Double Wall Double Image exposure, NDT and E Int. 105 (2019)
the work reported in this paper. 46–55.
[27] P. Sassi, P. Tripicchio, C.A. Avizzano, A Smart Monitoring System for Automatic
Welding Defect Detection, IEEE Trans. Ind. Electron. 66 (12) (2019) 9641–9650.
Acknowledgement [28] J.P. Yun, W.C. Shin, G. Koo, M.S. Kim, C. Lee, S.J. Lee, Automated defect inspection
system for metal surfaces based on deep learning and data augmentation, J. Manuf.
Syst. 55 (2020) 317–324.
The authors wish to thank the editor and the reviewers for their [29] J.-K. Park, W.-H. An, D.-J. Kang, Convolutional Neural Network Based Surface
helpful suggestions. Inspection System for Non-patterned Welding Defects, Int. J. Precis. Eng. Manuf.
20 (3) (2019) 363–374.
[30] J. Lin, Y.u. Yao, L. Ma, Y. Wang, Detection of a casting defect tracked by deep
Funding convolution neural network, Int. J. Adv. Manuf. Technol. 97 (1-4) (2018) 573–581.
[31] K. Zhang, H. Shen, Solder Joint Defect Detection in the Connectors Using Improved
The work in this research is financially supported by the National Faster-RCNN Algorithm, Appl. Sci. 11 (2) (2021) 576, https://doi.org/10.3390/
app11020576.
Natural Science Foundation of China (Grant Nos. 51975015). [32] Y. Yang, R. Yang, L. Pan, J. Ma, Y. Zhu, T. Diao, L.i. Zhang, A lightweight deep
learning algorithm for inspection of laser welding defects on safety vent of power
References battery, Comput. Ind. 123 (2020) 103306, https://doi.org/10.1016/j.
compind.2020.103306.
[33] X. Zhang, Y. Hao, H. Shangguan, P. Zhang, A. Wang, Detection of surface defects on
[1] T.W. Liao, Improving the accuracy of computer-aided radiographic weld inspection
solar cells by fusing Multi-channel convolution neural networks, Infrared Phys.
by feature selection, NDT & E Int. 42 (4) (2009) 229–239.
Technol. 108 (2020) 103334, https://doi.org/10.1016/j.infrared.2020.103334.
[2] W. Hou, D. Zhang, Y.e. Wei, J. Guo, X. Zhang, Review on Computer Aided Weld
[34] Y. Wang, F. Shi, X. Tong, A Welding Defect Identification Approach in X-ray Images
Defect Detection from Radiography Images, Appl. Sci. 10 (5) (2020) 1878, https://
Based on Deep Convolutional Neural Networks, Springer International Publishing,
doi.org/10.3390/app10051878.
Cham, 2019, pp. 53–64.
[3] Y. Zou, D. Du, B. Chang, L. Ji, J. Pan, Automatic weld defect detection method
[35] Oh S-j, Jung M-j, Lim C, Shin S-c. Automatic Detection of Welding Defects Using
based on Kalman filtering for real-time radiographic inspection of spiral pipe, NDT
Faster R-CNN. Applied Sciences. 2020;10.
& E Int. 72 (2015) 1–9.
[36] L. Yang, H. Wang, B. Huo, F. Li, Y. Liu, An automatic welding defect location
[4] M. Malarvel, G. Sethumadhavan, P.C. Rao Bhagi, S. Kar, T. Saravanan, A. Krishnan,
algorithm based on deep learning, NDT & E Int. 120 (2021) 102435, https://doi.
Anisotropic diffusion based denoising on X-radiography images to detect weld
org/10.1016/j.ndteint.2021.102435.
defects, Digital Signal Process. 68 (2017) 112–126.
[37] R. Guo, H. Liu, G. Xie, Y. Zhang, Weld Defect Detection From Imbalanced
[5] X. Wang, B.S. Wong, Radiographic Image Segmentation for Weld Inspection Using
Radiographic Images Based on Contrast Enhancement Conditional Generative
a Robust Algorithm, Res. Nondestr. Eval. 16 (3) (2005) 131–142.
Adversarial Network and Transfer Learning, IEEE Sens. J. 21 (9) (2021)
[6] Z. Lin, Z. Yingjie, D. Bochao, C. Bo, L.J.I.I.P. Yangfan, Welding defect detection
10844–10853.
based on local image enhancement. 13 (2019), 2647–2658.
[38] L. Yang, Y. Liu, J. Peng, An Automatic Detection and Identification Method of
[7] O. Zahran, H. Kasban, M. El-Kordy, F.E.A. El-Samie, Automatic weld defect
Welded Joints Based on Deep Neural Network, IEEE Access 7 (2019)
identification from radiographic images, NDT & E Int. 57 (2013) 26–35.
164952–164961.
[8] Y. Wang, Y.i. Sun, P. Lv, H. Wang, Detection of line weld defects based on multiple
[39] W. Hou, Y. Wei, Y. Jin, C.J.M. Zhu, Deep features based on a DCNN model for
thresholds and support vector machine, NDT & E Int. 41 (7) (2008) 517–524.
classifying imbalanced weld flaw types. 131 (2019) 482–489.
[9] R. Vilar, J. Zapata, R. Ruiz, An automatic system of classification of weld defects in
[40] T.W. Liao, Classification of weld flaws with imbalanced class data, Expert Syst.
radiographic images, NDT & E Int. 42 (5) (2009) 467–476.
Appl. 35 (3) (2008) 1041–1052.
[10] G. Wang, T.W.J.N. Liao, International E. Automatic identification of different types
[41] X. Le, J. Mei, H. Zhang, B. Zhou, J. Xi, A learning-based approach for surface defect
of welding defects in radiographic images.35 (2002) 519–528.
detection using small image datasets, Neurocomputing. 408 (2020) 112–120.
[11] J. Shao, D. Du, B. Chang, H. Shi, Automatic weld defect detection based on
[42] X. Dong, C.J. Taylor, T.F. Cootes, Automatic aerospace weld inspection using
potential defect tracking in real-time radiographic image sequence, NDT & E Int.
unsupervised local deep feature learning, Knowl.-Based Syst. 221 (2021) 106892,
46 (2012) 14–21.
https://doi.org/10.1016/j.knosys.2021.106892.
[12] Alaknanda, R.S. Anand, P. Kumar, Flaw detection in radiographic weldment
[43] H. Dong, K. Song, Y.u. He, J. Xu, Y. Yan, Q. Meng, PGA-Net: Pyramid Feature
images using morphological watershed segmentation technique, NDT & E Int. 42
Fusion and Global Context Attention Network for Automated Surface Defect
(1) (2009) 2–8.
Detection, IEEE Trans. Ind. Inf. 16 (12) (2020) 7448–7458.
[13] Alaknanda, R.S. Anand, P. Kumar, Flaw detection in radiographic weld images
[44] W. Du, H. Shen, J. Fu, G.e. Zhang, Q. He, Approaches for improvement of the X-ray
using morphological approach, NDT & E Int. 39 (1) (2006) 29–33.
image defect detection of automobile casting aluminum parts based on deep
[14] Z.H. Yan, H. Xu, P.F. Huang, Multi-scale multi-intensity defect detection in ray
learning, NDT & E Int. 107 (2019) 102144, https://doi.org/10.1016/j.
image of weld bead, NDT & E Int. 116 (2020) 102342, https://doi.org/10.1016/j.
ndteint.2019.102144.
ndteint.2020.102342.
[45] H. Jiang, Q. Hu, Z. Zhi, J. Gao, Z. Gao, R. Wang, S. He, H. Li, Convolution neural
[15] N. Nacereddine, A.B. Goumeidane, D. Ziou, Unsupervised weld defect classification
network model with improved pooling strategy and feature selection for weld
in radiographic images using multivariate generalized Gaussian mixture model
defect recognition, Welding World. 65 (4) (2021) 731–744.
with exact computation of mean and shape parameters, Comput. Ind. 108 (2019)
[46] Y. Gong, H. Shao, J. Luo, Z. Li, A deep transfer learning model for inclusion defect
132–149.
detection of aeronautics composite materials, Compos. Struct. 252 (2020) 112681,
[16] I. Valavanis, D. Kosmopoulos, Multiclass defect detection and classification in weld
https://doi.org/10.1016/j.compstruct.2020.112681.
radiographic images using geometric and texture features, Expert Syst. Appl. 37
[47] Q. Sun, Z. Ge, A Survey on Deep Learning for Data-Driven Soft Sensors, IEEE Trans.
(12) (2010) 7606–7614.
Ind. Inf. 17 (9) (2021) 5853–5866.
[17] H. Kasban, O. Zahran, H. Arafa, M. El-Kordy, S.M.S. Elaraby, F.E. Abd El-Samie,
[48] W. Hou, Y.e. Wei, J. Guo, Y.i. Jin, C. Zhu, Automatic Detection of Welding Defects
Welding defect detection from radiography images with a cepstral approach, NDT
using Deep Neural Network, J. Phys. Conf. Ser. 933 (2018) 012006, https://doi.
& E Int. 44 (2) (2011) 226–231.
org/10.1088/1742-6596/933/1/012006.
[18] N. Nacereddine, D. Ziou, L. Hamami, Fusion-based shape descriptor for weld defect
[49] C. Ajmi, J. Zapata, S. Elferchichi, A. Zaafouri, K. Laabidi, Deep Learning
radiographic image retrieval, Int. J. Adv. Manuf. Technol. 68 (9-12) (2013)
Technology for Weld Defects Classification Based on Transfer Learning and
2815–2832.
Activation Features, Adv. Mater. Sci. Eng. 2020 (2020) 1–16.
[19] L.u. Yang, H. Jiang, Weld defect classification in radiographic images using unified
[50] H. Jiang, R. Wang, Z. Gao, J. Gao, H. Wang, Classification of weld defects based on
deep neural network with multi-level features, J. Intell. Manuf. 32 (2) (2021)
the analytical hierarchy process and Dempster-Shafer evidence theory, J. Intell.
459–469.
Manuf. 30 (4) (2019) 2013–2024.
[20] Z. Yan, B. Shi, L. Sun, J. Xiao, Surface defect detection of aluminum alloy welds
[51] D. Mery, V. Riffo, U. Zscherpel, G. Mondragón, I. Lillo, I. Zuccar, H. Lobel,
with 3D depth image and 2D gray image, Int. J. Adv. Manuf. Technol. 110 (3-4)
M. Carrasco, GDXray: The Database of X-ray Images for Nondestructive Testing,
(2020) 741–752.
J. Nondestruct. Eval. 34 (4) (2015), https://doi.org/10.1007/s10921-015-0315-7.
[21] R.R. da Silva, L.P. Calôba, M.H.S. Siqueira, J.M.A. Rebello, Pattern recognition of
[52] T-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid
weld defects detected by radiographic test, NDT and E Int. 37 (6) (2004) 461–470.
networks for object detection. Proceedings of the IEEE conference on computer
[22] J. Sun, C. Li, X.-J. Wu, V. Palade, W. Fang, An Effective Method of Weld Defect
vision and pattern recognition (2017), p. 2117–2125.
Detection and Classification Based on Machine Vision, IEEE Trans. Ind. Inf. 15 (12)
[53] A. Kirillov, R. Girshick, K. He, P. Dollár, Panoptic feature pyramid networks, in:
(2019) 6322–6333.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
[23] N. Boaretto, T.M. Centeno, Automated detection of welding defects in pipelines
Recognition, 2019, pp. 6399–6408.
from radiographic images DWDI, NDT and E Int. 86 (2017) 7–13.
[54] O. Ronneberger, P. Fischer, B.T. U-net, Convolutional networks for biomedical
[24] F. Duan, S. Yin, P. Song, W. Zhang, C. Zhu, H. Yokoi, Automatic Welding Defect
image segmentation, in: International Conference on Medical image computing
Detection of X-Ray Images by Using Cascade AdaBoost With Penalty Term, IEEE
and computer-assisted intervention: Springer, 2015, pp. 234–241.
Access 7 (2019) 125929–125938.
19
H. Xu et al. Measurement 188 (2022) 110569
[55] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid scene parsing network. Proceedings [62] Y. Pavel, Segmentation Models, Information (2019).
of the IEEE conference on computer vision and pattern recognition (2017) p. [63] A. Buslaev, V.I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, A.A. Kalinin,
2881–2890. Albumentations: fast and flexible image augmentations, Information 11 (2) (2020)
[56] A. Chaurasia, C.E. Linknet, Exploiting encoder representations for efficient 125, https://doi.org/10.3390/info11020125.
semantic segmentation, in: 2017 IEEE Visual Communications and Image [64] F. Chollet, Deep learning with Python: Simon and Schuster (2017).
Processing (VCIP): IEEE, 2017, pp. 1–4. [65] A. Mahendran, A. Vedaldi, Understanding deep image representations by inverting
[57] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. them. Proceedings of the IEEE conference on computer vision and pattern
Proceedings of the IEEE conference on computer vision and pattern recognition recognition (2015). p. 5188–5196.
(2016), p. 770–778. [66] A.E. Orhan, Robustness properties of Facebook’s ResNeXt WSL models. arXiv
[58] H. Yu, X. Li, K. Song, E. Shang, H. Liu, Y. Yan, Adaptive depth and receptive field preprint arXiv:190707640. 2019.
selection network for defect semantic segmentation on castings X-rays, NDT & E [67] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks. Proceedings of the IEEE
Int. 116 (2020) 102345, https://doi.org/10.1016/j.ndteint.2020.102345. conference on computer vision and pattern recognition (2018). p. 7132–7141.
[59] C.H. Sudre, W. Li, T. Vercauteren, S. Ourselin, M.J. Cardoso, Generalised dice [68] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the inception
overlap as a deep learning loss function for highly unbalanced segmentations, in: architecture for computer vision. Proceedings of the IEEE conference on computer
Deep learning in medical image analysis and multimodal learning for clinical vision and pattern recognition2016. p. 2818–2826.
decision support: Springer, 2017, pp. 240–248. [69] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely connected
[60] T-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object convolutional networks. Proceedings of the IEEE conference on computer vision
detection. Proceedings of the IEEE international conference on computer vision and pattern recognition2017. p. 4700–8.
(2017) p. 2980–2988. [70] X. Liu, S. Chen, L. Song, M. Woniak, S. Liu, Self-attention Negative Feedback
[61] W. Zhu, Y. Huang, L. Zeng, X. Chen, Y. Liu, Z. Qian, N. Du, W. Fan, X. Xie, Network for Real-time Image Super-Resolution, J. King Saud Univ. Comput. Inf.
AnatomyNet: Deep learning for fast and fully automated whole-volume Sci. (2021).
segmentation of head and neck anatomy, Med. Phys. 46 (2) (2019) 576–589.
20