News Classsification
News Classsification
Research Article
News Text Classification Method Based on the GRU_CNN Model
           Lujuan Deng,1 Qingxia Ge,1 Jiaxue Zhang,1 Zuhe Li ,1,2 Zeqi Yu,1 Tiantian Yin,1
           and Hanxue Zhu1
           1
             School of Computer and Communication Engineering, Zhengzhou University of Light Industry,
             Zhengzhou 450002, Henan, China
           2
             Henan Key Laboratory of Food Safety Data Intelligence, Zhengzhou University of Light Industry,
             Zhengzhou 450002, Henan, China
Received 25 June 2022; Revised 25 July 2022; Accepted 16 August 2022; Published 31 August 2022
           Copyright © 2022 Lujuan Deng et al. This is an open access article distributed under the Creative Commons Attribution License,
           which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
           The convolutional neural network can extract local features of text but cannot capture structure information or semantic re-
           lationships between words, and a single CNN model’s classification performance is low, whereas GRU can effectively extract
           semantic information and global structure relationships of text. To address this problem, this paper proposes a news text
           classification method based on the GRU_CNN model, which combines the advantages of CNN and GRU. The model first trains
           word vectors as the embedding layer with the Word2vec model and then extracts semantic information from text sentences with
           the GRU model. Following that, this model employs the CNN model to extract crucial semantic information features and finally
           completes the classification through the Softmax layer. The experimental results reveal that the GRU_CNN hybrid model
           outperforms single CNN, LSTM, and GRU models in terms of classification effect and accuracy.
categorization in recent years. Deep neural network models        requires a lot of calculations during training due to its
have superior feature extraction ability to typical machine       numerous parameters and the complexity of the calculations
learning algorithms, making them ideally suited for appli-        between each gate.
cations in the area of text categorization [10].                       Chung et al. proposed the GRU model [17] on this basis,
    The primary goal of this study is to address the problem      which simplified the structure and training parameters of
of Chinese news text classification using deep learning           LSTM and improved the efficiency of training. However,
models. As GRU can efficiently extract the semantic in-           GRU is unable to compute in parallel and is still unable to
formation and global structural information of text, and as       resolve the gradient vanishing problem entirely.
CNN cannot effectively extract the structural information of           Graves and Schmidhuber [18] applied the bidirectional
text and the semantic relationship of sentences, the classi-      LSTM model for the first time to solve the categorization
fication accuracy of a single CNN model is low. Therefore,        problem and achieved better classification results than the
this paper combines the benefits of the CNN model and the         unidirectional LSTM model. However, compared with
GRU model to propose the GRU_CNN model for the                    LSTM, BiLSTM has more parameters and is more difficult to
classification of news text.                                      calculate. Cao et al. [19] utilized the BiGRU model to cat-
                                                                  egorize Chinese text by synthesizing the context of the ar-
2. Related Work                                                   ticle. The model is simple, with a few parameters and fast
                                                                  convergence.
Among deep learning classification models, there are two               Li and Dong [20] employed CNN to extract text’s local
main types of common categories, namely, convolutional            features and the BiLSTM to extract text’s global features in
neural network and recurrent neural network, which have           order to fully capitalize on the benefits of the features ob-
many applications in text categorization.                         tained by the two models and enhance the model’s ability to
    Kim [11] first proposed a text categorization model           classify data.
based on CNN in 2014, which converts words into fixed-                 In conclusion, a single neural network model often has
length word vectors through Word2vec as the input of CNN          the problem of low classification accuracy. This research
and then uses multisize convolution to check word vector          provides a news text categorization strategy based on the
convolution. Finally, pooling and classification are per-         GRU_CNN model in order to get a better classification
formed. The key advantage of CNN is that it has a quick           performance. This model extracts more accurate text fea-
training speed and can efficiently extract local text features,   tures, and its effectiveness has been demonstrated through
but the pooling layer will lose a lot of vital information and    experiments.
overlook the association between the local and the whole.
    After Kim proposed the model, Zhang and Wallace [12]          3. GRU_CNN Model
also proposed a text categorization method based on CNN
and performed numerous comparative experiments under              The whole process of the GRU_CNN hybrid model is as
various hyperparameter settings. Besides, they also offered       follows: Firstly, input the news text data, preprocess the news
parameter tuning advice and have some experience with             text, and train the text using the Word2vec model to produce
hyperparameter configuration.                                     a word vector comprising the text’s overall information.
    Johnson and Tong proposed the deep pyramid con-               Secondly, to extract contextual semantic information from
volutional neural network (DPCNN) [13], which has low             the text, the vector representation of words obtained by
complexity and excellent categorization power. This model         Word2vec is fed into the GRU model for semantic infor-
primarily investigates word-level CNN and improves ac-            mation extraction. Then, the output results of the GRU
curacy by deepening the depth of the network. DPCNN can           model are input into the CNN model for further semantic
extract long-distance text dependencies by constantly             feature extraction. Finally, the final word vector is input to
deepening the network, but the computational complexity           the Softmax layer for text classification. Figure 1 depicts the
will increase as well, which makes practical applications         GRU_CNN model’s structure.
problematic. Yao et al. proposed a graph-based convolu-
tional neural network (GCN) [14], which is more effective
for small datasets categorization. GCN’s limitations of           3.1. Word2vec Embedding Layer. This paper uses the classic
flexibility and scalability are its main drawbacks.               Word2vec [21] model for word embedding, and the
    Mikolov et al. [15] employed the recurrent neural net-        Word2vec of the Gensim toolkit is used to train word
work model to achieve text categorization. RNN can ac-            vectors, which is simple and fast. The Word2vec model is
commodate inputs of arbitrary length, and the model size          divided into three layers: input, hidden, and output. The
does not grow with the input length. However, the RNN             model’s input is a one-hot vector; that is, the textual in-
model relies on learning features for a long time, which is       formation is represented by x1 , x2 , . . . , xV . Assuming the
prone to the problem of gradient dispersion.                      size of the vocabulary is V and the dimension of each hidden
    Hochreiter and Schmidhuber proposed a long short-             layer is N, the input to the hidden layer is represented by a
term memory network (LSTM) [16] based on RNN, which               matrix W of size V × N. Similarly, word vectors can be
makes up for the classic RNN’s poor learning performance          obtained by connecting the hidden and output layers via a
for long-distance sentences as well as the problem of gra-        N × V matrix W′ . Figure 2 depicts the Word2vec model’s
dient disappearance or gradient explosion. However, LSTM          structure.
International Transactions on Electrical Energy Systems                                                                           3
GRU
x1
                                                                                                                    softmax
                                                 GRU
               x2
              xn-1                               GRU
               xn
GRU
W(t-2) W(t-2)
                                             SUM
                    W(t-1)                                                                                       W(t-1)
W(t) W(t)
W(t-1) W(t-1)
W(t+2) W(t+2)
FeatureMap
Max-pooling
Text category
                     Convolution                                                          fully
                       kernels                                                          connected
                                                                                          layer
   (1) LR: logistic regression is a machine learning cate-          4.4. Evaluation Index. The experiment employs the accuracy
       gorization method for estimating the likelihood of           of text categorization as its evaluation index in order to
       something.                                                   examine the effectiveness of the model put forth in this
                                                                    study. The accuracy rate indicates the proportion of correctly
   (2) NB: Naive Bayes is a machine learning categorization
                                                                    predicted samples to the total samples, and the calculation
       algorithm that leverages Bayes’ theorem.
                                                                    formula is shown in
   (3) CNN: it is the most basic convolutional neural
       network for text categorization.                                                             1 K
                                                                                    accuracy �                 i|.
                                                                                                       |yi � y               (5)
   (4) LSTM: it is the most basic long and short memory                                             K i�1
       network for text categorization.
                                                                        In formula (5), K represents the sample capacity of the
   (5) GRU: it is the most basic gated recurrent unit net-          sample set, yi represents the label of the sample xi , and yi
       work for text categorization.                                is used as the prediction result. When yi and y         i are
   (6) LSTM + CNN: firstly, the LSTM model is utilized to           consistent, the value of the precision rate is 1; otherwise,
       extract the contextual semantic information from the         the value is 0.
       input text, and then the CNN model is used to extract
       the feature of the semantic information. Finally,
       complete classification through the Softmax layer.           4.5. Analysis of Experimental Results. The error and accu-
                                                                    racy of GRU_CNN model obtained by training on the
4.3. Parameters. The experiment presented in this paper is          training set are shown in Figure 6. During the experiment,
based on the TensorFlow framework, and the model’s pa-              the accuracy and loss values are output on the training set
rameter settings determine how well the model trains. Ta-           and validation set every 100 batches and recorded and
ble 2 displays the GRU_CNN model’s specific parameter               saved. Continuously train the model, saving the best-
settings.                                                           performing model for testing on the test set. We can see
6                                                                                                                                                                               International Transactions on Electrical Energy Systems
                     1
                   0.9
                   0.8
                   0.7
                   0.6
                   0.5
                   0.4
                   0.3
                   0.2
                   0.1
                     0
                         step_100
                                                                                step_1000
                                                                                            step_1300
                                                                                                        step_1500
                                                                                                                    step_1700
                                                                                                                                step_1900
                                                                                                                                            step_2100
                                                                                                                                                        step_2300
                                                                                                                                                                    step_2500
                                                                                                                                                                                step_2700
                                                                                                                                                                                            step_2900
                                                                                                                                                                                                        step_3100
                                                                                                                                                                                                                    step_3300
                                                                                                                                                                                                                                step_3500
                                                                                                                                                                                                                                            step_3700
                                                                                                                                                                                                                                                        step_3900
                                                                                                                                                                                                                                                                    step_4100
                                                                                                                                                                                                                                                                                step_4300
                                                                                                                                                                                                                                                                                            step_4500
                                    step_300
                                               step_500
                                                          step_700
                                                                     step_900
                                        accuracy
                                        loss
                                                                     Figure 6: Error and accuracy results of model training.
that the model’s accuracy on the training set has achieved                                                                                                      in this experiment is Adam. The Adam optimization algo-
good results.                                                                                                                                                   rithm has high computational efficiency and fast conver-
     In the experiment, the number of epochs can affect the                                                                                                     gence speed. In order to make the algorithm more efficient,
model’s performance. This study discusses the model clas-                                                                                                       different learning rate values are selected for experiments.
sification effect under different epoch numbers through                                                                                                         Table 5 and Figure 9 illustrate the experimental outcomes. As
experiments. Table 3 and Figure 7 illustrate the experimental                                                                                                   shown by the experimental results, the model’s accuracy is
outcomes. The experimental results show that the number of                                                                                                      highest when Adam’s corresponding learning rate is 0.001.
epochs required by different models to achieve the optimal                                                                                                      Therefore, the learning rate of the Adam optimizer in this
effect varies. The GRU_CNN model reaches the highest                                                                                                            case is 0.001.
accuracy rate when the epoch number is 6. The numbers of                                                                                                             Word vector dimension is also an important parameter.
epochs at which CNN, LSTM, GRU, and LSTM + CNN                                                                                                                  The larger the dimension, the more feature information the
achieve the best results are 3, 2, 4, and 5, respectively.                                                                                                      model can learn, and the greater the risk of the model
Figure 6 shows that the model’s performance first rises and                                                                                                     overfitting. However, the lower the word vector dimension,
then falls with the increase of the number of epochs, which                                                                                                     the greater the risk of the model underfitting. Therefore, the
means that, before reaching the best effect of the model, the                                                                                                   appropriate word vector dimension is also important for the
ability of model learning characteristics will continue to                                                                                                      model’s training effect. This paper selects different dimen-
increase as the number of epochs increases. After the best                                                                                                      sions of the word vectors to train. Table 6 and Figure 10
results, the model will show an overfitting phenomenon so                                                                                                       illustrate the experimental outcomes. As can be observed
that the model’s performance will decrease.                                                                                                                     from the experimental findings that all models’ accuracy
     During the training process of the model, the dropout                                                                                                      reaches the highest in dimension 100, this paper chooses 100
method is introduced into the model in this experiment. The                                                                                                     dimensions to experiment.
value of dropout is also an important parameter, and an                                                                                                              Figure 11 illustrates the experimental results of all
appropriate value can enable the model to converge better,                                                                                                      models on the Cnews dataset. For proving the performance
avoid overfitting, and improve its performance. Therefore,                                                                                                      of the GRU_CNN model, several classical models are se-
we choose different dropout values for training. The dropout                                                                                                    lected for comparative experiments. In the classification
values set in this experiment are [0.2, 0.3, 0.4, 0.5, 0.6, 0.7],                                                                                               model based on machine learning, logistic regression (LR)
respectively. The optimal dropout values are selected based                                                                                                     and Naive Bayes (NB) are selected for comparative exper-
on the training results of the model. Table 4 and Figure 8                                                                                                      iments. In the classification model based on deep learning,
illustrate the experimental outcomes. The findings show that                                                                                                    this paper selects a single CNN, LSTM, and GRU model, as
only the LSTM model has the best effect when the dropout                                                                                                        well as a hybrid model LSTM_CNN for comparative ex-
value is 0.7, and the other models have the best effect when                                                                                                    periments. The experimental results are selected to compare
the dropout value is 0.5. At this time, the dropout value can                                                                                                   the best effect of each model.
effectively prevent the model from overfitting on the premise                                                                                                        As can be noticed from the outcomes of the experi-
of ensuring accuracy. Therefore, the dropout value of the                                                                                                       ments, the GRU_CNN model has the best classification
model in this paper is set to 0.5.                                                                                                                              effect and the highest accuracy on the Cnews dataset.
     During the course of updating parameters by gradient                                                                                                       Among the traditional machine learning classification
backpropagation in the neural network, the optimizer used                                                                                                       algorithm models, the classification accuracy of logistic
International Transactions on Electrical Energy Systems                                                                                        7
99
98
                                  97
                   accuracy (%)
96
95
94
93
                                  92
                                            1         2          3        4          5         6         9          10
epoch number
                                             CNN               LSTM+CNN
                                             LSTM              GRU+CNN
                                             GRU
                                       Figure 7: The accuracy curve of each model under different epoch numbers.
regression (LR) on the Cnews dataset is superior to that of                   account, the GRU_CNN model this study proposes per-
the Naive Bayes model (NB), and the classification ac-                        forms better.
curacy of all deep learning models is higher than that of
traditional machine learning classification models, which
is also the reason why deep learning models are popular                       4.6. Ablation Experiment. In order to verify the effectiveness
nowadays. In the deep learning model, the hybrid neural                       of using Word2vec to train word vectors as the embedding
network model has a greater categorization accuracy than                      layer, this paper first selects CNN, LSTM, and GRU as the
the single deep learning model, and the CNN model has a                       comparative experimental model and compares the exper-
better categorization effect than the LSTM and GRU                            imental results of Word2vec_CNN, Word2vec_LSTM, and
models, as shown by the results. Although the classifica-                     Word2vec_GRU. The experimental results are shown in
tion accuracy of the GRU_CNN model and the                                    Table 7.
LSTM_CNN model is not significantly different, the GRU                            It can be seen that using Word2vec to train the word vector
model has fewer parameters and is easier to calculate than                    as the embedding layer can effectively optimize the vector
the LSTM model. As a result, when everything is taken into                    representation of the input text, so as to obtain a better training
8                                                                                         International Transactions on Electrical Energy Systems
99
98
97
accuracy (%) 96
95
94
93
                                   92
                                              0.2            0.3            0.4             0.5             0.6       0.7
                                                                              dropout number
                                             CNN                LSTM+CNN
                                             LSTM               GRU+CNN
                                             GRU
                                    Figure 8: The accuracy curves of each model under different dropout values.
                                  100
                                   90
                                   80
                                   70
                   accuracy (%)
                                   60
                                   50
                                   40
                                   30
                                   20
                                   10
                                               0.1                 0.01           0.001            0.0001         0.00001
learning_rate numbers
                                             CNN                LSTM+CNN
                                             LSTM               GRU+CNN
                                             GRU
                                        Figure 9: The accuracy curves of each model under different learning rates.
effect. Compared with CNN, the accuracy of Word2vec_CNN                                Next, in order to verify the effectiveness of the
has increased by 1.23%. Compared with LSTM, the accuracy of                        Word2vec_GRU_CNN model proposed in this paper in the
Word2vec_LSTM has increased by 3.32%. Compared with                                text classification task, the Word2vec_CNN and Word2-
GRU, the accuracy of Word2vec_GRU has increased by 2.33%.                          vec_GRU models that performed relatively well in the above
International Transactions on Electrical Energy Systems                                                                                9
                                  99
                                  98
                                  98
                                  98
                   accuracy (%)
                                  97
                                  96
                                  95
                                  94
93
                                  92
                                         50              100           150           200             250        300
embedding_dim numbers
                                        CNN                LSTM+CNN
                                        LSTM               GRU+CNN
                                        GRU
                           Figure 10: The accuracy curves of each model under different word vector dimensions.
                                  99
                                                                                                      97.78     97.86
                                  98
                                                               96.92         96.78
                                  97
                                                                                           95.92
                                  96
                                       95.18
                   accuracy (%)
95
94
                                  93
                                                    92
                                  92
91
                                  90
                                        LR         NB          CNN           LSTM          GRU      LSTM+CNN GRU+CNN
                                                                             model
                                               Figure 11: Classification accuracy of each model.
experiments were selected for comparison. Experiment with                        From the experimental results in Table 8, it can be seen
the same parameter settings and the experimental results are                 that, compared with Word2vec word embedding followed
shown in Table 8.                                                            by only one layer of training network, the GRU_CNN model
10                                                                     International Transactions on Electrical Energy Systems