Hand Written Recognition

siuuu

Uploaded by

Karthik Raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views10 pages

Hand Written Recognition

siuuu

Uploaded by

Karthik Raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Received January 29, 2021, accepted February 28, 2021, date of publication March 19, 2021, date of current

version June 9, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3067315

Adaptive Named Entity Recognition Using Distant

Supervision for Contemporary Written Texts
JUAE KIM 1, YEJIN KIM2 , SANGWOO KANG 3, AND JUNGYUN SEO 4
1 Hyundai Motor Group, AIRS Company, Seoul 06134, South Korea
2 Artificial Intelligence Laboratory, LG Electronics, Seoul 06763, South Korea
3 School of Computing, Gachon University, Seongnam 13120, South Korea
4 Department of Computer Engineering, Sogang University, Seoul 04107, South Korea

Corresponding author: Sangwoo Kang (swkang@gachon.ac.kr)

This work was supported in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant
funded by the Korea Government (MSIT), Development of Smart Signage Technology for Automatic Classification of Untact Examination
and Patient Status Based on AI, under Grant 2020-0-01907, and in part by the Gachon University research fund of 2020 under Grant
GCU-202002090001.

ABSTRACT Named entity recognition (NER) is the process of categorizing named entities in a given
text that suffers from the lack of labeled corpora, which is a long-standing issue. Deep neural networks
have been successfully applied to NER tasks. However, they require a large number of annotated data.
Regardless of the number of data made available, annotation requires significant human effort, which is
expensive and time-consuming. Moreover, collecting labeled data that reflect contemporary surrounding
statuses requires exhaustive follow-up and incurs correspondingly higher costs. Current NERs typically
focus on the supervised learning of hand-crafted data. The most well-known dataset for NER shared tasks,
which was released at the 2003 Conference on Natural Language Learning, is used for basic training and
evaluation. Although the data are qualified, the database has low coverage of timely material. In this paper,
we illustrate methods for swiftly labeling up-to-date data via distant supervision. To tackle the difficulty
of annotating contemporary written texts, we generate labeled data articles that reflect the latest issues.
We evaluated the proposed methods with bidirectional long short-term memory conditional random-field
architecture using static and contextualized embedding methods. Our proposed models perform higher than
state-of-the-art methods with average F1-scores 3.09% better with weakly labeled Wikipedia data and 3.47%
better with Cable News Network data. When using the NER model with Flair embedding, our method shows
1.50 and 3.26% higher F1-scores with weakly labeled Wikipedia and news data, respectively. Qualitatively,
the proposed model also performs better when extracting contemporary keywords.

INDEX TERMS Computational and artificial intelligence, named entity recognition, natural language
processing, neural networks, transfer learning, weakly supervised learning.

I. INTRODUCTION tasks such as relation extraction, knowledge-based construc-

Named entity recognition (NER) detects the positions of tion, and question answering system.
named entities (NEs) and classifies them into pre-defined Recently, NERs have achieved significant performance
categories from plain text. General NERs define categories improvements based on deep neural networks (DNNs), which
of person, location, and organization [1], [2]. For example, use supervised learning methods [3]–[5]. Supervised learning
in the case of the sentence, ‘‘Samsung headquartered in is a process of learning algorithms based on training data,
Seoul,’’ the general NER extracts the entities, ‘‘Samsung’’ which comprise inputs and a label for each. The aim of
and ‘‘Seoul,’’ and classifies them into person and location supervised learning is to approximate a function that maps
categories, respectively. This process is a crucial and fun- the inputs to their labels. Therefore, properly labeled data
damental component of various natural language processing are essential, and such approaches require a vast number of
labeled data to achieve outstanding performance.
In particular, DNN models require a large number of
The associate editor coordinating the review of this manuscript and labeled data, because they have many parameters that must
approving it for publication was Hao Ji. be fine-tuned with training data. The distribution of data is
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
VOLUME 9, 2021 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 80405
J. Kim et al.: Adaptive NER Using Distant Supervision

also important. It is a well-known fact that performance this process. These changes allow the proposed method to be
improves when the training and test dataset distributions adopted more flexibly for newly generated texts.
are similar [6]. In fields of natural language processing Using weakly supervised data is advantageous in that
(e.g., document processing [7]–[10]), extant studies have human labeling is not required as long as there is a knowl-
primarily sought to overcome the differences in the domain edge source full of unlabeled data. However, this approach
distributions of training and test datasets. In this study, for suffers from noisy labels that come from automatic align-
NER, we focused not only on the domain distribution, but ment and incomplete knowledge. Therefore, we adopt a
also on the changes in writing styles over time. transfer-learning approach to utilize the weakly supervised
It is evident that the usage of words and styles changes data. Transfer learning is a method that transfers information
over time [11]. Hence, NERs are more affected over time from a source task or dataset to solve the data-shortage prob-
than other natural language tasks. New NEs are consistently lem of a target task. It was shown by Kim et al. that trans-
being produced, because new institutions, organizations, and fer learning could be used to reduce noise while providing
sometimes locations are created. Then, they are named. The abundant information from weakly labeled data for the target
uses and meanings of words continue to change. For example, task.
‘‘Apple’’ was just a fruit before the 1970s, but now it refers In our experiments, we generate weakly labeled data
to a prominent company. Second, the frequency of specific using unlabeled texts from two domains: Wikipedia and
NEs changes with time. For example, the name, ‘‘Dora,’’ was Cable News Network (CNN) news articles. The generated
the 51st most popular female name in the 1880s. However, weakly labeled data is trained with a bidirectional (BI)
by the 1990s, it had fallen to 972nd. If NER training data were long short-term memory (LSTM) conditional random-field
created in 1880, it would have included Dora as the named (CRF) NER model using static and contextualized embedding
entity of person, but it is likely not if it was created in 1990. methods. The proposed NER is evaluated on the CoNLL
Therefore, when we use NEs for contemporary written texts, 2003 benchmark dataset and the latest Wikipedia and CNN
it is beneficial to use the latest styles as training data to ensure news texts. Experimental results show that our method is
high performance. useful and flexible for recognizing NEs on up-to-date texts.
Most recent studies trained and evaluated their NERs using The remainder of this paper is organized as follows.
the Conference on Natural Language Learning (CoNLL) Section II deals with the related work of distant supervision
2003 benchmark dataset [1], which consists of Reuters news and transfer learning. Section III describes the proposed
collected between August 1996 and August 1997. The distri- method for constructing the NER, which is designed to per-
bution of the CoNLL 2003 dataset differs from contemporary form well on contemporary written texts. The experimental
written text. Thus, training with CoNLL 2003 can degrade settings and results are discussed in Section IV. Finally,
the performance of an NER for current real-world issues. The in Section V, we present our conclusion and outline future
best practice is to annotate the written text data manually and work.
consistently update them over time as NEs emerge. However,
this is infeasible, because manual annotation is costly and II. RELATED WORK
time-consuming. Furthermore, new text data, including news Traditional NER approaches were formulated as
and user-generated texts, are continuously being generated. sequence-labeling tasks using an inside–outside–beginning
A previous work by Kim et al. [12] proposed an NER that tagging scheme. In early machine-learning studies, support
performed well on the latest generated texts by generating vector machines [14], hidden Markov models [15], maximum
weakly labeled data and training them. They manually con- entropy models [16], and CRFs [17], [18] necessitated many
structed relations wherein both the subject and object implied handcrafted feature methods, including part-of-speech tag-
the NE using the Freebase [13] database. Then, the two gers, dependency parsing, and other external libraries known
entities connected via the collected relation in Freebase and as gazetteers (e.g., WordNet [19]). The performance of these
automatically annotated the unlabeled data by considering models was heavily dependent on hand-engineered features.
their occurrence in the sentence. Our study is based on this In recent years, BI-LSTM-CRF, which is based on the
work, and we propose an advanced method. DNN model, has exhibited outstanding performance. The
The main differences between the method of Kim et al. DNN-based model is applied to NERs that does not require
and our method can be summarized with two points. The handcrafted features [20]–[22]. LSTM is the most typically
first is that we use Wikipedia as the knowledge source. used architecture for sequence tagging. Huang et al. [3] pro-
Wikipedia is more useful than Freebase, because real-time posed a BI-LSTM-CRF architecture that sent and received
information is updated quickly, and there is a larger volume information in both directions using a CRF layer atop the BI-
of information for the knowledge graph. Thus, it enables more LSTM. BI-LSTM-CRF showed a remarkable improvement
accurate automatic labeling from the increased knowledge in the NER task, and it was the basis for many contem-
base provided. Second, our method does not require any porary studies. Another modified RNN-based model is the
human labeling. In the work of Kim et al., relations were stacked LSTM [23], which learns character-level features
collected via human labeling to obtain pairs of potential by concatenating the vector representations of a BI-LSTM
NEs. Alternatively, we devise a method that does not require over the characters of the input word. Chiu and Nichols [5]