0% found this document useful (0 votes)
48 views11 pages

Cyber2 Namedentity

This paper proposes a cybersecurity named entity recognition model called RDF-CRF that uses a combination of rule-based expressions, a known-entity dictionary, and conditional random fields with four feature templates. The model aims to improve upon existing methods by leveraging different techniques to recognize entities. Experiments on a security text dataset show the proposed method achieves better performance than state-of-the-art methods.

Uploaded by

Moriwam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views11 pages

Cyber2 Namedentity

This paper proposes a cybersecurity named entity recognition model called RDF-CRF that uses a combination of rule-based expressions, a known-entity dictionary, and conditional random fields with four feature templates. The model aims to improve upon existing methods by leveraging different techniques to recognize entities. Experiments on a security text dataset show the proposed method achieves better performance than state-of-the-art methods.

Uploaded by

Moriwam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2020.DOI

Cybersecurity Named Entity Recognition


using Multi-modal Ensemble Learning
FENG YI1 , BO JIANG2 , LU WANG2 , and JIANJUN WU3
1
School of Computer Science, University of Electronic Science and Technology of China, Zhongshan Institute, Zhongshan 528402, China
2
Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
3
Beijing College of Politics and Law, Beijing 100024, China
Corresponding author: Lu Wang (wanglu@iie.ac.cn)
This work was supported in part by National Natural Science Foundation of China under Grant 61702508 and Grant 61802404, in part by
National Social Science Foundation of China under Grant 19BSH022, in part by National Key Research and Development Program of
China under Grant 2019QY1303.

ABSTRACT Cybersecurity named entity recognition is an important part of threat information extraction
from large-scale unstructured text collection in many cybersecurity applications. Most existing security
entity recognition studies and systems use regular matching strategy or machine learning algorithms. Due
to the peculiarity and complexity of security named entity, these models ignore the characteristic of security
data and the correlation of entities. Therefore, through the in-depth study of security entity characteristic,
we propose a novel security named entity recognition model based on regular expressions and known-entity
dictionary as well as conditional random fields (CRF) combined with four feature templates. This model
is named RDF-CRF. The rule-based expressions can match security entities with good accuracy in simpler
situations, the known-entity dictionary can extract common and specific security entity, and the CRF-based
extractor leverages the identified entities by rule-based and dictionary-based extractors to further improve
the recognition performance. In order to demonstrate the effectiveness of our proposed model, extensive
experiments are performed on a security text dataset collected from public security webs. The experimental
results show that can achieve better performance than state-of-the-art methods.

INDEX TERMS Cybersecurity, named entity recognition, regular expression, known-entity dictionary,
conditional random fields.

I. INTRODUCTION Named entity recognition (NER) is the most basic step of


information extraction that seeks to locate and classify named
Recent years have witnessed the importance of cybersecurity, entities in text into pre-defined categories [4]. NER systems
which is paid attention to more and more importantly such are often used as the first step in question answering, in-
as application attacks, malware, ransomware, phishing and formation retrieval, co-reference resolution, topic modeling,
exploit kits. A large amount of cybersecurity data has been etc. The main task is to identify named entities like person,
published on various network platforms, such as security location, organization, time, quantities, monetary values, per-
blogs, forums, software vendors bulletin boards, official news centages, etc. from unstructured texts [5]–[7]. In recent years,
and social networks. These unstructured security texts con- many named entity recognition models have been proposed
tain high-value latest security information and events, like to help users to find objects of value information, including
software vulnerabilities [1], attack detection [2], and threat recommendation system [8], [9], question answering [10],
action [3]. Nowadays, it becomes a trend to establish a se- [11] and biomedical [12], [13]. In the domain of cyberse-
curity knowledge graph with open interconnect and semantic curity, security information extraction have attracted many
processing capabilities which can help security analysts more research efforts from different perspectives. For example,
quickly retrieve and collate large-scale threat data. The basic some researchers have reported the results of security entity
task of establishing such a knowledge graph is informa- recognition from the view of data source, including Twitter
tion extraction. Therefore, automatically extracting security [14], National Vulnerability Database [15], hacker forums
knowledge from a collection of unstructured text documents [16], and technical blogs [17]. On the other hand, there are
is a critical and fundamental task in the field of cybersecurity.

VOLUME 4, 2020 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

also a variety of efforts studying different methods for the that with the expansion and improvement of the corpus in the
task, which can be divided into two classes: rule-based and follow-up work, the accuracy of the recognized professional
machine learning-based. vocabulary will also be significantly improved.
The rule-based methods can extract named entity with In this paper, we propose a novel security entity recog-
good accuracy in a simple manner when the to-be-extracted nition model based on conditional random fields combined
information follows regular speech patterns such as email with four feature templates and incorporating regular ex-
address, host IP, and Common Vulnerabilities and Exposures pressions, known-entity dictionary for preprocessing, named
(CVE) [17], [18]. However, these methods are not suitable RDF-CRF. Specifically, rule-based approach can first extract
for complex situations while to-be-extracted entity includes named entity with good accuracy in simpler situations, then
many variations or comes from irregular structured text, dictionary-based method can match common and specific
which is more in line with the actual situation on the net- security entity. After matching by rule-based and dictionary-
work. Meanwhile, these methods are difficult to identify new based methods, the word sequence will be more accurately
named entity. Moreover, designing rule-based systems is very matched to the feature templates by considering contextual
time-consuming and requires expert field knowledge. There- information so that CRF-based model can further improves
fore, the rule-based methods lead to unsatisfactory results the recognition performance. To demonstrate the effective-
for cybersecurity named entity identification in the complex ness of our proposed model, extensive experiments are per-
situations. Taking into consideration the good performance formed on a security dataset collected from security Webs.
and simplicity of rule-based methods and the regular patterns The experimental results shows that the proposed method can
of some security entities such as IP and CVE, in this paper achieve better performance than state-of-the-art methods.
we also introduce the rule-based template to extract cyberse- The contributions of this paper are summarized as follows.
curity named entities. • We propose a novel security named entity recognition
In these more complex situations, machine learning-based model by using a combination of regular expressions,
methods outperform rule-based ones by tuning general algo- known-entity dictionary and conditional random fields.
rithms with existing data. Meanwhile, they can identify new In the proposed model, the identified entities by rule-
entities from training corpus and are suitable for widespread based and dictionary-based approaches can further as-
applications. Recent years, a lot of approaches for security- sist CRF-based model in improving the performance of
relevant named entity recognition (NER) from unstructured cybersecurity entity recognition.
text documents have been proposed from different perspec- • We also design four feature templates for unstructured
tives, including conditional random fields (CRF) [19], [20], security entity recognition, including atomic features,
support vector machines (SVM) [16], expectation regulariza- combination features, maker features, and semantic fea-
tion [14], bootstrapping algorithm [21], maximum entropy tures, to filter the feature vectors of current word for
model (ME) [22], and long short-term memory (LSTM) conditional random fields.
[23], [24] etc. However, all of the above machine learning • Various experiments are conducted on real-world cyber-
methods fail to yield satisfactory results for identifying cy- security dataset, and the results demonstrate that our
bersecurity related concepts and entities from unstructured proposed model can achieve better prediction perfor-
cybersecurity texts collection. Through analyzing these texts, mance than the state-of-the-art methods.
we find that existing entity recognition techniques is not
suitable for the task. Although the named entity recognition The remainder of this paper is organized as follows: Sec-
technology has gradually matured in the general field, when tion II reviews related work. Section III describes the pro-
it is directly applied to the professional zone, it usually fails posed model and provides an efficient optimization method
to produce satisfactory results. For example, in the field of for the solution. We empirically evaluate our method on
biomedicine, Dongliang et al. [25] illustrates, despite the real-world dataset in Section IV, including a comparison to
traditional method is easy to use,the assumptions it relies on competing methods. We conclude the paper in Section V.
do not fully reflect the actual situation of a large number
of complex biological texts, so the accuracy is relatively II. RELATED WORK
poor. The same problem also occurs in the field of cyber These studies on security named entity recognition can be
security. This is because cybersecurity texts contain a lot of fallen into two categories: rule-based and machine learning-
security vocabularies, such as file names, hash value, and based approaches. Next, we briefly review these works.
even attack tools. On the other hand, these models need to
manually explore a wide range of features and ignore the A. RULE-BASED ENTITY EXTRACTION METHODS
correlation of entities, which is not amenable to large-scale The rule-based matching methods to locate and extract infor-
applications.The rules and dictionaries constructed in this mation by constructing regular expressions or other heuristic
paper, as well as the features extracted for training the model, rules. For example, Liao et al. [17] propose a fully au-
are obtained through observation and training of corpus in tomated Indicators of Compromise (IOC) [26] extraction,
the security field, so they are generally applicable to tasks named iACE. iACE uses a set of regular expressions and
in such field. The experimental results of the article prove common context terms extracted from iocterms to identify
2 VOLUME 4, 2020

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

the IOC tokens, such as IP and MD5 string. Balduccini [18] III. THE PROPOSED MODEL
design a set of regular expressions for matching each entity In this section, we present a novel ensemble learning ap-
contained in the file of cyber assets. However, due to the proach for security entity extraction from documents. The
unstructured characteristics and diversity of many security proposed model consists of rule-based extractor, dictionary-
entities, it is very difficult to construct rules for all these types based extractor and CRF-based extractor. Rule-based ex-
of entity. As a result, the heuristics strategy is expensive and tractor is designed based on regular expressions, dictionary-
unimplemented in large scale application. based extractor includes known-entity lists, and CRF-based
extractor leverages the identified entities by rule-based and
dictionary-based extractors to improve the recognition per-
B. MACHINE LEARNING-BASED ENTITY EXTRACTION formance. The overall architecture of the model is illustrated
METHODS in Figure 1.
The machine learning-based approaches use training corpus
to construct statistical learning models, which can realize A. RULE-BASED EXTRACTOR
automatic information extraction. Many efforts have been A lot of entities have certain rule patterns in the domain
made in the task of cybersecurity named entity recognition. of cybersecurity. Through a large number of observations
For instance, Lal et al. [20] utilize conditional random fields based on unstructured security texts, we find that URL is
algorithm to extract cybersecurity related concepts and en- started with http/https string, Email contains symbol @ in
tities by using a set of features from manually annotated the middle of a string and CVE follows specific named
security texts. Joshi et al. [19] use conditional random field format. Hence, these security entities can be extracted based
to identify cybersecurity-related entities, concepts and rela- on regular expression matching. According to the naming
tions from the National Vulnerability Database and from text rules of specific security entities, we design the template of
sources. Deliu et al. [16] extract cyber threat intelligence regular expression rules, as shown in Table 1. The rule-based
from hacker forums based on support vector machines and extractor have the properties of high precision and high recall
convolutional neural networks. Jones et al. [21] implement as well as scalability.
a bootstrapping algorithm for extracting security entities
and their relationships from security texts. Ritter et al. [14] TABLE 1: The example of regular expression
propose a weakly supervised seed-based approach to event Entity Types Regular Expression
extraction from Twitter. Mittal et al. [1] analyze tweets about [A-Za-z0-9-_\· ]+\· (txt|php|exe|dll|bat|sys|htm
cybersecurity and issue timely threat alerts to security ana- Filename |html|js|jar|jpg|png|vb|scr|pif |chm|zip|rar
lysts. Weerawardhana et al. [27] present machine learning- |cab|pdf |doc|docx|ppt|pptx|xls|xlsx|swf |gif )
Filepath [a-zA-Z]:(\\([0-9a-zA-Z]+)
based and part-of-speech tagging approaches to information Email [a-z][_a-z0-9-.]+@[a-z0-9-]+˙[a-z]+
extraction from online vulnerability databases. Bridges et al. SHA1 [a-f0-9]{40}|[A-F0-9]{40}
[22] propose a Maximum Entropy Model trained with the SHA256 [a-f0-9]{64}|[A-F0-9]{64}
CVE CVE−[0-9]{4}−[0-9]{4,6}
many security corpus and achieve a high performance of (https?|ftp|file)://[-A-Za-z0-9+&@#/%?=∼ _|! :
identification and classification of appropriate entities. Gasmi URL
, .;]+[-A-Za-z0-9+&@#/%?=∼ _|]
et al. [23] combine the advantage of Long Short-Term Mem- (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-
IPv4 9]?)\· )\{3\}(?:25[0-5]|2[0-4][0-9]|[01]?[0-
ory (LSTM) and Conditional Random Field (CRF) methods 9][0-9]?)(/([0-2][0-9]|3[0-2]|[0-9]))?
to improve the accuracy of NER extraction compared with
traditional pure statistical CRF method. Furthermore, Qin
et al. [24] propose a combined model of neural networks
B. DICTIONARY-BASED EXTRACTOR
which is called FT-CNN-BiLSTM-CRF. When training the
models,they use feature templates to extract context features As far as we know, existing many named entities are well
as we do and achieve an F-score of 0.86 on their network known concepts in the cybersecurity domain, including large
security dataset. security companies (e.g., Cisco, FireEye, and IBM, etc.),
software products (e.g., operating systems, firewalls, and
In conclusion, although the above mentioned methods anti-virus software, etc.) and hacker groups (e.g., OurMine,
work well to some extent in incorporating one or two of the Anonymous, and DCLeaks, etc.). Based on these observa-
three components (i.e., rule-based method, dictionary-based tions, we also design a known-entity dictionary including
method and machine learning-based method), none of them various entities. The entities can be categorized into the
integrate all the information from these three components following categories: company, hardware, software, attack
into an unified learning framework for cybersecurity named means, operating system, protocol, hacker groups and so on.
entity recognition, resulting in dissatisfactory results. To the
best of our knowledge, there is still a lack of cybersecurity C. CONDITIONAL RANDOM FIELDS-BASED
named entity recognition method that extract entities of se- EXTRACTOR
curity texts at high precision level.
CRF model can further extract the undiscovered entities on
basis of the identified entities by rule-based extractor and
VOLUME 4, 2020 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

1. POS 3. Semi-manual addition 4. Feature template 5. Conversion 6. Parameter


Tagging of BMESO Tags selection of Corpus Form Training

360/ls B-ORG For example:


Threat/nn M-ORG
Intelligence/nt E-ORG Threat/vn M-ORG
disclosed/vbn O
- fetures :
Sauron/nx S-HACK
,/w O Cur_word=“Threat” Iterative training:
“ 360 Threat Intelligence
also/cc O Cur_pos=“nn” 1
disclosed Sauron, also known/vbn O Cur_tag=“M-ORG” P( I | O) =  t ( I t | O)
Z (O) t
known as Strider, an APT as/in O WORD(i-1)=“360” 1
Strider/nx S-HACK POS(i-1)=“ls” = exp ( k f k (Ot , I t −1 , I t , t ))
organization that … ” ,/w O
Z (O) t k
TAG(i-1)=“B-ORG”
an/dt O POS(i)(i-1)=“ls;nn”
APT/nnp O POS(i-1)WORD(i)
2. Rule & organization/nn O
=“ls;Threat”
Dictionary that/wdt O
… … … … CRF Model
Matching
Model training

Model prediction CRF-based


Tags
Combination
extractor
POS Sky/nn B-ORG
Tagging Eye/nn M-ORG Sky Eye Laboratory/ nORG
feature
Laboratory/nt E-ORG
extraction
“Sky Eye Laboratory analyzed/vbn O
Get new
analyzed recent activities recent /jj O
entity labels
of OceanLotus… ” activities /nns O
Predictive of/in O
BMESO Labels OceanLotus/nx S-Hack OceanLotus/nHACK

Rule-based Dictionary-based … …
extractor extractor

FIGURE 1: Overall architecture of security entity recognition model. Our proposed framework consists of three components:
(1) rule-based extractor, (2) dictionary-based extractor and (3) CRF-based extractor.

dictionary-based extractor. We propose four feature tem- TABLE 2: The template of atomic features
plates to filter the feature vectors of current word for CRF Atomic Description
model. Features
Word(0) Current word
Word(-1) The first word on the left of current word
1) Atomic Features Template Word(-2) The second word on the left of current word
A simple but powerful method is to use tokenization and Part- Word(1) The first word on the right of current word
Word(2) The second word on the right of current word
Of-Speech (POS) tagger for named entity recognition. Due POS(0) The part of speech of current word
to not be separable again, we consider the features of part of POS(-1)
The part of speech of the first word on the left
speech and morphology of words as atomic features. Table 2 of current word
The part of speech of the second word on the
summarizes the detailed information of the atomic features. POS(-2)
left of current word
According to Table 2, when the current word is "Google", The part of speech of the first word on the right
POS(1)
of current word
which belong to the independent organization word, the The part of speech of the second word on the
corresponding feature functions can be generated as follows: POS(2)
right of current word

1 if Word(0) = "Google" and y = Org
f (x, y) = (1)
0 otherwise
windows, but it can not adequately describe the complex
where the variable y represents the label of the current word. phenomena of language.
The template describes the individual morphology or part
of speech of each word in the current word and its context
4 VOLUME 4, 2020

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

TABLE 3: The template of combination features TABLE 4: The template of marker features.
Combination Description Marker Features Description
Features Tag(-1) Entity tag of first word on the left of current word
Word(0)+POS(0) Current word and part of speech Entity tag of second word on the left of current
Tag(-2)
Current word and the first word on the left of word
Word(0)+Word(-1)
current word Entity tags of the first word and the second word
Tag(-1)+Tag(-2)
Current word and the first word on the right of on the left of current word
Word(0)+Word(1)
current word The part of speech of current word and entity
POS(0)+Tag(-1)
The first word on the left of current word and mark of the first word on the left of current word
Word(-1)+POS(0)
part of speech of current word The part of speech of current word and entity
POS(0)+Tag(-2)
Word(0)+POS(1) Current word and part of speech of current word mark of second word on the left of current word
The first word and part of speech on the left of The part of speech of current word and entity
Word(-1)+POS(-1) POS(0)+Tag(1)
current word mark of first word on the right of current word
The first word and the second word on the left Current word and entity mark of first word on the
Word(-1)+Word(-2) Word(0)+Tag(-1)
of current word left of current word
The second word and part of speech on the left Current word and entity mark of second word on
Word(-2)+POS(-2) Word(0)+Tag(-2)
of current word the left of current word
The first word and the second word on the right Current word and entity mark of first word on the
Word(1)+Word(2) Word(0)+Tag(1)
of current word right of current word
The first word on the left of current word and POS(0)+Tag(- The part of speech of current word and entity
Word(-1)+Word(1)
the first word on the right of current word 1)+Tag(-2) tags of first word and second word on the left of
The first word and part of speech on the right of current word
Word(1)+POS(0)
current word Tag(- Entity tag of first word on the left of current word
The part of speech of the second word and the 1)+POS(0)+POS(1) and part of speech of current word and part of
POS(-2)+POS(-1)
first word on the left of current word speech of first word on the right of current word
The part of speech of current word and the part Tag(-1)+POS(- Entity tag of first word on the left of current word
POS(-2)+POS(0)
of the second word on the left of current word 1)+POS(0) and part of speech of first word on the left of
The part of the first word on the left of current current word and part of speech of current word
POS(-1)+POS(0)
word and the part of the current word Tag(- Entity tag of first word on the left of current word
The part of the first word on the left of current 1)+POS(0)+Word(0) and part of speech of current word and current
POS(-1)+POS(1)
word and the part of the first word on the right word
The part of the word of current word and the Tag(-2)+Tag(- Entity tags of first word and second word on the
POS(0)+POS(1)
part of the word of the first word on the right 1)+POS(0) left of current word and part of speech of current
The part of speech of current word and the word
POS(0)+POS(2)
second word on the right of current word
The part of speech of the first word and the
POS(1)+POS(2)
second word on the right of current word

to prevent the appearance of similar situations like "two


adjacent B-tags". The template is constructed by the rules
2) Combination Features Template
of the internal indicators and context indicators. The marker
In fact, simple morphological and part-of-speech features feature template is shown in Table 4.
only contains the limited context information. Combination
features can make use of long-distance constraints and rich For example, in the phrase "hacker organization equation",
context information. As shown in Table 3, we construct com- when the current word is "equation", we can get the binary
bination features based on the template of atomic features to function as follows:
form new rule features.
Based on these features, given a sentence "Google Re-

 if Tag(-1) = "B-nhack" and Word(0) =
leased...", when the current word is "Google", we can define 1
f (x, y) = "Org" and y = "E-nhack"
the binary function as follows: 
0 otherwise
 (3)
 if Word(0) = "Google" and
1
f (x, y) = POS(1) = "verb" and y = Org (2)
0 otherwise

4) Semantic Features Template
With the increase of the size of the combination of atomic Many words such as "teacher" and "chairman" often indicate
template, the complexity of the model will be greatly im- the appearance of names, and the name recognition is a
proved. Meanwhile, related studies show that the combina- very important task. It makes up for the inconvenience of
tion template with two atomic features can play a better role, expressing the relationship between adjacent words. The
but the combination template composed of more than three basic idea is to recognize demonstrative words and suffixes
atom features will cause the the high cost of computation. from dictionaries on the basis of word segmentation. These
words need to be filled in manually continuously. Semantic
3) Marker Features Template templates are now defined in Table 5.
The template of marker features can be inferred the tag For example, when identifying the organization name "sky
of current word by using predicted tag information and be eye laboratory", assuming that the current word is "sky eye",
described the mutual constraint information between entities such a specific feature can be represented by the binary
VOLUME 4, 2020 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

TABLE 5: The template of semantic features Algorithm 1 The learning algorithm for feature selection
Semantic Features Description Require: cybersecurity text corpus D, the library of above
CUR_PER_FRIST Whether the current word is name four feature templates T
CUR_ORG_SUF Whether the current word is an organization Ensure: feature set F
name suffix
NEXT_ORG_SUF Whether the two words on the right side of 1: choose a template T from the library of template T ;
current word contain organization suffix 2: read a word w from vocabulary V generated by cyberse-
LOC_INDICATION Whether the left or right words of current word curity text corpus D;
contain place indicators
PER_INDICATION Whether the left or right words of current word 3: while T ∈ T do
contain name indication 4: while w ∈ V do
ORG_INDICATION Whether the left or right words of current word 5: match current template T and current word w, and
contain organization indicator
CUR_LOC Whether the current word is a common place then generate a feature f
name 6: if f ∈ F then
CUR_ORG Whether the current word is a common organi- 7: increment count for f
zation name
CUR_PER_NAME Whether the current word is a common name 8: else
CUR_LOC 9: add f to F
+LOC_INDICATION Whether the current word is a common place 10: end if
name and whether the two words around the
current word contain place name indicators 11: end while
CUR_PER_FRIST 12: end while
+PER_INDICATION Whether the current word is a Chinese surname 13: return F
and the left and right words contain a person
name
Tag(- The first word on the left side of current word
1)+CUR_ORG_SUF is the named entity and the current word is the
institutional feature suffix and computational efficiency, we use the threshold method,
Tag(-1)+CUR_LOC The first word on the left side of current word is and the threshold is set to 2.
entity and the current word is the place name.

6) Conditinal Random Fields Modeling


feature function as follows: CRF are a type of discriminative probabilistic graphical
 model, which often applies in predicting sequences and
 if Word(1) = "sky eye" and ORG_SUFFIX= named entity recognition. It can take into account contex-
1
f (x, y) = "true" and y = B-norg tual information from previous labels, thus making a good
0 otherwise

prediction performance.
(4)
In CRF, given the set of input vectors X, yi−1 and
yi denote the labels of previous word and current word
5) Feature Selection
in X respectively, we define the feature function as
The generation of feature sets is accomplished by matching fi (X, i, yi−1 , yi ). Each feature function is either 0 or 1 based
the above feature templates. Next, we perform the process of on the label of previous word and current word. To build the
traversing all words in the corpus in turn to match the words conditional field, we assign each feature function fi a set of
and their contexts with all feature templates. All successfully weights λ as follows
matched features are added to the feature set. The details
of the generation process of feature sets are described in n X
Algorithm 1. 1 X
P (y, X, λ) = exp{ λj fi (X, i, yi−1 , yi )}
Due to a large number of words and the wide variety of Z(X) i=1 j
feature templates, the number of generated features will be (5)
P Pn P 0 0
incalculable, and some features have little effect on entity where Z(X) = y 0 ∈y i=1 j λj fi (X, i, yi−1 , yi ). To
recognition. Instead, these redundant features have seriously estimate the parameters λ, we use Maximum Liklihood Esti-
affected the efficient of our proposed model, so it is necessary mation to take the negative log of the distribution as
to perform a round of screening of the feature results.
m
Common feature selection methods are incremental Y
method and threshold method. The former is to calculate the L = −log{ P (y k |xk , λ)}
k
information gain of all features, and retains the features with m Pn P
large information gain of system performance, otherwise
X exp{ i=1 j λj fi (X m , i, yi−1
k
, yik )
=− log[ }]
deletes. The latter is to count the frequency of each feature. Z(xm )
k=1
If the frequency of a feature is less than a set threshold, it is (6)
deleted, otherwise retained. The incremental method works Maximizing log-posterior distribution on Eq. (6) is equiva-
well but the system performance is expensive. The threshold lent to minimizing sum-of-of-squared errors function. The
method is simple to operate, but not intelligent. For simplicity local minimum of the objective function given by Eq.(6) can
6 VOLUME 4, 2020

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

TABLE 6: Statistics of the constructed dataset • HMM: Hidden Markov Model (HMM) is a statistical
Class Number Class Number Markov model in which the system being modeled is
CVE 68 Product 1402 assumed to be a Markov process with unobservable
AS 8 Organization 3047 (i.e. hidden) states. The hidden Markov model can be
Cert 10 Person 1372
Host 14 Place 518
represented as the simplest dynamic Bayesian network
Domain 25 Threat 21 [28].
Email 17 Hacker_Group 62 • MEMM: Maximum Entropy Markov Model (MEMM)
MD5 31 Attack 19 makes use of both the HMM framework to predict
Registry 22 Software 427
SHA1 15 Protocol 25 sequence labels given an observation sequence, but in-
SHA256 18 Conference 14 corporating the multinomial Logistic Regression (aka
URL 42 Report 80 Maximum Entropy), which gives freedom in the type
IP 24 File_Path 43
and number of features one can extract from the obser-
File_Name 71 Event 18
vation sequence [22].
• CRF: Conditional Random Fields (CRF) is a discrim-
be found by using gradient descent on parameters λ as inative probabilistic graphical model. It use contextual
m n
information from previous labels, thus increasing the
∂L −1 X X amount of information. The model has to make a good
= fi (xk , i, yi−1
k
, yik )
∂λ m i=1
prediction [20].
k=1
m (7)
X
k k
The neural network method has become a major topic in
+ p{y|x , λ}fi (x , i, yi−1 , yi ) the field of natural language processing (NLP) recently, but
k=1
its training complexity is often high, generally used to solve
CRF estimates the global probability, and establishes a complex and high-level tasks, such as machine translation,
unified probability model on all states. Hence, CRF is a text understanding and so on. At the expense of certain
relatively good model in named entity recognition. complexity and computational speed, there are also some
researchers use Long short-term memory (LSTM) and their
IV. EXPERIMENTS deformation models to extract cybersecurity entities, such as
A. DATA PREPARATION LSTM-CRF [23] and FT-CNN-BiLSTM-CRF [24], and the
Unlike named entity recognition in the general field, cyber results proved such models have a certain degree recognition
security lacks large-scale publicly available dataset and an- ability on their datasets.
notation methods. Therefore, we construct a standard ground So we also compare the effectiveness of our proposed
truth dataset through the following construction process. model with the following state-of-the-art baseline methods
First, we collect a large amount of security text corpus from on the same dataset.
official security forums 1 , software vendors bulletin boards 2 ,
and various blog articles. Second, we choose a collaborative • LSTM-CRF: LSTM is a special recurrent neural net-
text annotation system brat 3 , which is an open source Web work.The advantage of LSTM is to obtain the relation-
annotation tool that can annotate a large number of text ship between the sample and the sample over a long time
online. Third, the members of this collaborative annotation sequence, and BiLSTM can more effectively acquire the
using brat tools are domain experts who have rich knowledge features before and after the input sentence. This model
of cybersecurity. Each document is annotated by at least extract features by the LSTM and predict entity types by
three users in turn. The ground-truth class labels are selected CRF [23].
based on the majority vote mechanism, Finally, about 14,000 • FT-CNN-BiLSTM-CRF: In this model, the Convolu-
unstructured texts from cyber security domain have been tional Neural Networks (CNN) is used to extract the
marked, in which the training set consists of 70% of the character-level feature and the BiLSTM is to capture
total documents and the remaining 30% as test set. We use long-term contextual features. Then CRF is applied for
the constructed dataset in the following experiments. The learning and inference. Futhermore, it adds the feature
statistics of datasets are summarized in Table 6. template and extract contextual features of the security
entity through feature templates [24].
B. BASELINE METHODS For HMM, MEMM, and CRF models, we use the de-
In order to select a model for equilibrium accuracy and fault recommended settings. For LSTM-CRF and FT-CNN-
performance, we analyze the following models after doing BiLSTM-CRF models, we set the word embedding layers to
the same rules and dictionary matching preprocessing on our 64, and the word embedding dimensions to 100. Meanwhile,
security samples. for CNN and LSTM models, we set batch_size to 32, and
Dropout to 0.5, and learning rate to 0.01, and gradient to 5 in
1 http://www.cert.org.cn/
2 https://www.anquanke.com/
the following comparison experiments.
3 http://brat.nlplab.org/

VOLUME 4, 2020 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

TABLE 7: Performance Comparison of different deep recog-


0.9 nition models, evaluated by Precision, Recall and F1
0.8
Performance of our proposed model

Method Precision Recall F1


0.7 LSTM-CRF 0.7945 0.7079 0.7487
FT-CNN-BiLSTM-CRF 0.8157 0.7642 0.7891
0.6
RDF-CRF 0.8578 0.7837 0.8191
0.5

0.4

0.3
corpus. Hence, by incorporating regular expression, know-
entity dictionary and CRF model, our proposed model indeed
0.2
Precision
perform well on the cybersecurity entity recognition task.
0.1 Recall
F1
0 2) Comparisons with the state-of-the-art methods
Average Accuracy Overall Accuracy
In order to evaluate and compare the effectiveness, we con-
duct an experiment to compare our method to the latest
FIGURE 2: Performance of our proposed model on the tasks
methods in cybersecurity extraction entities mentioned in
of average classification and overall classification.
the last two years of papers on the same dataset. The first
is LSTM-CRF [23], and the second is FT-CNN-BiLSTM-
C. EVALUATION METRICS CRF [24]. The comparative experiment results are shown in
In this paper, we use three representative metrics to evaluate Table 7.
the performance: Precision, Recall, and F1-measure (F1). A As we can see, the performance metrics show that the
greater Precision, Recall, and F1-measure values mean better results for RDF-CRF are better than other state-of-the-art
performance. Without loss of generality, we split randomly methods. Even though the recall score of the FT-CNN-
with 80% as the training set and 20% as the testing set. BiLSTM-CRF is close to ours, its precision still have some
We repeat each experiment 5 times and report the average room for improvement. One of the reasons is that there are a
performance. large number of simple but regular entities in cybersecurity
texts, such as IP, domain, etc., and the use of complex model
D. PERFORMANCE AND ANALYSIS methods for these entities will reduce its precision. At the
1) The performance of cybersecurity entity recognition same time, due to the use of neural network for feature
The task of entity recognition is divided into two categories: extraction, the computational complexity of the model will
(1) be or not be a entity, which is a binary classification be greatly increased. The final results prove that in the case
task; (2) belong to which entity class, which is a multi- of entity pre-matching using rules and dictionaries, the CRF
classification task. To this end, we conduct extensive experi- model with feature templates can be used to obtain better
ments with the above two tasks on the cybersecurity dataset. recognition results at lower complexity.
The experimental results are shown in Figure 2 and Table 8.
From the Figure 2, we can see the overall accuracy of 3) Comparisons of different recognition models
whether there is a entity is higher than the average accuracy In this section, we mainly compares the performance of cy-
of entity class recognition. We argue that this phenomenon bersecurity entity recognition under Hidden Markov Model
may be caused by confusion in the process of entity classi- (HMM), Maximum Entropy Markov Model (MEMM) and
fication, such as Person be classified as Organization, Threat Conditional Random Fields (CRF). The main classes of
be classified as Hacker_Group, etc. We also can see that the comparison entities are recognized only by statistical model,
binary classification accuracy is only more 6% than that of including Organization, Person, Report, Threat, Event, Con-
multi-classification, which shows our proposed model have ference, Hacker_Group. The experimental results are shown
good robustness. in Figure 3.
On the other hand, from the Table 8, we can also observe From the figure, the experimental results show that CRF
the following conclusion that (1) our proposed model has model always outperforms other comparison methods of all
a relatively high performance at most of entity classes; (2) metrics. The major reason is that the CRF model can make
regular-based entities like CVE and Email can be extracted better use of the sequential state of sentences and its depen-
with a highest accuracy, which demonstrates that regular- dence on features, and has the best effect on the named enti-
based extractor is a good strategy; (3) dictionary-based en- ties recognition in unstructured cybersecurity texts. Through
tities such as Product and Organization have a relatively the analysis of the reasons, it is found that for named entity
high accuracy, and sometimes the improvements are not recognition of unstructured cybersecurity texts, each obser-
statistically significant due to the lack of specific entities; (4) vation value has abundant interacting context features and
CRF-based extractor obtain poor precision and recall as our dependencies. HMM model can choose the best path in the
dataset contains only a small number of these instances. This range of its inference sequence, but its independence as-
problem can be solved given a larger amount of cybersecurity sumption and no aftereffect restrict the selection of features.
8 VOLUME 4, 2020

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

TABLE 8: Performance of our proposed model with Precision, Recall, F1 on different entity classes
Class Precision Recall F1 Class Precision Recall F1
CVE 1.0000 1.0000 1.0000 Product 0.7579 0.7066 0.7314
AS 1.0000 1.0000 1.0000 Organization 0.8989 0.7366 0.8097
Cert 1.0000 1.0000 1.0000 Person 0.8399 0.7633 0.7998
Host 0.7800 0.8500 0.8135 Place 0.9028 0.8824 0.8925
Domain 0.8225 0.7433 0.7809 Threat 0.8729 0.7536 0.8089
Email 0.8895 0.7965 0.8404 Hacker_Group 0.7500 0.5742 0.6504
MD5 1.0000 1.0000 1.0000 Attack 0.6600 0.5400 0.5940
Registry 0.8901 0.8628 0.8762 Software 0.3396 0.3005 0.3189
SHA1 1.0000 1.0000 1.0000 Protocol 0.8200 0.7800 0.7995
SHA256 1.0000 1.0000 1.0000 Conference 0.6842 0.6023 0.6406
URL 0.9255 0.8700 0.8969 Report 0.6472 0.4821 0.5526
IP 0.9900 0.9900 0.9900 File_Path 0.8936 0.6200 0.7496
File_Name 0.8842 0.8925 0.8883 Event 0.6233 0.3900 0.4798

0.9 0.8 0.9

0.8 0.7 0.8

0.7 0.7
0.6

0.6 0.6
0.5
Precision

0.5 0.5
Recall

F1
0.4
0.4 0.4
0.3
0.3 0.3

0.2
0.2 0.2

HMM 0.1 HMM HMM


0.1 MEMM MEMM 0.1 MEMM
CRF CRF CRF
0 0 0
Organization Person Report Threat Event Conference Hacker_Group Organization Person Report Threat Event Conference Hacker_Group Organization Person Report Threat Event Conference Hacker_Group

(a) Precision vs. Entity Classes (b) Recall vs. Entity Classes (c) F1 vs. Entity Classes
FIGURE 3: Precision, Recall and F1 with different entity classes.

0.9 0.9

0.8 0.8
Performance of our proposed model

Performance of our proposed model

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2
Precision Precision
0.1 Recall 0.1 Recall
F1 F1
0 0
A A+C A+C+S A+C+S+M 0 2000 4000 6000 8000 10000 12000 14000
Combination of different feature templates Size of dataset

FIGURE 4: Performance of our proposed model with combi- FIGURE 5: Performance of our proposed model under dif-
nation of different feature templates. ferent dataset size.

4) Combination of different feature templates


MEMM model can improve this problem. However, it only The combinations of different feature templates have a great
normalizes locally and easily falls into local optimum, which impact on the performance of cybersecurity entity recogni-
leads to label bias problem. Based on the MEMM, CRF tion. Therefore, we also implement the different configu-
model chooses to normalize all features globally to solve the rations of our proposed model to test the effectiveness of
label bias problem. It also has the ability to express long- combination of different feature templates. In this paper,
distance dependence and overlapping features among ele- we denote atomic features as A, and combination features
ments, and can accommodate arbitrary context information. as C, semantic features as S and marker features as M,
respectively. We give the performance of different variants
of our proposed model in Figure 4. From the results, it is
VOLUME 4, 2020 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

clear that (1) when increasing the amount of combination [4] E. F. Tjong Kim Sang and F. De Meulder, “Introduction to the conll-
templates, the performance of our proposed model improves, 2003 shared task: Language-independent named entity recognition,” in
Proceedings of the seventh conference on Natural language learning at
and the proposed model can achieve best performance by HLT-NAACL 2003-Volume 4. ACL, 2003, pp. 142–147.
using all feature templates; (2) among variants of our pro- [5] A. Ritter, S. Clark, O. Etzioni et al., “Named entity recognition in tweets:
posed model, the improvements are statistically significant an experimental study,” in Proceedings of the conference on empirical
methods in natural language processing. ACL, 2011, pp. 1524–1534.
while using marker feature templates; (3) all of these variants [6] C. N. d. Santos and V. Guimaraes, “Boosting named entity recognition
have big differences with the degrees of improvements in with neural character embeddings,” arXiv preprint:1505.05008, 2015.
some cases. From this view, we conclude that our proposed [7] T.-H. Pham and P. Le-Hong, “End-to-end recurrent neural network models
for vietnamese named entity recognition: Word-level vs. character-level,”
model is a proper choice for improving the performance of in International Conference of the Pacific Association for Computational
cybersecurity entity recognition. Linguistics. Springer, 2017, pp. 219–232.
[8] S. M. Yimam, C. Biemann, L. Majnaric, Š. Šabanović, and A. Holzinger,
“An adaptive annotation approach for biomedical entity and relation recog-
5) Impact of dataset size nition,” Brain informatics, vol. 3, no. 3, p. 157, 2016.
Figure 5 shows the impacts of different dataset sizes on our [9] T. Eftimov, B. K. Seljak, and P. Korošec, “A rule-based named-entity
recognition method for knowledge extraction of evidence-based dietary
proposed model. From the figure, we can observe that the recommendations,” PloS one, vol. 12, no. 6, p. e0179488, 2017.
size of dataset impacts the results of entity recognition signif- [10] C. Lee, Y.-G. Hwang, H.-J. Oh, S. Lim, J. Heo, C.-H. Lee, H.-J. Kim, J.-
icantly. As the cybersecurity data increases, the recognition H. Wang, and M.-G. Jang, “Fine-grained named entity recognition using
accuracy greatly improves, but when the cybersecurity data conditional random fields for question answering,” in Asia Information
Retrieval Symposium. Springer, 2006, pp. 581–587.
surpasses a certain threshold, the recognition accuracy be- [11] M. A. Khalid, V. Jijkoun, and M. De Rijke, “The impact of named
come stable with further increase of the size of dataset. This entity normalization on information retrieval for question answering,” in
phenomenon coincides with the intuition that our proposed European Conference on Information Retrieval. Springer, 2008, pp. 705–
710.
model can efficiently handle different dataset sizes with a [12] Ö. Uzuner, Y. Luo, and P. Szolovits, “Evaluating the state-of-the-art in
significant improvement of recognition power. automatic de-identification,” Journal of the American Medical Informatics
Association, vol. 14, no. 5, pp. 550–563, 2007.
[13] M. Krallinger, O. Rabal, F. Leitner, M. Vazquez, D. Salgado, Z. Lu,
V. CONCLUSION R. Leaman, Y. Lu, D. Ji, D. M. Lowe et al., “The chemdner corpus of
In this paper, we propose a novel security named entity chemicals and drugs and its annotation principles,” Journal of cheminfor-
matics, vol. 7, no. 1, p. S2, 2015.
recognition method by incorporating regular expressions, [14] A. Ritter, E. Wright, W. Casey, and T. Mitchell, “Weakly supervised
known-entity dictionary and conditional random fields. The extraction of computer security events from twitter,” in Proceedings of the
proposed model consists of rule-based extractor, dictionary- 24th International Conference on World Wide Web. International World
Wide Web Conferences Steering Committee, 2015, pp. 896–905.
based extractor and CRF-based extractor. In particular, [15] V. Mulwad, W. Li, A. Joshi, T. Finin, and K. Viswanathan, “Extracting in-
rule-based extractor is designed to locate specific entities, formation about security vulnerabilities from web text,” in Proceedings of
dictionary-based extractor includes known-entity lists, and the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence
and Intelligent Agent Technology-Volume 03. IEEE Computer Society,
CRF-based extractor leverages the identified entities by rule- 2011, pp. 257–260.
based and dictionary-based extractors to further improve the [16] I. Deliu, C. Leichter, and K. Franke, “Extracting cyber threat intelligence
recognition performance. In order to verify the effectiveness from: Support vector machines versus convolutional neural networks,” in
of our proposed method, we construct a standard ground truth 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017,
pp. 3648–3656.
dataset through manually collaborative annotation and per- [17] X. Liao, K. Yuan, X. Wang, Z. Li, L. Xing, and R. Beyah, “Acing the
form extensive experiments. The experimental results show ioc game: Toward automatic discovery and analysis of open-source cyber
that our proposed method can outperform the state-of-the-art threat intelligence,” in Proceedings of the 2016 ACM SIGSAC Conference
on Computer and Communications Security. ACM, 2016, pp. 755–766.
baseline methods. In the future work, we will focus on ex- [18] M. Balduccini, S. Kushner, and J. Speck, “Ontology-driven data semantics
ploring neural network methods to deal with the problem of discovery for cyber-security,” in International Symposium on Practical
label imbalance and feature automatic extraction. The results Aspects of Declarative Languages. Springer, 2015, pp. 1–16.
[19] A. Joshi, R. Lal, T. Finin, and A. Joshi, “Extracting cybersecurity related
of our work will have a positive effect on the extraction of linked data from text,” in Semantic Computing (ICSC), 2013 IEEE Sev-
security knowledge and the construction of knowledge graph. enth International Conference on. IEEE, 2013, pp. 252–259.
[20] R. Lal et al., “Information extraction of security related entities and
concepts from unstructured text,” 2013.
REFERENCES [21] C. L. Jones, R. A. Bridges, K. M. Huffer, and J. R. Goodall, “Towards a
[1] S. Mittal, P. K. Das, V. Mulwad, A. Joshi, and T. Finin, “Cybertwitter: relation extraction framework for cyber-security concepts,” in Proceedings
Using twitter to generate alerts for cybersecurity threats and vulnerabili- of the 10th Annual Cyber and Information Security Research Conference.
ties,” in Proceedings of the 2016 IEEE/ACM International Conference on ACM, 2015, p. 11.
Advances in Social Networks Analysis and Mining. IEEE Press, 2016, [22] R. A. Bridges, C. L. Jones, M. D. Iannacone, K. M. Testa, and J. R.
pp. 860–867. Goodall, “Automatic labeling for entity extraction in cyber security,” arXiv
[2] R. P. Khandpur, T. Ji, S. Jan, G. Wang, C.-T. Lu, and N. Ramakrishnan, preprint arXiv:1308.4941, 2013.
“Crowdsourcing cybersecurity: Cyber attack detection using social me- [23] H. Gasmi, A. Bouras, and J. Laval, “Lstm recurrent neural networks for
dia,” in Proceedings of the 2017 ACM on Conference on Information and cybersecurity named entity recognition,” in Proceedings of the Thirteenth
Knowledge Management. ACM, 2017, pp. 1049–1057. International Conference on Software Engineering Advances. IARIA
[3] G. Husari, X. Niu, B. Chu, and E. Al-Shaer, “Using entropy and mutual in- XPS Press, 2018, pp. 1–6.
formation to extract threat actions from cyber threat intelligence,” in 2018 [24] Ya, G. Qin, W. Shen, Y. Zhao, M. Chen, X. Yu, and Jin, “A network
IEEE International Conference on Intelligence and Security Informatics security entity recognition method based on feature template and cnn-
(ISI). IEEE, 2018, pp. 1–6. bilstm-crf,” Frontiers of IT & EE, vol. 20, no. 6, pp. 872–884, 2019.

10 VOLUME 4, 2020

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2984582, IEEE Access

F. Yi et al.: Cybersecurity Named Entity Recognition using Multi-modal Ensemble Learning

[25] Dongliang, J. Xu, B. Pan, and Wang, “Multiple kernels learning-based


biological entity relationship extraction method,” Journal of Biomedical
Semantics, vol. 8, no. 1, pp. 38:1–38:8, 2017.
[26] L. Obrst, P. Chase, and R. Markeloff, “Developing an ontology of the cyber
security domain,” in Proceedings of the Seventh International Conference
on Semantic Technologies for Intelligence, Defense, and Security, 2012,
pp. 49–56.
[27] S. Weerawardhana, S. Mukherjee, I. Ray, and A. Howe, “Automated ex-
traction of vulnerability information for home computer security,” in Inter-
national Symposium on Foundations and Practice of Security. Springer,
2014, pp. 356–366.
[28] D. R. Miller, T. Leek, and R. M. Schwartz, “A hidden markov model infor-
mation retrieval system,” in Proceedings of the 22nd Annual International
ACM SIGIR Conference on Research and Development in Information
Retrieval, vol. 99, 1999, pp. 214–221.

VOLUME 4, 2020 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.

You might also like