0% found this document useful (0 votes)
19 views16 pages

HINTI

The document presents HINTI, a novel Cyber Threat Intelligence (CTI) framework that utilizes heterogeneous graph convolutional networks to model and analyze interdependent relationships among heterogeneous Indicators of Compromise (IOCs). It addresses limitations in existing CTI frameworks by improving IOC extraction accuracy and providing a comprehensive view of threat landscapes through a multi-granular attention mechanism. The proposed system has demonstrated effectiveness in identifying relationships among various IOCs and offers insights for proactive cybersecurity measures.

Uploaded by

w4tdk6j56x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views16 pages

HINTI

The document presents HINTI, a novel Cyber Threat Intelligence (CTI) framework that utilizes heterogeneous graph convolutional networks to model and analyze interdependent relationships among heterogeneous Indicators of Compromise (IOCs). It addresses limitations in existing CTI frameworks by improving IOC extraction accuracy and providing a comprehensive view of threat landscapes through a multi-granular attention mechanism. The proposed system has demonstrated effectiveness in identifying relationships among various IOCs and offers insights for proactive cybersecurity measures.

Uploaded by

w4tdk6j56x
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Cyber Threat Intelligence Modeling Based on Heterogeneous Graph

Convolutional Network

Jun Zhao1,2 , Qiben Yan3,* , Xudong Liu1,2,* , Bo Li1,2,* , Guangsheng Zuo1,2


1School of Computer Science and Engineering, Beihang University, Beijing, China
2 Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, China
3 Computer Science and Engineering, Michigan State University, East Lansing, Michigan, USA

Abstract rity databases (e.g., CVE1 , ExploitDB2 ), CTI can facilitate


organizations to proactively release more comprehensive and
Cyber Threat Intelligence (CTI), as a collection of threat in- valuable threat warnings (e.g., malicious IPs, malicious DNS,
formation, has been widely used in industry to defend against malware and attack patterns, etc.) when a system encounters
prevalent cyber attacks. CTI is commonly represented as In- suspicious outsider or insider threats [23].
dicator of Compromise (IOC) for formalizing threat actors. In recent years, CTI has been increasingly adopted by se-
However, current CTI studies pose three major limitations: curity researchers and industries to share and capitalize on
first, the accuracy of IOC extraction is low; second, isolated their discoveries, as well as by security firms to analyze the
IOC hardly depicts the comprehensive landscape of threat threat landscape using the deluge of data [5, 30]. The orig-
events; third, the interdependent relationships among hetero- inal CTI extraction and analysis require extensive manual
geneous IOCs, which can be leveraged to mine deep security inspection of the attack event descriptions, which becomes
insights, are unexplored. In this paper, we propose a novel rather time-consuming given the enormous volume of threat
CTI framework, HINTI, to model the interdependent relation- description data. Recent studies have proposed automated
ships among heterogeneous IOCs to quantify their relevance. methods to extract CTI in the form of Indicator of Compro-
Specifically, we first propose multi-granular attention based mise (IOC) from unstructured security-related texts [4, 22].
IOC recognition method to boost the accuracy of IOC extrac- Most of existing IOC extraction methods, such as CleanMX 3 ,
tion. We then model the interdependent relationships among PhishTank4 , IOC Finder5 , and Gartner peer insight6 , follow
IOCs using a newly constructed heterogeneous information the OpenIOC [10] standard and extract particular types of
network (HIN). To explore intricate security knowledge, we IOCs (e.g., malicious IP, malware, file Hash, etc) by lever-
propose a threat intelligence computing framework based on aging a set of regular expressions. However, such extraction
graph convolutional networks for effective knowledge dis- approaches face three major limitations. First, the accuracy of
covery. Experimental results demonstrate that our proposed IOC extraction is low, which inevitably leads to the omission
IOC extraction approach outperforms existing state-of-the-art of critical threat objects [22]. Second, isolated IOC hardly
methods, and HINTI can model and quantify the underlying depicts the comprehensive landscape of threat events, making
relationships among heterogeneous IOCs, shedding new light it virtually impossible for CTI subscribers to gain a complete
on the evolving threat landscape. picture into the incoming threat. Third, there is a lack of an
effective computing framework to efficiently measure the
interactive relationships among heterogeneous IOCs.
1 Introduction To combat these limitations, HINTI, a cyber threat intel-
ligence framework based on heterogeneous information net-
Nowadays, we are witnessing a rapid growth of sophisti- work (HIN), is proposed to model and analyze CTIs. Specifi-
cated cyber attacks (e.g., zero-day attack, advanced persis- cally, HINTI proposes a multi-granular attention based IOC
tent threat) [34]. Such attacks can effortlessly bypass tra- recognition approach to boost the accuracy of IOC extraction.
ditional defenses such as firewalls and intrusion detection 1 http://cve.mitre.org/
2 https://www.exploit-db.com/
systems (IDS), breach critical infrastructures, and cause dev-
3 http://list.clean-mx.com
astating catastrophes [7, 20, 39]. To combat these emerg- 4 https://www.phishtank.com
ing threats, security experts proposed Cyber Threat Intel- 5 https://www.fireeye.com/services/freeware/ioc-finder.html
ligence (CTI) that consists of a collection of Indicators of 6 https://www.gartner.com/reviews/market/security-threat-intelligence-

Compromise (IOCs). Different from the well-known secu- services

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 241
Then, HINTI leverages HIN to model the interdependent re- cyber threat. Take WannaCry virus as an example, if security
lationships among heterogeneous IOCs, which can depict a guards can timely capture the threat intelligence that indicates
more comprehensive picture of threat events. Moreover, we “Wannacry permeates port 445 to attack systems", the mali-
propose a novel CTI computing framework to quantify the cious intrusion can be easily blocked by locking down port
interdependent relationships among IOCs, which helps un- 445, which is the most direct and effective way of combating
cover novel security insights. In short, the main contributions WannaCry virus [7].
of this paper are summarized as follows: Meanwhile, social media (e.g., Blog, Twitter) has increas-
ingly become an effective medium for exchanging and spread-
• Multi-granular Attention based IOC Recognition.
ing cyber security information, on which security experts and
We propose multi-granular attention based IOC recogni-
organizations often post their discoveries to reach a wider
tion approach to automatically extract cyber threat ob-
audience promptly [32]. These posts usually include a mag-
jects from multi-source threat texts, which can learn the
nitude of valuable security-related information [25, 26], such
significance of features with different scales. Our model
as the warnings regarding latest vulnerabilities, hacking tools,
outperforms the state-of-the-art methods in terms of IOC
data breaches, and existing or upcoming software patches,
recognition accuracy and recall. In total, we extract over
providing one of the main raw materials for extracting CTIs.
397,730 IOCs from the unstructured threat descriptions.
Early CTI extraction requires extensive manual inspec-
• Heterogeneous Threat Intelligence Modeling. We tion of the threat descriptions, which becomes rather time-
model different types of IOCs using heterogeneous infor- consuming given the enormous volume of such descrip-
mation network (HIN), which introduces various meta- tions. To facilitate the automatic generation and sharing of
paths to capture the interdependent relationships among CTI, a large volume of methods and frameworks are es-
heterogeneous IOCs while depicting a more comprehen- tablished, such as IODEF [13], STIX [4], TAXII [36], Ope-
sive landscape of cyber threat events. nIOC [10], and CyBox [28], CleanMX, PhishTank, IOC Finder
and [2,22,31,46]. The majority of existing methods and frame-
• Threat Intelligence Computing Framework. We are works leverage regular expressions to extract IOCs, which
the first to present the concept of cyber threat intelligence may suffer from a low accuracy due to their inability in pre-
computing, and design a general computing framework, defining a comprehensive set of the IOC rules.
as shown in Figure 5. The framework first utilizes a
weight-learning based node similarity measure to quan-
tify the interdependent relationships between heteroge- 2.2 Motivation
neous IOCs, and then leverages attention mechanism
The main goal of this research is to address the limitations
based heterogeneous graph convolutional networks to
of the existing CTI analytics frameworks by modeling the
embed the IOCs and their interactive relations.
interdependent relationships among heterogeneous IOCs. As
• Threat Intelligence Prototype System. To evaluate the a motivating example, given a security-related post: “Last
effectiveness of HINTI, we implement a CTI prototype week, Lotus exploited CVE-2017-0143 vulnerability to affect
system. Our system has identified 1,262,258 relation- a larger number of Vista SP2 and Win7 SP devices in Iran.
ships among 6 types of IOCs including attackers, vul- CVE-2017-0143 is a remote code execution vulnerability in-
nerabilities, malicious files, attack types, devices and cluding a malicious file SMB.bat”. Most of the existing CTI
platforms, based on which we further assess the real- frameworks can extract specific IOCs but neglect the rela-
world applicability of HINTI using three real-world ap- tionships among them, as shown in Figure 1. It is obvious
plications: IOC significance ranking, attack preference that such IOCs could not draw a comprehensive picture of
modeling, and vulnerability similarity analysis. the threat landscape, let alone quantifying their interactive
relationships for in-depth security investigation.
Different from the existing CTI frameworks, HINTI aims
2 Background
to implement a computational CTI framework, which can not
only extract IOCs efficiently but also model and quantify the
2.1 Cyber Threat Intelligence
relationships between them. Here, we use the motivating ex-
Cyber Threat Intelligence (CTI) extracted from security- ample to illustrate how HINTI works step-by-step in practice
related data is structured information used to proactively resist as follows.
cyber attacks. CTI consists of reasoning, context, mechanism, (i) First, the security-related post is annotated by the B-
indicators, implications, and actionable advice about an ex- I-O sequence tagging method [43] as shown in Figure 2,
isting or evolving cyber attack that can be used to create where B-X indicates that the element of type X is located at
preventive measures in advance [30]. CTI allows subscribers the beginning of the fragment, I-X means that the element
to expand their visibility into the fast-growing threat land- belonging to type X is located in the middle of the fragment,
scape, and enable early identification and prevention of a and O represents a non-essential element of other types. In this

242 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association
Figure 3: A miniature of a constructed CTI includes attacker,
vulnerability, malicious file, attack type, device, and platform,
which describes a particular threat: an attacker utilizes CVE-
2017-0143 vulnerability to invade Vista SP2 and Win7 SP1
Figure 1: An example of extracted IOCs without any relations devices. CVE-2017-0143 is a remote code execution vulnera-
among them. bility that involves a malicious file “SMB.bat".

research, we annotated 30,000 such training samples from based on heterogeneous graph convolutional networks (see
5,000 threat description texts, which are the raw materials Section 4.3) to effectively quantify the relationships among
used to build our IOC extraction model. IOCs for knowledge discovery. Particularly, our proposed
CTI computing framework characterizes IOCs and their re-
lationships in a low-dimensional embedding space, based on
which CTI subscribers can use any classification (e.g., SVM,
Naive Bayes) or clustering algorithms (K-Means, DBSCAN)
to gain new threat insights, such as predicting which attack-
ers are likely to intrude their systems, and identifying which
vulnerabilities belong to the same category without the expert
knowledge. In this work, we mainly explore three real-world
applications to verify the effectiveness and efficiency of the
Figure 2: An annotation example with the B-I-O tagging CTI computing framework: IOC significance ranking (see
method. Section 6.1), attack preference modeling (see Section 6.2),
and vulnerability similarity analysis (see Section 6.3).
(ii) The labeled training samples are then fed into the pro-
posed neural network architecture as shown in Figure 6 to 2.3 Preliminaries
train our proposed IOC extraction model. As a result, HINTI
has the ability to accurately identify and extract IOCs (e.g., In this paper, we use heterogeneous information net-
Lotus, SMB.bat) using the proposed multi-granular attention work (HIN) to model the relationships among IOCs. Here, we
based IOC extraction method (see Section 4.1 for details). first introduce the preliminary knowledge about HIN.
(iii) HINTI then utilizes the syntactic dependency parser
[6] (e.g., subject-predicate-object, attributive clause, etc.) to Definition 1 Heterogeneous Information Network of
extract associated relationships between IOCs, each of which Threat Intelligence (HINTI) is defined as a directed graph
is represented as a triple (IOCi , relation, IOC j ). In this moti- G = (V, E, T ) with an object type mapping function ϕ : V →M
vating example, HINTI extracts the relationship triples involv- and a link type mapping function Ψ : E→R. Each object v
ing (Lotus, exploit,CV E − 2017 − 0143), (CV E − 2017 − ∈ V belongs to one particular object type in the object type
0143, a f f ect,VistaSP2), etc. Note that the extracted rela- set M: ϕ(v) ∈ M, and each link e ∈ E belongs to a particular
tional triples can be incrementally pooled into an HIN to relation type in the relation type set R: Ψ(e)∈R. T denotes
model the interactions among IOCs for depicting a more the types of nodes and relationships.
comprehensive threat landscape. Figure 3 shows a miniature
graphic representation describing interactive relations among In this paper, we focus on 6 common types of IOCs: at-
IOCs extracted from the example. Compared with Figure 1, it tacker (A), vulnerability (V), device (D), platform (P), mali-
is obvious that HINTI can depict a more intuitive and compre- cious file (F), and attack type (T), and the links connecting
hensive threat landscape than the previous approaches. In this different objects represent different semantic relationships.
paper, we mainly consider 9 relationships (R1∼R9) among 6 To better understand the object types and relationship types in
different types of IOCs (see Section 4.2 for details). HINTI, it is imperative to provide the meta-level (i.e., schema-
(iv) Finally, HINTI integrates a CTI computing framework level) description of the network. Consequently, we introduce

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 243
(a) Network schema. (b) Network instance.

Figure 4: Network schema and instance of HIN containing 6 types of IOCs. (a): The network schema of HIN, which depicts
belong
the relationship template among different types of IOCs, such as Device −−−→ Plat f orm. (b): An instance of network schema,
belong
which describes the concrete relationships between IOCs by following a network schema, e.g., O f f ice 2012 −−−→ Windows.

the network schema [37] for describing the meta-level rela-


Table 1: Meta-paths used in HINTI.
tionships.

ID Meta-path
Definition 2 Network Schema. The network schema of
HINTI, denoted as HS = (A, R), is a meta template for G = P1 Attacker-Attacker
(V, E, T ) with the object type mapping ϕ : V →M and the link
P2 Device-Device
type mapping Φ : E→R. It is a directed graph of object types
M with edges representing relations from R. P3 Vulnerability-Vulnerability
P4 Attacker-Vulnerability-Attacker
The schema of HINTI specifies type constraints on the sets of P5 Attacker-Device-Attacker
IOCs and their relationships. Figure 4 (a) shows the network P6 Device-File-Device
schema of HINTI, which defines the relationship templates P7 Device-Platform-Device
among IOCs to effectively guide the semantic exploration in
P8 Vulnerability-File-Vulnerability
HINTI. For example, for a relationship that describes: “at-
tackers invade devices", the semantic schema can be written P9 Vulnerability-Type-Vulnerability
invade P10 Vulnerability-Device-Vulnerability
as: attacker−−−→device. Figure 4 (b) presents a concrete
instance of the network schema. P11 Vulnerability-Platform-Vulnerability
P12 Attacker-Device-Platform-Device-Attacker
Definition 3 Meta-path. A meta-path [37] P is a path se- P13 Attacker-Vul-Device-Vul-Attacker
quence defined on a network schema S = (N, R), and is repre- P14 Attacker-Vul-Platform-Vul-Attacker
R
1 R
2 i R
sented in the form of N1 −→ N2 −→ ··· −→ Ni+1 , which defines P15 Attacker-Vul-Type-Vul-Attacker
a composite relation R = R1  R2  · · ·  Ri+1 , where  denotes
P16 Vul-Device-Platform-Device-Vul
the composition operator on relations. A meta-path P is a
symmetric path when the relation R defined by the path is P17 Attacker-Vul-Device-Platform-Device-Vul-Attacker
symmetric (i.e, P is equal to P−1 ).

Table 1 displays the meta-paths considered in HINTI. For 3 Architecture Overview of HINTI
example, the relationship “the attackers (A) exploit the same
vulnerability (V)" can be described by a length-2 meta-path HINTI, as a cyber threat intelligence extraction and comput-
exploit exploit − ing framework, is capable of effectively extracting IOCs from
attacker −−−−→ vulnerability − −−−− → attacker, denoted as threat-related descriptions and formalizing the relationships
AVAT (P4 ), which means that the two attackers exploit the among heterogeneous IOCs to demystify new threat insights.
same vulnerability. Similarly, AV DPDT V T AT (P17 ) portrays As shown in Figure 5, HINTI consists of four major compo-
a close relationship between IOCs that “two attackers who nents, including:
leverage the same vulnerability invade the same type of device
and ultimately destroy the same type of platform". • Data Collection and IOC Recognition. We first de-

244 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association
Figure 5: The overall architecture of HINTI. HINTI consists of four major components: (a) collecting security-related data and
extracting threat objects (i.e., IOCs); (b) modeling interdependent relationships among IOCs into a heterogeneous information
network; (c) embedding nodes into a low-dimensional vector space using weight-learning based similarity measure; and (d)
computing threat intelligence based on graph convolutional networks and knowledge mining.

velop a data collection system to automatically capture the relevance among IOCs by leveraging graph convolu-
security-related data from blogs, hacker forum posts, se- tional network (GCN). Our proposed threat intelligence
curity news, and security bulletins. The system utilizes computing framework could reveal richer security knowl-
a breadth-first search to collect the HTML source code, edge within a more comprehensive threat landscape.
and then leverages Xpath (XML Path language) to ex-
tract threat-related descriptions. After that, we utilize a
4 Methodology
multi-granular attention based IOC recognition method
to extract IOC from the collected threat-related descrip-
4.1 Multi-granular Attention Based IOC Ex-
tions (see Section 4.1 for details).
traction
• Relation Extraction and IOC modeling. HINTI ad- Extracting IOCs from multi-source threat texts is one of the
dresses the challenge of CTI modeling by leveraging major tasks of threat intelligence analytics, and the quality
heterogeneous information networks, which can natu- of the extracted IOCs significantly influences the analysis
rally depict the interdependent relationships between results of cyber threats. Recently, Bidirectional Long Short-
heterogeneous IOCs. As an example, Figure 4 shows a Term Memory+Conditional Random Fields (BiLSTM+CRF)
model that capture the interactive relationships among at- model [15] has demonstrated excellent performance in text
tacker, vulnerability, malicious file, attack type, platform, chunking and Named-entity Recognition (NER). However,
and device (see Section 4.2 for details). directly applying this model to IOC extraction is unlikely to
succeed, since threat texts usually contain a large number of
• Meta-path Design and Similarity Measure. Meta- threat objects with different grams and irregular structures.
path is an effective tool to express the semantic rela- Consequently, we need an efficient method to learn the dis-
tions among IOCs in constructed HIN. For instance, criminative characteristics of IOCs with different sizes. In this
exploit exploit −
attacker −−−−→ vulnerability − −−−− → attacker, indi- paper, we propose a multi-granular attention based IOC extrac-
cates that two attackers are related by exploiting the same tion method, which can extract threat objects with different
vulnerability. We design 17 types of meta-paths (See granularity. Particularly, Figure 6 presents the proposed IOC
Table 1) to describe the interdependent relationships be- extraction framework, which leverages the multi-granular at-
tween IOCs. With these meta-paths, we present a weight- tention mechanism to characterize IOCs. Different from the
learning based node similarity computing approach to traditional BiLSTM+CRF model, we introduce new word-
quantify and embed the relationships as the premise for embedding features with different granularities to capture the
threat intelligence computing. characteristics of IOCs with different sizes. Furthermore, we
utilize a self-attention mechanism to learn the importance of
• Threat Computing and Knowledge Mining. In this the features to improve the accuracy of IOC extraction.
component, an effective threat intelligence computing Our proposed method takes a threat description sentence
framework is proposed, which can quantify and measure X = (x1 , x2 , · · · , xi ) as input, where xi represents i-th word

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 245
to the type ŷi . Next, we utilize so f tmax function to normalize
the overall label score:

eS(X,Y )
p(Y |X) = (5)
∑ eS(X,Y )
ỹ∈YX

We design an objective function to maximize the proba-


bility p(Y |X) to achieve the highest label score for different
IOCs, which can be written as follows:

argmax log(p(Y |X)) = argmax (S(X,Y )−


(6)
log( ∑ eS(X,ỹ) ))
ỹ∈YX
Figure 6: The framework of multi-granular IOC extraction.
By solving the objective function above, we assign correct
labels to the n-gram components, according to which we can
in X. We first chunk the sentence into n-gram components
identify the IOCs with different lengths. Our multi-granular
including char-level, 1-gram, 2-gram, and 3-gram, which are
attention based IOC extraction method is capable of identify-
the inputs of our trained model, written as follows:
ing different types of IOCs, and its evaluation is presented in
j Section 5.
exji = Vembedding (xi ), (1)

j
where Vembedding transforms the chunk with granularity j into 4.2 Cyber Threat Intelligence Modeling
a vector space and xi is the i-th word in a sentence X. Thus,
CTI modeling is an important step to explore the intricate
the threat description sentence Xi can be vectorized as follows:
relationship between heterogeneous IOCs. In our work, HIN
is introduced to group different types of IOCs into a graph
→ to explore their interactive relationships. In this section, we
j
hi = LST M f orward ([exj0 , exj1 , · · · , exji ]) portray the main principle of threat intelligence modeling.
← (2)
j
To model the intricate interdependent relationships among
hi = LST Mbackward ([exj0 , exj1 , · · · , exji ]) IOCs, we define the following 9 relationships among 6 types
of IOCs as follows.
→ ←
j j
where hi and hi are the embedded features learned by • R1: To depict the relation of an attacker and the ex-
forward LSTM and backward LSTM, respectively. Let O be ploited vulnerability, we construct the attacker-exploit-
the output of Bi-LSTM, which is a weighted sum of embedded vulnerability matrix A. For each element Ai, j ∈ {0, 1},
features with weights corresponding to the importance of Ai, j =1 indicates attacker i exploits vulnerability j.
different features:
O = H ·W T (3) • R2: To depict the relation of an attacker and a device,
→ ←
we build the attacker-invade-device matrix D. For each
j
j j j j j j
where H = ∑ ~βi σ(h1 , h2 , · · · , hi ), hi = (hi + hi ), ~βi is the element Di, j ∈ {0, 1}, Di, j =1 indicates attacker i invades
j device j.
weight vector to represent the importance of hi , in which
j and i are the segmentation granularity of sentences and the • R3: Two attacker can cooperate to attack a target. To
corresponding index of the chunk. W is the parameter matrix. study the relationship of attacker-attacker, we construct
Given a security-related sentence X = (x1 , x2 , · · · , xi ), its the attacker-cooperate-attacker matrix C. For each ele-
corresponding threat object sequence Y = (ŷ1 , ŷ2 , · · · , ŷi ), and ment Ci, j ∈ {0, 1}, Ci, j =1 indicates there exists a cooper-
its output of Bi-LSTM O, we can compute the overall label ative relationship between attacker i and j.
score of X and Y as follows:
n
• R4: To describe the relation of a vulnerability and the
S(X,Y ) = ∑ (Aŷi ,ŷi+1 + Oi,ŷi ) (4) affected device, we build the vulnerability-affect-device
i=0 matrix M. For each element Mi, j ∈ {0, 1}, Mi, j =1 indi-
cates vulnerability i affects device j.
where Aŷi ,ŷi+1 is the state transition matrix in CRF model, and
Oi,ŷi , as the output of Bi-LSTM hidden layer (calculated by • R5: A vulnerability is often labeled as a specific attack
Eq. (3)), represents the label score of i-th word corresponding type by Common Vulnerabilities and Exposures (CVE)

246 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association
system7 . To explore the relation of vulnerability-attack framework based on heterogeneous graph convolutional net-
type, we build the vulnerability-belong-attack type ma- works, which quantifies and measures the relevance between
trix G, where each element Gi, j ∈ {0, 1} denotes if vul- IOCs by analyzing meta-path based semantic similarity. Here,
nerability i belongs to an attack type j. we first provide a formal definition of threat intelligence com-
puting based on heterogeneous graph convolutional networks.
• R6: A vulnerability often involves one or more malicious
files. To describe the relation of vulnerability-file, we Definition 4 Threat Intelligence Computing Based on
build the vulnerability-include-file matrix F. For each Heterogeneous Graph Convolutional Networks. Given the
element Fi, j ∈ {0, 1}, Fi, j =1 denotes that vulnerability i threat intelligence graph G = (V, E), the meta-path set M =
includes malicious file j. {P1 , P2 , · · · , Pi }. The threat intelligence computing: i) com-
putes the similarity between IOCs based on meta-path Pi to
• R7: A malicious file often targets a specific device. We
generate corresponding adjacency matrix Ai ; ii) constructs
establish the file-target-device matrix T to explore the
the feature matrix of nodes Xi by embedding attribute in-
relation of file-device. For each element Ti, j ∈ {0, 1},
formation of IOCs into a latent vector space; iii) conducts
Ti, j =1 indicates malicious file i targets device j.
graph convolution GCN(Ai , Xi ) to quantify the interdependent
• R8: Oftentimes, a vulnerability evolves from another. relationships between IOCs by following meta-path Pi , and
To study the relationship of vulnerability-vulnerability, embeds them into a low-dimensional space.
we build the vulnerability-evolve-vulnerability matrix
The threat intelligence computing aims to model the seman-
E, where each element Ei, j ∈ {0, 1} indicates if vulnera-
tic relationships between IOCs and measure their similarity
bility i evolves from vulnerability j.
based on meta-paths, which can be used for advanced secu-
• R9: To depict the relation device-platform that a de- rity knowledge discovery, such as threat object classification,
vice belongs to a platform, we build the device-belong- threat type matching, threat evolution analysis, etc. Intuitively,
platform matrix P where each element Pi, j ∈ {0, 1} il- the objects connected by the most significant meta-paths tend
lustrates if device i belongs to platform j. to bear more similarity [37]. In this paper, we propose a
weight-learning based threat intelligence similarity measure,
Based on the above 9 types of relationships, HINTI which uses self-attention to improve the performance of simi-
leverages the syntactic dependency parser [6] (e.g., subject- larity measurement between any two IOCs. This method can
predicate-object, attributive clause, etc.) to automatically ex- be formalized as below:
tract the 9 relationships among IOCs from threat descriptions,
each of which is represented as a triple (IOCi , relation, IOC j ). Definition 5 Weight-learning based Node Similar-
For instance, given a security-related description: “On May ity Measure. Given a set of symmetric meta-path set
0
12, 2017, WannaCry exploited the MS17-010 vulnerability P = [Pm ]Mm=1 , the similarity S(hi , h j ) between any two IOCs
to affect a larger number of Windows devices, which is a hi and h j is defined as:
ransomware attack via encrypted disks". Using the syntactic
dependency parser, we can extract the following triples: (Wan- M
0
→ 2· | {hi→ j ∈ Pm } |
naCry, exploit, MS17-010), (MS17-010, affect, Windows de- S(hi , h j ) = ∑ w (7)
vice), (WannaCry, is, ransomware). Such triples extracted m | {hi→i ∈ Pm } | + | {h j→ j ∈ Pm } |
from various data sources can be incrementally assembled where hi→ j ∈ hm is a path instance between IOC hi and
into HINTI to model the relationships among IOCs, which h j following meta-path Pm , hi→i ∈ Pm is that between IOC
could offer a more comprehensive threat landscape that de- instance hi and hi , and h j→ j ∈ Pm is that between IOC in-
scribes the threat context. Particularly, we further define 17 stance h j and h j , where | {hi→ j ∈ Pm } |=CPm (i, j), | {hi→i ∈
types of meta-paths shown in Table 1 to probe into the interde- Pm } |=CPm (i, i), | {h j→ j ∈ Pm } |=CPm ( j, j), and CPm is the
pendent relationships over attackers, vulnerabilities, malicious →
commuting matrix based on meta-path Pm defined below. w =
files, attack types, devices, and platforms. HINTI is able to
[w1 , . . . , wm , . . . , wM0 ] denote the meta-path weights, where
convey a richer context of threat events by scrutinizing 17 0
types of meta-paths and reveal the in-depth threat insights wm is the weight of meta-paths Pm , and M is the number of
behind the heterogeneous IOCs (see Section 6 for details). meta-paths.
S(hi , h j ) is defined in two parts: (1) the semantic overlap
in the numerator, which describes the number of meta-path
4.3 Threat Intelligence Computing between IOC instance hi and h j ; (2) and the semantic broad-
ness in the denominator, which depicts the number of total
In this section, we illustrate the concept of threat intelligence
meta-paths between themselves. The larger number of meta-
computing, and design a general threat intelligence computing
path between IOC instance hi and h j , the more similar the
7 http://cve.mitre.org/ two IOCs are, which is normalized by the semantic broadness

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 247
(1)
of denominator. Moreover, different from existing similarity hidden layer with H feature maps, Wi ∈ RH×F is a hidden-to-
measures [37], we propose an attention mechanism based output weight matrix with F feature maps in the output layer,
similarity measure method by introducing the weight vec- Xi ∈ RN×d , N is the number of a specific type of IOCs, d is the

tor w = [w1 , . . . , wm , . . . , wM0 ], which is a trainable coefficient dimension of their corresponding features, and σ is another
1 1
vector to learn the importance of different meta-paths for activation function, such as sigmoid. Âi = D̃− 2 Ãi D̃− 2 can be
characterizing IOCs. calculated offline. Here, we leverage the cross-entropy loss to
Obviously, it is computationally expensive to measure the optimize the performance of our proposed threat intelligence
similarity among IOCs in the constructed heterogeneous framework, written as follows:
graph as it usually requires to randomly walk a larger number F
of nodes in the graph. Fortunately, in our work, it is unneces- Loss(Yl f , Zl f ) = − ∑
sary to walk through the entire graph as we prescribe a limit
∑ Yl f · lnZl f (10)
l∈Yl f =1
by introducing predefined meta-paths, and we only focus on
the symmetrical meta-paths presented in Table 1. To calcu- where Yl is the set of node indices that have labels, Yl f is the
late the similarity between IOCs under different meta-path real label, and Zl f is a corresponding label that our model
instances, we need to compute the corresponding commuting predicts. Based on Eq. (10), we conduct stochastic gradient
matrices [37] following the meta-paths. descent to continuously optimize the neural network weights
0 (0) (1) →
Wi , Wi , and w to reduce the loss, and build a general
Given a meta-path set P = ∑M m {A1 , A2 , · · · , Al+1 }, the
threat intelligence computing framework. Using this frame-
meta-path based commuting matrix can be defined as CP =
work, security organizations are able to mine richer security
UA1 A2 ◦UA2 A3 · · · ◦UAAl+1 , where CP (i, j) represents the prob-
knowledge hidden in the interdependent relationships among
ability of object i ∈ A1 reaching object j ∈ Al+1 under the path
IOCs.
P, and ◦ is a connection operation. These symmetric meta-
paths not only efficiently reduce the complexity of walking,
but also ensures that the commuting matrix can be easily de- 5 Experimental Evaluation
composed, which greatly reduces the computational costs. In
addition, the symmetric meta-paths in the graph G allow us to 5.1 Dataset and Settings
leverage the pairwise random-walk [37] to further accelerate
the computation. We develop a threat data collector to automatically collect
With Eq. (7) and pairwise random-walk, we can obtain the cyber threat data from a set of sources, including 73 inter-
similarity embedding between any two IOCs hi and h j under national security blogs (e.g., fireeye, cloudflare), hacker fo-
a meta-path set P. Based on the low-dimensional similarity rum posts (e.g., Blackhat, Hack5), security bulletins (e.g.,
embedding, we derive a weighted adjacent matrix between Microsoft, Cisco), CVE detail description, and ExploitDB. A
IOCs, denoted as Ai ∈ RN×N , where N is the number of a complete list of data sources is presented in the Baidu cloud8 .
specific type of IOC in G. Meanwhile, to utilize the attributed We set up a daemon to collect the newly generated security
information of nodes, we train a word2vec model [24] to events every day. So far, more than 245,786 security-related
embed the attribute information of nodes into a feature matrix data describing threat events have been collected. For training
Xi ∈ RN×d , where N is the number of IOCs in Ai , and d is the and evaluating our proposed IOC extraction method, 30,000
dimension of node feature. With the learned adjacency matrix samples from 5,000 texts are annotated by utilizing the B-I-O
Ai and its feature matrix Xi , we can leverage the classical sequence tagging method (see Section 2.2 for the example),
GCN [18] to characterize the relationship between IOC hi and an annotation example is shown in Figure 2.
and h j . Particularly, the layer-wise propagation rule of GCN For 30,000 labeled samples, we randomly select 60% of
can be defined as below: samples as a training set, 20% of samples as a verification
set, and the rest of the samples as our test set. Based on the
1 1
H (l+1) = σ(D̃− 2 ÃD̃− 2 H (l)W (l) ) (8) data sets, we comprehensively evaluate the performance of
HINTI for extracting IOCs and threat intelligence computing.
where à = A + IN is the adjacency matrix of IOCs with self- We run all of the experiments on 16 cores Intel(R) Core(TM)
connections, IN is the identity matrix, D̃ii = ∑ j Ãi j , and W (l) is i7-6700 CPU @3.40GHz with 64GB RAM and 4× NVIDIA
a l-th layer trainable weight matrix. σ(·) denotes an activation Tesla K80 GPU. The software programs are executed on the
function, such as relu. H (l) ∈ RN×d is the matrix of activation TensorFlow-GPU framework on Ubuntu 16.0.4.
in the l-th layer. We perform graph convolution [18] on Ai
and Xi to generate the embedding Z between IOCs belonging
to type i: 5.2 Evaluation of IOC Extraction
(0) (1) A set of experiments are conducted to evaluate the sensitiv-
Z = f (Xi , Ai ) = σ(Âi · relu(Âi XiWi )Wi ) (9)
ity of different parameters in the multi-granular based IOC
(0)
where Wi ∈ Rd×H is an input-to-hidden weight matrix for a 8 https://pan.baidu.com/s/1J631WMYY_T_awa8aY5xy3A

248 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association
extraction model. We mainly consider 8 hyper-parameters
that seriously impact the performance of the model as shown
in Table 2. More specifically, Embedding_dim is one of
the most important factors that determine the generaliza-
tion capability of the model. Here, we fix other parame-
ters while fine-tuning the embedding size in the range of
(50, 100, 150, 200, 250, 300, 350, 400). Experimental results
show that the accuracy of extracted IOC achieves the best
when Embedding_dim=300. Learning_rate is another ma-
jor factor for determining the stride of gradient descent in
minimizing the loss function, which determines whether
the model can find a global optimal solution. We fix other
parameters to fine-tune the Learning_rate in the range
of (0.001, 0.005, 0.01, 0.05, 0.1, 0.5), and the performance
reaches the best when the Learning_rate=0.001. Similarly, Figure 7: Performance of IOC extraction using embedding
we fine-tune the other hyper-parameters with 5,000 epochs, features with different granularity.
and the hyper-parameters allowing our model to perform op-
timally are recorded in Table 2.
performance is presented in Table 3. Overall, our IOC ex-
traction method demonstrates excellent performance in terms
Table 2: Hyperparameters setting in the multi-granular based of precision, recall, and Micro-F1 (i.e., micro-averaged F1-
IOC extraction method. score) for most types of IOCs, such as function, malicious
IP, and device. However, we observe a performance degra-
Parameter value Parameter Value dation when recognizing software and malware. This can be
Embedding_dim 300 Hidden_dim 128
attributed to the fact that most software and malware is named
by random strings such as md5 hash. Moreover, we find that
Sequence_length 500 Epoch_num 5,000 the number of training samples impacts the performance of
Learning_rate 0.001 Batch_size 64 the model. Specifically, the performance becomes unsatisfac-
tory (e.g., Software, Malware) when the number of a certain
Dropout_rate 0.5 Optimizer Adam
type of training samples is insufficient (i.e., less than 5,000).
In order to verify the effectiveness of multi-granular embed-
ding features, we assess the performance of IOC extraction
Table 3: Performance of IOC extraction w.r.t. IOC types. with features of different granularity including char-level,
1-gram, 2-gram, 3-gram and multi-granular features. The ex-
IOC Type Precision Recall Micro-F1 perimental results are demonstrated in Figure 7, from which
we can observe that the proposed multi-granular embedding
IP 99.56 99.52 99.54 feature outperforms others since it leverages the attention
File 94.36 96.88 95.60 mechanism to simultaneously learn multi-granular features to
Type 99.86 99.81 99.83 characterize different patterns of IOCs.
Email 99.32 99.87 99.49 Next, to verify the effectiveness of the proposed IOC ex-
Device 93.26 92.78 93.02 traction method, we compare it with the state-of-the-art en-
Vender 93.07 94.45 94.24
tity recognition approaches, including general NER tools
NLTK NER9 , and Stanford NER10 , professional IOC extrac-
Version 96.98 97.99 97.48
tion method Stucco [16] and iACE [22], and popular en-
Domain 96.58 95.89 96.23
tity recognition approaches CRF [21], BiLSTM and BiL-
Software 88.25 89.31 88.78 STM+CRF [15]. The experimental results of different meth-
Function 95.03 95.59 95.31 ods on real-world data are demonstrated in Table 4.
Platform 94.31 92.57 93.43 The results indicate that our proposed IOC extraction out-
Malware 89.76 91.23 90.49 performs the state-of-the-art entity recognition methods and
Vulnerability 99.25 98.73 98.99 tools in terms of precision, recall, and Micro-F1, and its im-
Other 98.29 98.42 98.35 provement can be attributed to the following factors. First,
compared with Standford NER and NLTK NER, the NLP tools
9 http://www.nltk.org/book/ch07.html

In this paper, we extract 13 major types of IOCs, and the 10 https://stanfordnlp.github.io/CoreNLP/ner.html

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 249
model the interdependent relationships among IOCs with
Table 4: Performance of threat entity recognition using
two characteristics: first, the isolated IOCs can be integrated
different methods.
into a graph-based HIN to clearly display the associated rela-
tionships among IOCs, which is capable of directly depicting
Method Accuracy Precision Micro-F1 the basic threat profile. For example, Figure 3 depicts a threat
profiling sample: an attacker utilizes CVE-2017-0143 vulnera-
NLTK NER 69.45 68.51 67.49 bility to invade Vista SP2 and Win7 SP1 devices belonging to
Stanford NER 68.35 66.74 68.58 the Microsoft platform, and CVE-2017-0143 is a remote code
iACE 92.14 91.26 92.25 execution vulnerability that uses a “SMB.bat" malicious file.
Stucco 91.16 92.21 91.47 Second, the significance of IOCs in HINTI can be naturally
ranked based on the proposed threat intelligence computing
CRF 92.64 91.80 92.65
framework.
BiLSTM 94.78 95.21 94.35
Table 5 shows the top 5 authoritative ranking score [35] of
BiLSTM+CRF 96.38 96.42 96.27 vulnerability, attacker, attack type, and platform, from which
Multi-granular 98.59 98.72 98.69 security experts can gain a clear insight into the impact of
each IOC. Degree centrality [33], which describes the number
of links incident upon a node, is widely used in evaluating
trained with general news corpora, our method uses a security- the importance of a node in a graph. It can used to quantify
related training corpus collected and labeled by ourselves as the immediate risk of a node that connects with other nodes
a data source for training our model. Second, different from for delivering network flows, such as virus spreading. Here,
the rule-based extraction approaches (e.g., iACE and Stucco), degree centrality can be utilized in verifying the effectiveness
our proposed deep learning based method provides an end-to- of the proposed threat intelligence computing framework in
end system with more advanced features to represent various ranking the importance of IOCs. It is worth noting that both
IOCs. Third, comparing to RNN-based methods (e.g., BiL- our ranking method and degree centrality work regardless of
STM and BiLSTM+CRF), our proposed method brings in the time of attacks. We compute the degree centrality rank-
multi-granular embedding sizes (char-level, 1-gram, 2-gram, ing of IOCs based on the fact that the node with a higher
and 3-gram) to simultaneously learn the characteristics of vari- degree centrality is more important than a node with a lower
ous sizes and types of IOCs, which can identify more complex one. For instance, if the degree centrality of a vulnerability
and irregular IOCs. Last but not the least, our method imple- is higher, it indicates that this vulnerability is exploited by
ments an attention mechanism to learn the weights of features more attackers or it affects more devices. The ranking result
with various scales to effectively characterize different types of degree centrality shown in Table 5 is consistent with the
of IOCs, further enhancing the IOC recognition accuracy. ranking result based on the proposed threat intelligence com-
puting framework, demonstrating the capability of the CTI
computing framework in ranking the importance of different
6 Application of Threat Intelligence Comput- types of IOCs.
ing
Our proposed threat intelligence computing framework based 6.2 Attack Preference Modeling
on heterogeneous graph convolutional networks can be used Attack preference modeling is meaningful for security organi-
to mine novel security knowledge behind heterogeneous IOCs. zations to gain insight into the attack intention of attackers,
In this section, we evaluate its effectiveness and applicability build attack portraits, and develop personalized defense strate-
using three real-world applications: profiling and ranking for gies. Here, we leverage HINTI to integrate different types of
CTIs, attack preference modeling, and vulnerability similarity IOCs and their interdependent relationships to comprehen-
analysis. sively depict the picture of attack events, which helps model
the attack preferences. With the proposed threat intelligence
6.1 Threat Profiling and Significance Ranking computing framework, we model attack preferences by clus-
of IOCs tering the embedded attackers’ vectors.
In this task, each malicious IP address is treated as an in-
Due to the disparity in the significance of threats, it is im- truder, and its attack preferences are mainly reflected in three
portant to derive the threat profile and rank the significance features including the platforms it destroys (including Win-
of IOCs for demystifying the landscape of threats. However, dows, Linux, Unix, ASP, Android, Apache, etc), the industries
most of the existing CTIs are incapable of modeling the asso- it invades (e.g., education, finance, government, Internet of
ciated relationships between heterogeneous IOCs. Things, and Industrial control system, etc), and the exploit
Different from isolated CTIs, HINTI leverages HIN to types it employs (e.g., DOS, Buffer overflow, Execute code,

250 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association
Table 5: The significance ranking of different types of IOCs. (CV E1 : CVE-2017-0146, CV E2 : CVE-2006-5911, CV E3 :
CVE-2008-6543, CV E4 : CVE-2012-1199, CV E4 : CVE-2006-4985; AR: Authoritative Ranking, DC: Degree Centrality value.)

Vulnerability Attacker Platform Attack Type

No. AR DC Monicker AR DC Category AR DC Exploit_type AR DC

CV E1 0.2713 7,643 Meatsploit 0.2764 549 PHP 0.4562 17,865 Webapps 0.5494 11,648
CV E2 0.2431 7,124 GSR team 0.1391 327 windows 0.2242 13,793 DOS 0.1772 8,741
CV E3 0.2132 6,833 Ihsan 0.0698 279 Linux 0.0736 8,792 Overflow 0.1533 7,652
CV E4 0.1826 6,145 Techsa 0.0695 247 Linux86 0.0623 8,147 CSRF 0.0966 5,433
CV E5 0.1739 5,637 Aurimma 0.0622 204 ASP 0.0382 5,027 SQL 0.0251 2,171

(a) AV DV T AT (P13 ) (b) AV DPDT V T AT (P17 ) (c) AV PV T AT (P14 )

Figure 8: The performance of attack preference modeling with different meta-paths, in which the preference of attacker i is
reduced to a two-dimensional space (xi , yi ) and each cluster represents a group with a specific attack preference.

Sql injection, XSS, Gain information, etc).


Table 6: Performance of modeling attack preference with
Specifically, we first utilize our proposed threat intelligence
different meta-paths.
computing framework to embed each attacker into a low-
dimensional vector space, and then perform DBSCAN algo-
rithm on the embedded vector to cluster attackers with the Metapath Accuracy Precision Micro-F1
same preferences into corresponding groups. Figure 8 shows
P1 74.31 76.22 75.25
the top 3 clustering results under different types of meta-
paths, in which the meta-path AV DPDT V T AT (P17 ) performs P4 71.16 73.27 72.16
the best performance with compact and well-separated clus- P5 69.15 71.43 70.27
ters, indicating that it contains richer semantic relationships P12 72.14 76.46 74.24
for characterizing attack preferences than other meta-paths. P13 79.65 81.31 80.47
To verify the effectiveness of attack preference modeling,
P14 77.48 79.34 78.40
we identify 5,297 distinct attackers (each unique IP address is
treated as an attacker) who have submitted at least 10 cyber P15 80.17 79.76 79.96
attacks. For these attackers, five cybersecurity researchers P17 81.39 81.72 81.55
consisting of three doctoral and two master students spent
about fortnight to manually annotate their attack preferences
from three perspectives: the platforms they destroyed, the in-
dustries they attacked, and the attack types they exploited. To ing attack preferences. In the attack modeling scenario, we
ensure the accuracy of data labeling, we test the consistency only focus on the meta-paths that both the start node and the
of the tags for the 5,297 attackers and remove the samples end node are attackers. The experimental results are demon-
with ambiguous tags. As a result, we obtain 3,000 samples strated in Table 6. Obviously, different meta-paths display
with consistent tags. Based on the labeled samples, we further different abilities in characterizing the attack preferences of
evaluate the performance of different meta-paths on model- cyber intruders. The performance when using P17 is more

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 251
superior than the one with other meta-paths, which indicates
Table 7: Accuracy of vulnerability clustering.
that P17 holds more valuable information that characterizes
the attack preferences of cybercriminals, since P17 includes
the semantics information of P1 , P4 , P5 and P12 ∼ P15 . Cluster ID Vulnerability type Accuracy
In addition, we compare the capabilities of our proposed
cluster1 Denial of Service 80.12
computing framework with those of other state-of-the-art em-
bedding methods in terms of attack preference modeling. cluster2 XSS 83.53
Our analysis result shows that the accuracy of attack pref- cluster3 Execute Code 81.50
erence modeling reaches 0.81, which outperforms the exist- cluster4 Overflow 76.50
ing popular models Node2vec (with precision of 0.71) [1], cluster5 Gain Privilege 91.56
metapaht2vec (with precision of 0.73) [11] and HAN (with
cluster6 Bypass Something 71.74
precision of 0.76) [42]. The performance improvement can
be attributed to the following characteristics. First, our com- cluster7 CSRF 93.27
puting framework utilizes weight-learning to learn the signifi- cluster8 File Inclusion 61.72
cance of different meta-paths for evaluating the similarity be- cluster9 Gain Informa 70.42
tween attackers. Second, the proposed computing framework cluster10 Directory Traversal 69.49
leverages GCN to learn the structural information between
cluster11 Memory Corruption 81.56
attackers to obtain more discriminative structural features that
improves the performance of attack preference modeling. cluster12 SQL Injection 80.67
average # 78.51
6.3 Vulnerability Similarity Analysis
Vulnerability classification or clustering is crucial for conduct- computing framework. We found that the proportion of these
ing vulnerability trend analysis, the correlation analysis of two types of vulnerabilities is too small (cluster 8 is 3.4%
incidents and exploits, and the evaluation of countermeasures. and cluster 10 is 4.2%), making our computing framework
The traditional vulnerability analysis relies on the manual very likely to be under-fit with insufficient data. However,
investigation of the source codes, which requires expert ex- the proposed computing framework performs well on most
pertise and consumes considerable efforts. In this section, types of vulnerabilities in an unsupervised manner, especially
we propose an unsupervised vulnerability similarity analy- given sufficient samples (e.g, cluster 7 is 17.6% and cluster 5
sis method based on the proposed threat intelligence com- is 15.7%).
puting framework, which can automatically group similar In addition, by examining the clustering results, we have
vulnerabilities into corresponding communities. Particularly, an observation that the vulnerabilities in the same cluster are
the vulnerability-related IOCs are first embedded into a low- likely to have evolutionary relationships. For instance, CVE-
dimensional vector space using CTI computing framework. 2018-0802, an office zero-day vulnerability, is evolved from
Then, the DBSCAN algorithm is performed on the embed- the CVE-2017-11882. They both include EQNEDT32.exe file
ded vector space to cluster vulnerabilities into corresponding used to edit the formula in Office software, which allows
communities. The clustering results are presented in Figure 9. remote attackers to execute arbitrary codes by constructing
Figure 9 (c) shows all vulnerabilities are clustered into 12 a malformed font name. The modeling and computation of
clusters using meta-path V DPDT V T (P16 ), which is very close interdependent relationships among IOCs in HINTI facilitate
to the classification standard (i.e., 13) recommended by CVE the discovery of such intricate connections between vulnera-
details, an authoritative database that publishes vulnerability bilities.
information. By manually analyzing the training samples, we
In summary, HINTI is capable of depicting a more compre-
find that HTTP Response Splitting vulnerability does not ap-
hensive threat landscape, and the proposed CTI computing
pear in our dataset. Therefore, our cluster number (i.e., 12)
framework has the ability to bring novel security insights
is consistent with CVE Details11 . To further validate the ef-
toward different real-world security applications. However,
fectiveness of threat intelligence computing framework for
there are still numerous opportunities for enhancing these
vulnerability clustering, we randomly select 100 vulnerabili-
security applications. Specifically, for attack preference mod-
ties from each cluster for manual inspection to measure the
eling, although each individual IP address is treated as an
consistency of the vulnerability types in each cluster, and the
attacker, we cannot determine whether it belongs to a real
results are presented in Table 7. Obviously, the clustering per-
attacker or is disguised by a proxy. Fortunately, even if the
formance of cluster 8 (i.e, File Inclusion) and cluster 10 (i.e.,
real attack address cannot be captured, understanding the at-
Directory Traversal) is remarkably worse than other clusters.
tack preferences of these IP proxies, which are widely used
To explain the reason, we examine our training data and the
in cybercrime, is also meaningful for gaining insight into the
11 https://www.cvedetails.com/ cyber threats. For vulnerability similarity analysis, data imbal-

252 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association
(a) AV DV T AT (P13 ) (b) V DV T (P10 ) (c) V DPDT V T (P16 )

Figure 9: Illustration of the vulnerability similarity analysis based on different meta-paths, in which vulnerability i can be reduced
into a two-dimensional space (xi , yi ) and each cluster indicates a particular type of vulnerability.

ance issue affects the performance of model, and inadequate Heterogeneous Information Network. Real-world systems
training samples often result in model underfitting, as shown often contain a large number of interacting, multi-typed ob-
in the case of cluster 8 and cluster 10. jects, which can naturally be expressed as a heterogeneous
information network (HIN). HIN, as a conceptual graph repre-
sentation, can effectively fuse information and exploit richer
7 Related Work semantics in interacting objects and links [37]. HIN has been
applied to network traffic analysis [38], public social media
Cyber Treat Intelligence. An increasing number of security data analysis [45], and large-scale document analysis [41].
vendors and researchers start exploring CTI for protecting Recent applications of HIN include mobile malware detec-
system security and defending against new threat vectors [28]. tion [14] and opioid user identification [12]. In this paper, for
Existing CTI extraction tools such as IBM X-Force12 , Threat the first time, we use HIN for CTI modeling.
crowd13 , Opencti.io14 , AlienVault15 , CleanMX 16 and Phish- Graph Convolutional Network. Graph convolutional net-
Tank17 use regular expression to synthesize IOC from the works (GCN) [17] has become an effective tool for address-
descriptive texts. However, these methods often produce ing the task of machine learning on graphs, such as semi-
high false positive rate by misjudging legitimate entities as supervised node classification [17], event classification [29],
IOCs [22]. clustering [8], link prediction [27], and recommended sys-
Recently, Balzarotti et al. [2] develop a system to extract tem [44]. Given a graph, GCN can directly conduct the con-
IOCs from web pages and identify malicious URLs from volutional operation on the graph to learn the nonlinear em-
JavaScript codes. Sabottke et al. [31] propose to detect po- bedding of nodes. In our work, to discern and reveal the inter-
tential vulnerability exploits by extracting and analyzing the active relationships between IOCs, we utilize GCN to learn
tweets that contain “CVE” keyword. Liao et al. [22] present a more discriminative representation from attributes and graph
tool, iACE, for automatically extracting IOCs, which excels structure simultaneously, which is the premise for threat intel-
at processing technology articles. Nevertheless, iACE identi- ligence computing.
fies IOCs from a single article, which does not consider the
rich semantic information from multi-source texts. Zhao et
al. [46] define different ontologies to describe the relationship 8 Discussion
between entities based on expert knowledge. Numerous popu-
lar CTI platforms including IODEF [9], STIX [3], TAXII [40], Data Availability. The proposed framework assumes that
OpenIOC [13], and CyBox [19] focus on extracting and shar- sufficient threat description data can be obtained for generat-
ing CTI. Yet, none of the existing approaches could uncover ing comprehensive and the latest CTIs. Fortunately, with the
the interdependent relations among CTIs extracted from multi- growing prosperity of social media, an increasing number of
source texts, let alone quantifying CTIs’ relevance and mining security-related data (e.g., blogs, posts, news and open secu-
valuable threat intelligence hidden behind the isolated CTIs. rity databases) can be collected effortlessly. To automatically
collect security-related data, we develop a crawler system to
12 https://exchange.xforce.ibmcloud.com/
collect threat description data from 73 international security
13 https://www.threatcrowd.org/
14 https://demo.opencti.io/
sources (e.g., blogs, hacker forum posts, security bulletins,
15 https://otx.alienvault.com/
etc), providing sufficient raw materials for generating cyber
16 http://list.clean-mx.com threat intelligence.
17 https://www.phishtank.com Model Extensibility. In this paper, 6 types of IOCs and 9

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 253
types of relationships are modeled in HINTI. However, our IOCs and their interactions to further improve the embedding
proposed framework is extensible, in which more types of performance, and to strike a balance between the accuracy
IOCs and relationships can be enrolled to represent richer and and computational complexity of the model. We will also
more comprehensive threat information, such as malicious investigate the feasibility of security knowledge prediction
domains, phishing Emails, attack tools, their interactions, etc. based on HINTI to infer the potential future relationships
High-level Semantic Relations. In view of the computa- between the vulnerabilities and devices.
tional complexity of the model, our threat intelligence comput-
ing method focues on utilizing the meta-paths to quantify the
similarity between IOCs while ignoring the influence of the
Acknowledgement
meta-graph on it, which inevitably misses characterizing some We would like to thank our shepherd Tobias Fiebig, and
high-level semantic information. Nevertheless, the proposed the anonymous reviewers for providing valuable feedback
computing framework introduces an attention mechanism to on our work. We also thank Hao Peng and Lichao Sun
learn the signification of different meta-paths to character- for their feedback on the early version of this work. This
ize IOCs and their interactive relationships, which effectively work was supported in part by National Science Founda-
compensates for the performance degradation caused by ig- tion grants CNS1950171, CNS-1949753. It was also sup-
noring the meta-graphs. ported by the NSFC for Innovative Research Group Science
Security Knowledge Reasoning. Although our proposed Fund Project (62141003), National Key R&D Program China
framework exhibits promising results in CTI extraction and (2018YFB0803 503), the 2018 joint Research Foundation
modeling computing, how to implement advanced security of Ministry of Education, China Mobile (MCM20180507)
knowledge reasoning and prediction is still an open problem, and the Opening Project of Shanghai Trusted Industrial Con-
e.g., it remains challenging to predict whether a vulnerability trol Platform (TICPSH202003020-ZC). Any opinions, find-
could potentially affect a particular type of devices in the ings, and conclusions or recommendations expressed in this
future. We will investigate this problem in the future. material do not necessarily reflect the views of any funding
agencies.
9 Conclusion
This work explores a new direction of threat intelligence com- References
puting, which aims to uncover new knowledge in the relation-
[1] Jure Leskovec Aditya Grover. node2vec: Scalable fea-
ships among different threat vectors. We propose HINTI, a
ture learning for networks. In Acm Sigkdd International
cyber threat intelligence framework, to model and quantify the
Conference on Knowledge Discovery Data Mining,
interdependent relationships among different types of IOCs
2016.
by leveraging heterogeneous graph convolutional networks.
We develop a multi-granular attention mechanism to learn [2] Marco Balduzzi, Marco Balduzzi, and Davide Balzarotti.
the importance of different features, and model the interde- Automatic extraction of indicators of compromise for
pendent relationships among IOCs using HIN. We propose web applications. In WWW, 2016.
the concept of threat intelligence computing and present a
general intelligence computing framework based on graph [3] Sean Barnum. Standardizing cyber threat intelligence
convolutional networks. Experimental results demonstrate information with the structured threat information ex-
that the proposed multi-granular attention based IOC extrac- pression (stix). Mitre Corporation, 11:1–22, 2012.
tion method outperforms the existing state-of-the-art methods.
The proposed threat intelligence computing framework can [4] Eric W Burger, Michael D Goodman, Panos Kam-
effectively mine security knowledge hidden in the interdepen- panakis, and Kevin A Zhu. Taxonomy model for cyber
dent relationships among IOCs, which enables crucial threat threat intelligence information exchange technologies.
intelligence applications such as threat profiling and ranking, In Proceedings of the 2014 ACM Workshop on Infor-
attack preference modeling, and vulnerability similarity analy- mation Sharing & Collaborative Security, pages 51–60,
sis. We would like to emphasize that the knowledge discovery 2014.
among interdependent CTIs is a new field that calls for a [5] Onur Catakoglu, Marco Balduzzi, and Davide Balzarotti.
collaborative effort from security experts and data scientists. Automatic extraction of indicators of compromise for
In future, we plan to develop a predicative and reasoning web applications. The web conference, pages 333–343,
model based on HINTI and explore preventative countermea- 2016.
sures to protect cyber infrastructure from future threats. We
also plan to add more types of IOCs and relations to depict [6] Danqi Chen and Christopher D Manning. A fast and ac-
a more comprehensive threat landscape. Moreover, we will curate dependency parser using neural networks. pages
leverage both meta-paths and meta-graphs to characterize the 740–750, 2014.

254 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association
[7] Qian Chen and Robert A. Bridges. Automated behav- [19] Tero Kokkonen. Architecture for the cyber security situ-
ioral analysis of malware a case study of wannacry ran- ational awareness system. In Internet of Things, Smart
somware. In 16th IEEE ICMLA, pages 454–460, 2017. Spaces, and Next Generation Networks and Systems,
pages 294–302. Springer, 2016.
[8] Weilin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy
Bengio, and Chojui Hsieh. Cluster-gcn: An efficient al- [20] Mehmet Necip Kurt, Yasin Yılmaz, and Xiaodong Wang.
gorithm for training deep and large graph convolutional Distributed quickest detection of cyber-attacks in smart
networks. knowledge discovery and data mining, pages grid. IEEE Transactions on Information Forensics and
257–266, 2019. Security, 13(8):2015–2030, 2018.
[9] Roman Danyliw, Jan Meijer, and Yuri Demchenko. The [21] John Lafferty, Andrew Mccallum, and Fernando Pereira.
incident object description exchange format. Interna- Conditional random fields: Probabilistic models for seg-
tional Journal of High Performance Computing Appli- menting and labeling sequence data. ICML, pages 282–
cations, 5070:1–92, 2007. 289, 2001.
[10] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. [22] Xiaojing Liao, Yuan Kan, Xiao Feng Wang, Li Zhou,
metapath2vec: Scalable representation learning for het- and Raheem Beyah. Acing the ioc game: Toward auto-
erogeneous networks. In 23rd ACM SIGKDD, pages matic discovery and analysis of open-source cyber threat
135–144, 2017. intelligence. In ACM Sigsac Conference on Computer
[11] Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. Communications Security, 2016.
metapath2vec: Scalable representation learning for het-
[23] Rob Mcmillan. Open threat intelligence.
erogeneous networks. In 23rd ACM SIGKDD, pages
http://www.gartner.com/doc/2487216/
135–144. ACM, 2017.
definition-threat-intelligence. Accessed
[12] Yujie Fan, Yiming Zhang, Yanfang Ye, and Xin Li. Au- January 20, 2020.
tomatic opioid user detection from twitter: Transductive
[24] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
ensemble built on different meta-graph based similari-
Dean. Efficient estimation of word representations in
ties over heterogeneous information network. In IJCAI,
vector space. arXiv preprint arXiv:1301.3781, 2013.
pages 3357–3363, 2018.
[13] Fireeye. Openioc. https://www.fireeye.com/blog/ [25] Sudip Mittal, Prajit Kumar Das, Varish Mulwad, Anu-
threat-research/2013/10/openioc-basics. pam Joshi, and Tim Finin. Cybertwitter: Using twitter
html. Accessed January 20, 2020. to generate alerts for cybersecurity threats and vulnera-
bilities. In Proceedings of the 2016 IEEE Advances in
[14] Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Ab- Social Networks Analysis and Mining, pages 860–867,
dulhayoglu. Hindroid: An intelligent android malware 2016.
detection system based on structured heterogeneous in-
formation network. In Proceedings of the 23rd ACM [26] Eric Nunes, Ahmad Diab, Andrew Gunn, Ericsson
SIGKDD International Conference on Knowledge Dis- Marin, Vineet Mishra, Vivin Paliath, John Robertson,
covery and Data Mining, pages 1507–1515, 2017. Jana Shakarian, Amanda Thart, and Paulo Shakarian.
Darknet and deepnet mining for proactive cybersecurity
[15] Zhiheng Huang, Wei Xu, and Kai Yu. Bidirectional lstm- threat intelligence. In 2016 IEEE ISI, pages 7–12, 2016.
crf models for sequence tagging. arXiv:1508.01991,
2015. [27] Shirui Pan, Ruiqi Hu, Guodong Long, Jing Jiang, Lina
Yao, and Chengqi Zhang. Adversarially regularized
[16] Michael D Iannacone, Shawn Bohn, Grant Nakamura, graph autoencoder for graph embedding. pages 2609–
John Gerth, Kelly MT Huffer, Robert A Bridges, Erik M 2615, 2018.
Ferragut, and John R Goodall. Developing an ontology
for cyber security knowledge graphs. CISR, 15:12, 2015. [28] P Pawlinski, P Jaroszewski, P Kijewski, L Siewierski,
P Jacewicz, P Zielony, and R Zuber. Actionable infor-
[17] Thomas Kipf and Max Welling. Semi-supervised clas-
mation for security incident response. European Union
sification with graph convolutional networks. arXiv:
Agency for Network and Information Security, Herak-
Learning, 2016.
lion, Greece, 2014.
[18] Thomas N Kipf and Max Welling. Semi-supervised
classification with graph convolutional networks. arXiv
preprint arXiv:1609.02907, 2016.

USENIX Association 23rd International Symposium on Research in Attacks, Intrusions and Defenses 255
[29] Hao Peng, Jianxin Li, Qiran Gong, Yangqiu Song, [38] Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan,
Yuanxing Ning, Kunfeng Lai, and Philip S Yu. Fine- Philip S Yu, and Xiao Yu. Pathselclus: Integrating meta-
grained event categorization with heterogeneous graph path selection with user-guided object clustering in het-
convolutional networks. In Proceedings of the 28th In- erogeneous information networks. ACM Transactions
ternational Joint Conference on Artificial Intelligence, on TKDD, 7(3):11, 2013.
pages 3238–3245. AAAI Press, 2019.
[39] Wiem Tounsi and Helmi Rais. A survey on technical
[30] Sara Qamar, Zahid Anwar, Mohammad Ashiqur Rah- threat intelligence in the age of sophisticated cyber at-
man, Ehab Al-Shaer, and Bei-Tseng Chu. Data-driven tacks. Computers & Security, 72:212–233, 2018.
analytics for cyber-threat intelligence and information
sharing. Computers & Security, 67:35–58, 2017. [40] Thomas D Wagner, Esther Palomar, Khaled Mahbub,
and Ali E Abdallah. Towards an anonymity supported
[31] Carl Sabottke, Octavian Suciu, and Tudor Dumitras. Vul- platform for shared cyber threat intelligence. risks and
nerability disclosure in the age of social media: exploit- security of internet and systems, pages 175–183, 2017.
ing twitter for predicting real-world exploits. In USENIX
[41] Chenguang Wang, Yangqiu Song, Haoran Li, Ming
Security, 2015.
Zhang, and Jiawei Han. Text classification with het-
[32] Carl Sabottke, Octavian Suciu, and Tudor Dumitras, . Vul- erogeneous information network kernels. In 13th AAAI,
nerability disclosure in the age of social media: exploit- 2016.
ing twitter for predicting real-world exploits. In 24th
[42] Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Peng Cui,
USENIX Security, pages 1041–1056, 2015.
P. Yu, and Yanfang Ye. Heterogeneous graph attention
[33] Deepak Sharma and Avadhesha Surolia. Degree Cen- network. 2019.
trality. Springer New York, 2013.
[43] Jie Yang, Shuailong Liang, and Yue Zhang. Design chal-
[34] Saurabh Singh, Pradip Kumar Sharma, Seo Yeon Moon, lenges and misconceptions in neural sequence labeling.
Daesung Moon, and Jong Hyuk Park. A comprehensive arXiv: Computation and Language, 2018.
study on apt attacks and countermeasures for future
[44] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombat-
networks and communications: challenges and solutions.
chai, William L Hamilton, and Jure Leskovec. Graph
Journal of Supercomputing, pages 1–32, 2016.
convolutional neural networks for web-scale recom-
[35] Yizhou Sun and Jiawei Han. Mining heterogeneous mender systems. In Proceedings of the 24th ACM
information networks: Principles and methodologies. SIGKDD International Conference on Knowledge Dis-
Synthesis Lectures on Data Mining and Knowledge Dis- covery & Data Mining, pages 974–983. ACM, 2018.
covery, 3(2):1–159, 2012.
[45] Jiawei Zhang, Xiangnan Kong, and Philip S Yu. Trans-
[36] Yizhou Sun and Jiawei Han. Mining heterogeneous ferring heterogeneous links across location-based social
information networks: a structural analysis approach. networks. In The 7th ACM international conference on
ACM SIGKDD, 14(2):20–28, 2013. Web search and data mining, pages 303–312, 2014.

[37] Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S Yu, and [46] Yishuai Zhao, Bo Lang, and Ming Liu. Ontology-based
Tianyi Wu. Pathsim: Meta path-based top-k similarity unified model for heterogeneous threat intelligence inte-
search in heterogeneous information networks. Proceed- gration and sharing. In 2017 11th IEEE International
ings of the VLDB Endowment, 4(11):992–1003, 2011. Conference on Anti-counterfeiting, Security, and Identi-
fication (ASID), pages 11–15, 2017.

256 23rd International Symposium on Research in Attacks, Intrusions and Defenses USENIX Association

You might also like