0% found this document useful (0 votes)

17 views76 pages

INFO 554x Data Analytics For Security: Spring '24

The document discusses the importance of data analytics in cybersecurity, highlighting various threats and methods for identifying persistent threats through data analysis. It covers concepts such as association rule mining, network monitoring, and preprocessing techniques to enhance security measures. The document emphasizes the need for a multi-dimensional approach to understanding and mitigating cybersecurity risks.

Uploaded by

Vismay Parekh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views76 pages

INFO 554x Data Analytics For Security: Spring '24

Uploaded by

Vismay Parekh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 76

INFO 554x

Data Analytics for Security

Spring ’24
Introduction
Cybersecurity refers to securing valuable electronic assets and physical assets,
which have electronic access, against unauthorized access. These assets may
include personal devices, networked devices, information assets, and
infrastructural assets, among others.
• Applications with several dependencies,
• Logical errors in software code (such as Heartbleed),
• Organizational risks (multiple partners, such as in cyber-attacks at Target and
the Pacific Northwest National Laboratories [PNNL]),
• Lack of user awareness of cybersecurity risks (such as in social engineering and
phishing),
• Personality traits of individuals using the systems (phishing), and
• Inherent issues in the Internet protocol being used.
Multi-dimensional view of Threats
• Computer networks evolve over time, and
communication patterns change over time.  Events become relevant when they occur together
• Attacks may have a spatial pattern. Sources and
destinations in certain key geo locations are
These events become relevant with proximities rather than
more important for monitoring and preventing an causation
attack.  The two items are in close Proximity, based on
• Any type of an attack has common underpinnings
of how it is carried out;
• Source Proximity
Spatial Distance • Destination Proximity
• Temporal proximity or Delay

N12
N2
N8

N3 N5 N4

N4 N6 N9 N10 N11 N3

0 4 8 12 16
Time
Goal : to identify potential “collusions” among the entities responsible for these two events

3
Logical View

Request Request
Business
User Internet Application

Response Response

The logical view of the user requesting access to a business application

appears to be fairly straightforward.

Within this pipeline there could be several points through which the
request and response pass
Leading to several opportunities in the end-to-end process for data
collection to help understand when a cyber threat may occur in this
process

03/25/2025 ADD A FOOTER 4

Physical View
Request
User Domain
Firewall
Name
Response Router System
Internet
Service
Internet Provider
Routing
Internet Table
Exchange
Router
Router Enterprise Server
Router
Routing
Table Internal IDS
Systems
Router Internet
Router DB Files
Service
Provider IDS
Router IDS
Router Switch

Firewall Firewall
Request

IDS Web Server

Response
Public Facing Systems
Business Application
5
4/21/24 6
Data Analytics for Cybersecurity, ©2022 Janeja All rights reserved.
Advanced Persistent Threat

4/21/24 7
Computational Model- Password Theft

4/21/24 8
Data Analytics for Cybersecurity, ©2022 Janeja All rights reserved.
Data Analytics Methods for Persistent Threats -
Data discretization:
Segregates data into time segment bins representing the multiple time periods, which makes it easier to evaluate
the persistence of the threats across time periods represented by bins.
Network traffic, which is suspicious, is captured through Intrusion detection System (IDS) in an alert log capturing
the source, destination IP with timestamps and the priority.
The high priority threats can be obvious suspicious activities and the low priority threats can be potential alerts that
may or may not be real. Such high priority alerts when looked at in combination over a long period of time may turn
out to be persistent
Frequent Patterns:

Analyzing threats individually may not provide the overall persistence of a threat. It is important to study the data
from the overall perspective and identify frequent patterns on a given timeframe to discover persistent threats.
Thus, the data can be mined using Association Rule Mining techniques and isolate the Persistent High Priority
Threats (PHPT) from the individual high priority threats. The PHPT are consistent and may indicate APT behaviors.

4/21/24 9
Data Analytics for Cybersecurity, ©2022 Janeja All rights reserved.
Data Analytics Methods for Persistent
Threats - Persistent Threats
Persistent threats are high priority, unusual
threats that stay consistent over a long period
of time
The binned datasets and their corresponding
frequent patterns can be intersected with each
other to isolate the non-intersecting high
priority threats to detect the potential
persistent threats
Persistent threat pattern have the following
key characteristics:
Consistent over time (occurs repeatedly)
Single source (same key players)
Individually each is a threat
Unusual (pattern may be repeated at an
unusual time of the day or single source
accessing different sources repeatedly at
the same time of the day)
Non-obvious (non-overlapping)
4/21/24 10
Data Analytics for Cybersecurity, ©2022 Janeja All rights reserved.
Definition: Association Rule
TID Items
 Association Rule 1 Bread, Milk
– An implication expression of the form 2 Bread, Diaper, Beer, Eggs
X  Y, where X and Y are itemsets 3 Milk, Diaper, Beer, Coke
– Example: 4 Bread, Milk, Diaper, Beer
{Milk, Diaper}  {Beer} 5 Bread, Milk, Diaper, Coke

 Rule Evaluation Metrics

Example:
– Support (s)
{Milk, Diaper}  Beer
 Fraction of transactions that contain
both X and Y
 (Milk , Diaper, Beer) 2
– Confidence (c) s  0.4
 Measures how often items in Y |T| 5
appear in transactions that
 (Milk, Diaper, Beer) 2
contain X c  0.67
 (Milk , Diaper) 3
Association Rule Mining Task
Given a set of transactions T, the goal of association rule mining is to find all rules having
support ≥ minsup threshold
confidence ≥ minconf threshold
Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf thresholds
Þ Computationally prohibitive!

Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent
itemset
Mining Association Rules
Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset, where each rule is a
binary partitioning of a frequent itemset

Frequent itemset generation is still computationally expensive

Reducing Number of Candidates
Apriori principle:
If an itemset is frequent, then all of its subsets must also be frequent

Apriori principle holds due to the following property of the

support measure:

X , Y : ( X  Y )  s( X ) s(Y )
Support of an itemset never exceeds the support of its subsets
This is known as the anti-monotone property of support
Illustrating Apriori Principle null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to
be
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Infrequent

ABCD ABCE ABDE ACDE BCDE

Pruned
supersets ABCDE
Graph theory
Network Monitoring
Essential for Network Management Tcpdump
Unix-based command-line tool used
Router and Firewall policy to intercept packets
Including filtering to just the
Detecting abnormal/error in networking packets of interest
Reads “live traffic” from interface
Access control specified using -i option …
… or from a previously recorded trace
file specified using -r option
Security Management You create these when capturing
live traffic using -w option
Detecting abnormal traffic Tshark
Tcpdump-like capture program that
Traffic log for future forensic analysis comes w/ Wireshark
Very similar behavior & flags to
tcpdump
Wireshark
GUI for displaying tcpdump/tshark
packet traces

17
Network Layered Structure
What is the Internet?

Application Web, Email, Application

VOIP
Transport TCP, UDP Transport

Network IP Network

Data Link Ethernet, cellular Data Link

Physical
link

18
Capture filters
A capture filter takes the form of a series of primitive expressions connected
by conjunctions (and/or) and optionally preceded by not:
[not] primitive [and|or [not] primitive ...]
Examples:
• tcp port 23 and host 10.0.0.5 (Telnet from source )
• tcp port 23 and not src host 10.0.0.5
Primitives:
• [src|dst] host <host>
• ether [src|dst] host <ehost>
• gateway host <host>
• [src|dst] net <net> [{mask <mask>}|{
<len>}]
• [tcp|udp] [src|dst] port <port>
• less|greater <length>
• ip|ether proto <protocol>
Filtering While viewing

Other features
• Follow stream
• Flow Graph
• Conversations
Preprocessing techniques
Discretization is a process of mapping continuous attributes into nominal attributes. The main objective of the
discretization process is to discover a set of cut points, which divide the range into small number of intervals.
Normalization
•Standardization: Normalize numerical features to have zero mean and unit variance.
•Min-Max Scaling: Scale features to a specific range (e.g., 0 to 1) for consistent comparisons.
Timestamp Alignment:
•Synchronization of Time Stamps: Ensure that time stamps across different devices or sources
are synchronized for accurate temporal analysis.
Flow Aggregation:
•Flow-level Aggregation: Aggregate packet-level data into flows (source-destination pairs) to
simplify analysis and reduce data volume.
Sessionization:
•Session Identification: Group packets into sessions to analyze interactions between hosts over a
defined time window.
Traffic Segmentation:
•Segmentation by Protocol: Analyze different protocols separately to understand their specific
characteristics and behaviors.
Feature selection and feature extraction
• Both reduce dimensionality
• Feature selection performs the removal of features that are irrelevant or
redundant in posterior processes for data representation and classification.
• Feature extraction consists of the projection of the original data set into a
new space where the linear dependence of features (axis or variables of the
new space) is minimized, causing therefore a reduction in the number of
required features.
• Feature selection
• Wrapper, Filters, Hybrids
• Forward-backward elimination, variance based, mutual information, information gain

• Feature extraction
• PCA, ICA, SVD
Eigenvalues & eigenvectors
› Vectors x having same direction as Ax are called
eigenvectors of A (A is an n by n matrix).
› In the equation Ax=x,  is called an eigenvalue of
A.
 2 3   3   12   3
  x    4 x 
 2 1  2  8   2

– Eigenvalues and eigenvectors of covariance matrix.

23
THEOREM: always possible to decompose
matrix A into A = U VT , where
SVD - Basics U, , V: unique (if singular value are unique)
U, V: column orthonormal (ie., columns are
The SVD of a m-by-n matrix A
unit vectors, is given byto
orthogonal theeach
formula :
other)
 UTU = I; VTV = I (I: identity matrix)
 : singular value are positive, and sorted in
decreasing A=UV
order T
Where :
U is a m-by-n matrix of the orthonormal eigenvectors of AAT
VT is the transpose of a n-by-n matrix containing the
orthonormal eigenvectors of ATA
is a n-by-n Diagonal matrix of the singular values which are the
square roots of the eigenvalues of ATA
How to compute SVD (by hand)
•Range of A: Span  of columns of U correspondingT to non-zero  singular values.
A UV T  AAT (UV T )(VT U T ) AA UT U T   ( A)   ( AT A)
•Rank ofT A: Number
T T
of non-zero singular values. 
A V
•Null Space of A:U  SpanA of
T columns
T T of V) corresponding
T T T to zero
T  singular values.
 A  (V U )(U V A A V  V   ( A)   ( AA T
)
•Span: the span of a set of vectors is the set of all possible linear combinations of
those vectors.
Example:  1 1 2  2
   2 2 det(W  I )  0
A  1 1 
T
W  A A   2 2 
 2 2
 0 0

1 1 1 1 1 1  1 2 1 4

u1  Av1  2 1
A  v1  1 , v2    1
1
2
 
2
   2 0 2 0
1
2

 12   12   0
 2 2 0  
 
u1  12  u2 , u3 orthonormal basis for Null(AAT ) AA  2 2 0
T u 2   12  u3  0
 0  0 0 0  0  1
  Range(A) Rank(A)  
Null(A)
 1 1  12 1
0  2 0
 1 1   1
2
   1
2
1
2

   2
1
2
0  0 0  1 1 
 0 0  0 0 1  0 0  2 2 
Fuzzy Set and Fuzzy Cluster
Clustering methods (What we discussed last semster)
Every data object is assigned to exactly one cluster
Some applications may need for fuzzy or soft cluster assignment
Ex. An e-game could belong to both entertainment and software
Methods: fuzzy clusters and probabilistic model-based clusters
Fuzzy cluster: A fuzzy set S: FS : X → [0, 1] (value between 0 and 1)
Fuzzy C-Means (FCM) is a data clustering technique wherein each data point belongs to a
cluster to some degree that is specified by a membership grade
Example: Attack category is defined as a fuzzy mapping

26
Linear Separators
Which of the linear separators is optimal?

Maximum Margin Classification

Linear SVM Mathematically
Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} be separated by a hyperplane
with margin ρ. Then for each training example (xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1
 yi(wTxi + b) ≥ ρ/2

For every support vector xs the above inequality is an equality. After

rescaling w and b by ρ/2 in the equality,
y s (w T we
x s  obtain
b) 1 that distance between
r 
each xs and the hyperplane is w w

2
 2r 
Then the margin can be expressed through (rescaled) w and b as: w
Linear SVMs Mathematically (cont.)
Then we can formulate the quadratic optimization problem:
Find w and b such that
2
 is maximized
w
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

Which can be reformulated as:

Find w and b such that

Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Soft Margin Classification
What if the training set is not linearly separable?
Slack variables ξi can be added to allow misclassification of
difficult or noisy examples, resulting margin called soft.

ξi
ξi
Non-linear SVMs
› Datasets that are linearly separable with some noise work
out great
0 x

› But what are we going to do if the dataset is just too hard?

0 x

› How about… mappingx2data to a higher-dimensional space:

0 x
The “Kernel Trick”
› The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj
› If every datapoint is mapped into high-dimensional space via some transformation Φ:
x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
› A kernel function is a function that is equivalent to an inner product in some feature
space.
› Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
› Thus, a kernel function implicitly maps data to a high-dimensional space (without the
need to compute each φ(x) explicitly).
Examples of Kernel Functions
Linear: K(xi,xj)= xiTxj
Mapping Φ: x → φ(x), where φ(x) is x itself

Polynomial of power p: K(xi,xj)= (1+ xiTxj)p  d  p

 
Mapping Φ: x → φ(x), where φ(x) has dimensions p 

2
xi  x j

2 2
Gaussian (radial-basis function): K(xi,xj) = e

Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional: every point is mapped to a function (a Gaussian);
combination of functions for support vectors is the separator.

Higher-dimensional space still has intrinsic dimensionality d (the mapping is not onto), but linear
separators in it correspond to non-linear separators in original space.
Graph Algorithms- Community
● Community in Network is a structure in where vertices are
densely connected internally

34
Girvan-Newman Algorithm
● Detects communities by progressively removing edges from the
original network
● The algorithm's steps for community detection are summarized
below
1. Calculate betweenness of all existing edges in the network
2. Remove edge(s) with the highest betweenness
3. Calculate betweenness of all edges affected by the removal
4. Repeat steps 2 and 3 until no edges remain

35
Log Data
Log data refers to records of events that occur within a system or
network, such as user logins, file accesses, network connections, system
errors, and administrative actions.

Detecting Security Incidents

Facilitating Incident Response
Supporting Compliance
Enabling Threat Hunting
Improving Security Posture
Challenges in Log Data Analysis:
Despite its benefits, log data analysis poses several challenges, including:
Volume and Variety: Organizations generate vast amounts of log data from diverse sources, making it
challenging to manage, process, and analyze effectively.
Complexity and Noise: Log data often contains a mix of legitimate events, false positives, and irrelevant
information, requiring sophisticated analysis techniques to distinguish between normal and suspicious
activities.
Timeliness and Scalability: Analyzing log data in real-time and at scale requires efficient log collection,
processing, and storage mechanisms to handle large volumes of data and meet the demands of
dynamic computing environments.
Integration and Correlation: Integrating log data from disparate sources and correlating events across
different systems can be complex, requiring standardized log formats, interoperable tools, and robust
correlation techniques.
Privacy and Data Protection: Log data may contain sensitive information, such as user credentials, IP
addresses, and personal identifiable information (PII), raising privacy concerns and compliance
challenges related to data protection laws and regulations.
Types of Logs
Access Logs:
Examples: Web server access logs, database access logs, application access logs.
Use Cases: Access logs are crucial for monitoring user activity, tracking resource
usage, and identifying suspicious access attempts or unauthorized activities.
Event Logs:
Examples: Windows Event Logs, syslog messages, application event logs.
Use Cases: Event logs are essential for troubleshooting system issues, diagnosing errors,
and monitoring the health and performance of IT infrastructure.
Audit Logs:
Examples: Authentication logs, authorization logs, configuration change logs.
Use Cases: Audit logs play a critical role in compliance monitoring, regulatory compliance, and forensic
investigations. They help organizations detect and investigate security incidents, maintain audit trails,
and demonstrate compliance with industry regulations and standards.

Structured vs unstructured
Log Collection and Management
Log collection involves gathering log data from various sources, including
systems, applications, network devices, and security appliances. Several
methods are commonly used for log collection:
•Agents: Software agents installed on individual systems or devices to collect
and forward log data to a centralized log management system.
•Syslog: Syslog is a standard protocol used for sending log messages over a
network. Devices such as routers, switches, firewalls, and servers can generate
syslog messages and transmit them to a syslog server for centralized storage and
analysis.
•APIs (Application Programming Interfaces): Many applications and services
provide APIs for programmatic access to log data. Organizations can use APIs to
extract log data from applications, cloud services, and third-party platforms for
centralized storage and analysis.
ELK (Elasticsearch, Logstash, and
Kibana)
•The ELK Stack, consisting of Elasticsearch, Logstash,
and Kibana, is a powerful combination of open-source
tools used for log management, analytics, and
visualization.
Logstash
Elasticsearch Elements
A node is simply a running instance of Elasticsearch.
• Data Nodes
• Master Nodes
A cluster is a collection of nodes running Elasticsearch.
Index(Noun): "index" refers to a collection of documents that share
similar characteristics.
Index(Verb): "index" refers to the action of adding documents to an
Elasticsearch index.
Each shard is technically a standalone search engine. There are two
types of shards: primary and replica.
In Elasticsearch, data is stored in the form of
documents, which are JSON objects. Each document
represents a single data entity and is stored in an index.
CRUD operations and queries
Create, Read, Update, Delete
{
"match": {
"description": "high-performance smartphone"
}
}

{
"wildcard": {
"name": "smart*"
}
}
Text Analysis/ Text analyzer
Text analysis enables Elasticsearch to perform full-text search,
where the search returns all relevant results rather than just exact
matches.

An analyzer — whether built-in or custom — is just a package which

contains three lower-level building blocks: character
filters, tokenizers, and token filters.
Character filters are used to preprocess the stream of characters
before it is passed to the tokenizer.
A tokenizer receives a stream of characters, breaks it up into
individual tokens (usually individual words), and outputs a stream of tokens.
Token filters accept a stream of tokens from a tokenizer and can modify tokens,
delete tokens or add tokens.
Inverted index
› The inverted index is a data
structure used in
information retrieval
systems, including search
engines like Elasticsearch,
to efficiently store and
retrieve documents based
on their textual content.
› It's called "inverted"
because it reverses the
relationship between terms
and documents compared
to the original documents.
The Growth of a Threat
Malwares Mass email campaign: Love letter, Melissa
The Growth of a Threat Multiple vectors of infection, attacks against AV software,
Combined infection vectors, dangerous payloads: Code Red, Nimda

Email virus + social engineering: Xmas Exec

Large scale pandemics: Morris worm
Infected 10% of the Internet.

Self replicating Sophisticated engineering: Conficker

Program: Creeper Use of Crypto.
Social Networks/cell phone worms.
Stuxnet,…
Types of Malware
Backdoor, Botnet, Downloader, Information-stealing malware, Launcher, Rootkit, Scareware
• Spam-sending malware
• Attacker rents machine to spammers
• Worms or viruses
• Malicious code that can copy itself and infect additional computers

• Mass malware
• Intended to infect as many machines as possible
• Most common type
• Targeted malware
• Tailored to a specific target
• Very difficult to detect, prevent, and remove
• Requires advanced analysis
• Ex: Stuxnet
Viruses
Virus propagates by infecting other programs
Automatically creates copies of itself, but to propagate, a human has to
run an infected program
Self-propagating viruses are often called worms
Encrypted viruses: constant decryptor followed by the encrypted
virus body
Polymorphic viruses: each copy creates a new random encryption
of the same virus body
Decryptor code constant and can be detected
Historical note: “Crypto” virus decrypted its body by brute-force key
search to avoid explicit decryptor code
48
Virus Detection
Simple anti-virus scanners
Look for signatures (fragments of known virus code)
Heuristics for recognizing code associated with viruses
Example: polymorphic viruses often use decryption loops
Integrity checking to detect file modifications
Keep track of file sizes, checksums, keyed HMACs of contents

Generic decryption and emulation

Emulate CPU execution for a few hundred instructions, recognize known
virus body after it has been decrypted
Does not work very well against viruses with mutating bodies and viruses
not located near beginning of infected executable

49
Mutation Techniques
Real Permutating Engine/RPME, ADMutate, etc.
Large arsenal of obfuscation techniques
Instructions reordered, branch conditions reversed, different register
names, different subroutine order
Jumps and NOPs inserted in random places
Garbage opcodes inserted in unreachable code areas
Instruction sequences replaced with other instructions that have the same
effect, but different opcodes
Mutate SUB EAX, EAX into XOR EAX, EAX or
MOV EBP, ESP into PUSH ESP; POP EBP

There is no constant, recognizable virus body

50
Malware Analysis
• Dissecting malware to understand
• How it works
• How to identify it
• How to defeat or eliminate it
• A critical part of incident response
• Exactly what happened
• Ensure you’ve located all infected machines and files
• How to measure and contain the damage
• Find signatures for intrusion detection systems
Static v. Dynamic Analysis
Static Analysis
Examines malware without running it
• Basic static analysis
View malware without looking at instructions
• Advanced static analysis
Reverse-engineering with a disassembler
Complex, requires understanding of assembly code

Antivirus scanning, Hashes, file’s strings, functions, and headers

Dynamic Analysis
Run the malware and monitor its effect
• Basic dynamic analysis
• Easy but requires a safe test environment
• Not effective on all malware
• Advanced Dynamic Analysis
• Run code in a debugger
• Examines internal state of a running malicious executable
PE Header
• Information about the code
• Type of application
• Required library functions
• Space requirements

The PE header lists every library and function that will be loaded.
Their names can reveal what the program does
Normal programs have a lot of DLLs, Malware often has very few
DLLs.
Important PE Sections
.text -- instructions for the CPU to execute
.rdata -- imports & exports
.data – global data
.rsrc – strings, icons, images, menus
• Hardware
• Microcode
• Machine code
• Low-level
languages
• High-level
languages
• Interpreted
languages
Disassembly
• Malware on a disk is in binary form at the machine code level
• Disassembly converts the binary form to assembly language
• IDA Pro is the most popular disassembler
• Graph and Text Mode
• Highlighting, functions and data calls & references, imports,
exports,
Why Perform Dynamic Analysis?
Dynamic analysis is efficient and will show you exactly what the
malware does.
Running malware deliberately, while monitoring the results.
Requires a safe environment
Real Machine vs Virtual Machine
Snapshots
Monitor system state
Boosting
• The term ‘Boosting’ refers to a family of algorithms
which converts weak learner to strong learners.
• To find weak rule, we apply base learning algorithms with a
different distribution. Each time base learning algorithm is
applied, it generates a new weak prediction rule. This is an
iterative process. After many iterations, the boosting algorithm
combines these weak rules into a single strong prediction rule.
Gradient Boosting
• In gradient boosting, it trains many models sequentially. Each new model gradually minimizes the loss
function (y = ax + b + e, ‘e’ needs special attention as it is an error term) of the whole system using Gradient
Descent method.
• The learning procedure consecutively fit new models to provide a more accurate estimate of the response
variable.
• The principle idea behind this algorithm is to construct new base learners which can be maximally correlated
with negative gradient of the loss function, associated with the whole ensemble.
o Y = M(x) + error
o error = G(x) + error2
o error2 = H(x) + error3
o Y = M(x) + G(x) + H(x) + error3
o Y = alpha * M(x) + beta * G(x) + gamma * H(x) + error4

For regression with square loss,

residual  negative gradient
Fit h to residual  fit h to negative gradient
update F based on residual  update F based on negative gradient
What Algorithm Does XGBoost Use?
• The XGBoost library implements the gradient boosting decision tree
algorithm.
In the XGBoost package, at the tth step we are tasked with finding the
tree Ft that will minimize the following objective function:

where L(Ft) is our loss function and (Ft) is our regularization function.

• Regularization: XGBoost has an option to penalize complex models

through both L1 and L2 regularization. Regularization helps in
preventing overfitting
Bagging vs Boosting
Bagging (Bootstrap Aggregating):
•Trains multiple models independently on different
subsets of data (with replacement).
Boosting:
•Trains models sequentially, focusing on examples
that previous models misclassified. Adjusts weights
of misclassified examples.
Basics of Probability
Conditional Probability:
Formula: P(A∣B)=P(A∩B)/ P(B)
Meaning: Probability of event A occurring given that event B
has occurred.
Joint Probability:
Formula: P(A∩B)
Meaning: Probability of both events A and B occurring
simultaneously.
Bayes' Theorem:
Formula: P(A∣B)=P(B∣A)⋅P(A)/ P(B)
0.3 0.7

Markov Models Rain

0.3 0.7
Dry
• Set of states:
{s1 , s2 ,  , s N } Low High
• Process moves from one state to another generating a
0.2 sequence
0.8of
states : si1 , si 2 ,  , sik , 
0.2 0.8
• Markov chain property: probability of each subsequent state depends only
on what was the previous state:
0.6 0.6
0.4 0.4
P ( sik | si1 , si 2 ,  , sik  1 ) P ( sik | sRain
ik  1 ) Dry

• To define Markov model, the following probabilities have to be specified:

transition probabilities and initial probabilities
 i P( si ) aij P ( si | s j )
•matrix of observation probabilities B=(bi (vm )), bi(vm ) = P(vm | si)
Using HMMs :
Evaluation problem. Given the HMMM=(A, B, ) and the observation sequence
O=o o ... o , calculate the probability that model M has generated sequence O .
1 2 K
Forward backward
• Decoding problem. Given the HMM M=(A, B, ) and the observation sequence
O=o o ... o , calculate the most likely sequence of hidden states s that produced this
1 2 K i

observation sequence O. Viterbi

• Learning problem. Given some training observation sequences O=o o ... o and general
1 2 K
structure of HMM (numbers of hidden and visible states), determine HMM parameters
M=(A, B, ) that best fit training data. Baum Welsch

O=o ...o denotes a sequence of observations o {v ,…,v

1 K k 1 M }.
User Behavior Monitoring
User behavior monitoring is a cybersecurity practice focused on observing,
analyzing, and understanding the actions and activities of individuals
within a computer system or network.
Significance in Cyber security
• Early Threat Detection
• Prevention of Insider Threats
• Enhanced Incident Response
• Compliance and Governance
Key Components of User Behavior Monitoring
User Activity Logs:
User Authentication Patterns
Privilege Escalation Monitoring
Access Patterns and Permissions
Endpoint Behavior Analysis
Density-Connectivity
Density-reachable is not symmetric
 not good enough to describe clusters
Density-Connected
A pair of points p and q are density-connected
if they are commonly density-reachable from a
point o.
 Density-connectivity is
symmetric
p q

o
DBSCAN Algorithm
Input: The data set D
Parameter: , MinPts
For each object p in D
if p is a core object and not processed then
C = retrieve all objects density-reachable from p
mark all objects in C as processed
report C as a cluster
else mark p as outlier
end if
End For
Core, Border & Outlier
Outlier ›Given  and MinPts,
categorize the
objects into three
Border exclusive groups.
A point is a core point if it has
Core more than a specified number of
points (MinPts) within Eps These
are points that are at the interior of
a cluster.

 = 1unit, MinPts = 5 A border point has fewer than

MinPts within Eps, but is in the
neighborhood of a core point.

A noise point is any point that

is not a core point nor a border
point.
DBSCAN: The Algorithm
• Arbitrary select a point p
• Retrieve all points density-reachable
from p wrt Eps and MinPts.
• If p is a core point, a cluster is formed.
• If p is a border point, no points are
density-reachable from p and DBSCAN
visits the next point of the database.
• Continue the process until all of the
points have been processed.
Clusters
Neural Networks
Multi-layer networks were designed to overcome the
computational (expressivity) limitation of a single threshold
element.
Output

Linear Threshold Unit Hidden

Input

71
Neural Networks
Neural Networks are functions:
where , or and
Robust approach to approximating real-valued, discrete-valued and vector
valued target functions.
𝑇3

)
) 𝑇5

Trainable Parameters:
, 𝑇4
72
The Backpropagation Algorithm
Create a fully connected three layer network. Initialize weights.
Until all examples produce the correct output within (or other criteria)
For each example in the training set do:
1. Compute the network output for this example
2. Compute the error between the output and target value

3. For each output unit k, compute error term

4. For each hidden unit, compute error term:

5. Update network weights with
End epoch

73
Recurrent Neural Networks (RNNs)
Definition of the RNN:

y1 y2 y3 y4 y5 yt

h1 h2 h3 h4 h5 h

x1 x2 x3 x4 x5 xt

74
Types of RNN
LSTM Network Architecture

Chapter 2
No ratings yet
Chapter 2
40 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Threat Hunting Guide & Tools Overview
No ratings yet
Threat Hunting Guide & Tools Overview
54 pages
Reducing Network Intrusion Detection Using Association Rule and Classification Algorithms
No ratings yet
Reducing Network Intrusion Detection Using Association Rule and Classification Algorithms
5 pages
Big Data Analytics in Incident Response
No ratings yet
Big Data Analytics in Incident Response
46 pages
ITJNS02 Final
No ratings yet
ITJNS02 Final
40 pages
Presented By, M.Tech. 3rd Semester Under Supervision and Guidance of Prof. Dhruba Kr. Bhattacharya, Computer Science & Engineering, Tezpur University
No ratings yet
Presented By, M.Tech. 3rd Semester Under Supervision and Guidance of Prof. Dhruba Kr. Bhattacharya, Computer Science & Engineering, Tezpur University
50 pages
Isaa Project Report 3
No ratings yet
Isaa Project Report 3
25 pages
DETECTION OF ADVANCED PERSISTENT THREATS (APTs) USING HONEYPOT DATA ANALYSIS TO LEVERAGE THREAT INTELLIGENCE.
No ratings yet
DETECTION OF ADVANCED PERSISTENT THREATS (APTs) USING HONEYPOT DATA ANALYSIS TO LEVERAGE THREAT INTELLIGENCE.
59 pages
Network Attack Pattern Discovery
No ratings yet
Network Attack Pattern Discovery
9 pages
A Systematic Review of Predictive AI For Cybersecurity
No ratings yet
A Systematic Review of Predictive AI For Cybersecurity
13 pages
Intruder Detection: Bryan Pearsaul
No ratings yet
Intruder Detection: Bryan Pearsaul
25 pages
Batch-59 - Analysis On Cyber Attacks
No ratings yet
Batch-59 - Analysis On Cyber Attacks
13 pages
Dhamm Nis Project
No ratings yet
Dhamm Nis Project
6 pages
Log Analysis
No ratings yet
Log Analysis
18 pages
Threat Hunting
No ratings yet
Threat Hunting
20 pages
Network Threat Hunting 202409
No ratings yet
Network Threat Hunting 202409
216 pages
CySA+ 003 Chapter 2 12.30.23
No ratings yet
CySA+ 003 Chapter 2 12.30.23
26 pages
Network Threat Characterization in Multiple Intrusion Perspectives Using Data Mining Technique
No ratings yet
Network Threat Characterization in Multiple Intrusion Perspectives Using Data Mining Technique
12 pages
Ehf QB Ut2
No ratings yet
Ehf QB Ut2
16 pages
CHapter 11
No ratings yet
CHapter 11
32 pages
Graph-Based Threat Hunting
No ratings yet
Graph-Based Threat Hunting
16 pages
Network Intrusion Detection System
No ratings yet
Network Intrusion Detection System
44 pages
IBM I2 Enterprise Insight Analysis For Cyber Threat Hunting
No ratings yet
IBM I2 Enterprise Insight Analysis For Cyber Threat Hunting
8 pages
Samuel - Threat Hunting
No ratings yet
Samuel - Threat Hunting
18 pages
Network Forensic Analysis by Correlation of Attacks With Network Attributes
No ratings yet
Network Forensic Analysis by Correlation of Attacks With Network Attributes
5 pages
A Deep Drive On Proactive Threat Hunting
No ratings yet
A Deep Drive On Proactive Threat Hunting
9 pages
Session 10 2012
No ratings yet
Session 10 2012
55 pages
Analysis On Cyber Attacks
No ratings yet
Analysis On Cyber Attacks
23 pages
Intruders
No ratings yet
Intruders
27 pages
Research Paper PDF
No ratings yet
Research Paper PDF
5 pages
Practical Threat Intelligence 1704578501
No ratings yet
Practical Threat Intelligence 1704578501
8 pages
Jonathan Z Bakdash Civ@mail Mil
No ratings yet
Jonathan Z Bakdash Civ@mail Mil
20 pages
Linux Rapid Compromise Assessment 2
No ratings yet
Linux Rapid Compromise Assessment 2
64 pages
CANA 1 Deepa+Tatyasaheb+Mane 11 1539
No ratings yet
CANA 1 Deepa+Tatyasaheb+Mane 11 1539
9 pages
JETIRBS06068
No ratings yet
JETIRBS06068
5 pages
Research 2
No ratings yet
Research 2
12 pages
Threat Hunting Assignment Final
No ratings yet
Threat Hunting Assignment Final
3 pages
Anticipated Network Surveillance - An Extrapolated Study To Predict Cyber-Attacks Using Machine Learning and Data Analytics
No ratings yet
Anticipated Network Surveillance - An Extrapolated Study To Predict Cyber-Attacks Using Machine Learning and Data Analytics
28 pages
Week11 Mod10
No ratings yet
Week11 Mod10
26 pages
2310 0419sd7v1
No ratings yet
2310 0419sd7v1
11 pages
00889216
No ratings yet
00889216
5 pages
Blue Team Fundamentals Module 03
No ratings yet
Blue Team Fundamentals Module 03
46 pages
Patent Application Publication (10) Pub. No.: US 2017/0230391 A1
No ratings yet
Patent Application Publication (10) Pub. No.: US 2017/0230391 A1
16 pages
Network Intrussion Etection System
No ratings yet
Network Intrussion Etection System
31 pages
2023-APTHunter - Detecting Advanced Persistent Threats in Early Stages
No ratings yet
2023-APTHunter - Detecting Advanced Persistent Threats in Early Stages
31 pages
Cybersecurity Threat Hunting Guide
No ratings yet
Cybersecurity Threat Hunting Guide
19 pages
Data Analytics For CyberSecurity
100% (5)
Data Analytics For CyberSecurity
207 pages
Rebellion 2K6: State Level Computer Science Symposium
No ratings yet
Rebellion 2K6: State Level Computer Science Symposium
14 pages
Theaffine ABSTRACT ExplorationofAdvancedPersistent
No ratings yet
Theaffine ABSTRACT ExplorationofAdvancedPersistent
3 pages
Fronesis Digital Forensics-Based Early Detection of Ongoing Cyber-Attacks
No ratings yet
Fronesis Digital Forensics-Based Early Detection of Ongoing Cyber-Attacks
16 pages
KNN 2
No ratings yet
KNN 2
4 pages
Eng-To Find New Evasion Techniques On Network-Rutuja R. Patil
No ratings yet
Eng-To Find New Evasion Techniques On Network-Rutuja R. Patil
12 pages
Intrusion Detection System
No ratings yet
Intrusion Detection System
69 pages
Application of Data Mining For Intrusion Detection: REG - NO:11109636
No ratings yet
Application of Data Mining For Intrusion Detection: REG - NO:11109636
12 pages
Ntrusion D Etection S Ystem - Via F Uzzy Artmap in A Ddition With A Dvance S Emi S Upervised F Eature S Election
No ratings yet
Ntrusion D Etection S Ystem - Via F Uzzy Artmap in A Ddition With A Dvance S Emi S Upervised F Eature S Election
15 pages
Dharani ph3
No ratings yet
Dharani ph3
21 pages
Electronics: Experimental Cyber Attack Detection Framework
No ratings yet
Electronics: Experimental Cyber Attack Detection Framework
30 pages
DM70MANUAL
No ratings yet
DM70MANUAL
5 pages
EASe Therapy for Sensory Disorders
No ratings yet
EASe Therapy for Sensory Disorders
9 pages
Dungeon Robotics Book 12 The Queen Collection Matthew Peed Instant Download
No ratings yet
Dungeon Robotics Book 12 The Queen Collection Matthew Peed Instant Download
87 pages
Glass Production and Processing
No ratings yet
Glass Production and Processing
10 pages
National Strategy JE 2023-2027
No ratings yet
National Strategy JE 2023-2027
124 pages
HP DesignJet T530 Printer Series Datasheet
No ratings yet
HP DesignJet T530 Printer Series Datasheet
2 pages
Halton M.A.R.V.E.L.Brochure BR-002
No ratings yet
Halton M.A.R.V.E.L.Brochure BR-002
16 pages
PA02 Spec MIT Spot & Soda Blasting (IFC)
No ratings yet
PA02 Spec MIT Spot & Soda Blasting (IFC)
7 pages
Types of Texts
100% (1)
Types of Texts
9 pages
Irrigation Engineering CE-431: Engr. Asim Qayyum Butt
No ratings yet
Irrigation Engineering CE-431: Engr. Asim Qayyum Butt
30 pages
Franklin India Multi Asset Fund - One Pager - V11 HDFC Securities 09072025
No ratings yet
Franklin India Multi Asset Fund - One Pager - V11 HDFC Securities 09072025
2 pages
Winter HHW Ix Session 2023-24
No ratings yet
Winter HHW Ix Session 2023-24
12 pages
E-Banking Satisfaction Study
No ratings yet
E-Banking Satisfaction Study
151 pages
1 s2.0 S0141813024078383 Main
No ratings yet
1 s2.0 S0141813024078383 Main
11 pages
Environmental Footprint Methods-KH0624103ENN
No ratings yet
Environmental Footprint Methods-KH0624103ENN
2 pages
BURNDY PENETROX A-A13-E Grounding Products
No ratings yet
BURNDY PENETROX A-A13-E Grounding Products
3 pages
Class Xii Term 1 Mock Test 2021
No ratings yet
Class Xii Term 1 Mock Test 2021
11 pages
Physica E: Low-Dimensional Systems and Nanostructures: Sciencedirect
No ratings yet
Physica E: Low-Dimensional Systems and Nanostructures: Sciencedirect
6 pages
Research Tangina
No ratings yet
Research Tangina
5 pages
J of App Behav Analysis - 2018 - Vessells - Effects of Delay Fading and Signals On Self Control Choices by Children
No ratings yet
J of App Behav Analysis - 2018 - Vessells - Effects of Delay Fading and Signals On Self Control Choices by Children
8 pages
Ergonomics - Posture: Chair
No ratings yet
Ergonomics - Posture: Chair
3 pages
Pharmaceutical Technology
No ratings yet
Pharmaceutical Technology
15 pages
Biotechnology and Gene Tech Answers F215
No ratings yet
Biotechnology and Gene Tech Answers F215
31 pages
Homibhabha Class 9
33% (3)
Homibhabha Class 9
6 pages
Prayer Service - Gandhi Jayanti - International Day of Nonviolence - Peace
No ratings yet
Prayer Service - Gandhi Jayanti - International Day of Nonviolence - Peace
7 pages
Presentation Latex
No ratings yet
Presentation Latex
39 pages
List of Pharmaceuticals in Lahore
78% (18)
List of Pharmaceuticals in Lahore
3 pages
Compliance Measurement - Oecd
No ratings yet
Compliance Measurement - Oecd
23 pages
Evaluation of Sewing Operator
No ratings yet
Evaluation of Sewing Operator
14 pages
Technical Information No. 1: Grey Lamellar Graphite Cast Iron
No ratings yet
Technical Information No. 1: Grey Lamellar Graphite Cast Iron
2 pages

INFO 554x Data Analytics For Security: Spring '24

Uploaded by

INFO 554x Data Analytics For Security: Spring '24

Uploaded by

INFO 554x

Data Analytics for Security

The logical view of the user requesting access to a business application

03/25/2025 ADD A FOOTER 4

IDS Web Server

 Rule Evaluation Metrics

Frequent itemset generation is still computationally expensive

Apriori principle holds due to the following property of the

ABCD ABCE ABDE ACDE BCDE

Application Web, Email, Application

Data Link Ethernet, cellular Data Link

– Eigenvalues and eigenvectors of covariance matrix.

1 1 1 1 1 1  1 2 1 4

Maximum Margin Classification

For every support vector xs the above inequality is an equality. After

Which can be reformulated as:

Find w and b such that

› But what are we going to do if the dataset is just too hard?

› How about… mappingx2data to a higher-dimensional space:

Polynomial of power p: K(xi,xj)= (1+ xiTxj)p  d  p

Detecting Security Incidents

An analyzer — whether built-in or custom — is just a package which

Email virus + social engineering: Xmas Exec

Self replicating Sophisticated engineering: Conficker

Generic decryption and emulation

There is no constant, recognizable virus body

Antivirus scanning, Hashes, file’s strings, functions, and headers

For regression with square loss,

• Regularization: XGBoost has an option to penalize complex models

Markov Models Rain

• To define Markov model, the following probabilities have to be specified:

observation sequence O. Viterbi

O=o ...o denotes a sequence of observations o {v ,…,v

 = 1unit, MinPts = 5 A border point has fewer than

A noise point is any point that

Linear Threshold Unit Hidden

3. For each output unit k, compute error term

4. For each hidden unit, compute error term:

You might also like