0% found this document useful (0 votes)

40 views9 pages

Final Research Paper

The document introduces VulHierGGNN, a hierarchical deep learning framework designed for multi-type software vulnerability classification by integrating BERT for textual descriptions and CodeBERT for source code analysis. This innovative approach enhances classification accuracy and robustness by capturing both semantic and structural information through a two-stage methodology, which includes contrastive pretraining and hierarchical fine-tuning. Empirical results demonstrate significant improvements in performance, achieving high F1-scores and accuracy across various vulnerability categories, thus advancing automated security analysis capabilities.

Uploaded by

harshvardhan15052002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views9 pages

Final Research Paper

Uploaded by

harshvardhan15052002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

VulHierGGNN: A Hierarchical Deep Learning Framework for Multi-Type

Software Vulnerability Classification Using Description and Source Code

Harsh Vardhan (workwithvardhan@gmail.com),Jatin (jatin17092003@gmail.com), Kalash

Kumar (kumar.kalash2022@gmail.com),Kanishk Teotia(kanishkteotia5077@gmail.com),
Birendra Kumar Verma(birendraverma@jssaten.ac.in)
Information technology, JSS Academy of Technical Education, Noida, India

ABSTRACT: Ensuring software security necessitates the precise identification of vulnerabilities

within code. Traditional detection methods often emphasize semantic representations while overlooking
crucial syntactic structures and contextual insights from vulnerability descriptions. This research introduces
VulHierGGNN, an innovative framework that combines Code Property Graphs (CPGs) with a Gated Graph
Neural Network (GGNN) encoder, trained using supervised contrastive learning to effectively capture both
semantic and structural information from source code. In parallel, textual descriptions of vulnerabilities are
encoded using BERT to incorporate contextual understanding. These embeddings are then fused to enhance
classification capabilities. The proposed two-stage approach—initial contrastive pretraining followed by
hierarchical fine-tuning—demonstrates notable improvements in accuracy and robustness when evaluated
on the CVE fixes dataset. Empirical results show that the BERT-based description model achieves an F1-
score and accuracy of 98%, the CodeBERT-based code model attains 91%, and the integrated model
delivers an overall accuracy of 93%, outperforming models that rely solely on semantic contrastive
learning.

Keywords: Software Vulnerability, BERT, CodeBERT, Deep Learning, Multiclass Classification,

Source Code Analysis, Hierarchical Framework

1.INTRODUCTION
In today’s interconnected digital environment, software vulnerabilities pose critical risks to system
integrity, data confidentiality, and overall security. These weaknesses, if left unidentified, can be exploited
by malicious actors, leading to severe consequences including data breaches, financial losses, and system
disruptions. Traditional vulnerability detection tools often rely on static rules or manual analysis, which are
time-consuming and fail to scale with the increasing complexity of modern software.

Recent advancements in deep learning, particularly transformer-based models, have opened new avenues
for understanding both natural language and programming languages. BERT, a pre-trained language model,
has demonstrated significant success in a variety of NLP tasks due to its ability to learn deep contextual
relationships in text. Similarly, CodeBERT extends this paradigm to source code, enabling the extraction
of meaningful representations from code syntax and semantics.

This research introduces VulHierGGNN, a hierarchical framework that combines the power of BERT for
analyzing vulnerability descriptions with CodeBERT for modeling the corresponding source code. The
objective is to enhance the classification of software vulnerabilities across multiple categories, thereby
supporting more intelligent and automated security assessment. By integrating both natural language and
code-based features, this approach aims to achieve high precision and recall in vulnerability classification,
offering an effective step toward robust software security solutions.

2.LITERATURE REVIEW

2.1 Traditional Systems for Vulnerability Detection

Conventional techniques for identifying software vulnerabilities have largely depended on rule-based
static analysis and heuristic methods. These tools inspect source code without executing it, attempting to
detect flaws such as buffer overflows, SQL injections, and memory leaks using predefined patterns. Static
analyzers like Fortify and Flawfinder are commonly used but often generate a high number of false
positives due to their limited contextual understanding. Moreover, they struggle to adapt to evolving
coding styles and emerging vulnerability types. While useful for known vulnerability signatures, these
systems lack the flexibility to generalize beyond their hardcoded rules [3].

2.2 Deep Learning and Modern Transformer-Based Systems

The introduction of deep learning brought a paradigm shift in vulnerability detection by enabling models
to automatically learn features from large datasets. Recurrent neural networks (RNNs) and convolutional
neural networks (CNNs) were initially applied to source code analysis but were limited by their inability
to capture long-range dependencies. The advent of transformer-based models, particularly BERT, marked
a significant breakthrough. Pre-trained on vast text corpora, BERT has shown remarkable performance in
natural language processing tasks, including the analysis of vulnerability descriptions [1].

In the realm of code, CodeBERT extended the transformer architecture to programming languages.
Trained on paired code and natural language, it can model the semantics and syntax of code more
effectively than earlier models, enhancing the accuracy of code classification and retrieval tasks [2].

2.3 Hybrid and Hierarchical Frameworks

Recent studies have explored the benefits of combining natural language understanding with code
semantics for more robust vulnerability classification. Hybrid models aim to utilize both textual
descriptions and code representations to capture the full context of a vulnerability. Hierarchical approaches
build on this concept by organizing the input data into layers, often using one model (e.g., BERT) for text
and another (e.g., CodeBERT) for code, then integrating their outputs to make a unified prediction. Such
frameworks have demonstrated improved precision and recall by leveraging complementary data
modalities [4].

The proposed VulHierGGNN aligns with this direction, introducing a structured method that incorporates
both BERT and CodeBERT to classify software vulnerabilities across multiple categories.

2.4 Advancement over existing work

The landscape of software vulnerability detection has evolved through the adoption of various
machine learning and deep learning approaches. Early systems emphasized code-level analysis,
while modern frameworks incorporate natural language understanding. Despite these advancements, a
gap remains in combining both vulnerability descriptions and source code representations for precise
multi-type classification. The proposed VulHierGGNN addresses this by unifying BERT and CodeBERT
to extract complementary semantic and structural features, resulting in enhanced performance.

Table 1. Existing work on different methods of Vulnerability Classification Technique

Sr. no Paper Author Year Objective Findings Limitation

(s)
BERT: Pre- Learn deep Achieved state-
training of bidirectional of-the-art results Not applicable to
Deep representations in multiple NLP code or
Devlin et 2019
Bidirectional from unlabeled benchmarks. structured
1. al.[1]
Transformers text for NLP software inputs.
for Language tasks.
Understanding.
CodeBERT: A Pre-train a Outperforms Needs task-
Pre-Trained model on code prior models on specific fine-
Model for and natural code-language tuning and lacks
2. Feng et al.[2] 2020
Programming language for tasks like NL- graph-level
and Natural tasks like code code retrieval. structure
Languages search and awareness.
classification.

SySeVR: A
Framework for Extract code Improved binary Ignores textual
Using Deep gadgets and classification vulnerability
3.
Learning to use CNNs to accuracy on descriptions;
Detect Li et al.[4] 2021
detect standard datasets. focused only on
Software vulnerabilities. code.
Vulnerabilities
VulDeePecker: Use BLSTM First to apply DL Lacks support
A Deep networks on on vulnerability for multi-type
Learning- manually detection using classification;
4. Z. Li et al.[5] 2018
Based System extracted code code patterns requires manual
for gadgets. preprocessing.
Vulnerability
Detection
DeepVD: Apply CNNs Achieved Ignores semantic
Deep on tokenized reasonable binary context from
Learning- code using detection vulnerability
5. Alshahwan et 2022
Based Word2Vec accuracy. reports.
al.[6]
Vulnerability embeddings.
Detection with
Word2Vec
DeKeDVer: A Use Text- Achieved up to Uses outdated
Deep RCNN and 91.4% accuracy architectures;
Learning- RGAT on in multi-class lacks
6. Kumar and 2023
Based Multi- vulnerability classification. transformer-
Tripathi[7]
Type Software description based contextual
Vulnerability and code. learning.
Classification
Vul-LMGNN: Use multiple Demonstrated Does not
Multi-view code views improved incorporate
Graph Neural (AST, CFG, detection natural language
7. Xu et al.[8] 2025 descriptions of
Network for DFG) within a performance
Software GNN using multi-view vulnerabilities.
Vulnerability framework. graphs
Detection
8. SCL-CVD Liu et al.[9] 2024 Leverage Applies Captures some
supervised supervised structure, but
contrastive contrastive underutilizes
learning with learning with CPGs
structural GraphCodeBERT
features
10 Devign Zhou et al[10] 2019 Detect Applies GGNNs No pretraining,
vulnerabilities on code graphs lacks multi-
using graph- for vulnerability modal fusion
based learning detection
with GGNN
11. HSVC Zhang et 2022 Use CWE Leverages Does not use a
al.[11] hierarchy for hierarchical two-step (binary
structured structure of then
multi-class CWE-IDs for classification)
vulnerability multi-class approach
classification classification
3.PROPOSED WORK
3.1 System overview and architecture
This research introduces VulHierGGNN, a dual-path architecture for hierarchical vulnerability detection
in source code. The system leverages two independent branches: one processes source code into Code
Property Graphs (CPGs) and generates structural embeddings using a Gated Graph Neural Network
(GGNN), while the other encodes textual vulnerability descriptions using BERT. Unlike previous multi-
modal approaches that merge embeddings at the feature level, our architecture maintains these two
representations separately and instead fuses their individual classification outputs through an additional
decision-level classifier. This final classifier integrates insights from both structural code analysis and
vulnerability semantics to improve prediction reliability. Furthermore, we employ a hierarchical
classification framework, beginning with a binary classifier to identify whether a function is vulnerable,
followed by a CWE-type classifier applied only to positively identified samples. To strengthen the model's
representation learning, we pretrain the GGNN encoder using a supervised contrastive loss, encouraging
the separation of vulnerability classes based on structural patterns. We also implement a robust
preprocessing pipeline that standardizes code identifiers and removes noisy tokens, enhancing feature
consistency. Overall, VulHierGGNN combines decision-level multi-modal fusion with graph-based
learning and hierarchical classification to address challenges such as data imbalance and lack of semantic-
structural alignment in vulnerability detection. The system architecture of VulHierGGNN is shown in Fig
3.1.1

Fig 3.1.1 System Architecture of VulHierGGNN

3.2 Methodology

Our system, VulHierGGNN, is a two-branch pipeline with a fusion module (see Figure 1). The first
branch processes source code into CPGs and uses a GGNN encoder to generate graph-level
embeddings. The second branch encodes vulnerability descriptions using BERT. These embeddings
are fused and fed into a hierarchical classification module consisting of:

• A binary classifier to predict whether a function is vulnerable.

• A CWE classifier to predict the specific CWE type for functions classified as vulnerable
during inference.
3.2.1 Source Code Pathway

Code is parsed into a CPG, combining AST, CFG, and PDG. Each node is embedded using CodeBERT
[2] (768-dimensional). The GGNN encoder processes the CPG:

• GatedGraphConv: Performs message passing over CPG edges for 3 layers.

• Global Mean Pooling: Aggregates node embeddings into a 768-dimensional graph-level

embedding

• MLP: Refines the embedding (768 → 256 → 768).

3.2.2 Vulnerability Description Pathway

Vulnerability descriptions from CWE records are encoded using BERT [15]. The [CLS] token
embedding (768-dimensional) represents the description.

3.3 Data Preprocessing

To prepare source code for CPG generation, we implement a preprocessing pipeline that enhances the
quality of node features:

• Removing comments: Eliminates irrelevant text that does not contribute to code execution or
structure, aligning with standard practices [16].

• Handling non-ASCII characters: Removes or normalizes non-ASCII characters to ensure

compatibility with CodeBERT, reducing noise from Unicode symbols.

• Standardizing variable and function names: Renames user-defined variables and functions to
generic identifiers (e.g., Var1, FUN1). This reduces variability in the input to CodeBERT, ensuring
consistent node features and mitigating issues with out-of-vocabulary tokens. Unlike regex-based
methods, which may struggle with diverse naming conventions [17], our approach leverages
CodeBERT’s semantic understanding for robust feature extraction.

This preprocessing strategy is particularly effective for graph-based models, as it minimizes noise
in node features, allowing the GGNN to focus on structural patterns captured by the CPG.

4. Experiment Results

To assess the effectiveness of the proposed VulHierGGNN framework, we conducted extensive

experiments on the CVE fixes dataset. The evaluation focused on three primary models: the
standalone BERT model for vulnerability descriptions, the CodeBERT model for source code, and
the fused architecture combining both modalities.
The training result of standalone BERT model for vulnerability description, CodeBERT model for
source code are shown in Fig 4.1 and Fig 4.2 respectively.

Fig 4.1 Training result of standalone BERT model for Description

Fig 4.2 Training result for CodeBERT model for Source Code

The curve such as Training vs Validation loss curve for BERT model for description and CodeBERT
for source code is shown in Fig 4.3 and Fig 4.4 respectively.

Fig 4.3 Training vs Validation loss for BERT model for Description

Fig 4.4 Training vs Validation loss for CodeBERT for Source code
The confusion matrix for VulHierGGNN for 30 labels (CWE type) is shown in Fig 4.5

Fig 4.5 Confusion matrix

5. DISCUSSION
The complete implementation of the proposed model, which combines BERT for processing vulnerability
descriptions and CodeBERT for analyzing source code, has demonstrated significant improvements in multi-
type vulnerability classification. By leveraging both natural language understanding and code semantics, the
system effectively captures contextual and structural cues essential for precise classification.

BERT proved highly effective in extracting semantic features from textual vulnerability reports, achieving an
F1-score and accuracy of 98%, indicating that most categories of vulnerabilities are well-represented in their
descriptions. Meanwhile, the integration of CodeBERT added an additional layer of granularity by enabling
the model to understand source-level syntax and logical flow. This dual-representation enhanced the system’s
ability to handle complex patterns where textual cues alone were insufficient.

The hierarchical framework adopted in the model allowed each component—description and code—to
contribute distinct yet complementary insights. This fusion improved the model’s ability to differentiate
between closely related vulnerability types, which is often a limitation in single-modality systems.

The model achieved strong generalization and accuracy across multiple vulnerability categories, offering a
robust framework for automated security analysis in real-world scenarios.
6. CONCLUSION

This study presents a hybrid deep learning framework for multi-type software vulnerability classification,
integrating BERT for textual descriptions and CodeBERT for source code analysis. The results affirm that
combining linguistic and structural features leads to significant improvements in classification accuracy
and robustness. The model effectively captures the semantic depth of vulnerability descriptions while
simultaneously extracting meaningful code patterns, enabling a more comprehensive and precise
identification of diverse vulnerability types.

By bridging the gap between natural language understanding and code semantics, the proposed system
advances the state of automated vulnerability detection. The high performance achieved across evaluation
metrics demonstrates its potential for real-world applications, particularly in environments where manual
vulnerability triage is time-consuming or error-prone.

Our model uniquely leverages both natural language vulnerability descriptions and source code features,
utilizing the powerful capabilities of BERT for textual analysis and CodeBERT integrated with Code
Property Graphs (CPGs) for code structure representation. This multimodal and hierarchical design allows
the model to understand and relate abstract semantic cues from descriptions with the precise syntactic and
structural patterns in source code, ultimately improving the classification performance.
Unlike traditional approaches that treat source code and textual descriptions separately or rely on flat
representations, VulHierGGNN provides a unified and hierarchical architecture that integrates contrastive
learning, graph-based reasoning, and cross- modal feature fusion. By modeling the inherent hierarchical
relationships between CWE (Common Weakness Enumeration) classes and their subtypes, our approach
enables finer-grained classification that better mirrors real-world vulnerability taxonomies.

7. FUTURE SCOPE
While the current model achieves strong performance, there remains scope for further enhancement.
Future research can explore incorporating graph-based code representations such as Abstract Syntax Trees
(ASTs) or Control Flow Graphs (CFGs) alongside CodeBERT, providing deeper structural insights.
Moreover, adapting the framework to support multilingual codebases and evolving vulnerability databases
can make it more versatile.

Another potential direction is the integration of real-time detection in CI/CD pipelines, enabling
proactive security measures during software development. Additionally, attention mechanisms or
explainability modules could be embedded to offer interpretable insights for security analysts.
Expanding this system into a fully deployable tool with visualization and feedback components may also
bridge the gap between research and industry use.

However, several avenues remain for future improvement and exploration.

1.Extension to Zero-Day and Unseen Vulnerabilities

While VulHierGGNN performs well on known vulnerabilities, its application can be extended to
detecting zero-day or unseen vulnerabilities. This would require adapting the model using techniques
such as semi-supervised learning or anomaly detection, enabling it to flag previously unrecorded or
ambiguous threats with minimal labeled data.

2.Real-Time Integration in Development Pipelines

A potential enhancement involves embedding VulHierGGNN into integrated development
environments (IDEs) or DevOps pipelines. This would allow for real-time vulnerability analysis as
developers write code, promoting secure programming practices and immediate threat feedback before
deployment.

3.Expansion to Multi-language and Cross-platform Support

The current implementation primarily focuses on a single programming language dataset. Future
iterations could incorporate multilingual code analysis by training on diverse datasets, supporting
languages like C++, Java, Python, and Go. This would broaden the system’s applicability to varied
software ecosystems.
REFERENCES
[1] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding.

[2] Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., ... & Zhou, M. (2020).
CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

[3] Chess, B., & McGraw, G. (2004). Static analysis for security.

[4] SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities.

[5] VulDeePecker: A Deep Learning-Based System for Vulnerability Detection.

[6] DeepVD: Deep Learning-Based Vulnerability Detection with Word2Vec.

[7] DeKeDVer: A Deep Learning-Based Multi-Type Software Vulnerability Classification.

[8] VulLMGNN: Multi-view Graph Neural Network for Software Vulnerability Detection

[9] Supervised contrastive learning for code vulnerability detection via GraphcodeBERT

[10] Effective Vulnerability Identification by Learning Comprehensive Program Semantics

via Graph Neural Networks

[11] Transformer based hierarchical distillation for software vulnerability classification

Deep Learning Solutions For Source Code Vulnerability Detection
No ratings yet
Deep Learning Solutions For Source Code Vulnerability Detection
12 pages
Deep Learning Solutions For Source Code Vulnerability Detection
No ratings yet
Deep Learning Solutions For Source Code Vulnerability Detection
12 pages
Automated Vulnerability Detection Using Deep Representation Learning
No ratings yet
Automated Vulnerability Detection Using Deep Representation Learning
7 pages
Devign Effective Vulnerability Identification by Learning Comprehensive Program Semantics Via Graph Neural Networks
No ratings yet
Devign Effective Vulnerability Identification by Learning Comprehensive Program Semantics Via Graph Neural Networks
11 pages
Software Vulnerability Detection
No ratings yet
Software Vulnerability Detection
9 pages
E-GVD Efficient Software Vulnerability Detection T-1
No ratings yet
E-GVD Efficient Software Vulnerability Detection T-1
9 pages
Vul-RAG: LLM-Enhanced Vulnerability Detection
No ratings yet
Vul-RAG: LLM-Enhanced Vulnerability Detection
12 pages
Vul-RAG Enhancing LLM-based Vulnerability Detectio
No ratings yet
Vul-RAG Enhancing LLM-based Vulnerability Detectio
12 pages
E Ure: The Int'l Journal of Information Security
No ratings yet
E Ure: The Int'l Journal of Information Security
8 pages
Li 2021
No ratings yet
Li 2021
16 pages
Deep Learning in Software Vulnerability Detection
No ratings yet
Deep Learning in Software Vulnerability Detection
15 pages
Meta-Path Based Attentional Graph Learning Model F
No ratings yet
Meta-Path Based Attentional Graph Learning Model F
13 pages
Vulnerability Detection in Popular Programming Languages With Language Models
No ratings yet
Vulnerability Detection in Popular Programming Languages With Language Models
21 pages
Buffer Overflow
No ratings yet
Buffer Overflow
12 pages
Advancing Vulnerability Classification With BERT: A Multi-Objective Learning Model
No ratings yet
Advancing Vulnerability Classification With BERT: A Multi-Objective Learning Model
9 pages
LineVul A Transformer-Based Line-Level Vulnerability Prediction
No ratings yet
LineVul A Transformer-Based Line-Level Vulnerability Prediction
13 pages
Security Vulnerability Detection Using Deep Learning Natural Language Processing
No ratings yet
Security Vulnerability Detection Using Deep Learning Natural Language Processing
6 pages
Deep Learning for Code Vulnerability Detection
No ratings yet
Deep Learning for Code Vulnerability Detection
7 pages
Your Instructions Are Not Always Helpfu
No ratings yet
Your Instructions Are Not Always Helpfu
10 pages
Func Vul
No ratings yet
Func Vul
21 pages
Wang 等 - 2023 - DeepVulSeeker a Novel Vulnerability Identification Framework via Code Graph Structure and Pre-train
No ratings yet
Wang 等 - 2023 - DeepVulSeeker a Novel Vulnerability Identification Framework via Code Graph Structure and Pre-train
12 pages
QNLP
No ratings yet
QNLP
20 pages
Wang 等 - Suitable is the Best Task-Oriented Knowledge Fusion in Vulnerability Detection
No ratings yet
Wang 等 - Suitable is the Best Task-Oriented Knowledge Fusion in Vulnerability Detection
25 pages
Soft Vulns Survey
No ratings yet
Soft Vulns Survey
35 pages
Empirical Validation of Automated Vulnerability Curation and Characterization
No ratings yet
Empirical Validation of Automated Vulnerability Curation and Characterization
20 pages
Diverse Vu L
No ratings yet
Diverse Vu L
15 pages
QLPro
No ratings yet
QLPro
6 pages
Pattern Based Vulnerability Discovery
No ratings yet
Pattern Based Vulnerability Discovery
151 pages
Usenixsecurity23 Mirsky
No ratings yet
Usenixsecurity23 Mirsky
19 pages
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
No ratings yet
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
18 pages
Context-Aware Vulnerability Classification
No ratings yet
Context-Aware Vulnerability Classification
16 pages
Project Draft 1.2
No ratings yet
Project Draft 1.2
11 pages
LLMXCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models
No ratings yet
LLMXCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models
19 pages
Software Vulnerability Prediction Using Text Analysis Techniques
No ratings yet
Software Vulnerability Prediction Using Text Analysis Techniques
3 pages
Mishra Thesis AI Augmented Vulnerability
No ratings yet
Mishra Thesis AI Augmented Vulnerability
96 pages
Haseeb Tahir Report
No ratings yet
Haseeb Tahir Report
40 pages
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
No ratings yet
Usage of Machine Learning in Software Testing: Sumit Mahapatra and Subhankar Mishra
16 pages
A Deep Learning Based Static Taint Analysis Approach
No ratings yet
A Deep Learning Based Static Taint Analysis Approach
40 pages
PHAM Iastate 0097M 11039
No ratings yet
PHAM Iastate 0097M 11039
58 pages
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
No ratings yet
The Rise of Software Vulnerability Taxonomy of Sof - 2021 - Journal of Network
24 pages
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
No ratings yet
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
18 pages
1 s2.0 S0164121222000437 Main2
No ratings yet
1 s2.0 S0164121222000437 Main2
3 pages
1 s2.0 S0164121222000437 Main
No ratings yet
1 s2.0 S0164121222000437 Main
4 pages
1 s2.0 S0164121222000437 Main3
No ratings yet
1 s2.0 S0164121222000437 Main3
2 pages
Dlap
No ratings yet
Dlap
15 pages
Jimenez VPMLinuxKernel
No ratings yet
Jimenez VPMLinuxKernel
10 pages
Data-Driven Software Vulnerability
No ratings yet
Data-Driven Software Vulnerability
176 pages
Machine Learning For Source Code Vulnerability Detection: What Works and What Isn't There Yet
No ratings yet
Machine Learning For Source Code Vulnerability Detection: What Works and What Isn't There Yet
17 pages
AIBugHunter: C/C++ Vulnerability Tool
No ratings yet
AIBugHunter: C/C++ Vulnerability Tool
34 pages
A Buffer Overflow Prediction Approach Based On Sof
No ratings yet
A Buffer Overflow Prediction Approach Based On Sof
13 pages
2024 SCL CVD Supervised Contrastive Learning For Code Vulnerability
No ratings yet
2024 SCL CVD Supervised Contrastive Learning For Code Vulnerability
19 pages
Auto-Detection of Programming Code Vulnerabilities With Natural L
No ratings yet
Auto-Detection of Programming Code Vulnerabilities With Natural L
37 pages
1 s2.0 S0167404822004096 Main
No ratings yet
1 s2.0 S0167404822004096 Main
11 pages
ndss2018 03A-2 Li Paper
No ratings yet
ndss2018 03A-2 Li Paper
15 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
61 pages
2024ist - A Vulnerability Detection Framework by Focusing On Critical Execution Paths
No ratings yet
2024ist - A Vulnerability Detection Framework by Focusing On Critical Execution Paths
16 pages
ISSTA 2023 Study
No ratings yet
ISSTA 2023 Study
13 pages
Securefalcon: Are We There Yet in Automated Software Vulnerability Detection With LLMS?
No ratings yet
Securefalcon: Are We There Yet in Automated Software Vulnerability Detection With LLMS?
18 pages
Inside The Chinese Mind
60% (5)
Inside The Chinese Mind
206 pages
Global Toolkit On AI and The Rule of Law For The Judiciary
No ratings yet
Global Toolkit On AI and The Rule of Law For The Judiciary
206 pages
2nd Quarter English MPS
No ratings yet
2nd Quarter English MPS
2 pages
Difference Between Questionnaire and In... TH Comparison Chart) - Key Differences
No ratings yet
Difference Between Questionnaire and In... TH Comparison Chart) - Key Differences
10 pages
Theoretical Models of Reading: Singer
No ratings yet
Theoretical Models of Reading: Singer
23 pages
Soal Ulangan Bahasa Inggris Simple Present Tense
No ratings yet
Soal Ulangan Bahasa Inggris Simple Present Tense
2 pages
Essential Vocabulary Guide
No ratings yet
Essential Vocabulary Guide
45 pages
Literacy Development in Course Books For Teaching
No ratings yet
Literacy Development in Course Books For Teaching
15 pages
JEH01 01 Rms 20190822
No ratings yet
JEH01 01 Rms 20190822
20 pages
Understanding the Triune Brain
No ratings yet
Understanding the Triune Brain
4 pages
Intro - Philosophy by Claus Emmeche
No ratings yet
Intro - Philosophy by Claus Emmeche
35 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
2 pages
Effective Communication Strategies
No ratings yet
Effective Communication Strategies
69 pages
My Portfolio III
100% (1)
My Portfolio III
19 pages
Classification of Desserts
No ratings yet
Classification of Desserts
3 pages
Paul Horwich's Essays on Truth
100% (1)
Paul Horwich's Essays on Truth
2 pages
A Sample PYP Planner Rubric: 4 3 2 1 Central Idea and Lines of Inquiry
No ratings yet
A Sample PYP Planner Rubric: 4 3 2 1 Central Idea and Lines of Inquiry
4 pages
Intentional Thinking - 1.1 PDF
No ratings yet
Intentional Thinking - 1.1 PDF
3 pages
HTIC 2023 Abstract Miroslaw Czak Dynamic and Static Posture Infl Music Entrain
No ratings yet
HTIC 2023 Abstract Miroslaw Czak Dynamic and Static Posture Infl Music Entrain
1 page
PEPP
No ratings yet
PEPP
2 pages
Brainstorming Techniques Guide
No ratings yet
Brainstorming Techniques Guide
13 pages
Grade 1 To 12 Daily Lesson LOG Comprehensive High School Context (First Semester)
No ratings yet
Grade 1 To 12 Daily Lesson LOG Comprehensive High School Context (First Semester)
3 pages
Heron's Formula
No ratings yet
Heron's Formula
2 pages
Argumentative Paper Format
100% (1)
Argumentative Paper Format
3 pages
Family Fitness Challenge Guide
No ratings yet
Family Fitness Challenge Guide
2 pages
Personal Development: Quarter 1 - Module 5: Developmental Tasks and Challenges of Adolescence
No ratings yet
Personal Development: Quarter 1 - Module 5: Developmental Tasks and Challenges of Adolescence
16 pages
GT Learning Journal 2 Think Ahead Interactive ENG UK
No ratings yet
GT Learning Journal 2 Think Ahead Interactive ENG UK
33 pages
Socio Lect
No ratings yet
Socio Lect
11 pages
Fractions Program wk1 2
No ratings yet
Fractions Program wk1 2
6 pages

Final Research Paper

Uploaded by

Final Research Paper

Uploaded by

VulHierGGNN: A Hierarchical Deep Learning Framework for Multi-Type

Software Vulnerability Classification Using Description and Source Code

Harsh Vardhan (workwithvardhan@gmail.com),Jatin (jatin17092003@gmail.com), Kalash

ABSTRACT: Ensuring software security necessitates the precise identification of vulnerabilities

Keywords: Software Vulnerability, BERT, CodeBERT, Deep Learning, Multiclass Classification,

2.1 Traditional Systems for Vulnerability Detection

2.2 Deep Learning and Modern Transformer-Based Systems

2.3 Hybrid and Hierarchical Frameworks

2.4 Advancement over existing work

Table 1. Existing work on different methods of Vulnerability Classification Technique

Sr. no Paper Author Year Objective Findings Limitation

Fig 3.1.1 System Architecture of VulHierGGNN

• A binary classifier to predict whether a function is vulnerable.

• GatedGraphConv: Performs message passing over CPG edges for 3 layers.

• Global Mean Pooling: Aggregates node embeddings into a 768-dimensional graph-level

• MLP: Refines the embedding (768 → 256 → 768).

3.2.2 Vulnerability Description Pathway

3.3 Data Preprocessing

• Handling non-ASCII characters: Removes or normalizes non-ASCII characters to ensure

To assess the effectiveness of the proposed VulHierGGNN framework, we conducted extensive

Fig 4.1 Training result of standalone BERT model for Description

Fig 4.5 Confusion matrix

However, several avenues remain for future improvement and exploration.

1.Extension to Zero-Day and Unseen Vulnerabilities

2.Real-Time Integration in Development Pipelines

3.Expansion to Multi-language and Cross-platform Support

[5] VulDeePecker: A Deep Learning-Based System for Vulnerability Detection.

[6] DeepVD: Deep Learning-Based Vulnerability Detection with Word2Vec.

[7] DeKeDVer: A Deep Learning-Based Multi-Type Software Vulnerability Classification.

[10] Effective Vulnerability Identification by Learning Comprehensive Program Semantics

[11] Transformer based hierarchical distillation for software vulnerability classification

You might also like