0% found this document useful (0 votes)

24 views10 pages

Together, We Can Make Open-Source Software Safer, One Line of Code at A Time.

Uploaded by

ebijosephofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views10 pages

Together, We Can Make Open-Source Software Safer, One Line of Code at A Time.

Uploaded by

ebijosephofficial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

PS-1: Development of a software tool using LLMs or similar AI based

techniques for Detection of Vulnerabilities (Malicious Code) in Source Code of

Software (especially Open-Source software) and suggest Mitigation Measures
I. Short Summary

1. In the fast-evolving world of software development, one thing remains constant,

the need for security. As the use of open-source software continues to soar,
vulnerabilities lurking in the codebase pose significant risks, that could be
exploited by malicious actors.

2. Open-source software, while offering tremendous innovation, is also a double-

edged sword. Developers worldwide collaborate to build these projects, but
they don't always have the resources to identify and fix security flaws quickly.
And as more organizations rely on these systems, the threat becomes even
more critical.

3. This is where the challenge lies and we need a faster, more efficient way to
detect these vulnerabilities as the traditional methods of vulnerability detection
rely heavily on manual reviews and outdated tools.

4. Through the AI Grand Challenge, we are looking forward to getting a cutting-

edge software tool developed that harnesses the power of Large Language
Models (LLMs) and similar AI techniques to detect vulnerabilities in open-
source software and also suggest mitigation measures to address these risks.

5. With this tool, we are hoping to empower developers to create secure, reliable,
and resilient open-source software at scale.

6. The AI Grand Challenge is just the beginning. Together, we can create the next
generation of security tools for open-source software. Join us in this mission,
and let’s build safer, more secure open-source software for everyone.

"Together, we can make open-source software safer, one line of code at a time."

1
II. Detailed Description

1. The development of a large language model (LLM) or similar AI techniques-

based tool for detection of vulnerabilities (Malicious Code) in source code of
software (especially open-source software) and suggest mitigation measures,
could provide a range of capabilities that assist developers in identifying,
mitigating, and preventing security vulnerabilities and malicious behaviour.
Here’s a breakdown of what such a product would be able to do:

2. Malicious Code Detection

a. Identify Malicious Patterns: The tool should be able to analyse source

code and identify patterns indicative of malicious behaviour. This could include
detecting backdoors, trojans, spyware, or any type of code that could be used
for unauthorized access or control over the system.

b. Suspicious Code Snippets: It should flag suspicious or non-idiomatic

code that might be indicative of attempts to conceal malicious actions, such as
encoded payloads, obfuscated code, or unusual use of libraries.

c. Code Behaviour Analysis: Instead of just static pattern recognition, the

tool should preferably employ dynamic analysis to evaluate how certain parts
of the code behave when executed, identifying whether they perform any
suspicious actions like network communication, privilege escalation, or data
exfiltration.

3. Vulnerability Detection

a. Common Vulnerabilities and Exposures (CVEs): The tool should be

able to detect and list down the known CVEs (such as buffer overflows, SQL
injections, cross-site scripting (XSS), etc.) by scanning the code for patterns
that match or resemble vulnerable code structures.

b. Unknown / Zero-Day Vulnerabilities: The tool should also attempt to

detect and list potential unknown/ zero-day vulnerabilities that are not yet
publicly known but may be exposed in certain code patterns or
misconfigurations.

c. Dependency Vulnerabilities: The tool should have the functionality to

check the open-source dependencies in the code for known vulnerabilities,
helping to secure the entire software ecosystem, not just the user’s custom
code.

2
d. Severity Ranking: The tool should allow the user to filter and prioritize
vulnerabilities based on severity, so they can focus on the most critical issues
first.

e. Risk Assessment: Based on the detected issues, the tool should be

able to provide a risk assessment that helps prioritize which vulnerabilities
should be addressed first, taking into account factors like exploitability and
potential impact.
f.
g.

4. Code Quality and Best Practices Enforcement

a. Code Review and Suggest Improvements: In addition to detecting

malicious code and vulnerabilities, the tool should enforce best practices by
suggesting code quality improvements and safer alternatives, such as avoiding
deprecated functions or ensuring proper validation of inputs.

b. Automated Code Audits: The tool should help in automating code

audits, providing developers with reports on potential security flaws and
providing suggestions on how to mitigate them.

c. Compliance Assistance: By further fine tuning, the tool should be able

to assist organizations in ensuring compliance with relevant regulations and
standards (e.g., OWASP, PCI DSS) by flagging violations and adhering to the
standard best practices.

5. Mitigation Measures and Recommendations

a. Automated Patches and Fixes: Upon identifying vulnerabilities or

malicious code, the tool should suggest or even attempt to automate fixes
based on established mitigation patterns. For example, replacing unsafe
functions with more secure alternatives or patching known vulnerabilities.

b. Code Hardening: It should be able suggest security hardening

techniques (e.g., input validation, encryption, least privilege principle) that
would mitigate the risk of exploiting vulnerabilities.

c. Customizable Security Rules: Developers could customize the tool to

enforce specific security policies or coding standards relevant to their project or
organization.

3
6. Reporting and Documentation

a. Detailed Security Reports: The tool should be able to generate detailed

reports that explain the vulnerabilities or malicious code identified, their
severity, and how to fix them. This would be helpful both for developers as well
as for security auditors.

7. User-friendly Interface

a. Real-time Analysis and Feedback: The tool should be able to integrate

with popular IDEs (e.g., Visual Studio Code, IntelliJ IDEA) to provide
developers with real-time feedback as they write or review code, if required.

b. Collaboration Tools: The tool should allow user teams to collaborate

on security issues, leaving comments on detected vulnerabilities and tracking
their resolution progress when used in collaborative testing mode, if required.

8. Adaptive Learning and Customisation

a. Learning from False Positives/Negatives: Over time, the tool should

learn from its false positives and false negatives, becoming more accurate as
it receives feedback and adapts to specific projects, environments, and
development patterns.

b. Customisable Sensitivity: It should allow users to adjust the sensitivity

of vulnerability detection based on project needs—for example, tightening
detection in high-risk applications (e.g., banking software) while relaxing rules.

10. Ultimately, the developed tool would combine the power of LLMs or similar AI
based techniques for natural language processing and machine learning with
traditional vulnerability detection and mitigation techniques, making it a highly
valuable tool for open-source software development teams.

4
III. Evaluation Parameters and Criteria:

Ser Evaluation Parameters Remarks Weightage

No. (%)

STAGE – I[Shortlisting 15-20 from all entries]

1 Languages Supported How many out of Java, 30
(for standalone software, web Python, C/C++/C#, PHP.
applications)*

2 No. of Vulnerabilities detected Like OWASP Top 10, 40

by the tool and whether it is CWE-Top 25, memory
able to map the vulnerabilities safety, injection,
with their CVEs, CWEs (if misconfiguration, etc.
available).

3 Detection Accuracy This will be evaluated 30

based on the F1 score.

STAGE – I[ Physical Evaluation of Shortlisted Participants to

Select Top 6 at the end of Stage 1]
20
Languages Supported How many out of Java,
(for standalone software, web Python, C/C++/C#, PHP.
1 applications)*

No. of Vulnerabilities detected Like OWASP Top 10, 30

by the tool and whether it is CWE-Top 25, memory
2 able to map the vulnerabilities safety, injection,
with their CVEs, CWEs (if misconfiguration, etc.
available).

5
3 Detection Accuracy This will be evaluated 20
based on the F1 score.

4 Approach Start-up need to present 30

Solution based on
(a) Methodologies used
(b) Architecture
(c) Scalability
(d) Resources Utilisation

STAGE – II

Ser Evaluation Parameters Remarks Weightage

No. (%)

1. Languages Supported (for How many out of Ruby, 20

mobile applications, standalone Rust, Kotlin, Swift, HTML,
software, web applications)* Javascript, Go (Golang)
and the languages
mentioned in Stage I
covered.

2.  No. of Vulnerabilities Like OWASP Top 10, 50

detected by the tool and CWE-Top 25, memory
whether it is able to map the safety, injection,
vulnerabilities with their CVEs, misconfiguration, etc.
CWEs (if available). Yes or No. If yes, whether
they can be implemented
 Mitigation measures without hampering the
suggested functionality and security
of the software itself?

Whether the tool is able to

 Granularity of Detection locate and mark the exact
code segment, line or not?

6
3. Performance of the tool based This would be measured in 10
on the Processing Time for terms of either how much
scanning and analyzing the average processing time
software code per line of code or per KB/
MB of code.

4. Explainability of Decisions taken Can the model explain why 10

by the tool (Proof to verify the a piece of code is
output) vulnerable or how the fix
helps?

Whether the tool supports

or provides security
annotations or traceability
to CVE references

5. Approach Start-up need to present 10

Solution based on
(a) Methodologies used
(b) Architecture
(c) Scalability
(d) Resources Utilisation

STAGE - III
Ser Evaluation Parameters Remarks Weightage
No. (%)

1. No of vulnerabilities detected Three categories of 40

applications - Standalone,
Mobile and Web
applications

2. Languages supported (for Whether the solution 25

standalone software, web supports all languages
applications)* mentioned in Stage I and II

7
3. Mitigation Measures Suggested Yes or No. If yes, whether 20
and functionality of automated they can be implemented
code correction. without hampering the
functionality and security
of the software itself?

4. Scalability of the tool Whether the tool is 15

scalable in terms of
deployment across
enterprises or used for
bigger open-source
software or it is just a
prototype and cannot be
scaled up?

What is the quality of the

Documentation of the tool tool documentation in
terms of its user manual
etc.

Whether the tool has a

user-friendly UI so that the
Usability of the tool learning curve to use the
tool is minimal and it can
be easily installed and run
to achieve the end
objective?

*The tool should meet minimum threshold set for the particular language.
Submission Format

1. The participants are required to submit the findings of first stage in excel
sheet in following format :-

Ser Name of Language Vulnerability CVE File Line Detection

Application Found Name of Accuracy
Tested Code

2. The file name should be GC_PS_01_Startup_name.

8
IV. Indicative Datasets for Training and Testing and Evaluation
1. It may be noted that since this problem statement focusses on the detection of
vulnerabilities (malicious code) in source code of software (especially open
source), the participating teams are free to choose the datasets available in
open domain for training their respective models for improving the functionality
of the tool.

2. It is pertinent to mention here that three types (Standalone software, web

application and mobile applications) of software codes are envisaged to be
tested using the tool and the performance of the tool would be evaluated based
on the output generated by the tool and fulfilment of evaluation parameters
stage wise.

3. The common programming languages on which the performance of the tool

would be evaluated are mentioned in the evaluation parameters matrix. The
categories of applications have also been listed viz. Mobile Applications,
Standalone Software Applications and Web based Applications.

4. For the ease of participating startups to train their models and test the
performance, some illustrative datasets available in the open domain for all
categories of applications are mentioned below:

Ser Dataset Name Description/ Link/ Remarks

No.
Software Assurance Real and synthetic vulnerable code samples
Reference Dataset
(SARD) https://samate.nist.gov/SARD/

Devign GitHub-based dataset with vulnerable/non-

vulnerable labels

https://github.com/epicosy/devign

CodeXGLUE (Defect) Dataset for defect prediction and repair

https://github.com/microsoft/CodeXGLUE
Multi-language dataset https://zenodo.org/records/13870382
(Oct 2024): supports C,
C++, Java, JS, Go,
PHP, Ruby, Python with
CWE/CVE labels and
patches (Zenodo)
MegaVul (C/C++) Vulnerabilities from repositories, CVE-linked,
JSON format, ideal for detection and severity
tasks

https://github.com/Icyrockton/MegaVul

9
DiverseVul Vulnerable functions across CWE types plus
nonvulnerable examples, excellent for fine-
grained CWE classification

https://github.com/wagner-group/diversevul
GITHUB Vulnerability https://github.com/CAE-Vuldataset/CAE-
Dataset Open Source Vuldataset

GitHub – Vulnerability https://github.com/ppakshad/VulnerabilityDatas

Dataset: A dataset for et
vulnerability detection
and program analysis
Real CVE Patches From GitHub projects or National Vulnerability
Database (NVD)

5. For evaluation of the tools submitted by the startups, datasets would be

selected either from the above collections or similar collections of open-source
software source codes. The evaluation across the three stages would be limited
to the parameters and languages mentioned in the evaluation parameters and
criteria.

6. For shortlisting, the startups would be given datasets 4 days prior (on, 28 Oct
2025 at 10:00am) to the Stage I submission deadline (i.e., 31 Oct 2025,
Midnight) and the startups are required to submit their results of their respective
tools based on these datasets. A private leaderboard will be made and at most,
top 15-20 startups would be selected for final evaluation of Stage I. The
shortlisted participants will be published along with the cutoff score as per the
evaluation criteria. Participants individual scores will be shared over the email.
The number may vary based on the overall performance at the discretion of the
Jury for this Problem Statement.

7. The shortlisted startups would be called for demonstration of the practical

capabilities of their tool either in person or through VC and the performance of
the respective tools would be evaluated using the ‘Holdout’ datasets for this
purpose to select the final 6 winners (Max) of Stage I. The Holdout datasets
would be released at the end of Stage I just before this evaluation.

8. The demonstration and evaluation of stage II and stage III would be in physical
mode only in accordance with the Evaluation Matrix of stage II and III. The test
datasets for these would be released during evaluation.

Note - The startups using these datasets are required to adhere to the terms and
conditions of usage of these datasets as mentioned on their websites.

Creating Tools For Ethical Hacking
No ratings yet
Creating Tools For Ethical Hacking
8 pages
Project Draft 1.2
No ratings yet
Project Draft 1.2
11 pages
Port Scanner
No ratings yet
Port Scanner
6 pages
Vulnerability Scanner For Os
No ratings yet
Vulnerability Scanner For Os
6 pages
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
No ratings yet
C L L M F F V S ?: AN Arge Anguage Odels Ind and IX Ulnerable Oftware
18 pages
Information Security
No ratings yet
Information Security
9 pages
AI False Positive Filtering Final
No ratings yet
AI False Positive Filtering Final
4 pages
Vulnerability Scanner
No ratings yet
Vulnerability Scanner
5 pages
CodeSense 1
No ratings yet
CodeSense 1
8 pages
Standardizing Source Code Security Audits
No ratings yet
Standardizing Source Code Security Audits
16 pages
Catching Bugs Before The Hack How AI Can Predict Security Flaws in Code
No ratings yet
Catching Bugs Before The Hack How AI Can Predict Security Flaws in Code
3 pages
71-78-79 Synopsis
No ratings yet
71-78-79 Synopsis
8 pages
Kantek DP
No ratings yet
Kantek DP
100 pages
Haseeb Tahir Report
No ratings yet
Haseeb Tahir Report
40 pages
SL No. Tool Name Comments: Axivion Bauhaus Suite
No ratings yet
SL No. Tool Name Comments: Axivion Bauhaus Suite
12 pages
Static Code Review
No ratings yet
Static Code Review
9 pages
Static Application Security Testing (SAST) : Code Quality Vs Code Security & A Brief Discussion of Three SAST Industry Leading Tools
83% (6)
Static Application Security Testing (SAST) : Code Quality Vs Code Security & A Brief Discussion of Three SAST Industry Leading Tools
6 pages
Tasking Instruction Document
No ratings yet
Tasking Instruction Document
5 pages
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
No ratings yet
Understanding The Effectiveness of Large Language Models in Detecting Security Vulnerabilities
18 pages
SAST Code Verification
No ratings yet
SAST Code Verification
13 pages
Devops
No ratings yet
Devops
16 pages
Cheatsheet SAST
No ratings yet
Cheatsheet SAST
2 pages
API Security Solution Overview
No ratings yet
API Security Solution Overview
31 pages
Sensors 23 07978 v2
No ratings yet
Sensors 23 07978 v2
33 pages
Buffer Overflow
No ratings yet
Buffer Overflow
12 pages
Software Vulnerability Prediction Using Text Analysis Techniques
No ratings yet
Software Vulnerability Prediction Using Text Analysis Techniques
3 pages
LLM Code Reviews
No ratings yet
LLM Code Reviews
25 pages
Hallucination in AI-Powered Coding Tools - A Quanti
No ratings yet
Hallucination in AI-Powered Coding Tools - A Quanti
15 pages
SSE Co-3
No ratings yet
SSE Co-3
74 pages
Static Code Analysis Guide
No ratings yet
Static Code Analysis Guide
10 pages
Vulnerability Assessment Sro en 240617190214 67a6a703
No ratings yet
Vulnerability Assessment Sro en 240617190214 67a6a703
3 pages
DT002G Final Report Group 3
No ratings yet
DT002G Final Report Group 3
25 pages
Secure Software Design - Lecture 11
No ratings yet
Secure Software Design - Lecture 11
42 pages
Purple Llama CYBERSECEVAL
No ratings yet
Purple Llama CYBERSECEVAL
13 pages
Soft Vulns Survey
No ratings yet
Soft Vulns Survey
35 pages
How Secure Is AI-generated Code: A Large-Scale Comparison of Large Language Models
No ratings yet
How Secure Is AI-generated Code: A Large-Scale Comparison of Large Language Models
47 pages
Castle
No ratings yet
Castle
18 pages
LineVul A Transformer-Based Line-Level Vulnerability Prediction
No ratings yet
LineVul A Transformer-Based Line-Level Vulnerability Prediction
13 pages
National Hackathon
No ratings yet
National Hackathon
14 pages
Secure Programming With Static Analysis
No ratings yet
Secure Programming With Static Analysis
56 pages
Sans Fixing What You Broke Owen Slubowski
No ratings yet
Sans Fixing What You Broke Owen Slubowski
25 pages
Module-2 SVT CSD4001
No ratings yet
Module-2 SVT CSD4001
23 pages
Py Driller
No ratings yet
Py Driller
27 pages
Diffing Tool
No ratings yet
Diffing Tool
4 pages
Major Final Report
No ratings yet
Major Final Report
15 pages
Cybersecurity Intelligence Project Description Report
No ratings yet
Cybersecurity Intelligence Project Description Report
14 pages
Blackhat OpenEval Instructions
No ratings yet
Blackhat OpenEval Instructions
5 pages
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
No ratings yet
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
5 pages
Injection Tsbb-Esec-Fse2017-Demo
No ratings yet
Injection Tsbb-Esec-Fse2017-Demo
5 pages
Cybersecurity Software Guide
No ratings yet
Cybersecurity Software Guide
12 pages
AIBugHunter: C/C++ Vulnerability Tool
No ratings yet
AIBugHunter: C/C++ Vulnerability Tool
34 pages
Day 1-Vulnerability Scanner
No ratings yet
Day 1-Vulnerability Scanner
5 pages
Align
No ratings yet
Align
5 pages
Streamlining Security Vulnerability Triage With Large Language Models
No ratings yet
Streamlining Security Vulnerability Triage With Large Language Models
16 pages
Module 2.3
No ratings yet
Module 2.3
33 pages
مشروع تخرج١١١
No ratings yet
مشروع تخرج١١١
2 pages
Towards The Application of Recommender Systems To
No ratings yet
Towards The Application of Recommender Systems To
25 pages
Scratch Block Categories
No ratings yet
Scratch Block Categories
5 pages
Klein DevOps
No ratings yet
Klein DevOps
49 pages
Week3 CM MDL CP1212
No ratings yet
Week3 CM MDL CP1212
26 pages
Intro to Programming & Networks
No ratings yet
Intro to Programming & Networks
9 pages
Boulanger Lazzarini AudioProgrammingBook PDF
67% (3)
Boulanger Lazzarini AudioProgrammingBook PDF
916 pages
SDLC User Guide PDF
No ratings yet
SDLC User Guide PDF
32 pages
Microsoft Visual C++ 6.0: By: Shahed Shahir Email: Shahir@uwindsor - Ca Office Hour: Mondays 9:00am-11:00am
No ratings yet
Microsoft Visual C++ 6.0: By: Shahed Shahir Email: Shahir@uwindsor - Ca Office Hour: Mondays 9:00am-11:00am
25 pages
Pcloadletter Co Uk 2012 07 06 Iemgd For Vaio P
No ratings yet
Pcloadletter Co Uk 2012 07 06 Iemgd For Vaio P
17 pages
CSC134 Lab 4 - Introduction To MS Word (Part 3)
No ratings yet
CSC134 Lab 4 - Introduction To MS Word (Part 3)
9 pages
A Project Report - New
No ratings yet
A Project Report - New
72 pages
Normalize Track User Guide
No ratings yet
Normalize Track User Guide
6 pages
Unit I
100% (1)
Unit I
32 pages
Applications Development and Emerging Technologies Reviewer
No ratings yet
Applications Development and Emerging Technologies Reviewer
145 pages
Document Management System Report
100% (4)
Document Management System Report
33 pages
14 Spring Boot Thymeleaf
No ratings yet
14 Spring Boot Thymeleaf
392 pages
Export To PDF Email To A Friend Print This: Step 1 - Installing Lotus Notes Step 2 - Configuring Lotus Notes
No ratings yet
Export To PDF Email To A Friend Print This: Step 1 - Installing Lotus Notes Step 2 - Configuring Lotus Notes
16 pages
Core Java Interview Q&A Guide
No ratings yet
Core Java Interview Q&A Guide
43 pages
Engineering Student Internship Report
No ratings yet
Engineering Student Internship Report
23 pages
Node Js
100% (1)
Node Js
51 pages
QDK Arm Cortex Stm32 Gnu
100% (2)
QDK Arm Cortex Stm32 Gnu
38 pages
C Standard Library Tutorial PDF
No ratings yet
C Standard Library Tutorial PDF
25 pages
Syllabus OOPJ
No ratings yet
Syllabus OOPJ
2 pages
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
No ratings yet
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
59 pages
Java Unit II - 05-8 Decision Looping
No ratings yet
Java Unit II - 05-8 Decision Looping
25 pages
NPTEL CC Assignment4
No ratings yet
NPTEL CC Assignment4
4 pages
Soal Yii2 221016 - Interview - Rev Roni Old
0% (1)
Soal Yii2 221016 - Interview - Rev Roni Old
4 pages
Clickteam Animation Tutorial
No ratings yet
Clickteam Animation Tutorial
11 pages
Python Code Quality Guidelines
No ratings yet
Python Code Quality Guidelines
43 pages
UML 1st Assignment
No ratings yet
UML 1st Assignment
1 page
How To Add Updated UVM Libraries To An Old QuestaSim UVM Library
No ratings yet
How To Add Updated UVM Libraries To An Old QuestaSim UVM Library
2 pages