0% found this document useful (0 votes)
22 views8 pages

Sbom dsn24

This paper conducts a large-scale differential analysis of four popular Software Bill of Materials (SBOM) generators to assess their correctness, revealing significant inconsistencies and omissions in the generated SBOMs. The authors introduce a parser confusion attack that highlights vulnerabilities in these tools and propose best practices and a benchmark for improving SBOM generation. The findings emphasize the need for accurate SBOMs to enhance vulnerability detection and compliance in software supply chains.

Uploaded by

Helen Yang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views8 pages

Sbom dsn24

This paper conducts a large-scale differential analysis of four popular Software Bill of Materials (SBOM) generators to assess their correctness, revealing significant inconsistencies and omissions in the generated SBOMs. The authors introduce a parser confusion attack that highlights vulnerabilities in these tools and propose best practices and a benchmark for improving SBOM generation. The findings emphasize the need for accurate SBOMs to enhance vulnerability detection and compliance in software supply chains.

Uploaded by

Helen Yang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

On the Correctness of Metadata-based SBOM

Generation: A Differential Analysis Approach


Sheng Yu∗† , Wei Song† , Xunchao Hu† , Heng Yin∗†
∗ University
of California, Riverside
† Deepbits
Technology Inc.
syu061@ucr.edu, wei@deepbits.com, xchu@deepbits.com, heng@cs.ucr.edu

Abstract—Amidst rising concerns of software supply chain to potential risks. Many SBOM generation tools [4], [6], [12],
attacks, the Software Bill of Materials (SBOM) has emerged as a [13] are extensively used in both commercial and open-source
pivotal tool, offering a detailed listing of software components to realms. However, the correctness of these tools remains largely
manage vulnerabilities, dependencies, and licensing. While many
SBOM generation tools are extensively used in both commercial unscrutinized. To date, there has not been a systematic study
and open-source realms, the correctness of these tools remains addressing the correctness of contemporary SBOM generation
largely unscrutinized. To date, there has not been a systematic solutions.
study addressing the correctness of contemporary SBOM genera- Given the diversity of programming languages, build tools,
tion solutions. In this paper, we conduct a large-scale differential and development practices, constructing a ground truth for
analysis of the correctness of four popular SBOM generators.
Surprisingly, our evaluation reveals all four SBOM generators SBOM generation evaluation is inherently challenging. In this
exhibit inconsistent SBOMs and dependency omissions, leading paper, we adopt a differential analysis approach: we analyze
to incomplete and potentially inaccurate SBOMs. Moreover, we the discrepancies in SBOMs produced by different tools for the
construct a parser confusion attack against these tools, intro- same software to assess both their correctness and weaknesses
ducing a new attack vector to conceal malicious, vulnerable, or in SBOM generation. More specifically, we 1) select four
illegal packages within the software supply chain. Drawing from
our analysis, we propose best practices for SBOM generation and popular SBOM generators: Trivy [13], Syft [12], Microsoft
introduce a benchmark to steer the development of more robust SBOM Tool [6], and GitHub Dependency Graph [4]; 2) collect
SBOM generators. 7,876 open-source projects written in Python, Ruby, PHP,
Java, Swift, C#, Rust, Golang and JavaScript; 3) evaluate the
I. I NTRODUCTION correctness of the SBOMs by conducting a differential analysis
Software Supply Chain Attacks (e.g., SolarWinds [18], on the outputs from these four tools.
PyTorch dependency confusion attack [9]) have increased by Surprisingly, our evaluation reveals all four SBOM gener-
742% between 2019 and 2022 [16]. In 2022 alone, 185,572 ators exhibit inconsistent SBOMs and dependency omissions,
software packages were affected by these attacks [1]. The leading to incomplete and potentially inaccurate SBOMs.
lack of visibility and transparency in the software supply Moreover, we construct a parser confusion attack against these
chain makes defending against such attacks challenging. Re- tools, introducing a new attack vector to conceal malicious,
cently, the Software Bill of Materials (SBOM) [10], a list vulnerable, or illegal packages within the software supply
of ”ingredients” used to build software, has demonstrated its chain. To assist in creating more effective SBOM generators,
efficacy in protecting the software supply chain by enhancing we have developed best practices for SBOM generation and
visibility from software development to consumption. Driven a benchmark to facilitate their development based on our
by regulations, such as Biden’s executive order [3] and the evaluation findings.
In summary, we make the following contributions in this
National Cybersecurity Implementation Plan [7], the industry
paper:
is adopting SBOM-based solutions to safeguard the software
• We are the first to conduct a large-scale differential
supply chain.
An essential step in adopting SBOM is to generate accurate analysis to examine the correctness of SBOM generation
SBOMs. While SBOMs have the potential to enhance vulnera- solutions.
• Our evaluation reveals significant deficiencies in current
bility detection and facilitate license compliance, these benefits
can only be realized if the SBOMs themselves are precise and SBOM generators. We also conduct a comprehensive
correct. Discrepancies or omissions in the SBOM can lead to case study to uncover how each SBOM tool detects
false assurances of security or compliance, exposing systems dependencies during the generation process.
• We construct a parser confusion attack against SBOM

Acknowledgments: We thank the anonymous reviewers and our shep- generators, introducing a new attack vector to inject
herd Yuchen (Dennis) Zhang for their valuable feedback. This work malicious, vulnerable, or illegal software packages into
is supported, in part, by the Department of Homeland Security under the software supply chain.
OTA#70RSAT23T00000013. Any opinions, findings, conclusions, or recom-
mendations expressed in this paper are those of the authors and do not • We develop best practices for developing SBOM gener-
necessarily reflect the views of the funding agencies. ators and a benchmark to facilitate their development.
II. BACKGROUND Node.js, are mainly for dependency declaration while en-
suring a degree of flexibility and future compatibility. The
A. Software Bill of Materials
other type is lockfile such as package-lock.json for
An SBOM [10] is a formal, machine-readable inventory Node.js. Lockfiles focus on providing a precise and deter-
of software components and dependencies that includes in- ministic snapshot of the exact dependency tree including
formation about those components and their hierarchical re- transitive dependencies. Locking prevents unexpected updates
lationships. It can be shared and exchanged automatically or changes in the dependencies when installing the project
among stakeholders (e.g., software vendors and consumers) to across different environments, ensuring reproducibility and
enhance software development, software supply chain man- avoiding compatibility issues. Despite that lockfiles contain
agement, vulnerability management, asset management, and the richest information for SBOM generation, they are not
procurement. This results in reduced costs, security risks, always available. Library developers are not encouraged to
license risks, and compliance risks. share lockfiles which could otherwise lead to version conflicts.
SBOM Types: Based on the stages of the software lifecycle Some package managers lack a native locking mechanism.
at which SBOMs are generated, they can be categorized into Without lockfiles, the missing transitive dependencies and
six types [14]: Design, Source, Build, Analyzed, Deployed, pinned versions pose a great challenge to SBOM tools to
and Runtime. Depending on what information is available in generate accurate and complete SBOM files.
each stage, these types of SBOMs focus on different aspects. III. M ETHODOLOGY
In this paper, we evaluate Source SBOM, a type of SBOM
derived from the development environment. It mainly contains Despite the growing significance and adoption of SBOMs,
dependencies used for development and compilation, and is a notable gap exists in systematically assessing the quality
widely supported by SBOM tools. Also, our survey suggests of the SBOM files generated. The reliability of security-
that, owing to its simplicity and precision, metadata parsing centric applications, including vulnerability detection and li-
is the industry’s leading SBOM generation technique. Thus, cense compliance, highly depends on the correctness of SBOM
this paper focuses on the Source SBOM generated using the data, which raises concerns regarding the trustworthiness of
metadata-based approach. such information.
This work aims to investigate the correctness and com-
SBOM Applications: The increasing complexity and inter- pleteness of the dependency information present in generated
dependence in software development have amplified the im- SBOMs. The objective is to not only measure the correctness
portance of SBOMs. These provide clarity by clearly listing but also to unravel the underlying factors contributing to high-
software components, facilitating swift vulnerability tracking quality SBOMs. Due to the lack of ground truth, we adopt
and identification for developers and security professionals. a differential analysis approach to obtain insights into the
Their compatibility with Vulnerability Exploitability eXchange performance of SBOM generators.
(VEX) [15], a structured database detailing product vul-
nerabilities, is noteworthy. Additionally, the comprehensive A. SBOM Generators
dependency information aids in license assessment, ensuring In this work, we evaluate four SBOM tools: Trivy 0.43.0,
compliance and mitigating legal exposures. SBOMs enable Syft 0.84.1, Microsoft SBOM Tool (sbom-tool) 1.1.6,
quality assessment of closed-source software through compo- and GitHub Dependency Graph (GitHub DG). Notably,
nent reputation checks, and their transparency fortifies the soft- the first three are popular open-source projects and offer cross-
ware supply chain by thwarting the introduction of potential platform support for Linux, Windows, and Mac operating
backdoors and vulnerabilities via third-party components. systems. Conversely, the GitHub Dependency Graph is intri-
cately integrated with GitHub repositories. We choose Trivy
B. Metadata
and Syft because they are the de facto SBOM generators
At the heart of Source SBOM generation lies the metadata used by industries and open-source communities. We pick
- an important element in modern software development. the Microsoft SBOM Tool because it is developed by the
These files encapsulate parameters, settings, dependencies, esteemed Microsoft. Similarly, the GitHub Dependency Graph
and version constraints, all of which are indispensable for is chosen because it is provided by the most widely used Git
reproducibility and consistent and reliable deployment, and platform. All the evaluated SBOM tools implement metadata-
offer support for package management, version control, and based approaches, meaning they read metadata files and extract
even automated build processes. Nowadays, almost every pro- dependency information declared in the metadata files.
gramming language comes with at least one package manager,
and each package manager defines its own metadata. B. Setup
At high level, there are two kinds of metadata. One is The evaluation was conducted by downloading popular
“raw” metadata where only direct dependencies are specified GitHub repositories associated with each programming lan-
and their versions are often given as a range or a constraint guage onto the local file system and subsequently scanning
instead of a specific (pinned) one. Raw metadata, such as the repository directories using the SBOM tools. Each tool
requirements.txt for Python and package.json for will generate an SBOM report in either CycloneDX [8] or

2
SPDX [5] format depending on which format is supported by B. Low Package Jaccard Similarities
the tools. Dependencies in these reports are then extracted and To measure whether the SBOM tools detect similar depen-
compared against each other. dencies for each repository, we compute a Jaccard similarity
Dataset: GitHub repositories were sourced from the well- for each SBOM tool pair for each repository as Equation 1
regarded awesome-LANGUAGE repositories, which are shows. A and B are two sets of dependencies generated
uniquely tailored to the respective programming languages. by two different SBOM tools. Each set contains dependency
Our dataset contains 535 Python, 819 Ruby, 384 PHP, 398 (name, version) pairs.
Java, 1,019 Swift, 700 C#, 994 Rust, 2,367 Golang, and 660 |A ∩ B|
JavaScript repositories. We do not evaluate C/C++ projects J(A, B) = (1)
|A ∪ B|
due to the absence of an “official” build toolset and ex-
Our evaluation result is illustrated in Figure 2. The majority
tremely limited support provided by the SBOM tools. C/C++
of these pairs show significant dissimilarity, with only a very
projects can be configured and built via various tools such as
small portion being similar. As shown in (a), the GitHub
Bazel, Makefile, CMake, Visual Studio project files, and more.
Dependency Graph and Syft have the most similarities among
Consequently, Trivy and Syft only analyze conan.lock,
them, although the majority of SBOM reports still exhibit
while GitHub Dependency Graph exclusively focuses on
substantial differences.
*.vcxproj files.
Metrics: For our large-scale evaluation, given the absence C. Duplicate Packages in SBOMs
of ground truth, we adopt a differential analysis approach. During our analysis of the generated SBOMs, we identified
First, we compare the number of dependencies reported by instances of duplicate packages: the same package appear-
each SBOM tool. We then use Jaccard similarity to measure ing in different entries with varying or the same version
the reported dependency names. This tells us the degree of requirements. To ensure accurate calculations, we excluded
overlap and commonality among the dependencies reported repositories in which tools could not find any packages.
by different tools. In addition, we identify duplicate packages In Table I, we have presented the rate of duplicate packages
reported by the SBOM tools. While these metrics may not for various SBOM tools. This problem was found to be
provide a direct ranking, they do shed light on the performance widespread across all four tools, suggesting a common occur-
of these tools. rence. However, it is important to note that having duplicate
packages is expected in some cases. For example, a repository
may contain multiple independent projects and they happen to
IV. L ARGE - SCALE SBOM C OMPARISON
have a common subset of dependencies.
After analyzing 7,876 high-quality repositories, we made TABLE I: Rate of Duplicate Packages in SBOMs
the following major findings. The reasons behind such dis-
crepancies will be discussed in Section V. Syft Trivy GitHub DG sbom-tool
Python 14.05% 12.56% 13.54% 13.71%
Java 12.76% 15.01% 19.93% 18.89%
A. Discrepancies in Package Counts within SBOM Reports JavaScript 17.46% 17.34% 18.89% 19.42%
Generated by Different Tools Go 9.97% 6.69% 11.03% 6.58%
.NET 17.38% 12.43% 18.01% 20.94%
The SBOM tools exhibited notable differences in the num- PHP 13.76% 11.77% 14.53% 23.76%
ber of packages they identified. Figure 1 clearly depicts this Ruby 13.56% 9.1% 15.84% 12.39%
variation. The x-axis is the repository ID sorted by the number Rust 13.19% 11.37% 19.18% 13.83%
Swift 1.37% 2.28% 6.98% 3.39%
of dependencies detected by the GitHub Dependency Graph.
For Python, PHP, Ruby, and Rust programming languages,
GitHub Dependency Graph discovers the most packages for V. SBOM G ENERATION A NALYSIS
these languages. For .Net repositories, Microsoft SBOM Tool To uncover the root causes behind the large disparities in
excelled in identifying the most packages, which is unsurpris- SBOM outputs, we conducted an in-depth analysis of the
ing as it is tailored to Microsoft’s own projects. For the Go and source code of the SBOM tools. Our examination revealed
Swift languages, Trivy and Microsoft SBOM Tool proved to several critical issues in SBOM generation, which are sum-
be the frontrunners, consistently identifying the most packages marized below.
in the majority of cases. Syft excels in detecting the highest
number of packages when it comes to JavaScript repositories. A. Limited Support for Metadata
The disparities presented in this figure underscore that different All the evaluated tools employ a metadata-based approach
tools possess varying capabilities and strategies in identifying where they analyze metadata to identify the components used
dependency packages across different programming languages. in the project. The supported metadata file types for each tool
It is important to note, however, that identifying more packages are detailed in Table II. It is important to note that the table
does not mean better because false positives may also be indicates the tools’ actual capability to extract dependencies
included. from metadata, which may differ from their claims.

3
104
103
Syft Syft Syft
Trivy 103 Trivy 103 Trivy
Package Count

Package Count

Package Count
102
Github Github Github
sbom-tool 102 sbom-tool 102 sbom-tool
101 101 101

100 100 100


0 100 200 300 400 500 0 100 200 300 400 0 200 400 600
Repositories Repositories Repositories
(a) Python (b) Java (c) JavaScript

103 Syft Syft 103 Syft


Trivy 103 Trivy
Package Count

Package Count
Trivy

Package Count
102 Github Github 102 Github
sbom-tool 102 sbom-tool sbom-tool
101 101 101

100 100 100


0 500 1000 1500 2000 0 200 400 600 0 100 200 300 400
Repositories Repositories Repositories
(d) Go (e) .Net (f) PHP
103
103 Syft Syft Syft
Trivy 103 Trivy Trivy
Package Count

Package Count
Package Count

102 Github Github 102 Github


sbom-tool 102 sbom-tool sbom-tool
101 101
101

100 100 100


0 200 400 600 800 0 200 400 600 800 1000 0 200 400 600 800 1000
Repositories Repositories Repositories
(g) Ruby (h) Rust (i) Swift
Fig. 1: Comparison of Package Counts Across Languages Using Various SBOM Generators

The table illustrates that each tool supports only a subset of parsers only support common syntaxes, leading to false neg-
commonly used metadata files. Overall, the SBOM tools have atives. For instance, the lack of support for the backslash “\”
good support for lockfiles in which transitive dependencies as a line continuation in all the SBOM tools causes parsing
and pinned versions are available, but they struggle with raw errors, resulting in incorrect versions or missed dependencies.
metadata. The GitHub Dependency Graph has the best support About 1.8% of Python repositories are affected by this.
for raw metadata such as Gemfile and Cargo.toml, while
other tools show limited or no support for raw metadata. C. Transitive Dependency
Despite claims by Trivy and Syft to support package.json,
they do not extract dependencies from the JSON file. In our The offline nature of SBOM tools (except Microsoft SBOM
evaluation, we found that 93% of Python repositories, 47% of Tool) implies a lack of attempts to resolve transitive dependen-
JavaScript repositories, and 56% of Rust repositories contain cies. In the case where lockfiles are not present, the absence
raw metadata only. of transitive dependencies will adversely affect SBOM appli-
cations. Microsoft SBOM Tool attempts to resolve transitive
B. Incomplete Metadata Parsing dependencies by querying package managers for each detected
Our evaluation shows that all the evaluated SBOM tools im- dependency, but this functionality is not well-implemented and
plement custom parsers for metadata. However, certain meta- often fails to retrieve dependency information from package
data, like requirements.txt defined in PEP 508, poses managers. About 74% of Python dependencies are transitive
challenges due to its complex syntax. The self-implemented dependencies.

4
Distribution of Jaccard Similarity Distribution of Jaccard Similarity Distribution of Jaccard Similarity
400 GitHub vs Syft 800 GitHub vs Trivy 600 Syft vs Trivy
300
Frequency

Frequency

Frequency
600
400
200 400
200
100 200
0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0.2 0.4 0.6 0.8 1.0
Jaccard Similarity Jaccard Similarity Jaccard Similarity
(a) GitHub vs. Syft (b) GitHub vs. Trivy (c) Syft vs. Trivy
Distribution of Jaccard Similarity Distribution of Jaccard Similarity Distribution of Jaccard Similarity
GitHub vs sbom-tool 1000 Syft vs sbom-tool
800 Trivy vs sbom-tool
800 600
Frequency

Frequency
Frequency
600 600 400
400 400
200
200 200
0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.0 0 0.0 0.2 0.4 0.6 0.8 1.0
0.2 0.4 0.6 0.8 1.0
Jaccard Similarity Jaccard Similarity Jaccard Similarity
(d) GitHub vs. sbom-tool (e) Trivy vs. sbom-tool (f) Syft vs. sbom-tool
Fig. 2: Distribution of Jaccard Similarity among Various Tools

TABLE II: Supported File Types straints by silently discarding dependencies without pinned
sbom- GitHub versions, resulting in false negatives. The GitHub Dependency
Trivy Syft
tool DG Graph reports version ranges as they appear in the metadata,
go.mod ✓ ✓ ✓ ✓ introducing additional parsing challenges for SBOM manage-
Go
Go executable ✓ ✓ ✗ ✗
pom.xml ✓ ✓ ✓ ✓ ment. In our evaluation, only 46% of dependencies declared in
gradle.lockfile ✓ ✓ ✓ ✓ requirements.txt have pinned versions, indicating that
Java
MANIFEST.MF ✓ ✓ ✗ ✗ Trivy and Syft may miss more than half of the dependencies
pom.properties ✓ ✓ ✗ ✗
even when transitive dependencies are not considered. Mi-
package.json ✗ ✗ ✗ ✓
package-lock.json ✓ ✓ ✓ ✓ crosoft SBOM Tool addresses this by pinning a version after
JS
yarn.lock ✗ ✓ ✓ ✓ querying the corresponding package manager for the latest
pnpm-lock.yaml ✗ ✓ ✓ ✗ version within the specified range.
composer.json ✗ ✗ ✗ ✓
PHP
composer.lock ✓ ✓ ✗ ✓ E. Inconsistent Package Naming Convention
requirements.txt ✓ ✓ ✓ ✓
poetry.lock ✓ ✓ ✓ ✓ When dealing with packages having compound names,
Python
pipfile.lock ✓ ✓ ✓ ✓ SBOM tools name them differently. For Java, a package is
setup.py ✗ ✗ ✗ ✓ located using the group ID and artifact ID. Syft uses the artifact
Gemfile ✗ ✗ ✗ ✓
Ruby Gemfile.lock ✓ ✓ ✓ ✓
ID as the package name, Microsoft SBOM Tool concatenates
.gemspec ✓ ✓ ✓ ✓ the group and artifact ID with a dot “.” as the package name,
Cargo.toml ✗ ✗ ✗ ✓ while Trivy and the GitHub Dependency Graph use a colon
Rust Cargo.lock ✓ ✓ ✓ ✓ sign “:” for this purpose. Similarly, Swift package manager
Rust executable ✓ ✓ ✗ ✗
CocoaPods supports subpecs when declaring a dependency.
Subspecs are a way of chopping up the functionality of a li-
brary, allowing people to install a subset of the library. Syft and
D. Limited Support for Version Constraints
Trivy report the subspecs, while Microsoft SBOM Tool reports
Raw metadata often contains version ranges or constraints their main dependency names. Additionally, Golang uses a
instead of pinned versions; for example, developers use leading letter “v” when specifying versions (e.g., v1.0.0). Syft
>=1.2.3 <2.0.0 to get the latest version while ensuring and Microsoft SBOM Tool adhere to this convention, while
backward compatibility. Trivy and Syft handle version con- Trivy and the GitHub Dependency Graph omit this leading

5
letter. Such inconsistencies can potentially compromise the TABLE III: SBOM Accuracy on requirements.txt
accuracy of vulnerability detection. Trivy Syft sbom-tool GitHub DG
Precision 0.25 0.25 0.74 0.13
F. Different Dependency Definition Recall 0.10 0.10 0.73 0.08

SBOM tools employ different strategies regarding whether


TABLE IV: requirements.txt Attack Samples
to include development dependencies (e.g., test suites, linters,
etc.) in SBOM files. Trivy focuses solely on production GitHub
Trivy Syft sbom-tool
DG
dependencies and ignores development dependencies, whereas requests [security]>=2.8.1 - - - -
Syft and GitHub Dependency Graph include both types. Our numpy \
numpy
evaluation reveals that in JavaScript, 76% of dependencies ==\ - - -
1.25.2
1.19.2
declared in package.json are development dependencies. -r SOME REQS.txt - - - -
It is crucial to note that there is no definitive answer regarding ./path/to/local pkg.whl - - - -
which approach is better. Including development dependencies https://remote pkg.whl - - - -
in the SBOM report offers several advantages, such as more urlib3 @ git link@hash - - - -
comprehensive vulnerability assessments and license violation
checks, but it may also introduce false alarms as the code of
OS and Python requirements. The low recall suggests that
development dependencies rarely goes into the final product.
relying solely on these SBOM tools in practice may have
The root problem lies in the absence of an existing field
serious negative impacts on downstream applications, such as
in SBOM formats representing the dependency scope. While
vulnerability detection and license violation checks.
most metadata have distinct fields for this purpose, such as the
scope field in pom.xml and the devDependencies in VI. PARSER C ONFUSION ATTACK
package.json, the current SBOM formats lack this support
Motivated by the findings in Section V-H, we present a
and may cause confusion in downstream applications.
parser confusion attack [20] to illustrate how adversaries can
G. Multiple Projects and Metadata obscure malicious dependencies. A parser confusion attack
exploits inconsistencies among different parsers processing the
Our evaluation indicates that, on average, over 10% of the
same input, enabling malicious actors to craft input that is
detected dependencies appear more than once in a repository,
benign for one parser but harmful for another. Our case study
causing duplicate entries in SBOM files. This is primarily due
shows that SBOM tools, employing custom metadata parsers,
to multiple metadata files present in a repository, either be-
introduce a new attack vector for constructing parser confusion
cause of having multiple subprojects or submodules or having
attacks within the SBOM ecosystem. In this study, we use
both raw metadata and lockfiles present. The SBOM tools
Python’s requirements.txt as an illustrative example.
analyze metadata individually without merging dependencies
in the same project. Duplicate entries in SBOMs can lead to Constructing the attack: Given that requirements.txt
confusion and potentially inflate the apparent package count. lacks a locking mechanism and exhibits a rich syntax, it
Our evaluation shows that there are 5.7 metadata files in becomes a suitable candidate for this type of attack. For
a Python repository and 12.8 metadata files in a JavaScript instance, none of the SBOM tools support the backslash as
repository on average. a line continuation; Trivy and Syft rely on the double-equal
sign to separate package names and versions; installations
H. Accuracy on Ground Truth from wheel packages are not universally supported; and many
Our large-scale evaluation employed a differential analysis more. Table IV provides some input patterns that can be
due to the lack of ground truth. In this section, we quantify used to bypass detections based on our manual analysis and
the accuracy of each SBOM tool on requirements.txt benchmark (discussed in Section VII). It shows how attack-
using our manually crafted ground truth. The ground ers can leverage different syntax elements to either conceal
truth is obtained by dry-running pip install (Python specific dependencies or confuse SBOM tools, leading to
3.11, pip 23.1.2), and we consider a correct dependency inaccurate results. In the table, a dash (“-”) signifies that the
(name, version) pair as a correct match. Dry run simulates corresponding SBOM tool cannot detect anything from the
the installation process and the dependencies reported by pip given dependency declaration.
install are those that will be installed in our environment. Achieving Damage: When the SBOM tools encounter un-
This evaluation aims to highlight the differences between the supported syntax, the default behavior is to silently ignore
reported libraries and the ones actually installed. the associated dependency. Adversaries can exploit this and
The evaluation result is presented in Table III. Most inject malicious or vulnerable dependencies in metadata using
SBOM tools fail to detect over 90% of the dependencies in unsupported syntax, effectively evading the tools’ detection
requirements.txt due to incomplete syntax support and entirely. In our dataset, the two most common patterns are
the lack of transitive dependency resolution. The Microsoft installing from other requirement files (-r) and installing
SBOM Tool excels in this test because it attempts to resolve from version control systems, each appearing in over 50
transitive dependencies, but it ignores the extras field, and requirements.txt files.

6
VII. B EST P RACTICE AND B ENCHMARK It is worth mentioning that our evaluation was specifically
limited to a subset of SBOM tools, namely Trivy, Syft, Mi-
Drawing from our evaluation, we present what we believe
crosoft SBOM Tool, and GitHub Dependency Graph. Despite
are the most optimal solutions to address identified issues and
our careful selection of these prominent tools, the dynamic
minimize the attack surface. We propose the following best
and ever-evolving landscape of SBOM generation solutions
practices for metadata-based approaches:
implies that our findings may not cover the entirety of available
Package Manager Dry Run for Lockfile Generation: The options. There is a possibility that subtle variations presented
root cause of the large discrepancies lies in the limitations by other tools might have been inadvertently overlooked.
of self-implemented parsers, particularly in their support for While metadata-based SBOM generation is relatively simple
metadata and metadata syntax. Instead of relying on these to implement, this approach has inherent limitations. First,
parsers, we recommend employing a package manager dry declared dependencies may only be partially built into the
run to generate lockfiles. This simulates the dependency in- final product or not be used at all, potentially leading to false
stallation process, providing both transitive dependencies and alarms. Transitive dependencies are not well-captured, causing
accurate version information for each package. Adopting this false negatives. Moreover, developers might add code directly
approach ensures the creation of a precise and reliable SBOM to the project for experiments or testing, and metadata-based
file, thereby enhancing resilience against confusion attacks. approaches are unable to detect such cases. We recommend
PURL and CPE Support: Each dependency should include implementing def-use analysis to determine whether each
a PURL (Package URL) entry and a CPE (Common Product library within the project has been used or not. Additionally,
Enumerator) entry for consistent package naming convention, code clone detection [26], [27], [33] can identify libraries
maximum compatibility with vulnerability databases, and fa- introduced via copy & paste. Employing these techniques
cilitate software identification. helps eliminate false positives and false negatives, enhancing
Our evaluation benchmark is available on GitHub the overall correctness of the SBOM.
at https://github.com/DeepBitsTechnology/sbom-benchmark.
IX. R ELATED W ORK
This benchmark includes manually crafted metadata files and
ground truth datasets for common languages. These metadata Software Supply Chain Attacks Malicious [30] or vulner-
files try to cover all supported syntaxes for each language, able packages [21] have resulted in increasing [28] software
and can be used to evaluate of the SBOM tools’ capability to supply chain attacks (SolarWind [18], NotPetya [24], etc.).
handle corner cases. This initiative aims to guide the develop- Various approaches have been proposed [31], [32]. SBOM [10]
ment of SBOM tools, emphasizing completeness and accuracy. demonstrates its efficiency in managing risks in the software
We are working on adding support for more programming supply chain and has been advocated by both the industry and
languages. goverment stakeholders [3], [7].
SBOM & Vulnerability Exploitability eXchange (VEX)
VIII. D ISCUSSION VEX, as defined by NTIA, is a “companion artifact” to a
This study aims to assess the quality of SBOMs produced by SBOM [15], allowing manufacturers to share product vulner-
widely used SBOM tools. Our analysis exposes deficiencies in ability exploitability in a standardized, automatable format.
the SBOM generation process employed by these tools. Trivy, Ahmed el al. [17] applied SBOM tools to assess how code de-
Syft, and GitHub Dependency Graph do not identify transitive bloating reduces vulnerabilities in Docker images. Numerous
dependencies or determine an appropriate version when no tools (DependencyTrack [2], DeepSCA [11], Nadgowa [29],
pinned version is provided. In contrast, the Microsoft SBOM Girdha [23], etc.) have been developed to support SBOM
Tool reaches out to package managers to validate package generation and consumption. In particular, DeepSCA is a
names and ascertain a suitable version. complimentary online service that generates different types of
While conducting our evaluation, we encountered a sig- SBOMs and conducts risk analysis for most popular languages
nificant challenge stemming from the absence of a well- and platforms with or without the source code.
defined benchmark for accurately assessing the quality of the Software Composition Analysis (SCA) Apart from metadata-
generated SBOMs. Currently, the industry lacks a standardized based parsing, SCA is also a promising technique for generat-
dataset and uniform statistical methods for conducting evalu- ing SBOMs. When source code is available, SCA solutions
ations in this area. In response to this issue, we created our such as CENTRIS [33] and Tamer [26] can be combined
own dataset. with program analysis to identify components that are actively
Our experiment focuses on metadata-based Source SBOM invoked in the software, yielding more accurate SBOMs. When
generation on file system. It is important to note that certain the source code is not available, binary-focused SCA tools like
SBOM tools, such as Trivy, may exhibit different behaviors BAT [25], OSSPolice [22], B2SFinder [34], and LibScout [19]
depending on the specific targets of their scans. For example, utilize string literals and other language-specific features to
scanning metadata files is enabled for both file system and discern components in the examined binaries. Though their
git repository scans, while the activation of wheel packages is accuracy might not be optimal, they still enhance transparency
restricted to Docker image and Rootfs scans. to a certain degree.

7
X. C ONCLUSION AND F UTURE W ORK [20] C. Carmony, M. Zhang, X. Hu, A. V. Bhaskar, and H. Yin. Extract me if
you can: Abusing PDF parsers in malware detectors. In Proceedings of
In this paper, we conducted the first large-scale differential the 23rd Annual Network and Distributed System Security Symposium
(NDSS’16), Feb. 2016.
analysis to examine the correctness of SBOM generation [21] A. Decan, T. Mens, and E. Constantinou. On the impact of security
solutions. We generated SBOMs using four popular SBOM vulnerabilities in the npm package dependency network. In Proceedings
generators for 7,876 open-source projects and systematically of the 15th international conference on mining software repositories,
pages 181–191, 2018.
studied the correctness of these SBOMs. Our evaluation un- [22] R. Duan, A. Bijlani, M. Xu, T. Kim, and W. Lee. Identifying open-source
covered significant deficiencies in current SBOM generators. license violation and 1-day security risk at large scale. In Proceedings
Additionally, we identified the design flaws in each SBOM of the 2017 ACM SIGSAC Conference on computer and communications
security, pages 2169–2185, 2017.
generator, and devised a parser confusion attack against these [23] S. Girdhar. Frankfurt university of applied sciences.
generators, introducing a new path for injecting malicious, [24] A. Greenberg. The untold story of notpetya, the most devastating
vulnerable, or illegal packages. Finally, based on our findings, cyberattack in history. Wired, August, 22, 2018.
[25] A. Hemel, K. T. Kalleberg, R. Vermaas, and E. Dolstra. Finding software
we established best practices for creating SBOM generators license violations through binary code clone detection. In Proceedings
and introduced a benchmark to aid their development. of the 8th Working Conference on Mining Software Repositories, pages
In the future, we plan to extend our benchmark to sup- 63–72, 2011.
[26] T. Hu, Z. Xu, Y. Fang, Y. Wu, B. Yuan, D. Zou, and H. Jin. Fine-grained
port languages beyond just Python. Additionally, we aim code clone detection with block-based splitting of abstract syntax tree.
to establish a ranking system to qualitatively measure the In Proceedings of the 32nd ACM SIGSOFT International Symposium on
quality of SBOM generators in the market, allowing security Software Testing and Analysis, pages 89–100, 2023.
[27] S. Kim, S. Woo, H. Lee, and H. Oh. Vuddy: A scalable approach for
professionals to select the most suitable tools and SBOM vulnerable code clone discovery. In 2017 IEEE Symposium on Security
generator vendors to evaluate and improve their offerings. and Privacy (SP), pages 595–614. IEEE, 2017.
[28] J. Martı́nez and J. M. Durán. Software supply chain attacks, a threat to
global cybersecurity: Solarwinds’ case study. International Journal of
R EFERENCES Safety and Security Engineering, 11(5):537–545, 2021.
[29] S. Nadgowda. Engram: the one security platform for modern software
[1] Annual number of software packages impacted by supply chain risks. In Proceedings of the Eighth International Workshop
supply chain cyber attacks worldwide from 2019 to on Container Technologies and Container Clouds, pages 7–12, 2022.
2023 ytd. https://www.statista.com/statistics/1375128/ [30] M. Ohm, H. Plate, A. Sykosch, and M. Meier. Backstabber’s knife
supply-chain-attacks-software-packages-affected-global/. collection: A review of open source software supply chain attacks. In
[2] Depedency track. https://dependencytrack.org/. Detection of Intrusions and Malware, and Vulnerability Assessment:
[3] Executive order on improving the nation’s cybersecurity. 17th International Conference, DIMVA 2020, Lisbon, Portugal, June
https://www.whitehouse.gov/briefing-room/presidential-actions/2021/ 24–26, 2020, Proceedings 17, pages 23–43. Springer, 2020.
05/12/executive-order-on-improving-the-nations-cybersecurity/. [31] M. Ohm and C. Stuke. Sok: Practical detection of software supply
chain attacks. In Proceedings of the 18th International Conference on
[4] Github dependency graph. https://docs.github.com/en/code-security/
Availability, Reliability and Security, pages 1–11, 2023.
supply-chain-security/understanding-your-software-supply-chain/
[32] M. Ohm, A. Sykosch, and M. Meier. Towards detection of software
about-the-dependency-graph.
supply chain attacks by forensic artifacts. In Proceedings of the 15th
[5] International open standard (iso/iec 5962:2021) - software package data
international conference on availability, reliability and security, pages
exchange (spdx). https://spdx.dev/.
1–6, 2020.
[6] Microsoft sbom tool. https://github.com/microsoft/sbom-tool. [33] S. Woo, S. Park, S. Kim, H. Lee, and H. Oh. Centris: A precise
[7] National cybersecurity strategy implementation plan. and scalable approach for identifying modified open-source software
https://www.whitehouse.gov/wp-content/uploads/2023/07/ reuse. In 2021 IEEE/ACM 43rd International Conference on Software
National-Cybersecurity-Strategy-Implementation-Plan-WH.gov .pdf. Engineering (ICSE), pages 860–872. IEEE, 2021.
[8] Owasp cyclonedx software bill of materials (sbom) standard. https: [34] Z. Yuan, M. Feng, F. Li, G. Ban, Y. Xiao, S. Wang, Q. Tang, H. Su,
//cyclonedx.org/. C. Yu, J. Xu, et al. B2sfinder: Detecting open-source software reuse
[9] Pytorch machine learning framework compromised with in cots software. In 2019 34th IEEE/ACM International Conference on
malicious dependency. https://thehackernews.com/2023/01/ Automated Software Engineering (ASE), pages 1038–1049. IEEE, 2019.
pytorch-machine-learning-framework.html.
[10] Software bill of materials. https://www.ntia.gov/page/
software-bill-materials.
[11] Software supply chain arsenal. https://tools.deepbits.com/.
[12] Syft. https://github.com/anchore/syft.
[13] Trivy. https://trivy.dev/.
[14] Types of software bill of materials. https://www.cisa.gov/
resources-tools/resources/types-software-bill-materials-sbom.
[15] Vulnerability-exploitability exchange (vex)–an overview. https://www.
ntia.gov/files/ntia/publications/vex one-page summary.pdf.
[16] Why 2023 is the year for software supply chain attacks. https://hadrian.
io/blog/why-2023-is-the-year-for-software-supply-chain-attacks.
[17] F. A. Ahmed and D. Fatih. Security analysis of code bloat in machine
learning systems. 2022.
[18] R. Alkhadra, J. Abuzaid, M. AlShammari, and N. Mohammad. Solar
winds hack: In-depth analysis and countermeasures. In 2021 12th In-
ternational Conference on Computing Communication and Networking
Technologies (ICCCNT), pages 1–7. IEEE, 2021.
[19] M. Backes, S. Bugiel, and E. Derr. Reliable third-party library detection
in android and its security applications. In Proceedings of the 2016 ACM
SIGSAC conference on computer and communications security, pages
356–367, 2016.

You might also like