Customization Guide
Customization Guide
Customization Guide
TABLE OF CONTENTS
Chapter 1. Introduction.........................................................................................1
Chapter 2. Sections.............................................................................................. 2
2.1. Section Files............................................................................................... 2
2.2. Section Definition.........................................................................................5
2.3. Metric Options.............................................................................................5
2.4. Missing Sections........................................................................................... 5
2.5. Derived Metrics........................................................................................... 6
Chapter 3. Rule System.........................................................................................8
3.1. Writing Rules.............................................................................................. 8
3.2. Integration................................................................................................. 8
3.3. Rule System Architecture............................................................................... 9
3.4. NvRules API............................................................................................... 10
3.5. Rule File API............................................................................................. 10
3.6. Rule Examples........................................................................................... 11
Chapter 4. Python Report Interface........................................................................ 12
4.1. Basic Usage............................................................................................... 12
4.2. NVTX Support............................................................................................ 14
4.3. Sample Script............................................................................................ 14
Chapter 5. Source Counters.................................................................................. 16
Chapter 6. Report File Format.............................................................................. 18
6.1. Version 7 Format........................................................................................ 18
www.nvidia.com
Nsight Compute v2022.3.0 | ii
LIST OF TABLES
www.nvidia.com
Nsight Compute v2022.3.0 | iii
www.nvidia.com
Nsight Compute v2022.3.0 | iv
Chapter 1.
INTRODUCTION
The goal of NVIDIA Nsight Compute is to design a profiling tool that can be easily
extended and customized by expert users. While we provide useful defaults, this allows
adapting the reports to a specific use case or to design new ways to investigate collected
data. All the following is data driven and does not require the tools to be recompiled.
While working with section files or rules files it is recommended to open the Sections/
Rules tool window from the Profile menu item. This tool window lists all sections and
rules that were loaded. Rules are grouped as children of their associated section or
grouped in the [Independent Rules] entry. For files that failed to load, the table shows the
error message. Use the Reload button to reload rule files from disk.
www.nvidia.com
Nsight Compute v2022.3.0 | 1
Chapter 2.
SECTIONS
The Details page consists of sections that focus on a specific part of the kernel analysis
each. Every section is defined by a corresponding section file that specifies the data to be
collected as well as the visualization used in the UI to present this data. Simply modify a
section file to add or modify what is collected.
Identifier: "SampleSection"
DisplayName: "Sample Section"
Description: "This sample section shows information on active warps and cycles."
Header {
Metrics {
Label: "Active Warps"
Name: "smsp__active_warps_avg"
}
Metrics {
Label: "Active Cycles"
Name: "smsp__active_cycles_avg"
}
}
On data collection, this section will cause the two PerfWorks metrics
smsp__active_warps_avg and smsp__active_cycles_avg to be collected.
www.nvidia.com
Nsight Compute v2022.3.0 | 2
Sections
More advanced elements can be used in the body of a section. Currently, NVIDIA Nsight
Compute supports tables and various bar charts. The following example shows how
to use these in a slightly more complex example. The usage of regexes is allowed in
tables and charts in the section Body only and follows the format regex: followed by the
actual regex to match PerfWorks metric names.
The supported list of metrics that can be used in sections can be queried using NVIDIA
Nsight Compute CLI with option --query-metrics. Each of these metrics can be
used in any section and will be automatically be collected if they appear in any enabled
section. Look at all the shipping sections to see how they are implemented.
www.nvidia.com
Nsight Compute v2022.3.0 | 3
Sections
Identifier: "SampleSection"
DisplayName: "Sample Section"
Description: "This sample section shows various metrics."
Header {
Metrics {
Label: "Active Warps"
Name: "smsp__active_warps_avg"
}
Metrics {
Label: "Active Cycles"
Name: "smsp__active_cycles_avg"
}
}
Body {
Items {
Table {
Label: "Example Table"
Rows: 2
Columns: 1
Metrics {
Label: "Avg. Issued Instructions Per Scheduler"
Name: "smsp__inst_issued_avg"
}
Metrics {
Label: "Avg. Executed Instructions Per Scheduler"
Name: "smsp__inst_executed_avg"
}
}
}
Items {
Table {
Label: "Metrics Table"
Columns: 2
Order: ColumnMajor
Metrics {
Name: "regex:.*__elapsed_cycles_sum"
}
}
}
Items {
BarChart {
Label: "Metrics Chart"
CategoryAxis {
Label: "Units"
}
ValueAxis {
Label: "Cycles"
}
Metrics {
Name: "regex:.*__elapsed_cycles_sum"
}
}
}
}
www.nvidia.com
Nsight Compute v2022.3.0 | 4
Sections
www.nvidia.com
Nsight Compute v2022.3.0 | 5
Sections
options. In NVIDIA Nsight Compute, the search path can be configured in the Profile
options.
Syntax errors: If the file is found but has syntax errors, it will not be available for metric
collection. However, error messages are reported for easier debugging. In NVIDIA
Nsight Compute CLI, use the --list-sections option to get a list of error messages, if
any. In NVIDIA Nsight Compute, error messages are reported in the Sections/Rules Info
tool window.
MetricDefinitions {
MetricDefinitions {
Name: "derived_metric_name"
Expression: "derived_metric_expr"
}
MetricDefinitions {
...
}
...
}
Since metrics can contain regular values and/or instanced values, elements are combined
as below. Constants are treated as metrics with only a regular value.
www.nvidia.com
Nsight Compute v2022.3.0 | 6
Sections
2. If both metrics have no correlation ids, the first N values are operator-
combined, where N is the minimum of the number of elements in both metrics.
a1 + b1
a2 + b2
a3
a4
3. Else if both metrics have correlation ids, the sets of correlation ids from
both metrics are joined and then operator-combined as applicable.
a1 + b1
a2
b3
a4 + b4
b5
4. Else if only the left-hand side metric has correlation ids, the right-hand
side regular metric value is operator-combined with every element of the left-
hand side metric.
a1 + b
a2 + b
a3 + b
5. Else if only the right-hand side metric has correlation ids, the right-hand
side element values are operator-combined with the regular metric value of the
left-hand side metric.
a + b1 + b2 + b3
In all operations, the value kind of the left-hand side operand is used. If the right-hand
side operand has a different value kind, it is converted. If the left-hand side operand is a
string-kind, it is returned unchanged.
Examples for derived metrics are derived__avg_thread_executed, which
provides a hint on the number of threads executed on average at each instruction, and
derived__uncoalesced_l2_transactions_global, which indicates the ratio of
actual L2 transactions vs. ideal L2 transactions at each applicable instruction.
MetricDefinitions {
MetricDefinitions {
Name: "derived__avg_thread_executed"
Expression: "thread_inst_executed_true / inst_executed"
}
MetricDefinitions {
Name: "derived__uncoalesced_l2_transactions_global"
Expression: "memory_l2_transactions_global /
memory_ideal_l2_transactions_global"
}
MetricDefinitions {
Name: "sm__sass_thread_inst_executed_op_ffma_pred_on_x2"
Expression:
"sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained * 2"
}
}
www.nvidia.com
Nsight Compute v2022.3.0 | 7
Chapter 3.
RULE SYSTEM
NVIDIA Nsight Compute features a new Python-based rule system. It is designed as the
successor to the Expert System (un)guided analysis in NVIDIA Visual Profiler, but meant
to be more flexible and more easily extensible to different use cases and APIs.
3.2. Integration
The rule system is integrated into NVIDIA Nsight Compute as part of the profile report
view. When you profile a kernel, available rules will be shown in the report's Details
page. You can either select to apply all available rules at once by clicking Apply Rules at
the top of the page, or apply rules individually. Once applied, the rule results will be
added to the current report. By default, all rules are applied automatically.
www.nvidia.com
Nsight Compute v2022.3.0 | 8
Rule System
www.nvidia.com
Nsight Compute v2022.3.0 | 9
Rule System
API. In addition, some functionality is provided directly by the NvRules module, e.g.
for global error reporting. Finally, since rules are valid Python code, they can use regular
libraries and language functionality that ship with Python as well.
From the rule Context, multiple further objects can be accessed, e.g. the Frontend,
Ranges and Actions. It should be noted that those are only interfaces, i.e. the actual
implementation can vary from tool to tool that decides to implement this functionality.
Naming of these interfaces is chosen to be as API-independent as possible, i.e. not to
imply CUDA-specific semantics. However, since many compute and graphics APIs
map to similar concepts, it can easily be mapped to CUDA terminology, too. A Range
refers to a CUDA stream, an Action refers to a single CUDA kernel instance. Each action
references several Metrics that have been collected during profiling (e.g. instructions
executed) or are statically available (e.g. the launch configuration). Metrics are accessed
via their names from the Action.
Each CUDA stream can contain any number of kernel (or other device activity) instances
and so each Range can reference one or more Actions. However, currently only a single
Action per Range will be available, as only a single CUDA kernel can be profiled at once.
The Frontend provides an interface to manipulate the tool UI by adding messages or
graphical elements such as line and bar charts or tables. The most common use case
is for a rule to show at least one message, stating the result to the user. This could be
as simple as "No issues have been detected," or contain direct hints as to how the user
could improve the code, e.g. "Memory is more heavily utilized than Compute. Consider
whether it is possible for the kernel to do more compute work."
www.nvidia.com
Nsight Compute v2022.3.0 | 10
Rule System
import NvRules
def get_identifier():
return "GpuArch"
def apply(handle):
ctx = NvRules.get_context(handle)
action = ctx.range_by_idx(0).action_by_idx(0)
ccMajor =
action.metric_by_name("device__attribute_compute_capability_major").as_uint64()
ctx.frontend().message("Running on major compute capability " + str(ccMajor))
www.nvidia.com
Nsight Compute v2022.3.0 | 11
Chapter 4.
PYTHON REPORT INTERFACE
Importing a report
Once the module is imported, you can load a report file by calling the load_report
function with the path to the file. This function returns an object of type IContext
which holds all the information concerning that report.
Querying ranges
1
On Linux machines you will also need a GNU-compatible libc and libgcc_s.so.
www.nvidia.com
Nsight Compute v2022.3.0 | 12
Python Report Interface
When inspected through the Python module, kernel profiling results are grouped in
ranges represented by an IRange object. You can inspect the number of ranges contained
in the loaded report by calling the num_ranges() member function of an IContext
object and retrieve a range by its index using range_by_idx(index).
>>> my_context.num_ranges()
1
>>> my_range = report.range_by_idx(0)
Querying actions
Inside a range, kernel profiling results are called actions. You can query the number of
actions by using the num_actions of an IRange object.
>>> my_range.num_actions()
2
In the same way ranges can be obtained using their indices, individual actions can be
obtained using the action_by_idx(index) method of the IRange object and are
represented by the IAction class.
>>> my_action.name()
MyKernel
Querying metrics
To get a tuple of all the metric names available within that action use the
metric_names() method. This is meant to be combined with the metric_by_name()
method which returns an IMetric object. The metric names are the same as the ones
you can use when using the --metrics flag with Nsight Compute. Once you have
extracted a metric from an action, you can obtain its value by using one of three methods:
‣ as_string() to obtain its value as a Python str
‣ as_uint64() to obtain its value as a Python int
‣ as_double() to obtain its value as a Python float
For example, to print the display name of the GPU the kernel was profiled on you can
query the device__attribute_display_name metric.
>>> display_name_metric =
my_action.metric_by_name('device__attribute_display_name')
>>> display_name_metric.as_string()
'NVIDIA GeForce RTX 3060 Ti'
www.nvidia.com
Nsight Compute v2022.3.0 | 13
Python Report Interface
Note that accessing a metric with the wrong type can lead to unexpected (conversion)
results.
>>> display_name_metric.as_double()
0.0
www.nvidia.com
Nsight Compute v2022.3.0 | 14
Python Report Interface
#!/usr/bin/env python3
import sys
import ncu_report
if len(sys.argv) != 2:
print("usage: {} report_file".format(sys.argv[0]), file=sys.stderr)
sys.exit(1)
report = ncu_report.load_report(sys.argv[1])
www.nvidia.com
Nsight Compute v2022.3.0 | 15
Chapter 5.
SOURCE COUNTERS
The Source page provides correlation of various metrics with CUDA-C, PTX and SASS
source of the application, depending on availability.
Which Source Counter metrics are collected and the order in which they are displayed
in this page is controlled using section files, specifically using the ProfilerSectionMetrics
message type. Each ProfilerSectionMetrics defines one ordered group of metrics, and
can be assigned an optional Order value. This value defines the ordering among those
groups in the Source page. This allows, for example, you to define a group of memory-
related source counters in one and a group of instruction-related counters in another
section file.
Identifier: "SourceMetrics"
DisplayName: "Custom Source Metrics"
Metrics {
Order: 2
Metrics {
Label: "Instructions Executed"
Name: "inst_executed"
}
Metrics {
Label: ""
Name: "collected_but_not_shown"
}
}
If a Source Counter metric is given an empty label attribute in the section file, it will be
collected but not shown on the page.
www.nvidia.com
Nsight Compute v2022.3.0 | 16
Source Counters
www.nvidia.com
Nsight Compute v2022.3.0 | 17
Chapter 6.
REPORT FILE FORMAT
This section documents the internals of the profiler report files (reports in the following)
as created by NVIDIA Nsight Compute. The file format is subject to change in future
releases without prior notice.
www.nvidia.com
Nsight Compute v2022.3.0 | 18
Report File Format
www.nvidia.com
Nsight Compute v2022.3.0 | 19
Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS,
DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY,
"MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES,
EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE
MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR
PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA
Corporation assumes no responsibility for the consequences of use of such
information or for any infringement of patents or other rights of third parties
that may result from its use. No license is granted by implication of otherwise
under any patent rights of NVIDIA Corporation. Specifications mentioned in this
publication are subject to change without notice. This publication supersedes and
replaces all other information previously supplied. NVIDIA Corporation products
are not authorized as critical components in life support devices or systems
without express written approval of NVIDIA Corporation.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA
Corporation in the U.S. and other countries. Other company and product names
may be trademarks of the respective companies with which they are associated.
Copyright
© 2018-2022 NVIDIA Corporation and affiliates. All rights reserved.
This product includes software developed by the Syncro Soft SRL (http://
www.sync.ro/).
www.nvidia.com