Skip to content

bombless/ptx-vm-leetcode

 
 

Repository files navigation

NVIDIA PTX Virtual Machine

A virtual machine implementation for executing NVIDIA PTX (Parallel Thread Execution) intermediate code. This project provides a complete VM architecture with advanced execution features, memory optimizations, and comprehensive tooling.

Authors: Han-Zhenzhong, TongyiLingma, GitHub Copilot

The project is hosted on gitee at: https://gitee.com/hanzhenzhong/ptx-vm

Features

Core Execution Engine

  • Full SIMT (Single Instruction Multiple Threads) execution model
  • Warp scheduling with dynamic thread mask management
  • Predicated execution support for conditional operations
  • Comprehensive divergence handling with multiple reconvergence algorithms
  • Performance counters for detailed execution statistics

Memory System

  • Hierarchical memory model with separate spaces
  • Data cache simulation with configurable parameters
  • Shared memory bank conflict detection
  • Memory access pattern analysis and optimization
  • TLB and page fault handling for virtual memory

Optimization Features

  • Dynamic register allocation framework
  • Instruction scheduling optimizations
  • Memory coalescing optimizations
  • Cache configuration flexibility

Integration Layer

  • Host API design for easy integration
  • CLI interface for manual execution and debugging
  • CUDA binary loading infrastructure
  • Enhanced debugging interface with watchpoints

Testing and Validation

  • Comprehensive unit test suite
  • Integration tests for system-level behavior
  • Performance benchmarks
  • Example programs for demonstration

Documentation

主要文档:

Building the Project

Prerequisites

  • CMake 3.14+
  • C++20 compatible compiler:
    • GCC 10+
    • Clang 12+
    • MSVC VS2019 16.10+
  • Google Test (for unit testing)

Build Instructions

# Clone the repository
git clone https://gitee.com/hanzhenzhong/ptx-vm.git
cd ptx_vm

# Create build directory
mkdir build && cd build

# Configure with CMake
cmake ..

# Build the project
make

Build Options

  • BUILD_TESTS=ON/OFF - Enable/disable unit tests
  • BUILD_EXAMPLES=ON/OFF - Enable/disable example programs
  • BUILD_DOCUMENTATION=ON/OFF - Enable/disable documentation building

Usage

The PTX VM provides three ways to use the virtual machine:

1. 🚀 Quick Start - Direct Execution Mode

Run PTX programs directly from the command line:

# Basic execution (with default INFO log level)
./ptx_vm examples/simple_math_example.ptx

# With debug logging to see detailed execution
./ptx_vm --log-level debug examples/control_flow_example.ptx

# Run example programs
cd build
./execution_result_demo
./parameter_passing_example

Command-line options:

  • -h, --help - Display help message
  • -l, --log-level LEVEL - Set log level: debug, info (default), warning, error

2. 💻 Interactive CLI Mode

For debugging, learning, and experimentation:

# Start interactive mode
./ptx_vm

# Interactive commands
> load examples/control_flow_example.ptx  # Load PTX program
> alloc 1024                              # Allocate memory
> launch myKernel 0x10000                 # Launch kernel with parameters
> memory 0x10000 256                      # View memory contents
> dump                                    # Show execution statistics
> loglevel debug                          # Change log level
> exit                                    # Exit the VM

3. 🔧 API Programming Mode

Integrate PTX VM into your application:

#include "host_api.hpp"

int main() {
    // Initialize VM
    HostAPI hostAPI;
    hostAPI.initialize();
    
    // Allocate device memory
    CUdeviceptr devicePtr;
    hostAPI.cuMemAlloc(&devicePtr, 1024 * sizeof(int));
    
    // Prepare and copy data
    std::vector<int> data(1024, 42);
    hostAPI.cuMemcpyHtoD(devicePtr, data.data(), 1024 * sizeof(int));
    
    // Load and execute PTX program
    hostAPI.loadProgram("my_kernel.ptx");
    
    // Launch kernel with parameters
    void* params[] = { &devicePtr, &size };
    hostAPI.cuLaunchKernel(kernel, 1,1,1, 32,1,1, 0, 0, params, nullptr);
    
    // Copy results back
    std::vector<int> results(1024);
    hostAPI.cuMemcpyDtoH(results.data(), devicePtr, 1024 * sizeof(int));
    
    // Cleanup
    hostAPI.cuMemFree(devicePtr);
    return 0;
}

Log Level Control

Control the verbosity of VM output:

# Command-line mode
./ptx_vm --log-level debug program.ptx    # Detailed debug info
./ptx_vm --log-level info program.ptx     # General info (default)
./ptx_vm --log-level warning program.ptx  # Warnings and errors
./ptx_vm --log-level error program.ptx    # Errors only

# Interactive mode
> loglevel debug     # Enable all logs
> loglevel info      # Default level
> loglevel warning   # Warnings and errors only
> loglevel error     # Errors only
> loglevel           # Display current level

Log levels:

  • debug - Shows detailed execution info, register values, memory operations
  • info - Shows program loading, kernel launches, general info (default)
  • warning - Shows warnings and errors only
  • error - Shows errors only

For more details, see:

Command Reference

load

Load a PTX or CUDA binary file into the VM.

> load <filename>

run

Execute the loaded program.

> run

step

Execute one instruction at a time.

> step [number_of_instructions]

break

Set a breakpoint at a specific address.

> break <address>

watch

Set a watchpoint at a specific memory address.

> watch <address>

register

Display register information.

> register [all|predicate|pc]

memory

Display memory contents.

> memory <address> [size]

alloc

Allocate memory in the VM.

> alloc <size>

memcpy

Copy memory within the VM.

> memcpy <dest> <src> <size>

write

Write a single byte value to a specific memory address.

> write <address> <value>

fill

Fill memory with multiple byte values starting at a specific address.

> fill <address> <count> <value1> [value2] ...

loadfile

Load data from a file into VM memory at a specific address.

> loadfile <address> <file> <size>

launch

Launch a kernel with parameters.

> launch <kernel_name> [param1] [param2] ...

profile

Start profiling session.

> profile <output_file.csv>

dump

Output execution statistics.

> dump

list

List loaded program disassembly.

> list

visualize

Display visualization of execution state.

> visualize <type>

Where <type> can be:

  • warp - Warp execution visualization
  • memory - Memory access visualization
  • performance - Performance counter display

quit

Exit the virtual machine.

> quit

Documentation

Comprehensive documentation is available via DOCS_INDEX.md and organized into three directories:

  • docs_user/ - End-user and API usage documentation
  • docs_dev/ - Contributor/developer documentation and technical reports
  • docs_spec/ - PTX/SIMT fundamentals and specification notes

Recommended starting points:

The documentation covers architecture, code structure, contribution guidelines, and technical details of implementation.

Release Information

Release Notes

See RELEASE_NOTES.md for information about this release, including:

  • Key features
  • Installation instructions
  • Usage examples
  • Known issues
  • Future enhancements

Contributors

See CONTRIBUTORS.md for a list of contributors and information about how to contribute to the project.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

  • Zhenzhong Han - Lead Developer and Architect

About

leetgpu remake

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 90.0%
  • Python 3.2%
  • CMake 1.9%
  • CSS 1.3%
  • JavaScript 1.2%
  • C 0.7%
  • Other 1.7%