Dosatsu C++ is a tool that scans C++ projects using Clang and builds a graph database in Kuzu containing parts or all of the Abstract Syntax Tree (AST) of the project. The resulting database assists AI tools in navigating large C++ codebases by providing structured access to code analysis data.
Dosatsu scans C++ projects and builds comprehensive graph databases containing:
- AST Analysis: Complete Abstract Syntax Tree representation
- Code Structure: Classes, functions, templates, and their relationships
- Dependency Mapping: Include relationships and symbol dependencies
- Type Information: Detailed type analysis and template instantiations
- Python 3.8+ for build orchestration
- CMake 3.24+ for build configuration
- Ninja for fast parallel builds
- C++20 compatible compiler
- Git for dependency management
Dosatsu supports the following platform and toolchain combinations:
- Windows with MSVC (Microsoft Visual C++)
- Linux with GCC or Clang
- macOS with Clang (Apple Clang or LLVM Clang)
Note: Other platform/toolchain combinations are not currently supported and will result in build errors. Please ensure you are using one of the supported combinations above.
All dependencies are automatically managed through the build system:
- LLVM/Clang 19.1.7: Automatically downloaded and built via CMake FetchContent
- Kuzu: Graph database integration (future component)
- doctest: Unit testing framework (included)
# Clone the repository
git clone https://github.com/your-org/Dosatsu.git
cd Dosatsu
# Initial environment setup (creates artifact directories, validates tools)
please setup
# Display build environment information
please info# Configure and build (debug mode)
please configure --debug
please build --debug
# Or do everything in one step
please rebuild --debug# Run all tests
please test
# Run tests with detailed reporting
please test --verbose --report-format htmlDosatsu includes comprehensive C++ examples and analysis tools to demonstrate its capabilities and verify correct operation.
The Examples/cpp/ directory contains well-documented C++ code demonstrating various language features:
-
Basic Examples (
Examples/cpp/basic/): Focused examples of specific C++ featuresinheritance.cpp- Class inheritance and virtual functionstemplates.cpp- Template classes, functions, and specializationsnamespaces.cpp- Namespace usage and scope resolutioncontrol_flow.cpp- Control flow statements and exception handlingexpressions.cpp- Operators, literals, and type conversionspreprocessor.cpp- Preprocessor directives and macros
-
Comprehensive Examples (
Examples/cpp/comprehensive/): Complex multi-feature examplescomplete_example.cpp- Integration of all major C++ featuresno_std_example.cpp- Examples without standard library dependencies
The Examples/queries/ directory provides Python tools for verifying Dosatsu output using a functional approach:
# Easy way: Use the examples runner
python Examples/run_examples.py --all
# Or run verification queries directly
python Examples/queries/run_queries.pyThe verification suite:
- Builds a Kuzu database from the C++ examples
- Runs verification queries to ensure correct parsing
- Validates that all major C++ constructs are properly captured
- Reports detailed results and statistics
# Use the examples runner
python Examples/run_examples.py --list # List all examples
python Examples/run_examples.py --index comprehensive_no_std_compile_commands.json # Index examples
python Examples/run_examples.py --verify # Run verification queries
python Examples/run_examples.py --all # Run complete workflowSee Examples/README.md for detailed documentation.
Dosatsu generates a comprehensive graph database that captures the complete structure of C++ codebases. The database schema is designed to support advanced querying capabilities for AI tools and code analysis.
For detailed information about the database structure, node types, relationships, and query examples, see SCHEMA.md. This comprehensive document covers:
- Node Types: ASTNode, Declaration, Type, Statement, Expression, and specialized nodes
- Relationships: Inheritance, template specialization, control flow, and semantic connections
- Query Examples: Ready-to-use Cypher queries for common code analysis tasks
- C++ Mappings: How C++ language constructs map to database entities
The database schema supports querying for:
- Code Navigation: Find declarations, definitions, usages, and dependencies
- Architecture Query: Understand inheritance hierarchies and relationships
- Template Query: Track template instantiations and specializations
- Control Flow: Analyze function control flow graphs and execution paths
- Documentation: Access comments and documentation associated with code elements
The please script provides a unified interface for all development operations:
# Configuration
please configure [--debug|--release] [--clean]
please reconfigure # Clean configure from scratch
# Building
please build [--debug|--release] [--parallel N]
please rebuild # Clean + configure + build + test
please clean # Clean build artifacts# Testing
please test [--verbose] [--parallel N]
please test --target specific_test
please test --ci-mode --coverage # CI-friendly with coverage
# Code Quality
please format [--check-only] # Format code with clang-format
please lint [--summary-only] # Two-phase lint: auto-fix then report# Git Operations (with intelligent pre/post checks)
please git-status # Enhanced git status
please git-pull [--rebase] [--check-clean]
please git-push [--set-upstream]
please git-commit -m "message" # With pre-commit checks
please git-clean [--force] [--include-build-artifacts]# Performance Analysis
please build-stats # Build performance metrics
please cache-mgmt [--clean-deps] [--clean-cmake]
please info # Environment information# Development Tools
please install-git-hooks # Install formatting pre-commit hooks
please compile-db [--copy-to-root] # Manage compilation databaseDosatsu/
βββ please.py # π― Main build orchestrator
βββ please.bat # πͺ Windows wrapper script
βββ please # π§ Unix/Linux/macOS wrapper script
βββ CMakeLists.txt # Root CMake configuration
βββ .clang-format # Code formatting rules
βββ .clang-tidy # Static analysis configuration
βββ .gitignore # Updated for new artifact structure
β
βββ source/ # π Main source code
β βββ CMakeLists.txt # Target-specific CMake config
β βββ Dosatsu.cpp # Main application
β βββ KuzuDump.cpp # Database operations
β βββ KuzuDump.h # Database interface
β βββ NoWarningScope_*.h # Utility headers
β
βββ third_party/ # π¦ Dependency management
β βββ dependencies.cmake # LLVM FetchContent configuration
β
βββ scripts/ # π οΈ Build helper scripts
β βββ setup_deps.cmake # Dependency setup helpers
β βββ format_config.py # Formatting configuration
β βββ validate-ci-quick.py # Quick CI validation
β βββ validate-ci.py # Full CI validation
β
βββ artifacts/ # ποΈ ALL BUILD OUTPUTS (git-ignored)
β βββ debug/ # Debug build artifacts
β β βββ build/ # CMake/Ninja files
β β βββ bin/ # Debug executables
β β βββ lib/ # Debug libraries
β β βββ logs/ # Build logs
β βββ release/ # Release build artifacts
β βββ test/ # Test results & reports
β βββ lint/ # Linting results
β βββ format/ # Formatting logs
β
βββ third_party/ # π Included dependencies
β βββ include/doctest/ # Testing framework
β
βββ .github/workflows/ # π CI/CD pipeline
βββ ci.yml # Multi-platform build & test
The project uses doctest for unit testing with comprehensive reporting:
# Basic test execution
please test
# Advanced test options
please test --verbose --parallel auto
please test --target Dosatsu_SelfTest
please test --ci-mode --historical
please test --coverage --report-format htmlTests generate comprehensive reports in artifacts/test/:
results.xml- JUnit format for CI integrationtest-report.html- Rich HTML report with statisticstest-report.json- Machine-readable resultstest-history.json- Historical test trackingtest-trends.txt- Performance trend analysis
# 1. Start with clean environment
please git-status
# 2. Pull latest changes
please git-pull --rebase
# 3. Make your changes...
# 4. Pre-commit workflow
please format # Auto-format code
please lint # Two-phase: auto-fix then report remaining issues
please rebuild # Full rebuild + test
# 5. Commit with validation
please git-commit -m "Your changes"
# 6. Push changes
please git-pushThe build system enforces consistent code quality:
- Formatting: Automatic clang-format integration with project-specific style
- Linting: Two-phase clang-tidy analysis with automatic fixes followed by remaining issue reports
- Testing: Mandatory test execution before commits
- Pre-commit Hooks: Optional git hooks for automatic validation
The please lint command runs in two phases for optimal developer experience:
- Phase 1 (Auto-fix): Runs clang-tidy with
--fixto automatically correct common issues - Phase 2 (Report): Runs clang-tidy again to report issues requiring manual attention
This approach reduces developer friction by handling routine fixes automatically while clearly highlighting issues that need thoughtful resolution.
Monitor and optimize build performance:
# Analyze build performance
please build-stats
# Manage caches (LLVM dependencies can be ~36GB)
please cache-mgmt
# Clean specific caches
please cache-mgmt --clean-cmake --clean-depsThe project includes a comprehensive GitHub Actions workflow:
- Platforms: Windows, Linux, macOS
- Build Types: Debug and Release
- Parallel Jobs: Optimized for fast feedback
- Build Validation: All platforms and configurations
- Test Execution: Comprehensive test suite with reporting
- Code Quality: Formatting and linting checks
- Security Scanning: CodeQL static analysis
- Artifact Collection: Build outputs and test results
- Dependency Caching: LLVM dependencies cached for faster builds
# Simulate CI locally
please rebuild --debug --skip-tests # Quick build check
please test --ci-mode # CI-style testing
please format --check-only # Format validation
please lint --summary-only # Quick two-phase lint check# Debug (default) - fast builds, debug symbols
please configure --debug
# Release - optimized builds
please configure --release
# Custom parallel jobs
please build --parallel 8LLVM 19.1.7 is automatically managed via CMake FetchContent:
- First Build: Downloads and builds LLVM (~30+ minutes)
- Subsequent Builds: Uses cached LLVM build
- Cache Location:
artifacts/debug/build/_deps/llvm-*
# Generate compilation database for IDEs
please compile-db --copy-to-root
# Install git pre-commit hooks
please install-git-hooks
# Environment validation
please infoIssue: Build fails with "linker out of heap space"
# Solution: Ensure 64-bit compiler environment
# On Windows, use: vcvars64.bat or VS 2022 x64 Native Tools Command Prompt
please info # Check compiler detectionIssue: LLVM build takes too long
# Solution: Use cached builds and parallel jobs
please build --parallel 4 # Adjust for your CPU
please cache-mgmt # Check cache statusIssue: Git operations fail
# Solution: Check repository status
please git-status # Comprehensive status
please git-clean --force # Clean untracked filesIssue: Tests fail
# Solution: Check test output and rebuild
please test --verbose # Detailed test output
please rebuild # Clean rebuild# Comprehensive environment information
please info
# Command-specific help
please --help
please <command> --help
# Build performance analysis
please build-statsThis project is written in C++20 and built with modern development practices:
- Standards Compliance: Full C++20 support
- Cross-Platform: Windows, Linux, macOS
- Modern Dependencies: LLVM 19.1.7 with FetchContent
- Quality Assurance: Comprehensive linting and formatting
This project is licensed under the MIT License. See the LICENSE file for details.
Status: Active Development - Modern Build System Complete!
β Completed: Modern build system migration with Python + CMake + Ninja β Completed: Multi-platform CI/CD pipeline β Completed: Git integration and development workflows β Completed: Performance optimization and caching
π In Progress: Core Dosatsu functionality expansion
- Fork the repository
- Setup development environment:
please setup please install-git-hooks # Optional but recommended - Create a feature branch:
git checkout -b feature/amazing-feature
- Develop with quality checks:
# Make your changes... please format # Auto-format please lint # Two-phase: auto-fix then report please rebuild # Build + test
- Commit with validation:
please git-commit -m "Add amazing feature" - Push and create Pull Request:
please git-push --set-upstream
- Code Style: Enforced via clang-format (LLVM style with customizations)
- Quality: All code must pass clang-tidy analysis
- Testing: Maintain or improve test coverage
- Documentation: Update documentation for user-facing changes
- Issues: Report bugs and request features on GitHub Issues
- Discussions: General questions and discussions on GitHub Discussions
- CI Status: Check build status on GitHub Actions
π― Goal: This project aims to bridge the gap between large C++ codebases and AI tools by providing a graph-based representation of code structure that can be queried using natural language.
β‘ Performance: The build system provides fast builds, excellent dependency management, and comprehensive development workflow automation.