tm2hsl is a compiler that transforms TextMate syntax grammars into optimized HSL bytecode, revolutionizing language support in code editors.
Modern editors (VSCode, Sublime Text, etc.) interpret complex TextMate grammars at runtime, causing latency, high memory usage, and limited scalability. tm2hsl changes this paradigm by compiling grammars once into deterministic bytecode that editors execute directly.
- Ahead-of-time compilation: Transforms TextMate grammars into optimized HSL bytecode
- Deterministic execution: Same input produces same bytecode
- Massive scalability: Efficient support for hundreds of languages
- Layer separation: Compiled languages vs. execution engine
- Binary format: Memory-mappable, versioned, and compact
- Complete CLI: Tools for development and testing
Download binaries from the releases page.
# Clone the repository
git clone https://github.com/ferchd/tm2hsl.git
cd tm2hsl
# Setup development environment
make dev-setup
# Build the project
make build
# Install globally
make install- Go 1.21 or higher
- Git
# Compile a TextMate grammar
tm2hsl compile --config language.toml --output output.hsl
# Validate without generating bytecode
tm2hsl compile --config language.toml --validate-onlyCreate a language.toml:
name = "MyLanguage"
scope = "source.mylanguage"
grammar = "grammars/mylanguage.json"
[metadata]
version = "1.0.0"
description = "Support for MyLanguage"tm2hsl
├── cmd/tm2hsl/ # Main CLI
├── internal/ # Private code
│ ├── cli/ # Command interface
│ ├── compiler/ # Compilation logic
│ ├── parser/ # TextMate parsing
│ ├── ir/ # Intermediate representation
│ ├── normalizer/ # Grammar normalization
│ ├── optimizer/ # Optimizations
│ ├── codegen/ # Bytecode generation
│ ├── serializer/ # HSL serialization
│ └── config/ # Configuration handling
├── pkg/ # Public packages
│ ├── hsl/ # HSL bytecode format
│ └── textmate/ # TextMate types
└── docs/ # Documentation
- Parsing: Load and validate TextMate grammar (JSON/plist)
- Normalization: Convert to deterministic state machine
- IR: Generate optimized intermediate representation
- Optimization: Apply structural transformations
- Bytecode: Generate HSL binary bytecode
- Serialization: Write final
.hslfile
HSL bytecode is a binary format designed for:
- Sequential execution: Efficient disk reading
- Memory-mapping: Zero-copy loading
- Versioning: Forward compatibility
- Compression: Optimized and deduplicated tables
HSL Header (64 bytes)
├── Magic: "HSL1"
├── Version: uint16
├── Checksum: uint32
└── Offset table...
String Table
Regex Table
State Table
Rule Table
Scope Table
Contributions are welcome! See CONTRIBUTING.md for detailed guides.
# Setup environment
./scripts/setup-dev.sh
# Iterative development
make # build + test + lint
make build # build only
make test # tests onlyWe use Conventional Commits for messages:
feat: add support for recursive includes
fix: fix regex lookbehind parsing
docs: update HSL specification- HSL Specification - Detailed bytecode format and execution model
- Migration Guide - From TextMate to HSL
- Internal API - Developer reference
- Examples - Sample language compilations
Current version: 0.x (active development)
matchrules with basic regexbegin/endrules with contentcontentNamefor internal scopescaptureswith simple names- Includes:
$self,$base - Line and block comments
- Repository with
#namereferences - Captures in
begin/end whilerules- Complex back-references
- Advanced lookahead/lookbehind
This project is licensed under Apache License 2.0 - see LICENSE for details.
- TextMate for the grammar format
- VSCode for popularizing TextMate
- Open source community for inspiration and tools
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: fernando@example.com
tm2hsl: Compiling languages, accelerating editors.