Skip to content

ferchd/tm2hsl

Repository files navigation

tm2hsl

CI Go Report Card GoDoc License

tm2hsl is a compiler that transforms TextMate syntax grammars into optimized HSL bytecode, revolutionizing language support in code editors.

Vision

Modern editors (VSCode, Sublime Text, etc.) interpret complex TextMate grammars at runtime, causing latency, high memory usage, and limited scalability. tm2hsl changes this paradigm by compiling grammars once into deterministic bytecode that editors execute directly.

Features

  • Ahead-of-time compilation: Transforms TextMate grammars into optimized HSL bytecode
  • Deterministic execution: Same input produces same bytecode
  • Massive scalability: Efficient support for hundreds of languages
  • Layer separation: Compiled languages vs. execution engine
  • Binary format: Memory-mappable, versioned, and compact
  • Complete CLI: Tools for development and testing

Installation

Pre-built Binaries

Download binaries from the releases page.

From Source

# Clone the repository
git clone https://github.com/ferchd/tm2hsl.git
cd tm2hsl

# Setup development environment
make dev-setup

# Build the project
make build

# Install globally
make install

Requirements

  • Go 1.21 or higher
  • Git

Usage

Basic Compilation

# Compile a TextMate grammar
tm2hsl compile --config language.toml --output output.hsl

# Validate without generating bytecode
tm2hsl compile --config language.toml --validate-only

Configuration File

Create a language.toml:

name = "MyLanguage"
scope = "source.mylanguage"
grammar = "grammars/mylanguage.json"

[metadata]
version = "1.0.0"
description = "Support for MyLanguage"

Architecture

tm2hsl
├── cmd/tm2hsl/          # Main CLI
├── internal/             # Private code
│   ├── cli/             # Command interface
│   ├── compiler/        # Compilation logic
│   ├── parser/          # TextMate parsing
│   ├── ir/              # Intermediate representation
│   ├── normalizer/      # Grammar normalization
│   ├── optimizer/       # Optimizations
│   ├── codegen/         # Bytecode generation
│   ├── serializer/      # HSL serialization
│   └── config/          # Configuration handling
├── pkg/                 # Public packages
│   ├── hsl/            # HSL bytecode format
│   └── textmate/       # TextMate types
└── docs/               # Documentation

Compilation Flow

  1. Parsing: Load and validate TextMate grammar (JSON/plist)
  2. Normalization: Convert to deterministic state machine
  3. IR: Generate optimized intermediate representation
  4. Optimization: Apply structural transformations
  5. Bytecode: Generate HSL binary bytecode
  6. Serialization: Write final .hsl file

HSL Format

HSL bytecode is a binary format designed for:

  • Sequential execution: Efficient disk reading
  • Memory-mapping: Zero-copy loading
  • Versioning: Forward compatibility
  • Compression: Optimized and deduplicated tables

Bytecode Structure

HSL Header (64 bytes)
├── Magic: "HSL1"
├── Version: uint16
├── Checksum: uint32
└── Offset table...

String Table
Regex Table
State Table
Rule Table
Scope Table

Contributing

Contributions are welcome! See CONTRIBUTING.md for detailed guides.

Quick Development

# Setup environment
./scripts/setup-dev.sh

# Iterative development
make          # build + test + lint
make build    # build only
make test     # tests only

Conventional Commits

We use Conventional Commits for messages:

feat: add support for recursive includes
fix: fix regex lookbehind parsing
docs: update HSL specification

Documentation

Project Status

Current version: 0.x (active development)

Supported (v0)

  • match rules with basic regex
  • begin/end rules with content
  • contentName for internal scopes
  • captures with simple names
  • Includes: $self, $base
  • Line and block comments

Not Supported (future)

  • Repository with #name references
  • Captures in begin/end
  • while rules
  • Complex back-references
  • Advanced lookahead/lookbehind

License

This project is licensed under Apache License 2.0 - see LICENSE for details.

Acknowledgments

  • TextMate for the grammar format
  • VSCode for popularizing TextMate
  • Open source community for inspiration and tools

Contact


tm2hsl: Compiling languages, accelerating editors.

About

A compiler that transforms TextMate syntax grammars into optimized HSL bytecode for efficient language support in code editors

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors