Skip to content

e-jerk/grep

Repository files navigation

GPU-Accelerated Grep

A high-performance grep replacement that uses GPU acceleration via Metal (macOS) and Vulkan for blazing-fast text searches.

Features

  • GPU-Accelerated Search: Parallel pattern matching on Metal and Vulkan compute shaders
  • SIMD-Optimized CPU: Vectorized Boyer-Moore-Horspool with 16/32-byte SIMD operations
  • Auto-Selection: Intelligent backend selection based on file size, pattern complexity, and hardware tier
  • GNU Compatible: Full support for common grep flags including context lines, recursive search, and color output

Installation

Homebrew formulas are now maintained in e-jerk/utils (pure GPU versions) and e-jerk/utils-gnu (GNU-fallback versions).

# Pure GPU version (no dependencies)
brew tap e-jerk/utils
brew install e-jerk/utils/grep

# GNU-fallback version (with GNU fallback support)
brew tap e-jerk/utils-gnu
brew install e-jerk/utils-gnu/grep

Usage

# Basic usage (auto-selects best backend)
grep "pattern" file.txt

# Case-insensitive search
grep -i "error" log.txt

# Word boundary matching
grep -w "test" source.c

# Invert match (show non-matching lines)
grep -v "debug" output.log

# Context lines (before, after, both)
grep -B 3 "error" log.txt      # 3 lines before
grep -A 3 "error" log.txt      # 3 lines after
grep -C 2 "error" log.txt      # 2 lines before and after

# Recursive search
grep -r "TODO" src/
grep -rn "FIXME" .              # With line numbers

# Color output
grep --color=always "pattern" file.txt

# Force specific backend
grep --gpu "TODO" *.py
grep --metal "FIXME" src/
grep --vulkan "BUG" lib/
grep --cpu "pattern" file.txt

# Verbose output showing timing and backend info
grep -V "pattern" largefile.txt

GNU Feature Compatibility

Feature CPU Metal Vulkan GPU Speedup Notes
Basic pattern matching 17x Full GPU search
-i case insensitive 11x Full GPU search
-w word boundary 8x Full GPU search
-v invert match 8x Full GPU search
-F fixed strings 7x Full GPU search
-E extended regex 5-10x GPU regex engine
-G basic regex 5-10x GPU regex engine
-P Perl regex 5-10x GPU lookahead/lookbehind
-e multiple patterns 5-10x GPU per-pattern
-n line numbers 10x+ GPU-computed line nums
-c count only 10x+ GPU search + CPU count
-l files with matches 10x+ GPU search + CPU filter
-L files without match 10x+ GPU search + CPU filter
-q quiet mode 10x+ GPU search + early exit
-o only matching 10x+ GPU-computed positions
-A/-B/-C context lines 10x+ GPU search + CPU format
-r recursive search 10x+ GPU search per file
-R recursive (follow symlinks) 10x+ -R follows symlinks; -r does not
--color output 10x+ GPU search + ANSI format
-s suppress errors 10x+ Silences read/permission errors
-m N max matches 10x+ Stop after N matches (early exit)
-b byte offset 10x+ Print byte offset of each match
-H show filename 10x+ Always print filename prefix
-h hide filename 10x+ Never print filename prefix
-d ACTION directories 10x+ read/skip/recurse for directories
-Z / --null 10x+ NUL-separated filenames in output
--line-buffered 10x+ Line-buffered output
--label=LABEL 10x+ Label for stdin (replaces -)
--group-separator / --no-group-separator 10x+ Custom separator between context groups
-T / --initial-tab 10x+ Tab-align matching lines after context header
-y alias for -i 10x+ Obsolete case-insensitive alias
-U / --binary 10x+ Treat file as binary (no EOL handling)
--devices=read/skip 10x+ Handle devices, sockets, and FIFOs
--include / --exclude 10x+ Glob filtering with character classes (*.[ch])
--exclude-dir 10x+ Exclude directories from recursion
Auto-decompression 10x+ Transparent .gz, .bz2, .xz decompression
-NUM shorthand 10x+ -3 is shorthand for -C 3

GPU Architecture: The GPU performs all pattern matching and line number computation. CPU handles file I/O and output formatting only.

Test Coverage: 55+/55+ GNU compatibility tests passing

Command Line Reference

Full --help output
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE.
If no FILE is given, read standard input.
Example: grep -i 'hello world' menu.h main.c

Pattern selection and interpretation:
  -e, --regexp=PATTERN      use PATTERN for matching              [GPU+SIMD]
  -E, --extended-regexp     PATTERN is extended regex (ERE)       [GPU+SIMD]
  -G, --basic-regexp        PATTERN is basic regex (BRE)          [GPU+SIMD]
  -P, --perl-regexp         PATTERN is Perl regex (PCRE)          [GPU+SIMD]
                            supports lookahead (?=), lookbehind (?<=)
  -F, --fixed-strings       PATTERN is a literal string (default) [GPU+SIMD]
  -i, --ignore-case         case-insensitive matching             [GPU+SIMD]
  -w, --word-regexp         match only whole words                [GPU+SIMD]
  -v, --invert-match        select non-matching lines             [GPU+SIMD]

Output control:
  -A NUM, --after-context=NUM   print NUM lines after match       [GPU+SIMD]
  -B NUM, --before-context=NUM  print NUM lines before match      [GPU+SIMD]
  -C NUM, --context=NUM         print NUM lines before and after  [GPU+SIMD]
  -NUM                        shorthand for -C NUM                [GPU+SIMD]
  -b, --byte-offset         print byte offset with output         [GPU+SIMD]
  -c, --count               count matching lines per FILE         [GPU+SIMD]
      --color[=WHEN]        highlight matches (always/never/auto) [GPU+SIMD]
  -H                        print filename with every match       [GPU+SIMD]
  -h, --no-filename         suppress filename prefix              [GPU+SIMD]
      --group-separator=SEP print SEP between context groups      [GPU+SIMD]
      --no-group-separator  disable context group separator       [GPU+SIMD]
      --initial-tab         align matches with initial tab        [GPU+SIMD]
      --line-buffered       flush output after every line         [GPU+SIMD]
      --label=LABEL         use LABEL for stdin filename          [GPU+SIMD]
  -l, --files-with-matches  print filenames with matches          [GPU+SIMD]
  -L, --files-without-match print filenames without matches       [GPU+SIMD]
  -m NUM, --max-count=NUM   stop after NUM matches                [GPU+SIMD]
  -n, --line-number         print line numbers (GPU-computed)     [GPU+SIMD]
  -o, --only-matching       print only matched parts              [GPU+SIMD]
  -q, --quiet, --silent     suppress output (exit status only)    [GPU+SIMD]
  -s, --no-messages         suppress error messages               [GPU+SIMD]
  -T, --initial-tab         tab-align matching lines            [GPU+SIMD]
  -Z, --null                print NUL after filename              [GPU+SIMD]

File and directory selection:
  -r, --recursive           search directories recursively        [GPU+SIMD]
  -R, --dereference-recursive  recursive, follow symlinks           [GPU+SIMD]
  -d ACTION, --directories=ACTION  how to handle directories (read/skip/recurse) [GPU+SIMD]
      --devices=ACTION    handle devices/sockets/fifos (read/skip) [GPU+SIMD]
      --exclude=PATTERN   skip files matching PATTERN           [GPU+SIMD]
      --exclude-from=FILE skip files matching patterns in FILE  [GPU+SIMD]
      --exclude-dir=PATTERN skip directories matching PATTERN   [GPU+SIMD]
      --include=PATTERN   search only files matching PATTERN      [GPU+SIMD]
  -U, --binary              treat files as binary (no EOL)        [GPU+SIMD]
  -y                        alias for -i (obsolete)               [GPU+SIMD]
  -V, --verbose             print backend and timing info

Backend selection:
  --auto                    auto-select optimal backend (default)
  --cpu, --cpu-optimized    force CPU backend (SIMD-optimized)
  --gnu                     force GNU grep backend (GPL, full features)
  --gpu                     force GPU (Metal on macOS, Vulkan on Linux)
  --metal                   force Metal backend (macOS only)
  --vulkan                  force Vulkan backend

GPU tuning (auto mode only):
  --prefer-gpu              bias auto-selection toward GPU
  --prefer-cpu              bias auto-selection toward CPU
  --gpu-bias=NUM            fine-tune GPU preference (-10 to +10)
  --min-gpu-size=SIZE       minimum input size for GPU (e.g., 128K)
  --max-gpu-size=SIZE       maximum input size for GPU (e.g., 16M)

Miscellaneous:
  -h, --help                display this help and exit
      --version             display version information and exit

Optimization legend:
  [GPU+SIMD]  GPU-accelerated (Metal/Vulkan) + SIMD-optimized CPU fallback
  GPU uses parallel compute shaders for pattern matching
  CPU uses Boyer-Moore-Horspool with 16/32-byte SIMD vectors

Exit status: 0 if match found, 1 if no match, 2 if error.

GPU Performance (typical speedups vs SIMD CPU):
  Single char patterns:     ~17x    Case-insensitive (-i):  ~11x
  Word boundary (-w):       ~8x     Fixed strings (-F):     ~7x
  Extended regex (-E):      ~5-10x  Perl regex (-P):        ~5-10x

Examples:
  grep 'error' /var/log/syslog        Search for 'error' in syslog
  grep -i 'warning' *.log             Case-insensitive search
  grep -E 'error|warning' *.log       Extended regex (ERE)
  grep -G 'ab\+c' file.txt            Basic regex (BRE)
  grep -P '(?<=@)\w+' file.txt        Perl regex lookbehind
  grep -P 'foo(?=bar)' file.txt       Perl regex lookahead
  grep -rn 'TODO' src/                Recursive with line numbers
  grep --gpu 'needle' haystack.txt    Force GPU acceleration

Memory Safety (zust)

This project uses zust — zero-cost ownership and memory safety for Zig. The entire source tree has been transpiled to zust-safe Zig with the following transformations applied by zust-transpile:

  • Array initialization: var buf: [N]u8 = undefined → zeroed initialization to eliminate uninitialized-variable warnings
  • Pointer capture loops: for (slice) |*item| rewritten to index-based iteration to avoid raw pointer dereferences
  • Resource cleanup: defer allocator.free(x) statements safely stripped where memory is owned by safe types
  • Out-parameters: *bool scalars rewritten to array indexing syntax for analyzer compatibility
  • Unsafe casts: @ptrCast, @bitCast, @intCast annotated with // safe-transpile review comments
  • Optional unwraps: opt.? in expression contexts preserved inline with review comments

Run the memory-safety analyzer as part of the build:

zig build analyze    # Runs zust-analyzer --strictness=high on src/

The analyzer currently reports zero non-slice errors (all remaining diagnostics are raw []const u8 parameter/return type suggestions which require cross-function refactoring across the codebase).

Build Variants

Variant Description Vulkan on macOS --gnu flag
pure Zig + SIMD + GPU only. No external dependencies. No Not available
gnu Includes GNU grep + Vulkan via MoltenVK. Yes Falls back to GNU grep

The gnu build enables Vulkan on macOS using MoltenVK, allowing both Metal and Vulkan backends on Mac.

Backend Selection

Flag Description
--auto Automatically select optimal backend (default)
--gpu Use GPU (Metal on macOS, Vulkan elsewhere)
--cpu Force CPU backend (SIMD-optimized)
--gnu Force GNU grep backend (gnu build only)
--metal Force Metal backend (macOS only)
--vulkan Force Vulkan backend (macOS+gnu build, or Linux)

Architecture & Optimizations

CPU Implementation (src/cpu_optimized.zig)

The CPU backend uses a SIMD-optimized Boyer-Moore-Horspool algorithm:

SIMD Vector Operations:

  • Vec16 and Vec32 types for 16/32-byte parallel processing
  • @Vector(16, u8) and @Vector(32, u8) Zig vector types
  • Processes pattern matching in 16-byte chunks with matchAtPositionSIMD()

Boyer-Moore-Horspool Skip Table:

  • Pre-computed 256-entry skip table for O(n/m) average case
  • Case-insensitive variant populates both upper/lower entries
  • Skip calculation: pattern_len - 1 - last_occurrence_index

Vectorized Operations:

  • toLowerVec16(): SIMD lowercase conversion using @select
  • findLineStartSIMD(): Backwards 16-byte newline search
  • findNextNewlineSIMD(): Forward 32-byte newline search
  • searchAllLines(): 32-byte chunked newline counting for empty patterns

Context Lines Implementation:

  • outputWithContext(): Builds line index, computes context ranges, merges overlapping groups
  • Outputs -- separator between non-adjacent context groups
  • Supports combined -n with context for numbered output

Recursive Search:

  • processDirectory(): Recursive directory walker with file type filtering
  • Processes files in parallel where beneficial
  • Supports combined flags (-rn, -ri, -rc, -rl)

Color Output:

  • ANSI escape codes: \033[01;31m for match highlighting
  • --color=always|never|auto modes
  • Works with -o (only matching) mode

GPU Implementation

Metal Shader (src/shaders/search.metal):

  • Chunked Processing: Each thread handles chunk_size = text_len / num_threads bytes
  • Boyer-Moore-Horspool: GPU-side skip table with build_skip_table kernel
  • uchar4 SIMD: 4-byte vectorized pattern matching via match_at_position()
  • Atomic Counters: atomic_uint for thread-safe match counting
  • Line Start Tracking: find_line_start() for result metadata

Vulkan Shader (src/shaders/search.comp):

  • uvec4 SIMD: 16-byte vectorized comparison via match_uvec4()
  • Packed Word Access: get_text_word_at() handles unaligned 4-byte reads
  • Workgroup Size: 256 threads per workgroup (local_size_x = 256)
  • Chunked Dispatch: (text_len / 64) / 256 workgroups for efficient parallelism

Auto-Selection Algorithm

The e_jerk_gpu library scores workloads based on:

  • Data Size: Minimum 32-256KB depending on GPU tier
  • Compute Intensity: Pattern length and case-sensitivity increase GPU advantage
  • Hardware Tier: Ultra/High/Mid/Entry classification affects thresholds
  • GPU Bias: +4 for ultra-tier, -2 for entry-tier hardware

Performance

Workload CPU GPU Speedup
Single character patterns 128 MB/s 2.2 GB/s 17.4x
Case-insensitive (-i) 223 MB/s 2.5 GB/s 11.0x
Word boundary (-w) 482 MB/s 3.8 GB/s 8.0x
Common words (the) 335 MB/s 2.4 GB/s 7.2x
Long patterns (8+ chars) 1.3 GB/s 3.7 GB/s 2.7x
Sparse matches 3.9 GB/s 6.3 GB/s 1.6x

Results measured on Apple M1 Max with 50MB test files.

Requirements

  • macOS: Metal support (built-in), optional MoltenVK for Vulkan
  • Linux: Vulkan runtime (libvulkan1)
  • Build: Zig 0.15.2+, glslc (Vulkan shader compiler)

Building from Source

zig build -Doptimize=ReleaseFast

# Run tests
zig build test      # Unit tests
zig build smoke     # Integration tests (GPU verification)
zig build bench     # Benchmarks
zig build analyze   # Memory-safety analyzer (zust)
bash gnu-tests.sh   # GNU compatibility tests (55+ tests)

Recent Changes

  • Error Suppression & Limits: -s suppresses read errors, -m N stops after N matches with GPU early-exit
  • Filename & Offset Control: -b byte offsets, -H / -h show/hide filename, -Z / --null NUL-separated output, --label=LABEL for stdin
  • Directory & Device Handling: -d ACTION / --directories=ACTION (read/skip/recurse), --devices=read/skip for sockets/FIFOs
  • Context & Formatting: --group-separator=SEP / --no-group-separator, -T / --initial-tab, --line-buffered, -NUM shorthand for -C NUM
  • File Filtering: --include / --exclude with character-class glob support (*.[ch]), --exclude-dir
  • Archive Support: Transparent auto-decompression for .gz, .bz2, .xz files
  • Binary & Symlinks: -U / --binary for binary-file handling, -r vs -R symlink-follow distinction
  • Legacy Aliases: -y alias for -i (obsolete case-insensitive flag)
  • GPU PCRE Lookaround: Full GPU support for Perl regex lookahead (?=), (?!) and lookbehind (?<=), (?<!) assertions
  • GPU Regex Support: Native Thompson NFA regex execution on Metal and Vulkan GPUs for -E extended regex patterns
  • Context Lines: Native -A, -B, -C support with proper group separators
  • Recursive Search: Native -r flag with combined options (-rn, -ri, -rc, -rl)
  • Color Output: Native --color support with ANSI highlighting
  • Test Coverage: 55+ GNU compatibility tests passing

License

Source code: Unlicense (public domain) Binaries: GPL-3.0-or-later

About

GPU-accelerated grep with Metal and Vulkan backends

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors