A high-performance grep replacement that uses GPU acceleration via Metal (macOS) and Vulkan for blazing-fast text searches.
- GPU-Accelerated Search: Parallel pattern matching on Metal and Vulkan compute shaders
- SIMD-Optimized CPU: Vectorized Boyer-Moore-Horspool with 16/32-byte SIMD operations
- Auto-Selection: Intelligent backend selection based on file size, pattern complexity, and hardware tier
- GNU Compatible: Full support for common grep flags including context lines, recursive search, and color output
Homebrew formulas are now maintained in e-jerk/utils (pure GPU versions) and e-jerk/utils-gnu (GNU-fallback versions).
# Pure GPU version (no dependencies)
brew tap e-jerk/utils
brew install e-jerk/utils/grep
# GNU-fallback version (with GNU fallback support)
brew tap e-jerk/utils-gnu
brew install e-jerk/utils-gnu/grep# Basic usage (auto-selects best backend)
grep "pattern" file.txt
# Case-insensitive search
grep -i "error" log.txt
# Word boundary matching
grep -w "test" source.c
# Invert match (show non-matching lines)
grep -v "debug" output.log
# Context lines (before, after, both)
grep -B 3 "error" log.txt # 3 lines before
grep -A 3 "error" log.txt # 3 lines after
grep -C 2 "error" log.txt # 2 lines before and after
# Recursive search
grep -r "TODO" src/
grep -rn "FIXME" . # With line numbers
# Color output
grep --color=always "pattern" file.txt
# Force specific backend
grep --gpu "TODO" *.py
grep --metal "FIXME" src/
grep --vulkan "BUG" lib/
grep --cpu "pattern" file.txt
# Verbose output showing timing and backend info
grep -V "pattern" largefile.txt| Feature | CPU | Metal | Vulkan | GPU Speedup | Notes |
|---|---|---|---|---|---|
| Basic pattern matching | ✓ | ✓ | ✓ | 17x | Full GPU search |
-i case insensitive |
✓ | ✓ | ✓ | 11x | Full GPU search |
-w word boundary |
✓ | ✓ | ✓ | 8x | Full GPU search |
-v invert match |
✓ | ✓ | ✓ | 8x | Full GPU search |
-F fixed strings |
✓ | ✓ | ✓ | 7x | Full GPU search |
-E extended regex |
✓ | ✓ | ✓ | 5-10x | GPU regex engine |
-G basic regex |
✓ | ✓ | ✓ | 5-10x | GPU regex engine |
-P Perl regex |
✓ | ✓ | ✓ | 5-10x | GPU lookahead/lookbehind |
-e multiple patterns |
✓ | ✓ | ✓ | 5-10x | GPU per-pattern |
-n line numbers |
✓ | ✓ | ✓ | 10x+ | GPU-computed line nums |
-c count only |
✓ | ✓ | ✓ | 10x+ | GPU search + CPU count |
-l files with matches |
✓ | ✓ | ✓ | 10x+ | GPU search + CPU filter |
-L files without match |
✓ | ✓ | ✓ | 10x+ | GPU search + CPU filter |
-q quiet mode |
✓ | ✓ | ✓ | 10x+ | GPU search + early exit |
-o only matching |
✓ | ✓ | ✓ | 10x+ | GPU-computed positions |
-A/-B/-C context lines |
✓ | ✓ | ✓ | 10x+ | GPU search + CPU format |
-r recursive search |
✓ | ✓ | ✓ | 10x+ | GPU search per file |
-R recursive (follow symlinks) |
✓ | ✓ | ✓ | 10x+ | -R follows symlinks; -r does not |
--color output |
✓ | ✓ | ✓ | 10x+ | GPU search + ANSI format |
-s suppress errors |
✓ | ✓ | ✓ | 10x+ | Silences read/permission errors |
-m N max matches |
✓ | ✓ | ✓ | 10x+ | Stop after N matches (early exit) |
-b byte offset |
✓ | ✓ | ✓ | 10x+ | Print byte offset of each match |
-H show filename |
✓ | ✓ | ✓ | 10x+ | Always print filename prefix |
-h hide filename |
✓ | ✓ | ✓ | 10x+ | Never print filename prefix |
-d ACTION directories |
✓ | ✓ | ✓ | 10x+ | read/skip/recurse for directories |
-Z / --null |
✓ | ✓ | ✓ | 10x+ | NUL-separated filenames in output |
--line-buffered |
✓ | ✓ | ✓ | 10x+ | Line-buffered output |
--label=LABEL |
✓ | ✓ | ✓ | 10x+ | Label for stdin (replaces -) |
--group-separator / --no-group-separator |
✓ | ✓ | ✓ | 10x+ | Custom separator between context groups |
-T / --initial-tab |
✓ | ✓ | ✓ | 10x+ | Tab-align matching lines after context header |
-y alias for -i |
✓ | ✓ | ✓ | 10x+ | Obsolete case-insensitive alias |
-U / --binary |
✓ | ✓ | ✓ | 10x+ | Treat file as binary (no EOL handling) |
--devices=read/skip |
✓ | ✓ | ✓ | 10x+ | Handle devices, sockets, and FIFOs |
--include / --exclude |
✓ | ✓ | ✓ | 10x+ | Glob filtering with character classes (*.[ch]) |
--exclude-dir |
✓ | ✓ | ✓ | 10x+ | Exclude directories from recursion |
| Auto-decompression | ✓ | ✓ | ✓ | 10x+ | Transparent .gz, .bz2, .xz decompression |
-NUM shorthand |
✓ | ✓ | ✓ | 10x+ | -3 is shorthand for -C 3 |
GPU Architecture: The GPU performs all pattern matching and line number computation. CPU handles file I/O and output formatting only.
Test Coverage: 55+/55+ GNU compatibility tests passing
Full --help output
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE.
If no FILE is given, read standard input.
Example: grep -i 'hello world' menu.h main.c
Pattern selection and interpretation:
-e, --regexp=PATTERN use PATTERN for matching [GPU+SIMD]
-E, --extended-regexp PATTERN is extended regex (ERE) [GPU+SIMD]
-G, --basic-regexp PATTERN is basic regex (BRE) [GPU+SIMD]
-P, --perl-regexp PATTERN is Perl regex (PCRE) [GPU+SIMD]
supports lookahead (?=), lookbehind (?<=)
-F, --fixed-strings PATTERN is a literal string (default) [GPU+SIMD]
-i, --ignore-case case-insensitive matching [GPU+SIMD]
-w, --word-regexp match only whole words [GPU+SIMD]
-v, --invert-match select non-matching lines [GPU+SIMD]
Output control:
-A NUM, --after-context=NUM print NUM lines after match [GPU+SIMD]
-B NUM, --before-context=NUM print NUM lines before match [GPU+SIMD]
-C NUM, --context=NUM print NUM lines before and after [GPU+SIMD]
-NUM shorthand for -C NUM [GPU+SIMD]
-b, --byte-offset print byte offset with output [GPU+SIMD]
-c, --count count matching lines per FILE [GPU+SIMD]
--color[=WHEN] highlight matches (always/never/auto) [GPU+SIMD]
-H print filename with every match [GPU+SIMD]
-h, --no-filename suppress filename prefix [GPU+SIMD]
--group-separator=SEP print SEP between context groups [GPU+SIMD]
--no-group-separator disable context group separator [GPU+SIMD]
--initial-tab align matches with initial tab [GPU+SIMD]
--line-buffered flush output after every line [GPU+SIMD]
--label=LABEL use LABEL for stdin filename [GPU+SIMD]
-l, --files-with-matches print filenames with matches [GPU+SIMD]
-L, --files-without-match print filenames without matches [GPU+SIMD]
-m NUM, --max-count=NUM stop after NUM matches [GPU+SIMD]
-n, --line-number print line numbers (GPU-computed) [GPU+SIMD]
-o, --only-matching print only matched parts [GPU+SIMD]
-q, --quiet, --silent suppress output (exit status only) [GPU+SIMD]
-s, --no-messages suppress error messages [GPU+SIMD]
-T, --initial-tab tab-align matching lines [GPU+SIMD]
-Z, --null print NUL after filename [GPU+SIMD]
File and directory selection:
-r, --recursive search directories recursively [GPU+SIMD]
-R, --dereference-recursive recursive, follow symlinks [GPU+SIMD]
-d ACTION, --directories=ACTION how to handle directories (read/skip/recurse) [GPU+SIMD]
--devices=ACTION handle devices/sockets/fifos (read/skip) [GPU+SIMD]
--exclude=PATTERN skip files matching PATTERN [GPU+SIMD]
--exclude-from=FILE skip files matching patterns in FILE [GPU+SIMD]
--exclude-dir=PATTERN skip directories matching PATTERN [GPU+SIMD]
--include=PATTERN search only files matching PATTERN [GPU+SIMD]
-U, --binary treat files as binary (no EOL) [GPU+SIMD]
-y alias for -i (obsolete) [GPU+SIMD]
-V, --verbose print backend and timing info
Backend selection:
--auto auto-select optimal backend (default)
--cpu, --cpu-optimized force CPU backend (SIMD-optimized)
--gnu force GNU grep backend (GPL, full features)
--gpu force GPU (Metal on macOS, Vulkan on Linux)
--metal force Metal backend (macOS only)
--vulkan force Vulkan backend
GPU tuning (auto mode only):
--prefer-gpu bias auto-selection toward GPU
--prefer-cpu bias auto-selection toward CPU
--gpu-bias=NUM fine-tune GPU preference (-10 to +10)
--min-gpu-size=SIZE minimum input size for GPU (e.g., 128K)
--max-gpu-size=SIZE maximum input size for GPU (e.g., 16M)
Miscellaneous:
-h, --help display this help and exit
--version display version information and exit
Optimization legend:
[GPU+SIMD] GPU-accelerated (Metal/Vulkan) + SIMD-optimized CPU fallback
GPU uses parallel compute shaders for pattern matching
CPU uses Boyer-Moore-Horspool with 16/32-byte SIMD vectors
Exit status: 0 if match found, 1 if no match, 2 if error.
GPU Performance (typical speedups vs SIMD CPU):
Single char patterns: ~17x Case-insensitive (-i): ~11x
Word boundary (-w): ~8x Fixed strings (-F): ~7x
Extended regex (-E): ~5-10x Perl regex (-P): ~5-10x
Examples:
grep 'error' /var/log/syslog Search for 'error' in syslog
grep -i 'warning' *.log Case-insensitive search
grep -E 'error|warning' *.log Extended regex (ERE)
grep -G 'ab\+c' file.txt Basic regex (BRE)
grep -P '(?<=@)\w+' file.txt Perl regex lookbehind
grep -P 'foo(?=bar)' file.txt Perl regex lookahead
grep -rn 'TODO' src/ Recursive with line numbers
grep --gpu 'needle' haystack.txt Force GPU acceleration
This project uses zust — zero-cost ownership and memory safety for Zig. The entire source tree has been transpiled to zust-safe Zig with the following transformations applied by zust-transpile:
- Array initialization:
var buf: [N]u8 = undefined→ zeroed initialization to eliminate uninitialized-variable warnings - Pointer capture loops:
for (slice) |*item|rewritten to index-based iteration to avoid raw pointer dereferences - Resource cleanup:
defer allocator.free(x)statements safely stripped where memory is owned by safe types - Out-parameters:
*boolscalars rewritten to array indexing syntax for analyzer compatibility - Unsafe casts:
@ptrCast,@bitCast,@intCastannotated with// safe-transpilereview comments - Optional unwraps:
opt.?in expression contexts preserved inline with review comments
Run the memory-safety analyzer as part of the build:
zig build analyze # Runs zust-analyzer --strictness=high on src/The analyzer currently reports zero non-slice errors (all remaining diagnostics are raw []const u8 parameter/return type suggestions which require cross-function refactoring across the codebase).
| Variant | Description | Vulkan on macOS | --gnu flag |
|---|---|---|---|
| pure | Zig + SIMD + GPU only. No external dependencies. | No | Not available |
| gnu | Includes GNU grep + Vulkan via MoltenVK. | Yes | Falls back to GNU grep |
The gnu build enables Vulkan on macOS using MoltenVK, allowing both Metal and Vulkan backends on Mac.
| Flag | Description |
|---|---|
--auto |
Automatically select optimal backend (default) |
--gpu |
Use GPU (Metal on macOS, Vulkan elsewhere) |
--cpu |
Force CPU backend (SIMD-optimized) |
--gnu |
Force GNU grep backend (gnu build only) |
--metal |
Force Metal backend (macOS only) |
--vulkan |
Force Vulkan backend (macOS+gnu build, or Linux) |
The CPU backend uses a SIMD-optimized Boyer-Moore-Horspool algorithm:
SIMD Vector Operations:
Vec16andVec32types for 16/32-byte parallel processing@Vector(16, u8)and@Vector(32, u8)Zig vector types- Processes pattern matching in 16-byte chunks with
matchAtPositionSIMD()
Boyer-Moore-Horspool Skip Table:
- Pre-computed 256-entry skip table for O(n/m) average case
- Case-insensitive variant populates both upper/lower entries
- Skip calculation:
pattern_len - 1 - last_occurrence_index
Vectorized Operations:
toLowerVec16(): SIMD lowercase conversion using@selectfindLineStartSIMD(): Backwards 16-byte newline searchfindNextNewlineSIMD(): Forward 32-byte newline searchsearchAllLines(): 32-byte chunked newline counting for empty patterns
Context Lines Implementation:
outputWithContext(): Builds line index, computes context ranges, merges overlapping groups- Outputs
--separator between non-adjacent context groups - Supports combined
-nwith context for numbered output
Recursive Search:
processDirectory(): Recursive directory walker with file type filtering- Processes files in parallel where beneficial
- Supports combined flags (
-rn,-ri,-rc,-rl)
Color Output:
- ANSI escape codes:
\033[01;31mfor match highlighting --color=always|never|automodes- Works with
-o(only matching) mode
Metal Shader (src/shaders/search.metal):
- Chunked Processing: Each thread handles
chunk_size = text_len / num_threadsbytes - Boyer-Moore-Horspool: GPU-side skip table with
build_skip_tablekernel - uchar4 SIMD: 4-byte vectorized pattern matching via
match_at_position() - Atomic Counters:
atomic_uintfor thread-safe match counting - Line Start Tracking:
find_line_start()for result metadata
Vulkan Shader (src/shaders/search.comp):
- uvec4 SIMD: 16-byte vectorized comparison via
match_uvec4() - Packed Word Access:
get_text_word_at()handles unaligned 4-byte reads - Workgroup Size: 256 threads per workgroup (
local_size_x = 256) - Chunked Dispatch:
(text_len / 64) / 256workgroups for efficient parallelism
The e_jerk_gpu library scores workloads based on:
- Data Size: Minimum 32-256KB depending on GPU tier
- Compute Intensity: Pattern length and case-sensitivity increase GPU advantage
- Hardware Tier: Ultra/High/Mid/Entry classification affects thresholds
- GPU Bias: +4 for ultra-tier, -2 for entry-tier hardware
| Workload | CPU | GPU | Speedup |
|---|---|---|---|
| Single character patterns | 128 MB/s | 2.2 GB/s | 17.4x |
Case-insensitive (-i) |
223 MB/s | 2.5 GB/s | 11.0x |
Word boundary (-w) |
482 MB/s | 3.8 GB/s | 8.0x |
Common words (the) |
335 MB/s | 2.4 GB/s | 7.2x |
| Long patterns (8+ chars) | 1.3 GB/s | 3.7 GB/s | 2.7x |
| Sparse matches | 3.9 GB/s | 6.3 GB/s | 1.6x |
Results measured on Apple M1 Max with 50MB test files.
- macOS: Metal support (built-in), optional MoltenVK for Vulkan
- Linux: Vulkan runtime (
libvulkan1) - Build: Zig 0.15.2+, glslc (Vulkan shader compiler)
zig build -Doptimize=ReleaseFast
# Run tests
zig build test # Unit tests
zig build smoke # Integration tests (GPU verification)
zig build bench # Benchmarks
zig build analyze # Memory-safety analyzer (zust)
bash gnu-tests.sh # GNU compatibility tests (55+ tests)- Error Suppression & Limits:
-ssuppresses read errors,-m Nstops after N matches with GPU early-exit - Filename & Offset Control:
-bbyte offsets,-H/-hshow/hide filename,-Z/--nullNUL-separated output,--label=LABELfor stdin - Directory & Device Handling:
-d ACTION/--directories=ACTION(read/skip/recurse),--devices=read/skipfor sockets/FIFOs - Context & Formatting:
--group-separator=SEP/--no-group-separator,-T/--initial-tab,--line-buffered,-NUMshorthand for-C NUM - File Filtering:
--include/--excludewith character-class glob support (*.[ch]),--exclude-dir - Archive Support: Transparent auto-decompression for
.gz,.bz2,.xzfiles - Binary & Symlinks:
-U/--binaryfor binary-file handling,-rvs-Rsymlink-follow distinction - Legacy Aliases:
-yalias for-i(obsolete case-insensitive flag) - GPU PCRE Lookaround: Full GPU support for Perl regex lookahead
(?=),(?!)and lookbehind(?<=),(?<!)assertions - GPU Regex Support: Native Thompson NFA regex execution on Metal and Vulkan GPUs for
-Eextended regex patterns - Context Lines: Native
-A,-B,-Csupport with proper group separators - Recursive Search: Native
-rflag with combined options (-rn,-ri,-rc,-rl) - Color Output: Native
--colorsupport with ANSI highlighting - Test Coverage: 55+ GNU compatibility tests passing
Source code: Unlicense (public domain) Binaries: GPL-3.0-or-later