Drop-in replacement for jq with automatic parallel processing for faster execution on large JSON files.
- 6x faster on large datasets (automatic parallel processing)
- Drop-in replacement - identical results to regular jq
- Smart detection - automatically chooses parallel vs sequential based on query complexity
- Handles aggregations - count, sum, average with parallel processing
- Control options - force parallel/sequential modes when needed
I built pjq to speed up my own data processing workflows with large JSON files. It gives me consistent 6x performance improvements on my datasets. Your mileage may vary - please test with your data first.
Not extensively tested on exotic jq features. Contributions welcome!
# Install dependencies
sudo apt install jq parallel # Ubuntu/Debian
brew install jq parallel # macOS# Download and make executable
curl -O https://raw.githubusercontent.com/jsp123/pjq/main/pjq
chmod +x pjq
# Use with ./pjq
./pjq 'select(.price > 50)' data.jsonl# Download pjq
curl -O https://raw.githubusercontent.com/jsp123/pjq/main/pjq
chmod +x pjq
# Install to system PATH
sudo cp pjq /usr/local/bin/pjq
# Now available globally
pjq 'select(.price > 50)' data.jsonl# Test basic functionality
pjq --version
pjq --help
# Test with sample data
echo '{"name":"test","price":100}' | pjq 'select(.price > 50)'# Basic usage - just replace 'jq' with 'pjq'
pjq 'select(.price > 50)' large_dataset.jsonl
# Automatically goes parallel on large files
pjq 'select(.status == "active")' huge_file.jsonl | wc -l
# Complex queries stay sequential (for correctness)
pjq 'group_by(.category) | map(length)' data.jsonlBased on real-world testing with datasets up to 1M+ records:
# Small files (< 1000 lines): Sequential processing
pjq 'select(.active == true)' small_data.jsonl
# ~0.05s (same as jq)
# Medium files (5k+ lines): Automatic parallel processing
pjq 'select(.price > 100)' medium_data.jsonl | wc -l
# ~0.12s vs jq's ~0.50s (4x faster)
# Large files (800k+ lines): Maximum parallel benefit
pjq 'select(.amount > 50)' large_data.jsonl | wc -l
# ~0.42s vs jq's ~2.55s (6x faster!)# Simple filters
pjq 'select(.price > 100)' data.jsonl
pjq 'select(.status != null and .active == true)' data.jsonl
# Basic aggregations
pjq '[select(.price > 50)] | length' data.jsonl # Count
pjq '[select(.price > 50) | .price] | add' data.jsonl # Sum
pjq '[select(.price > 50) | .price] | add/length' data.jsonl # Average# Complex operations that need global context
pjq 'group_by(.category)' data.jsonl
pjq 'sort_by(.timestamp)' data.jsonl
pjq 'unique_by(.id)' data.jsonl# Force parallel processing (advanced users)
pjq --parallel 'select(.price > 50)' data.jsonl
# Force sequential processing
pjq --sequential 'select(.price > 50)' data.jsonl
# Skip output ordering for maximum I/O performance
pjq --no-order 'select(.price > 50)' data.jsonl
# Debug mode to see processing decisions
pjq --debug 'select(.price > 50)' data.jsonl# ETL workflows - 6x faster filtering
pjq 'select(.date >= "2025-01-01")' events.jsonl | process_data.py
# Log analysis - handle large log files efficiently
pjq 'select(.level == "ERROR")' application.logs | alert_system.sh
# Data validation - quick filtering of large datasets
pjq 'select(.email != null and (.email | test("@")))' users.jsonl# Count matching records
pjq '[select(.active == true)] | length' users.jsonl
# Calculate totals
pjq '[select(.status == "paid") | .amount] | add' transactions.jsonl
# Compute averages
pjq '[select(.rating > 0) | .rating] | add/length' reviews.jsonlBuilt and tested on real datasets:
- Files up to 1M+ records
- Null value handling
- Numeric comparisons
- String operations
- Stdin/pipe processing
- Multiple file processing
- Threshold: Files > 1000 lines trigger parallel processing
- CPU Usage: Utilizes all available cores automatically
- Memory: Efficient - processes data in chunks, no full file loading
- Ordering: --keep-order adds ~7% processing time for deterministic output
- Scalability: Tested on 800k+ record datasets with consistent 6x speedup
- I/O Impact: --keep-order uses disk buffering (use --no-order on I/O-constrained systems)
Found a bug? Want to add test coverage? Pull requests welcome!
# Run the test suite
./test_pjq.sh
# Test specific scenarios
pjq --debug 'your_query_here' your_data.jsonlMIT License - see LICENSE file for details.
pjq - Because life's too short for slow JSON processing! ⚡