logsaw is my practical CLI for fast, local log analytics.
I built it for real operational debugging: when you need answers from logs in minutes, without standing up heavy infrastructure like ELK, Loki, or a data warehouse.
Most incidents start the same way: you have a large log file, a hypothesis, and no time.
I do not want to:
- provision infrastructure for one investigation
- ship sensitive logs to external systems
- wait for indexing pipelines
I do want to:
- point to a file (or stdin)
- run one command
- get immediate output for both humans and scripts
That is exactly what logsaw is designed for.
logsaw is aimed at:
- backend engineers debugging production or staging incidents
- SRE / DevOps engineers doing fast drill-downs on access and app logs
- technical leads who prefer decisions based on measurable data
- Streaming-first: process line by line, never load full files into memory.
- Predictable output contract: table to
stderr, JSON tostdout. - Conservative time semantics: invalid timestamps do not leak into time-based results.
- Tool observability: JSON includes
statsso you can see what was skipped and why. - Fail fast config: unknown
--ts-formatis an explicit error, never a silent fallback.
For each line, the pipeline is:
- Resolve input format (
auto | jsonl | nginx | plain). - Extract:
- aggregation value (
--by) - timestamp (when needed)
- Apply command logic:
top: frequency aggregationhist: frequency by time bucketgrep: regex + context
For --last, the time window is robust even when logs are out of timestamp order.
git clone https://github.com/etherinus/logsaw
cd logsaw
cargo build --release
./target/release/logsaw --help# top nginx statuses
logsaw top --format nginx --by status -i access.log
# error search with context
logsaw grep --re "panic|timeout|error" --context 2 -i app.log
# status histogram for the last 30 minutes
logsaw hist --format nginx --by status --bucket 1m --last 30m -i access.logReturns top-N values for a field.
logsaw top --by <field> [--last <duration>] [--limit N] [--out table|json|both] [--format ...] [--fast-json] -i <file>Examples:
logsaw top --format nginx --by status -i access.log
logsaw top --format nginx --nginx-preset combined --by request.method -i access.log
logsaw top --format jsonl --by user.id -i app.jsonl
logsaw top --format jsonl --by user_id --fast-json -i app.jsonl
logsaw top --format plain --by a.b -i app.logReturns frequencies by time bucket and field value.
logsaw hist --by <field> --bucket <duration> [--last <duration>] [--top N] [--out table|json|both] [--format ...] [--fast-json] -i <file>Examples:
logsaw hist --format nginx --by status --bucket 1m -i access.log
logsaw hist --format nginx --by status --bucket 1m --last 30m -i access.log
logsaw hist --format jsonl --by status --bucket 1m --ts-field @timestamp --ts-format rfc3339 -i app.jsonlRegex search with before/after context.
logsaw grep --re <regex> [--context N | --before N --after N] [--json] -i <file>Examples:
logsaw grep --re "timeout|panic" --context 2 -i app.log
logsaw grep --re "error" --before 10 --after 5 -i app.log
logsaw grep --re "panic" --context 2 --json -i app.log--format auto|jsonl|nginx|plain
auto(default): triesjsonl -> nginx -> plainjsonl: JSON Linesnginx: built-in parser or your template/presetplain:key=valuestyle logs + timestamp heuristics
Multiple inputs are supported:
logsaw top --format nginx --by status -i a.log -i b.logStdin is supported via -:
tail -f access.log | logsaw top --format nginx --by status -i -Nested paths are supported:
logsaw top --format jsonl --by user.id -i app.jsonlSupported:
- dotted keys:
a.b=c - JSON expansion:
a={"b":1}with--by a.b
Critical for:
--last(sliding time window)hist(bucketization)
Explicit timestamp field.
Examples:
logsaw hist --format jsonl --by status --bucket 1m --ts-field @timestamp --ts-format rfc3339 -i app.jsonl
logsaw hist --format plain --by status --bucket 1m --ts-field ts --ts-format rfc3339 -i app.logSupported values:
autoepochepoch_msrfc3339ymd_hmsnginx_time_localchrono:<fmt>
Note: unknown format values fail explicitly.
logsaw top --format nginx --nginx-preset combined --by status -i access.log
logsaw top --format nginx --nginx-preset common --by remote_addr -i access.log
logsaw top --format nginx --nginx-preset json-ish --by http_user_agent -i access.loglogsaw top \
--format nginx \
--nginx-format '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"' \
--by status \
-i access.logDerived request fields:
request.methodrequest.pathrequest.protorequest.raw
- Table output always goes to
stderr. - JSON output always goes to
stdout.
This keeps terminal output readable for humans and stable for pipelines.
{
"command": "top",
"by": "status",
"format": "nginx",
"fast_json": false,
"ts_field": null,
"ts_format": "auto",
"nginx_preset": "combined",
"nginx_format_used": "...",
"last_ms": 900000,
"total": 12345,
"stats": {
"lines_read": 13000,
"parsed_lines": 12900,
"used_lines": 12345,
"skipped_unparsed": 100,
"skipped_no_value": 400,
"skipped_no_ts": 80,
"skipped_out_of_window": 75
},
"items": [
{"value": "200", "count": 10000},
{"value": "500", "count": 234}
]
}{
"command": "hist",
"by": "status",
"format": "nginx",
"fast_json": false,
"ts_field": null,
"ts_format": "auto",
"nginx_preset": "combined",
"nginx_format_used": "...",
"bucket_ms": 60000,
"last_ms": 1800000,
"stats": {
"lines_read": 13000,
"parsed_lines": 12900,
"used_lines": 7600,
"skipped_unparsed": 100,
"skipped_no_value": 3200,
"skipped_no_ts": 900,
"skipped_out_of_window": 1100
},
"items": [
{"bucket_ms": 1730000000000, "value": "200", "count": 120},
{"bucket_ms": 1730000000000, "value": "500", "count": 2}
]
}{"source":"app.log","line":42,"kind":"match","text":"panic: ..."}I added stats because in incident work the result is not enough; sample quality matters.
lines_read: total lines readparsed_lines: lines successfully parsed in selected formatused_lines: lines included in aggregationskipped_unparsed: parser could not parse lineskipped_no_value: missing--byfieldskipped_no_ts: missing/invalid timestamp for time-based logicskipped_out_of_window: line excluded by--last
- Streaming I/O, single pass over input.
- HashMap-based aggregation; memory depends on key cardinality.
--lastuses an ordered time window and remains correct for out-of-order logs.--fast-jsonaccelerates common JSONL top-level field cases.
--fast-jsonsupports top-level keys only.histand--lastrequire parseable timestamps.- nginx template parsing expects reasonably structured
log_formatlines.
- If you know the format, set
--formatexplicitly. - For nginx, start with
--nginx-preset combined, then move to custom--nginx-format. - If
histoutput looks off, checkstatsfirst, then validate--ts-fieldand--ts-format.