Skip to content

Etherinus/logsaw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logsaw

logsaw is my practical CLI for fast, local log analytics.

I built it for real operational debugging: when you need answers from logs in minutes, without standing up heavy infrastructure like ELK, Loki, or a data warehouse.

Why I built this

Most incidents start the same way: you have a large log file, a hypothesis, and no time.

I do not want to:

  • provision infrastructure for one investigation
  • ship sensitive logs to external systems
  • wait for indexing pipelines

I do want to:

  • point to a file (or stdin)
  • run one command
  • get immediate output for both humans and scripts

That is exactly what logsaw is designed for.

Who this is for

logsaw is aimed at:

  • backend engineers debugging production or staging incidents
  • SRE / DevOps engineers doing fast drill-downs on access and app logs
  • technical leads who prefer decisions based on measurable data

Core engineering principles

  • Streaming-first: process line by line, never load full files into memory.
  • Predictable output contract: table to stderr, JSON to stdout.
  • Conservative time semantics: invalid timestamps do not leak into time-based results.
  • Tool observability: JSON includes stats so you can see what was skipped and why.
  • Fail fast config: unknown --ts-format is an explicit error, never a silent fallback.

How it works internally

For each line, the pipeline is:

  1. Resolve input format (auto | jsonl | nginx | plain).
  2. Extract:
  • aggregation value (--by)
  • timestamp (when needed)
  1. Apply command logic:
  • top: frequency aggregation
  • hist: frequency by time bucket
  • grep: regex + context

For --last, the time window is robust even when logs are out of timestamp order.

Quick start

Build

git clone https://github.com/etherinus/logsaw
cd logsaw
cargo build --release
./target/release/logsaw --help

Typical flows

# top nginx statuses
logsaw top --format nginx --by status -i access.log

# error search with context
logsaw grep --re "panic|timeout|error" --context 2 -i app.log

# status histogram for the last 30 minutes
logsaw hist --format nginx --by status --bucket 1m --last 30m -i access.log

Commands

top

Returns top-N values for a field.

logsaw top --by <field> [--last <duration>] [--limit N] [--out table|json|both] [--format ...] [--fast-json] -i <file>

Examples:

logsaw top --format nginx --by status -i access.log
logsaw top --format nginx --nginx-preset combined --by request.method -i access.log
logsaw top --format jsonl --by user.id -i app.jsonl
logsaw top --format jsonl --by user_id --fast-json -i app.jsonl
logsaw top --format plain --by a.b -i app.log

hist

Returns frequencies by time bucket and field value.

logsaw hist --by <field> --bucket <duration> [--last <duration>] [--top N] [--out table|json|both] [--format ...] [--fast-json] -i <file>

Examples:

logsaw hist --format nginx --by status --bucket 1m -i access.log
logsaw hist --format nginx --by status --bucket 1m --last 30m -i access.log
logsaw hist --format jsonl --by status --bucket 1m --ts-field @timestamp --ts-format rfc3339 -i app.jsonl

grep

Regex search with before/after context.

logsaw grep --re <regex> [--context N | --before N --after N] [--json] -i <file>

Examples:

logsaw grep --re "timeout|panic" --context 2 -i app.log
logsaw grep --re "error" --before 10 --after 5 -i app.log
logsaw grep --re "panic" --context 2 --json -i app.log

Input formats

--format auto|jsonl|nginx|plain

  • auto (default): tries jsonl -> nginx -> plain
  • jsonl: JSON Lines
  • nginx: built-in parser or your template/preset
  • plain: key=value style logs + timestamp heuristics

Multiple inputs are supported:

logsaw top --format nginx --by status -i a.log -i b.log

Stdin is supported via -:

tail -f access.log | logsaw top --format nginx --by status -i -

Fields and dotted paths

jsonl

Nested paths are supported:

logsaw top --format jsonl --by user.id -i app.jsonl

plain

Supported:

  • dotted keys: a.b=c
  • JSON expansion: a={"b":1} with --by a.b

Working with timestamps

Critical for:

  • --last (sliding time window)
  • hist (bucketization)

--ts-field

Explicit timestamp field.

Examples:

logsaw hist --format jsonl --by status --bucket 1m --ts-field @timestamp --ts-format rfc3339 -i app.jsonl
logsaw hist --format plain --by status --bucket 1m --ts-field ts --ts-format rfc3339 -i app.log

--ts-format

Supported values:

  • auto
  • epoch
  • epoch_ms
  • rfc3339
  • ymd_hms
  • nginx_time_local
  • chrono:<fmt>

Note: unknown format values fail explicitly.

Nginx support

Presets

logsaw top --format nginx --nginx-preset combined --by status -i access.log
logsaw top --format nginx --nginx-preset common --by remote_addr -i access.log
logsaw top --format nginx --nginx-preset json-ish --by http_user_agent -i access.log

Custom template

logsaw top \
  --format nginx \
  --nginx-format '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"' \
  --by status \
  -i access.log

Derived request fields:

  • request.method
  • request.path
  • request.proto
  • request.raw

Output contract

  • Table output always goes to stderr.
  • JSON output always goes to stdout.

This keeps terminal output readable for humans and stable for pipelines.

JSON example: top

{
  "command": "top",
  "by": "status",
  "format": "nginx",
  "fast_json": false,
  "ts_field": null,
  "ts_format": "auto",
  "nginx_preset": "combined",
  "nginx_format_used": "...",
  "last_ms": 900000,
  "total": 12345,
  "stats": {
    "lines_read": 13000,
    "parsed_lines": 12900,
    "used_lines": 12345,
    "skipped_unparsed": 100,
    "skipped_no_value": 400,
    "skipped_no_ts": 80,
    "skipped_out_of_window": 75
  },
  "items": [
    {"value": "200", "count": 10000},
    {"value": "500", "count": 234}
  ]
}

JSON example: hist

{
  "command": "hist",
  "by": "status",
  "format": "nginx",
  "fast_json": false,
  "ts_field": null,
  "ts_format": "auto",
  "nginx_preset": "combined",
  "nginx_format_used": "...",
  "bucket_ms": 60000,
  "last_ms": 1800000,
  "stats": {
    "lines_read": 13000,
    "parsed_lines": 12900,
    "used_lines": 7600,
    "skipped_unparsed": 100,
    "skipped_no_value": 3200,
    "skipped_no_ts": 900,
    "skipped_out_of_window": 1100
  },
  "items": [
    {"bucket_ms": 1730000000000, "value": "200", "count": 120},
    {"bucket_ms": 1730000000000, "value": "500", "count": 2}
  ]
}

JSON example: grep --json

{"source":"app.log","line":42,"kind":"match","text":"panic: ..."}

What stats tells you

I added stats because in incident work the result is not enough; sample quality matters.

  • lines_read: total lines read
  • parsed_lines: lines successfully parsed in selected format
  • used_lines: lines included in aggregation
  • skipped_unparsed: parser could not parse line
  • skipped_no_value: missing --by field
  • skipped_no_ts: missing/invalid timestamp for time-based logic
  • skipped_out_of_window: line excluded by --last

Performance notes

  • Streaming I/O, single pass over input.
  • HashMap-based aggregation; memory depends on key cardinality.
  • --last uses an ordered time window and remains correct for out-of-order logs.
  • --fast-json accelerates common JSONL top-level field cases.

Limitations

  • --fast-json supports top-level keys only.
  • hist and --last require parseable timestamps.
  • nginx template parsing expects reasonably structured log_format lines.

Practical recommendations

  • If you know the format, set --format explicitly.
  • For nginx, start with --nginx-preset combined, then move to custom --nginx-format.
  • If hist output looks off, check stats first, then validate --ts-field and --ts-format.

About

Streaming CLI for fast local log slicing and aggregation (top, grep, hist) without ELK

Resources

Stars

Watchers

Forks

Contributors

Languages