lookup is a command-line utility inspired by Splunk's powerful lookup command. It enriches JSON data streams by adding fields based on matching values in an external data source (like a CSV or JSON file). It's designed to be a flexible and high-performance tool for data enrichment pipelines.
The tool reads JSON objects (either as a JSON Array or as JSON Lines) from standard input, performs lookups based on sophisticated rules, and outputs the enriched JSON objects to standard output.
- Multiple Data Sources: Use either CSV or JSON files as your lookup table.
- Advanced Matching Methods:
exact: Case-sensitive or insensitive exact string matching.wildcard: Glob-style wildcard matching (e.g.,bot-*).regex: Powerful matching using regular expressions.cidr: Match IP addresses against CIDR blocks (e.g.,10.0.0.0/8).
- Flexible Configuration: A central JSON configuration file separates lookup logic from your data, allowing for complex matching rules.
- Built-in DNS Lookup: Perform forward (
Arecord) or reverse (PTRrecord) DNS lookups as a native feature.- Optionally specify a custom DNS server for queries.
- Flexible Field Mapping: Intuitive syntax (
<config_ref_field> as <input_field> OUTPUT out1 as new1, ...) to control which fields are matched and how new fields are named. - Handles Multiple Input Formats: Automatically detects and processes both JSON Array and JSON Lines (JSONL) from stdin.
- Cross-Platform: Written in Go, it compiles to a single binary with no external dependencies, running on Linux, macOS, and Windows.
Pre-compiled binaries for macOS, Windows, and Linux are available on the Releases page.
The configuration file defines where your data is and how to match against it.
{
"data_source": "./path/to/your/data.csv",
"matchers": [
{
"input_field": "field_from_stdin",
"lookup_field": "column_in_data_source",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "another_field_from_stdin",
"lookup_field": "another_column",
"method": "regex"
}
]
}data_source: (string) The relative or absolute path to your lookup data file (CSV or JSON).matchers: (array) A list of objects, where each object defines a specific matching rule.input_field: A name for this lookup rule. This is what you refer to in the-mflag.lookup_field: The column/key name in yourdata_sourcefile to match against.method: The matching algorithm to use. Supported values:"exact"(default)"wildcard""regex""cidr"
case_sensitive: (boolean, optional) Iftrue, the match will be case-sensitive. Defaults tofalse. This applies toexact,wildcard, andregexmethods.
To make setup easier, lookup provides a helper command to generate a configuration template from your data file. It scans your CSV, JSON, or JSONL file and creates a valid config.json structure based on the headers or keys it finds.
./lookup generate-config -file <path_to_your_data_file>-file <path>: The path to your data source file (e.g.,users.csvordata.jsonl).
Example: Given a users.csv file with columns username, department, and role:
./lookup generate-config -file users.csvOutput (save as config.json and edit as needed):
{
"data_source": "users.csv",
"matchers": [
{
"input_field": "username",
"lookup_field": "username",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "department",
"lookup_field": "department",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "role",
"lookup_field": "role",
"method": "exact",
"case_sensitive": false
}
]
}The basic command structure is:
cat input.json | ./lookup -c <config.json> -m "<mapping_rule>"| Flag | Description | Required |
|---|---|---|
-c <path> |
Path to the JSON configuration file that defines the data source and matching rules. | Yes |
-m <string> |
The mapping rule that specifies how to link input data to the lookup table. (See Mapping Syntax below). | Yes |
--dns |
Enables DNS lookup mode. When used, the -c flag is ignored. |
No |
--dns-server |
(Optional) Specifies a custom DNS server for DNS lookups (e.g., 8.8.8.8 or 1.1.1.1:53). If not set, the system's default resolver is used. |
No |
The -m flag defines the link between the input stream and the lookup table, and controls the output.
"CONFIG_REF_FIELD as INPUT_FIELD [OUTPUT original_name1 as new_name1, original_name2 as new_name2]"
CONFIG_REF_FIELD as INPUT_FIELD: (Required)CONFIG_REF_FIELDmust match aninput_fieldin one of your matchers inconfig.json.INPUT_FIELDis the name of the field in the incoming JSON object to use for the lookup.
OUTPUT ...: (Optional)- Controls which fields from the lookup file are added to the output and allows you to rename them.
- If omitted, all columns from the matched row are added with their original names.
Let's use the following files for our examples.
users.csv (Data Source)
username,department,role,building,ip_range
jdoe,Sales,Manager,A,192.168.1.10
asmith,Engineering,Developer,B,192.168.1.25
b-*,Engineering,QA,B,10.0.0.0/8
^scanner-.*$,IT,Service,A,lookup_config.json (Configuration)
{
"data_source": "./users.csv",
"matchers": [
{
"input_field": "user_lookup",
"lookup_field": "username",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "host_lookup",
"lookup_field": "username",
"method": "wildcard"
},
{
"input_field": "process_lookup",
"lookup_field": "username",
"method": "regex"
},
{
"input_field": "ip_lookup",
"lookup_field": "ip_range",
"method": "cidr"
}
]
}input.jsonl (Input Data)
{"timestamp": "2023-10-28T11:00:00Z", "user": "JDOE", "event": "login"}
{"timestamp": "2023-10-28T11:01:00Z", "hostname": "b-jones", "event": "connect"}
{"timestamp": "2023-10-28T11:02:00Z", "process": "scanner-01", "event": "scan"}
{"timestamp": "2023-10-28T11:03:00Z", "client_ip": "10.20.30.40", "event": "access"}
{"timestamp": "2023-10-28T11:04:00Z", "client_ip": "8.8.8.8", "event": "external_access"}cat input.jsonl | ./lookup \
-c lookup_config.json \
-m "user_lookup as user OUTPUT department as dept, role"Output (for the first line):
{"dept":"Sales","event":"login","role":"Manager","timestamp":"2023-10-28T11:00:00Z","user":"JDOE"}cat input.jsonl | ./lookup \
-c lookup_config.json \
-m "ip_lookup as client_ip"Output (for the fourth line):
{"building":"B","client_ip":"10.20.30.40","department":"Engineering","event":"access","ip_range":"10.0.0.0/8","role":"QA","timestamp":"2023-10-28T11:03:00Z","username":"b-*"}Command (using system resolver):
cat input.jsonl | ./lookup \
--dns \
-m "dns_reverse_lookup as client_ip OUTPUT hostname as resolved_host"Command (using a custom DNS server):
cat input.jsonl | ./lookup \
--dns \
--dns-server "8.8.8.8" \
-m "dns_reverse_lookup as client_ip OUTPUT hostname as resolved_host"Output (for the last line, may vary):
{"client_ip":"8.8.8.8","event":"external_access","resolved_host":"dns.google","timestamp":"2023-10-28T11:04:00Z"}To build from source, you need Go and Make installed.
# Build for the current platform
make build
# Cross-compile for all platforms (Linux amd64/arm64, macOS amd64/arm64, Windows amd64)
make build-all
# Build all binaries and create .zip archives in dist/
make packageThe build artifacts will be placed in the dist/ directory.
- Architecture — Module structure, data flow, testing strategy
- アーキテクチャ — Japanese version
- 日本語ドキュメント
This project is licensed under the MIT License.