Skip to content

nadi-pro/shipper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nadi Shipper

Nadi Shipper is a super lightweight log shipper for Nadi app, which transports all the Nadi logs to Nadi API.

Features

  • Parallel file processing with configurable worker pool
  • Exponential backoff with jitter for retries
  • Graceful shutdown with state preservation
  • TLS support with custom CA certificates
  • Structured logging with log levels
  • Atomic file writes for data integrity

Quick Start

Install Shipper on all the servers you want to monitor.

To download and install Shipper, use the commands that work with your system:

Install via bash script (Linux & Mac)

Linux & Mac users can install it directly to /usr/local/bin/shipper with:

sudo bash < <(curl -sL https://raw.githubusercontent.com/nadi-pro/shipper/master/install)

Download static binary (Windows, Linux and Mac)

Run the following command which will download latest version and configure default configuration for Windows.

powershell -command "(New-Object Net.WebClient).DownloadFile('https://raw.githubusercontent.com/nadi-pro/shipper/master/install.ps1', '%TEMP%\install.ps1') && %TEMP%\install.ps1 && del %TEMP%\install.ps1"

Building from Source

go build -ldflags "-X main.Version=1.0.0 -X main.BuildTime=$(date -u +%Y-%m-%dT%H:%M:%SZ)" -o shipper .

Configuration

Duplicate nadi.reference.yaml to nadi.yaml and update the following values:

Field Required Default Description
endpoint Yes - Nadi API endpoint URL
apiKey Yes - Your Nadi API token
token Yes - Your application token
storage Yes - Path to Nadi logs directory
accept No application/vnd.nadi.v1+json Accept header value
trackerFile No tracker.json Path to tracker file
persistent No false Keep log files after sending
maxTries No 3 Maximum retry attempts
timeout No 1m HTTP request timeout
checkInterval No 5s Directory polling interval
filePattern No *.json Glob pattern for files to process
deadLetterDir No - Directory for failed files (dead letter queue)
compress No false Enable gzip compression for requests
workers No 4 Number of parallel workers
tlsCACert No - Path to custom CA certificate
tlsSkipVerify No false Skip TLS verification (insecure)
healthCheckAddr No - Address for health check server (e.g., :8080)
metricsEnabled No false Enable OpenTelemetry metrics at /metrics endpoint

Example Configuration

nadi:
  endpoint: https://nadi.pro/api/
  accept: application/vnd.nadi.v1+json
  apiKey: your-api-key-here
  token: your-app-token-here
  storage: /var/log/nadi
  trackerFile: tracker.json
  persistent: false
  maxTries: 3
  timeout: 1m
  checkInterval: 5s
  workers: 4

Usage

Command Line Flags

Flag Description
--config Path to configuration file (default: nadi.yaml)
--test Test connection to Nadi Collector
--verify Verify shipper configuration
--record Start shipping logs to Nadi Collector
--dry-run Process files without sending to API (use with --record)
--retry-failed Reset failed files to pending status for retry
--version Show version information

Testing Connection

shipper --test

Running the Shipper

shipper --config=/path/to/nadi.yaml --record

Checking Version

shipper --version

Health Checks

When healthCheckAddr is configured, the shipper exposes HTTP endpoints for health monitoring:

Endpoint Description
GET /healthz Liveness probe - returns 200 if service is running
GET /readyz Readiness probe - returns 200 if storage is accessible
GET /status Detailed status with stats, uptime, and config
GET /metrics Prometheus metrics (requires metricsEnabled: true)

Kubernetes Example

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /readyz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Status Response Example

{
  "status": "ok",
  "version": "1.0.0",
  "uptime": "2h30m15s",
  "stats": {
    "startTime": "2024-01-15T10:00:00Z",
    "filesSent": 1250,
    "filesFailed": 3,
    "filesRetried": 15,
    "lastActivity": "2024-01-15T12:30:00Z"
  },
  "config": {
    "storage": "/var/log/nadi",
    "workers": 4,
    "checkInterval": "5s",
    "filePattern": "*.json"
  }
}

Metrics

When metricsEnabled: true is set, the shipper exposes OpenTelemetry metrics in Prometheus format at the /metrics endpoint. This requires healthCheckAddr to be configured.

Available Metrics

Metric Type Description
nadi_shipper_files_sent_total Counter Total files successfully sent to API
nadi_shipper_files_failed_total Counter Total files failed after max retries
nadi_shipper_files_retried_total Counter Total retry attempts
nadi_shipper_api_requests_total Counter Total API requests (labeled by status)
nadi_shipper_api_request_duration_seconds Histogram API request duration
nadi_shipper_pending_files Gauge Current number of pending files
nadi_shipper_uptime_seconds Gauge Shipper uptime in seconds
nadi_shipper_build_info Gauge Build information (labeled with version)

Prometheus Configuration

scrape_configs:
  - job_name: 'nadi-shipper'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: /metrics

Grafana Dashboard Example

You can create alerts based on these metrics:

# Alert when files are failing
rate(nadi_shipper_files_failed_total[5m]) > 0

# Alert when API latency is high
histogram_quantile(0.95, rate(nadi_shipper_api_request_duration_seconds_bucket[5m])) > 5

# Alert when retry rate is high
rate(nadi_shipper_files_retried_total[5m]) > rate(nadi_shipper_files_sent_total[5m])

Exit Codes

Code Description
0 Success - operation completed successfully
1 Error - configuration error, connection failure, or runtime error

Exit code 1 is returned when:

  • Configuration file is missing or invalid
  • Required fields (apiKey, token, storage) are not set
  • Storage directory does not exist or is inaccessible
  • API endpoint URL is malformed
  • TLS/CA certificate loading fails
  • Connection test (--test) fails
  • Configuration verification (--verify) fails
  • No action flag is provided

Tracker File Format

The tracker.json file tracks the status of each log file. It is a JSON object where keys are filenames and values contain tracking information.

File Status Values

Status Value Description
Pending 0 File is queued for sending
Sent 1 File was successfully sent to API
Failed 2 File failed after max retries

Tracker Entry Fields

Field Type Description
status number Current file status (0, 1, or 2)
tries number Number of send attempts
lastAttempt string ISO 8601 timestamp of last attempt
nextRetryAfter string ISO 8601 timestamp for next retry

Example tracker.json

{
  "app-2024-01-15-001.json": {
    "status": 1,
    "tries": 1,
    "lastAttempt": "2024-01-15T10:30:00Z"
  },
  "app-2024-01-15-002.json": {
    "status": 0,
    "tries": 2,
    "lastAttempt": "2024-01-15T10:31:00Z",
    "nextRetryAfter": "2024-01-15T10:31:04Z"
  },
  "app-2024-01-15-003.json": {
    "status": 2,
    "tries": 3,
    "lastAttempt": "2024-01-15T10:32:00Z"
  }
}

Running as a Service

Using Supervisord

For monitoring multiple applications on a single server, use Supervisord:

[program:shipper-app1]
process_name=%(program_name)s
command=/usr/local/bin/shipper --config=/path/to/shipper/config/nadi-app1.yaml --record
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/nadi/nadi-app1.log
stopwaitsecs=3600

[program:shipper-app2]
process_name=%(program_name)s
command=/usr/local/bin/shipper --config=/path/to/shipper/config/nadi-app2.yaml --record
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/nadi/nadi-app2.log
stopwaitsecs=3600

Using systemd

Create /etc/systemd/system/nadi-shipper.service:

[Unit]
Description=Nadi Shipper
After=network.target

[Service]
Type=simple
User=nadi
ExecStart=/usr/local/bin/shipper --config=/etc/nadi/nadi.yaml --record
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl enable nadi-shipper
sudo systemctl start nadi-shipper

Reload configuration without restarting:

sudo systemctl reload nadi-shipper
# or
kill -HUP $(pidof shipper)

Troubleshooting

Common Issues

"apiKey is required" / "token is required" / "storage path is required"

Cause: Required configuration fields are missing.

Solution: Ensure your nadi.yaml contains valid values for apiKey, token, and storage.

"storage directory does not exist"

Cause: The configured storage directory doesn't exist.

Solution: Create the directory:

sudo mkdir -p /var/log/nadi
sudo chown $USER:$USER /var/log/nadi

"Endpoint is using http instead of HTTPS"

Cause: Warning that credentials may be transmitted insecurely.

Solution: Use an HTTPS endpoint in production. This is a warning only; the shipper will continue to run.

"TLS certificate verification is disabled"

Cause: tlsSkipVerify: true is set in configuration.

Solution: This is insecure for production. Use a proper CA certificate with tlsCACert instead.

"failed to read CA certificate"

Cause: The CA certificate file specified in tlsCACert cannot be read.

Solution: Verify the file path and permissions:

ls -la /path/to/ca-cert.pem

"failed to parse CA certificate"

Cause: The CA certificate file is not valid PEM format.

Solution: Verify the certificate format:

openssl x509 -in /path/to/ca-cert.pem -text -noout

"API request failed with status code: 401"

Cause: Invalid API key or token.

Solution: Verify your credentials in the Nadi app and update nadi.yaml.

"API request failed with status code: 403"

Cause: The token doesn't have permission to access the endpoint.

Solution: Check your application permissions in the Nadi app.

Files stuck in "pending" status

Cause: API is unreachable or returning errors.

Solution:

  1. Test connectivity: shipper --test
  2. Check network/firewall settings
  3. Review logs for specific error messages
  4. Check tracker.json for retry timing

Files marked as "failed" (status: 2)

Cause: File failed after maximum retry attempts.

Solution:

  1. Check the file contains valid JSON: cat file.json | jq .
  2. Review logs for the specific error
  3. To retry: delete the entry from tracker.json and restart shipper

Debug Tips

  1. Check shipper logs:

    journalctl -u nadi-shipper -f  # If using systemd
    tail -f /var/log/nadi/nadi-app1.log  # If using supervisord
  2. Verify configuration:

    shipper --config=/path/to/nadi.yaml --verify
  3. Test API connectivity:

    shipper --config=/path/to/nadi.yaml --test
  4. Inspect tracker state:

    cat tracker.json | jq .
  5. Check file permissions:

    ls -la /var/log/nadi/

Performance Tuning

  • High throughput: Increase workers (e.g., 8-16) for faster processing
  • Rate-limited API: Decrease workers (e.g., 1-2) to avoid rate limits
  • Slow network: Increase timeout (e.g., 2m or 5m)
  • Many small files: Decrease checkInterval for faster detection

License

See LICENSE for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published