Skip to content

distroaryan/logman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LogMan

A self-hosted, AI-native event observability platform. Ingest arbitrary JSON events at scale, store them in ClickHouse, and query them via HTTP API or natural language through MCP — all without a predefined schema.


Architecture

Architecture Overview

Kafka Write Path

Data Flow

Client
  │
  ▼
POST /v1/ingest/:dataset    ← HTTP Ingest API (Go)
  │
  ▼
Kafka                        ← Write buffer (absorbs spikes)
  │
  ▼
Batch Worker (Go)            ← Consumer → ClickHouse writer
  │
  ▼
ClickHouse                   ← Columnar storage (MergeTree)
  │
  ▼
GET /v1/query/:dataset       ← Query API (Go)
  │
  ▼
MCP Tools (Go)               ← AI agent interface

Features

  • Schema-less ingest — Send any JSON, no pre-defined schema required
  • Kafka-buffered writes — Absorb traffic spikes, decouple ingest from storage
  • ClickHouse storage — Columnar, partitioned, fast time-range aggregations
  • Query API — Filter by dataset, time range, pagination with limit/offset
  • MCP integration — AI agents (Claude, Cursor, etc.) can query logs in natural language
  • Graceful shutdown — Drains Kafka offsets + ClickHouse buffer on SIGINT/SIGTERM

Tech Stack

Layer Technology
Language Go 1.25+
HTTP Framework Gin
Message Queue Kafka (KRaft mode, no Zookeeper)
Database ClickHouse (MergeTree)
Configuration koanf (env-based)
MCP SDK modelcontextprotocol/go-sdk
Dev Environment Docker Compose

Quick Start

Prerequisites

  • Go 1.25+
  • Docker + Docker Compose

1. Start infrastructure

make dev
# or: docker compose -f docker-compose.dev.yml up -d

This starts Kafka (single broker, KRaft mode) and ClickHouse.

2. Start the API server

go run ./cmd/api/main.go

Override port at runtime:

go run ./cmd/api/main.go -port=9090

3. Start the worker (Kafka → ClickHouse)

go run ./cmd/worker/main.go

4. Verify health

curl -s http://localhost:8080/healthz
# {"status":"ok"}

Usage

Ingest events

curl -X POST http://localhost:8080/v1/ingest/my-app \
  -H "Content-Type: application/json" \
  -d '[
    {"level": "error", "message": "DB connection failed", "service": "auth"},
    {"level": "info",  "message": "User logged in",       "service": "auth"}
  ]'

Response: 202 Accepted on success, 503 Service Unavailable if Kafka is unreachable.

Query events

curl -s "http://localhost:8080/v1/query/my-app?from=2026-01-01T00:00:00Z&limit=10"

Query parameters:

Param Type Default Description
from RFC3339 epoch Earliest time to include
to RFC3339 now Latest time to include
limit int (1–1000) 100 Max results
offset int 0 Pagination offset

API Reference

Health

GET /healthz
→ 200 {"status":"ok"}

Ingest

POST /v1/ingest/:dataset
Body: JSON array of event objects
→ 202 Accepted
→ 503 Service Unavailable (Kafka unavailable)

Query

GET /v1/query/:dataset
Params: from, to, limit, offset
→ 200 JSON array of enriched events

MCP

POST /mcp
Body: MCP JSON-RPC message

The /mcp endpoint serves the Model Context Protocol over streamable HTTP transport. Connect any MCP-compatible client (Claude Desktop, Cursor, etc.) to query logs using natural language.

Available tools:

Tool Description
query_logs Query logs by dataset with optional time range, limit, offset
list_datasets List all unique dataset names

Project Structure

logman/
├── cmd/
│   ├── api/main.go           # HTTP API server entrypoint
│   └── worker/main.go        # Kafka consumer + ClickHouse writer
├── internal/
│   ├── clickhouse/            # ClickHouse writer, batch processor, query
│   ├── config/                # Config loading (koanf, env-based)
│   ├── domain/                # Core domain types (LogEvent, EnrichedEvent)
│   ├── handler/               # HTTP handlers (ingest, query) + MCP tools
│   ├── kafkaclient/           # Kafka consumer + publisher
│   ├── migrations/            # ClickHouse schema migrations
│   └── server/                # Gin HTTP server setup + graceful shutdown
├── assets/
│   ├── architecture.png       # System architecture diagram
│   └── kafka_architecture.png # Kafka write path diagram
├── docker-compose.dev.yml     # Local infrastructure (Kafka + ClickHouse)
├── Makefile                   # Dev workflow shortcuts
└── .env                       # Environment variables

Configuration

All configuration is loaded from environment variables prefixed with APP_. A .env file at the project root is loaded automatically.

Variable Default Description
APP_PORT 8080 HTTP server port
APP_KAFKA_BROKERS localhost:9092 Kafka broker address
APP_KAFKA_TOPIC logs Kafka topic
APP_KAFKA_GROUP_ID logflow-consumer Consumer group ID
APP_CLICKHOUSE_HOST localhost ClickHouse host
APP_CLICKHOUSE_PORT 9000 ClickHouse native port
APP_CLICKHOUSE_DATABASE logman ClickHouse database
APP_CLICKHOUSE_USER default ClickHouse username
APP_CLICKHOUSE_TABLE logs ClickHouse table

Design Decisions

Why Kafka instead of writing directly to ClickHouse? ClickHouse performs best with large batch inserts. Kafka absorbs ingest spikes and lets the worker control insert cadence — prevents write amplification under load.

Why MergeTree and not ReplacingMergeTree? Events are append-only. MergeTree is the fastest and simplest choice. Deduplication would add overhead with no benefit.

Why schema-on-write? Accepting any JSON without a pre-defined schema is the core value prop. Events are stored with their original JSON payload for maximum flexibility.

Why MCP? The Model Context Protocol lets AI agents (Claude, Cursor, GitHub Copilot, etc.) query logs using natural language. This turns LogMan into an AI-native observability tool.

About

an observability and monitoring system build for llm agents,

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors