OctoLLM

OctoLLM is a high-performance LLM (Large Language Model) gateway and development framework designed for high-traffic production environments. It stands out for its flexibility, extensibility, and modular design.

OctoLLM serves two main purposes:

Standalone Gateway: A ready-to-use LLM gateway configured via a YAML file.
Development Framework: A Go-based framework for building custom LLM gateways and plugins with ease.

✨ Features & Roadmap

Implemented Features

Multi-Protocol Support: Supports OpenAI-compatible chat/completions and Claude messages interface forwarding.
Load Balancing: Configurable weighted round-robin load balancing across multiple backends.
Rule Engine: Powerful routing and logic based on expressions (e.g., checking request parameters).
Security: API Key authentication and authorization, integratable with the rule engine for granular control.
Traffic Body Rewrite: Request and response rewriting and transformation capabilities.
Extensible Design: Modular Engine interface allowing arbitrary nesting and composition of features.
Protocol Conversion: Support serving Claude messages protocol from OpenAI chat/completions backend.

Planned Features

Content Moderation: Integration with external services for content safety.
Advanced Rate Limiting: Distributed rate limiting capabilities (e.g., Redis-based).
Comprehensive Unit Tests: Expanding test coverage for stability.
Dynamic Configuration: Loading configuration from relational databases.

🔧 Getting Started

Here is an example of how to use OctoLLM Engines as the building blocks of a custom LLM gateway. If you are looking for a ready-to-use gateway, please refer to the Standalone Gateway section.

package main

import (
	"fmt"
	"net/http"
	"os"

	"github.com/infinigence/octollm/pkg/engines"
	"github.com/infinigence/octollm/pkg/engines/client"
	"github.com/infinigence/octollm/pkg/engines/converter"
	"github.com/infinigence/octollm/pkg/octollm"
)

func main() {
	mux := http.NewServeMux()

	// Create a general endpoint to access an OpenAI-compatible API
	ep := client.NewGeneralEndpoint(client.GeneralEndpointConfig{
		BaseURL: "https://cloud.infini-ai.com/maas",
		Endpoints: map[octollm.APIFormat]string{
			octollm.APIFormatChatCompletions: "/v1/chat/completions",
		},
		APIKey: os.Getenv("OCTOLLM_API_KEY"),
	})
	mux.Handle("/v1/chat/completions", octollm.ChatCompletionsHandler(ep))

	// Create a converter to convert OpenAI-compatible API to Claude messages API
	conv := converter.NewChatCompletionsToClaudeMessages(ep)
	mux.Handle("/v1/messages", octollm.MessagesHandler(conv))

	// Create a rewrite engine to force the model to use kimi-k2-instruct
	rewrite := engines.NewRewriteEngine(conv, &engines.RewritePolicy{
		SetKeys: map[string]any{"stream": true},
	}, nil, nil)
	mux.Handle("/force-stream/v1/messages", octollm.MessagesHandler(rewrite))

	// Start the server
	if err := http.ListenAndServe(":8080", mux); err != nil {
		fmt.Printf("failed to start server: %v", err)
	}
}

The complete example code is available in the examples directory.

🚀 Using the Standalone Gateway

Building the Standalone Gateway

To build the standalone gateway:

go build -o . ./cmd/...

Configuration

The standalone gateway uses a YAML configuration file (config.yaml) to define backends, models, and user access policies.

Example Configuration: See examples/config-rule.yaml for a starter template.
Detailed Documentation: Read the full Configuration Guide for in-depth explanation of all options.

Copy an example configuration file from the examples directory:

cp examples/config-minimal.yaml ./config.yaml
# Edit config.yaml and set an API key for the infini backend

Running the Standalone Gateway

./octollm-server

Using Claude Code with OpenAI-compatible Services

Here is an example of how to use the standalone gateway to serve Claude messages protocol from OpenAI chat/completions backend, so that you can use Claude CLI.

Copy the protocol conversion example config:

cp examples/config-protocol-conversion.yaml ./config.yaml
# Edit config.yaml and set an API key for the infini backend

To run the gateway:

./octollm-server

Config and run Claude CLI to use the OctoLLM gateway:

export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_AUTH_TOKEN=xxx # any non-empty value works, since octollm auth is disabled in the config
claude --model kimi-k2-instruct # or other models defined in your config.yaml

For persistent configuration of Claude CLI, edit ~/.claude/settings.json.

🏗 Architecture

Core Design Philosophy

Unified Engine Interface: The core is built around a simple Engine interface with a single Process method. This handles both standard and streaming responses (via Go channels).
```
type Engine interface {
    Process(req *Request) (*Response, error)
}
```
Lazy Parsing: Requests and responses use lazy parsing. Content is parsed only when accessed, minimizing memory usage and CPU cycles. Unused content remains as an io.Reader, avoiding unnecessary copying.
Modularity: Engines can be nested arbitrarily. Each Engine implements a specific function (e.g., authentication, logging, routing) without needing to know the details of others.
Lightweight Core: The octollm package is minimal, containing only essential interfaces and structs. Implementations reside in the engines directory.

🔌 Development & Extensions

OctoLLM is designed to be easily extended. You can implement your own Engine to add custom logic.

Import github.com/infinigence/octollm/pkg/octollm

Implement the Engine interface:

type MyCustomEngine struct {
    Next octollm.Engine
}

func (e *MyCustomEngine) Process(req *octollm.Request) (*octollm.Response, error) {
    // Custom logic before request
    resp, err := e.Next.Process(req)
    // Custom logic after response
    return resp, err
}

Build a top-level Engine that chains your custom engine with the existing ones. And use this top-level engine to process HTTP requests.

func main() {
    // Initialize the top engine
    // topEngine := &MyCustomEngine{...}

    // Start HTTP server
    http.HandleFunc("/chat/completions", octollm.ChatCompletionsHandler(topEngine))
    http.ListenAndServe(":8080", nil)
}

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.github/workflows		.github/workflows
cmd		cmd
docs		docs
examples		examples
pkg		pkg
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OctoLLM

✨ Features & Roadmap

Implemented Features

Planned Features

🔧 Getting Started

🚀 Using the Standalone Gateway

Building the Standalone Gateway

Configuration

Running the Standalone Gateway

Using Claude Code with OpenAI-compatible Services

🏗 Architecture

Core Design Philosophy

🔌 Development & Extensions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OctoLLM

✨ Features & Roadmap

Implemented Features

Planned Features

🔧 Getting Started

🚀 Using the Standalone Gateway

Building the Standalone Gateway

Configuration

Running the Standalone Gateway

Using Claude Code with OpenAI-compatible Services

🏗 Architecture

Core Design Philosophy

🔌 Development & Extensions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages