OctoLLM is a high-performance LLM (Large Language Model) gateway and development framework designed for high-traffic production environments. It stands out for its flexibility, extensibility, and modular design.
OctoLLM serves two main purposes:
- Standalone Gateway: A ready-to-use LLM gateway configured via a YAML file.
- Development Framework: A Go-based framework for building custom LLM gateways and plugins with ease.
- Multi-Protocol Support: Supports OpenAI-compatible
chat/completionsand Claudemessagesinterface forwarding. - Load Balancing: Configurable weighted round-robin load balancing across multiple backends.
- Rule Engine: Powerful routing and logic based on expressions (e.g., checking request parameters).
- Security: API Key authentication and authorization, integratable with the rule engine for granular control.
- Traffic Body Rewrite: Request and response rewriting and transformation capabilities.
- Extensible Design: Modular
Engineinterface allowing arbitrary nesting and composition of features. - Protocol Conversion: Support serving Claude
messagesprotocol from OpenAIchat/completionsbackend.
- Content Moderation: Integration with external services for content safety.
- Advanced Rate Limiting: Distributed rate limiting capabilities (e.g., Redis-based).
- Comprehensive Unit Tests: Expanding test coverage for stability.
- Dynamic Configuration: Loading configuration from relational databases.
Here is an example of how to use OctoLLM Engines as the building blocks of a custom LLM gateway. If you are looking for a ready-to-use gateway, please refer to the Standalone Gateway section.
package main
import (
"fmt"
"net/http"
"os"
"github.com/infinigence/octollm/pkg/engines"
"github.com/infinigence/octollm/pkg/engines/client"
"github.com/infinigence/octollm/pkg/engines/converter"
"github.com/infinigence/octollm/pkg/octollm"
)
func main() {
mux := http.NewServeMux()
// Create a general endpoint to access an OpenAI-compatible API
ep := client.NewGeneralEndpoint(client.GeneralEndpointConfig{
BaseURL: "https://cloud.infini-ai.com/maas",
Endpoints: map[octollm.APIFormat]string{
octollm.APIFormatChatCompletions: "/v1/chat/completions",
},
APIKey: os.Getenv("OCTOLLM_API_KEY"),
})
mux.Handle("/v1/chat/completions", octollm.ChatCompletionsHandler(ep))
// Create a converter to convert OpenAI-compatible API to Claude messages API
conv := converter.NewChatCompletionsToClaudeMessages(ep)
mux.Handle("/v1/messages", octollm.MessagesHandler(conv))
// Create a rewrite engine to force the model to use kimi-k2-instruct
rewrite := engines.NewRewriteEngine(conv, &engines.RewritePolicy{
SetKeys: map[string]any{"stream": true},
}, nil, nil)
mux.Handle("/force-stream/v1/messages", octollm.MessagesHandler(rewrite))
// Start the server
if err := http.ListenAndServe(":8080", mux); err != nil {
fmt.Printf("failed to start server: %v", err)
}
}The complete example code is available in the examples directory.
To build the standalone gateway:
go build -o . ./cmd/...The standalone gateway uses a YAML configuration file (config.yaml) to define backends, models, and user access policies.
- Example Configuration: See examples/config-rule.yaml for a starter template.
- Detailed Documentation: Read the full Configuration Guide for in-depth explanation of all options.
Copy an example configuration file from the examples directory:
cp examples/config-minimal.yaml ./config.yaml
# Edit config.yaml and set an API key for the infini backend./octollm-serverHere is an example of how to use the standalone gateway to serve Claude messages protocol from OpenAI chat/completions backend, so that you can use Claude CLI.
Copy the protocol conversion example config:
cp examples/config-protocol-conversion.yaml ./config.yaml
# Edit config.yaml and set an API key for the infini backendTo run the gateway:
./octollm-serverConfig and run Claude CLI to use the OctoLLM gateway:
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_AUTH_TOKEN=xxx # any non-empty value works, since octollm auth is disabled in the config
claude --model kimi-k2-instruct # or other models defined in your config.yamlFor persistent configuration of Claude CLI, edit ~/.claude/settings.json.
- Unified Engine Interface: The core is built around a simple
Engineinterface with a singleProcessmethod. This handles both standard and streaming responses (via Go channels).type Engine interface { Process(req *Request) (*Response, error) }
- Lazy Parsing: Requests and responses use lazy parsing. Content is parsed only when accessed, minimizing memory usage and CPU cycles. Unused content remains as an
io.Reader, avoiding unnecessary copying. - Modularity: Engines can be nested arbitrarily. Each Engine implements a specific function (e.g., authentication, logging, routing) without needing to know the details of others.
- Lightweight Core: The
octollmpackage is minimal, containing only essential interfaces and structs. Implementations reside in theenginesdirectory.
OctoLLM is designed to be easily extended. You can implement your own Engine to add custom logic.
- Import
github.com/infinigence/octollm/pkg/octollm - Implement the
Engineinterface:type MyCustomEngine struct { Next octollm.Engine } func (e *MyCustomEngine) Process(req *octollm.Request) (*octollm.Response, error) { // Custom logic before request resp, err := e.Next.Process(req) // Custom logic after response return resp, err }
- Build a top-level
Enginethat chains your custom engine with the existing ones. And use this top-level engine to process HTTP requests.func main() { // Initialize the top engine // topEngine := &MyCustomEngine{...} // Start HTTP server http.HandleFunc("/chat/completions", octollm.ChatCompletionsHandler(topEngine)) http.ListenAndServe(":8080", nil) }