Skip to content

azmaveth/arbor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

91 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Arbor - Distributed AI Agent Orchestration System

CI Release Coverage License: MIT

⚠️ Alpha Software: This project is currently in alpha stage (v0.x.x). The API is unstable and may change significantly before v1.0 release. See PROJECT_STATUS.md for current implementation status.

Arbor is an ambitious distributed AI agent orchestration system built on the rock-solid foundation of Elixir/OTP. Designed from the ground up for fault-tolerance, scalability, and security, Arbor aims to enable sophisticated multi-agent AI workflows with enterprise-grade reliability.

Current State: Core infrastructure is implemented including distributed agent supervision, capability-based security, and event sourcing. The CLI and AI integrations are still under development.

🌟 Key Features

πŸ—οΈ Distributed Architecture

  • BEAM VM Foundation: Built on Erlang/OTP for legendary fault-tolerance and concurrency
  • Umbrella Project: Modular design with clear separation of concerns
  • Horde Integration: Dynamic process distribution across cluster nodes
  • Defensive Programming: "Let it crash" philosophy with comprehensive supervision trees

πŸ€– AI Agent Orchestration

  • Multi-Agent Coordination: Orchestrate diverse AI agents with different capabilities
  • Dynamic Agent Spawning: Create and manage agents based on workload demands
  • Inter-Agent Communication: Robust message passing with trace correlation
  • Task Delegation: Intelligent work distribution across agent types

πŸ”’ Capability-Based Security

  • Fine-Grained Permissions: Resource-specific access controls with time-based expiration
  • Zero-Trust Architecture: Every operation requires explicit capability grants
  • Audit Trail: Complete security event logging for compliance
  • Principle of Least Privilege: Minimal permission grants with automatic revocation

πŸ’Ύ State Persistence & Recovery

  • Event Sourcing: Immutable event streams for complete state reconstruction
  • CQRS Pattern: Optimized read/write models for performance
  • Automatic Recovery: Self-healing systems with state restoration
  • Distributed State: Consistent state management across cluster nodes

πŸ“Š Production-Ready Observability

  • Three Pillars: Comprehensive metrics, structured logs, and distributed traces
  • OpenTelemetry Integration: Industry-standard telemetry and tracing
  • Real-Time Monitoring: Prometheus metrics with Grafana dashboards
  • Performance Analytics: Detailed insights into agent behavior and system health

πŸš€ Quick Start

Prerequisites

  • Elixir 1.15.7+ and OTP 26.1+
  • Git for version control
  • Docker & Docker Compose (optional, for observability stack)

Installation

# Clone the repository
git clone https://github.com/azmaveth/arbor.git
cd arbor

# Run one-time setup (installs dependencies, builds PLT files)
./scripts/setup.sh

# Start development server with distributed node capabilities
./scripts/dev.sh

Development Workflow

# Run comprehensive test suite
./scripts/test.sh

# Quick feedback loop (skip slow checks)
./scripts/test.sh --fast

# Generate coverage report
./scripts/test.sh --coverage

# Connect to running development node (in another terminal)
./scripts/console.sh

# Performance benchmarks
./scripts/benchmark.sh

# Run distributed tests (multi-node cluster tests)
./scripts/test-distributed.sh

Testing Infrastructure

Arbor uses a hybrid testing approach that balances fast feedback with comprehensive distributed system verification:

  • Single-node tests (default): Fast unit and integration tests that run in isolation
  • Distributed tests (@tag :distributed): Multi-node tests that verify cluster behavior

Distributed Testing Capabilities

The distributed test suite validates critical distributed system behaviors:

  • CRDT Synchronization: Ensures distributed data structures converge correctly across nodes
  • Failover Scenarios: Verifies agents migrate properly when nodes crash
  • Race Condition Handling: Tests concurrent operations maintain consistency
  • Split-Brain Recovery: Validates cluster healing after network partitions

Test helpers in test/support/ provide utilities for:

  • Multi-node cluster orchestration
  • Network partition simulation
  • Cascading failure scenarios
  • CRDT convergence verification
  • Race condition detection

Using Docker

# Development environment with full observability stack
docker-compose up -d

# Access services:
# - Arbor: http://localhost:4000
# - Grafana: http://localhost:3000 (admin/admin)
# - Prometheus: http://localhost:9090
# - Jaeger: http://localhost:16686

# Build production image
docker build -t arbor:latest .

πŸ›οΈ Architecture Overview

Arbor follows a contracts-first, defensive architecture with clear dependency boundaries:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Arbor System                        β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Arbor Core   β”‚  β”‚   Arbor      β”‚  β”‚     Arbor        β”‚  β”‚
β”‚  β”‚              │◄──  Security    │◄──   Persistence    β”‚  β”‚
β”‚  β”‚ β€’ Agents     β”‚  β”‚              β”‚  β”‚                  β”‚  β”‚
β”‚  β”‚ β€’ Tasks      β”‚  β”‚ β€’ Capabilitiesβ”‚  β”‚ β€’ Event Store    β”‚  β”‚
β”‚  β”‚ β€’ Sessions   β”‚  β”‚ β€’ Audit       β”‚  β”‚ β€’ State Mgmt     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚           β”‚                 β”‚                     β”‚        β”‚
β”‚           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚                             β”‚                              β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚                    β”‚    Arbor     β”‚                        β”‚
β”‚                    β”‚  Contracts   β”‚                        β”‚
β”‚                    β”‚              β”‚                        β”‚
β”‚                    β”‚ β€’ Schemas    β”‚                        β”‚
β”‚                    β”‚ β€’ Types      β”‚                        β”‚
β”‚                    β”‚ β€’ Protocols  β”‚                        β”‚
β”‚                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“¦ Project Structure

arbor/
β”œβ”€β”€ apps/                           # Umbrella applications
β”‚   β”œβ”€β”€ arbor_contracts/            # πŸ”— Zero-dependency contracts
β”‚   β”‚   β”œβ”€β”€ lib/arbor/contracts/    # Schema definitions and types
β”‚   β”‚   └── test/                   # Contract validation tests
β”‚   β”œβ”€β”€ arbor_security/             # πŸ›‘οΈ Capability-based security
β”‚   β”‚   β”œβ”€β”€ lib/arbor/security/     # Authentication & authorization
β”‚   β”‚   └── test/                   # Security validation tests
β”‚   β”œβ”€β”€ arbor_persistence/          # πŸ’Ύ State management
β”‚   β”‚   β”œβ”€β”€ lib/arbor/persistence/  # Event sourcing & CQRS
β”‚   β”‚   └── test/                   # Persistence tests
β”‚   └── arbor_core/                 # 🧠 Core business logic
β”‚       β”œβ”€β”€ lib/arbor/core/         # Agent orchestration
β”‚       └── test/                   # Integration tests
β”œβ”€β”€ config/                         # Configuration files
β”‚   β”œβ”€β”€ config.exs                  # Base configuration
β”‚   β”œβ”€β”€ dev.exs                     # Development settings
β”‚   β”œβ”€β”€ test.exs                    # Test environment
β”‚   └── prod.exs                    # Production settings
β”œβ”€β”€ scripts/                        # πŸ› οΈ Development automation
β”‚   β”œβ”€β”€ setup.sh                    # One-time project setup
β”‚   β”œβ”€β”€ dev.sh                      # Development server
β”‚   β”œβ”€β”€ test.sh                     # Test suite runner
β”‚   β”œβ”€β”€ console.sh                  # Remote console connection
β”‚   β”œβ”€β”€ release.sh                  # Production builds
β”‚   └── benchmark.sh                # Performance testing
β”œβ”€β”€ .github/                        # πŸ”„ CI/CD workflows
β”‚   β”œβ”€β”€ workflows/                  # GitHub Actions
β”‚   β”‚   β”œβ”€β”€ ci.yml                  # Continuous integration
β”‚   β”‚   β”œβ”€β”€ nightly.yml             # Comprehensive testing
β”‚   β”‚   └── release.yml             # Automated releases
β”‚   └── README.md                   # CI/CD documentation
β”œβ”€β”€ observability/                  # πŸ“Š Monitoring configuration
β”‚   β”œβ”€β”€ prometheus.yml              # Metrics collection
β”‚   β”œβ”€β”€ grafana/                    # Dashboard definitions
β”‚   └── postgres/                   # Database initialization
β”œβ”€β”€ docs/                           # πŸ“š Documentation
β”‚   β”œβ”€β”€ development.md              # Development guide
β”‚   └── architecture.md             # System design
β”œβ”€β”€ Dockerfile                      # 🐳 Container definition
β”œβ”€β”€ docker-compose.yml              # Development environment
└── README.md                       # This file

🧠 Core Concepts

Agents

Autonomous AI entities with specific capabilities and responsibilities. Each agent:

  • Runs in its own supervised process
  • Has a unique identity and capability set
  • Can communicate with other agents via message passing
  • Maintains its own state and execution context

Capabilities

Granular permissions that agents must acquire to access resources:

  • Resource-Specific: Access to files, APIs, databases, etc.
  • Time-Limited: Automatic expiration for security
  • Auditable: Complete grant/revoke/usage logging
  • Hierarchical: Capabilities can delegate sub-capabilities

Sessions

Multi-agent coordination contexts that manage:

  • Agent lifecycle and task distribution
  • Shared context and memory
  • Resource allocation and cleanup
  • Performance monitoring and optimization

πŸ”§ Configuration

Environment Variables

# Development
export MIX_ENV=dev
export ARBOR_NODE_NAME=arbor@localhost
export ARBOR_COOKIE=arbor_dev_cookie

# Observability
export PROMETHEUS_ENDPOINT=http://localhost:9090
export JAEGER_ENDPOINT=http://localhost:14250
export GRAFANA_ENDPOINT=http://localhost:3000

# Security
export ARBOR_SECRET_KEY_BASE="your-secret-key-base"
export ARBOR_CAPABILITY_ENCRYPTION_KEY="your-encryption-key"

Configuration Files

  • config/config.exs - Base configuration
  • config/dev.exs - Development overrides
  • config/prod.exs - Production settings
  • coveralls.json - Test coverage thresholds

πŸ§ͺ Testing Strategy

Test Categories

  • Unit Tests: Individual module testing with mocks
  • Integration Tests: Component interaction testing
  • Property-Based Tests: Comprehensive input validation
  • Performance Tests: Benchmarking and load testing

Quality Gates

  • Coverage: β‰₯80% test coverage across all apps
  • Static Analysis: Credo compliance with strict checks
  • Type Safety: Dialyzer verification with success typing
  • Security: Dependency vulnerability scanning

Running Tests

# Full test suite with coverage
./scripts/test.sh --coverage

# Quick feedback loop
./scripts/test.sh --fast

# Specific test files
mix test test/arbor/core/agent_test.exs

# Property-based tests only
mix test --only property

# Integration tests
mix test --only integration

πŸš€ Deployment

Production Build

# Build optimized release
./scripts/release.sh

# Build with specific version
./scripts/release.sh --version 1.0.0

# Skip tests for faster builds
./scripts/release.sh --skip-tests

Container Deployment

# Build production image
docker build -t arbor:v1.0.0 .

# Run with clustering
docker run -d \
  --name arbor-node-1 \
  -p 4000:4000 \
  -e NODE_NAME=arbor@node1.cluster.local \
  -e ERLANG_COOKIE=secure_cluster_cookie \
  arbor:v1.0.0

Kubernetes Deployment

See docs/deployment/kubernetes.md for detailed Kubernetes configuration.

πŸ“Š Monitoring & Observability

Built-in Metrics

  • Agent Lifecycle: Spawn/termination rates, lifetime distributions
  • Performance: Operation latency, throughput, error rates
  • Security: Capability grants/revokes, security violations
  • System Health: Memory usage, process counts, cluster status

Dashboards

Pre-configured Grafana dashboards for:

  • System Overview: Cluster health, resource utilization
  • Agent Performance: Operation metrics, communication patterns
  • Security Monitoring: Capability usage, audit events
  • Distributed Tracing: Request flows across services

Alerting

Production-ready alerts for:

  • Critical: Service outages, security breaches
  • High: Performance degradation, high error rates
  • Medium: Resource constraints, operational issues

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for:

  • Code of Conduct: Community standards and expectations
  • Development Setup: Detailed environment configuration
  • Pull Request Process: Code review and merge requirements
  • Issue Templates: Bug reports and feature requests

Development Process

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes following our conventions
  4. Test thoroughly (./scripts/test.sh)
  5. Commit using conventional commits
  6. Push to your branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

πŸ“š Documentation

Architecture & Design

API Documentation

Generate comprehensive API docs:

mix docs
open doc/index.html

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Elixir/OTP Community for the incredible platform
  • BEAM Ecosystem for fault-tolerant distributed systems
  • OpenTelemetry Project for observability standards
  • All Contributors who make this project possible

πŸ“ž Support


Built with ❀️ using Elixir/OTP - The best platform for distributed, fault-tolerant systems

About

Distributed AI Agent Orchestration System built with Elixir/OTP

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors