Skip to content

busthorne/avtivka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Avtivka

Avtivka implements interruptible action spaces called cables, on bare-metal hardware; primarily, it serves as the workplace environment for intelligent agents, and it's similar to FaaS in spirit, but unlike AWS Lambda, or Cloud Functions, cables can live forever, and bootstrap themselves into whole networks. Hence, we say that Avtivka is designed with Cable semantics in mind—a persistent, interruptible computational environment that bridges the gap between machine intuition and concrete action. Agents live only when they need to: sleeping on NVMe, waking in milliseconds, and communicating over a pure IPv6, zero-trust fabric.

The name comes from the Ukrainian word автівка—a passenger car, a vehicle.

Table of Contents


The Story

For years, we have treated AI agents as "dreamers in a vacuum." We give them a chat window and expect them to perform complex work. But an agent is not merely a chat participant—it is a computer program in a network. The Clownbot saga proved that even in the mainstream, people already distinguish bots from agents. The labs keep shipping models with new capabilities, and intuition is getting better with every generation, but the instrumentation of agents—the harness, the action space—remains an open problem for everyone.

The standard tools fail us because they inherit the bloat of the cloud. When an agent needs to "act"—to compile code, query a database, spin up a sub-agent—we shouldn't be waiting for a cloud-native scheduler to decide where to place a container. We shouldn't be paying for cold-start latencies measured in seconds. We shouldn't be thinking about context window limits, about whether our state fits in a chat thread.

Avtivka is the clean room implementation of a different vision. One where:

  • Persistence is the default, not the exception. Your cables live forever on NVMe; you can always return to them, to their full context, to the algorithms and data structures they've built.
  • Interruption is cheap, not catastrophic. Snapshotting and restoring a cable is a storage operation, not a re-initialization.
  • The hardware is yours, and you know exactly what it can do. We don't abstract away the metal—we optimize for it.

This is not a little bit more complicated than an OpenAI API wrapper. This is the absolute minimum if you want to save 50% on inference for the general case, or 90% on context encoding for the 10% most demanding agents.


Cable Semantics

Intuition Space and Action Space

On the most basic level, artificial intelligence is always the transition from ruminations—machine intuition, the LLM—to concrete actions, where the agent turns out to be not just a "dreamer in a vacuum" but a computer program in a network.

Cable semantics formalizes this split. We talk about two spaces:

  • Intuition Space (Inference): The LLM backend. This is where the thinking happens—where Gemini, Claude, or a self-hosted model generates the next action. We talk about cables instead of talking about the peculiarities of Gemini in Google Cloud, or paged attention in vLLM. We talk about cables instead of talking about "threads" with "a helpful assistant" in a chat window, because we don't want to think about context window lengths, about whether everything fits, about the economics of thinking tokens, and we certainly don't want to think about where to cache and batch and where not to.
  • Action Space (Execution): The long-lived computational environment—the sandbox—where the agent can write, build, and run programs. This is where Avtivka lives.

A cable connects these two spaces. It abstracts the LLM backoffice on one side, and the sandboxed execution environment on the other. The cable doesn't care whether intuitions come from Google Vertex APIs or from a model running on a Mac Studio on 10G copper.

What Is a Cable?

A cable is a persistent, interruptible action space driven from the intuition space, which is eventually powered by an LLM. Cables can do many things, but they should expect to only ever live during inference time, or whenever they're hit over the network. In this respect, they are similar to FaaS functions.

More precisely:

  • Persistence: A cable exists on disk as a Firecracker snapshot—a frozen VM image on NVMe. It retains its full state: processes, filesystem, open file descriptors, network connections, in-progress computations. When the cable is idle, it costs nothing but storage.
  • Wake-on-Inference: The moment an intuition arrives (a code block from the LLM, an MCP request from another cable, an HTTP request from the overlay), the cable is restored from NVMe to RAM. Firecracker's on-demand paging ensures the CPU begins executing before the full memory footprint is loaded.
  • Interruptibility: After completing its work—or at any point during execution—the cable can be checkpointed back to NVMe. The next wake cycle continues exactly where it left off.
  • Service Identity: Semantically, each cable can act as an MCP server to whatever downstream consumer; this could be another cable, or any compliant MCP client. If Avtivka decides to route it a public prefix, the cable becomes addressable from the wider internet.

The CodeAct Paradigm

Avtivka is designed to implement the CodeAct paradigm. The key insight from the original Apple research is deceptively simple: we have been training models to write perfectly good code, so why make them communicate through JSON function schemas? Instead, let them write code directly.

When we create an agent, we create a container with a Pythonic interface and a hierarchical filesystem, and this becomes the root process. The API providers chose to split function-calling and the overarching code interpreter into two distinct interfaces. The traditional function-calling pollutes the context with JSON. But if your runtime can translate an OpenAPI function schema into a Python function, then your agents can effectively bootstrap novel tools from it.

graph TD
    subgraph "Intuition Space"
        LLM[LLM Backend<br/>Vertex AI / Anthropic / Self-hosted]
    end

    subgraph "Avtivka Host"
        Supervisor[Avtivka Supervisor]
        XDP[XDP Dataplane]
    end

    subgraph "Cable (Firecracker microVM)"
        Docker[Docker + gVisor runtime]
        Agent[Agent Root Process<br/>Python Environment]
        Tool1[Synthesized Tool: git]
        Tool2[Synthesized Tool: psql]
        Sub[Sub-Agent Process]
    end

    LLM -- "Code Block" --> Supervisor
    Supervisor -- "Wake / Route" --> XDP
    XDP -- "Packet" --> Docker
    Agent --> Tool1
    Agent --> Tool2
    Agent -- "fork / exec" --> Sub
    Agent -- "Result" --> Supervisor
    Supervisor -- "Completion" --> LLM
Loading

Toolbelt Time

Before giving an agent its assignment, Avtivka performs what we call Toolbelt Time: a preparation phase where we use the LLM itself to synthesize idiomatic Python interfaces for all available tools, MCP clients, and API endpoints relevant to the task. This includes:

  • Code-generating a Pythonic API from MCP server schemas.
  • Adding tests, debug harnesses, and type hints where possible.
  • Bundling the synthesized toolbelt into the cable's golden snapshot so it's available instantly on wake.

The result is that the agent doesn't waste inference tokens figuring out how to call a tool. It has a clean, tested, idiomatic interface ready from the first instruction.

TODO: Define the exact Toolbelt Time protocol. How much of this is done at cable creation vs. dynamically during execution? Should the toolbelt be versioned and cached across cable lifetimes? How does this interact with MCP's tools/list capability negotiation?

Sub-Agents as Processes

When we talk about sub-agents, we mean processes that the primary agent can code or conceptualize through the same Pythonic interface. A cable is a microVM with a full process tree—the agent is PID 1, and any sub-agent it spawns is just a child process.

This design has several advantages:

  • No orchestration overhead: Spawning a sub-agent is as cheap as fork().
  • Shared filesystem: Sub-agents inherit the cable's filesystem, including the toolbelt.
  • Natural supervision: The agent can wait, signal, and collect results from sub-agents using standard POSIX semantics.

TODO: Explore the boundary between "sub-agent as a process" and "sub-agent as a separate cable." When should an agent spawn a child process vs. request Avtivka to create a new cable? The latter gives stronger isolation and independent lifecycle, but at higher latency cost.

MCP and A2A Service Identity

Every cable is, by default, an MCP (Model Context Protocol) server. This is a deliberate architectural choice: it means that cable-to-cable communication and cable-to-external-client communication use the same protocol, the same discovery mechanism, and the same access policies.

In the recent months, the pair of MCP for function discovery and A2A for agent-to-agent communication have emerged as the de facto standard means of tool-use over the network. Avtivka embraces both:

  • MCP: For tool discovery and invocation. A cable exposes its capabilities (file operations, shell commands, database queries) as MCP tools.
  • A2A: For structured, multi-turn agent-to-agent conversations. Cable A can delegate a complex task to Cable B, receive progress updates, and collect results.

TODO: Define the exact relationship between MCP server/client roles in the cable lifecycle. Should every cable expose a fixed set of "intrinsic" tools (filesystem, shell, process management) alongside task-specific tools from Toolbelt Time? How does A2A session management interact with cable sleep/wake cycles?


Architecture: Nested Sandboxing

Why Both Firecracker and gVisor?

This is the key architectural decision that sets Avtivka apart. We use both Firecracker and gVisor, not either/or:

  • Firecracker gives us VM-level isolation, IOMMU passthrough for NVMe and NIC VFs, and—critically—instant snapshot/restore via userfaultfd. When a cable sleeps, Firecracker serializes the entire VM state to NVMe. When it wakes, Firecracker restores the VM with on-demand paging: the CPU starts executing before the full memory footprint is loaded from disk.

  • gVisor gives us application-level sandboxing inside the VM. The microVM runs a lightweight Linux image with Docker configured to use runsc (gVisor's OCI runtime). This means the agent gets a standard, familiar Docker API—it can docker pull any compatible container from any allowed registry—but every syscall is intercepted by gVisor's user-space kernel.

The result is a double hull:

graph TB
    subgraph "Host OS (Debian)"
        KVM[Linux KVM]
        IOMMU[IOMMU / VT-d]
        NVMe[NVMe Namespace]
        VF[25G VF via SR-IOV]
    end

    subgraph "Firecracker microVM"
        Guest[Lightweight Guest Linux]
        DockerD[Docker Daemon]
        subgraph "gVisor Sandbox"
            Sentry[gVisor Sentry<br/>User-space Kernel]
            App1[Agent Container]
            App2[Tool Container]
        end
    end

    KVM --> Guest
    IOMMU --> VF
    IOMMU --> NVMe
    Guest --> DockerD
    DockerD --> Sentry
    Sentry --> App1
    Sentry --> App2
Loading

Even if a malicious payload escapes the gVisor sentry, it finds itself inside a minimal Firecracker guest with no host access. And even if it somehow escapes the KVM boundary (an extremely high-severity zero-day), IOMMU ensures it cannot DMA to host memory.

The Firecracker Layer (Infrastructure)

Firecracker is Amazon's microVM monitor, originally built for AWS Lambda. We use it because:

  1. Checkpoint and Restore in Userspace (CRIU). Firecracker supports snapshotting the full VM state—CPU registers, memory, device state—to disk. On restore, it uses userfaultfd to demand-page memory: only the pages the CPU actually touches are read from the snapshot file.
  2. Minimal attack surface. Firecracker emulates only 4 devices (virtio-net, virtio-block, serial, and a partial i8042). There is no PCI bus, no USB, no GPU passthrough by default.
  3. Sub-millisecond boot. A fresh Firecracker microVM boots in under 125ms. From a snapshot, it resumes in low single-digit milliseconds.
  4. IOMMU integration. On our EPYC platform, each microVM can receive a dedicated NVMe namespace and a 25G NIC virtual function via SR-IOV, passed through directly with IOMMU. This gives the guest near-native I/O performance without any host CPU involvement in the data path.

TODO: Benchmark the actual snapshot-to-first-instruction latency on our reference hardware with varying memory footprints (256MB, 1GB, 2GB, 4GB). Document the userfaultfd configuration and verify that the Samsung 9100's random read latency (which is excellent due to the 4GB LPDDR4X cache on the 4TB model) dominates the restore path as expected.

The gVisor Layer (Application)

gVisor is Google's application kernel. Inside each Firecracker microVM, we run gVisor as the OCI runtime for Docker:

  1. Syscall interception. gVisor's Sentry reimplements the Linux kernel interface in user-space. The agent's containers never touch the guest kernel directly—every read(), write(), mmap(), clone() goes through the Sentry.
  2. Standard Docker experience. The agent uses docker run, docker pull, docker exec—the same API it would use anywhere else. The only difference is that --runtime=runsc is the default.
  3. systrap platform. gVisor's systrap platform (successor to ptrace) works well inside KVM guests. This has been validated in production at Google scale.

TODO: Verify systrap performance inside Firecracker's minimal KVM environment. Measure the syscall interception overhead for typical agent workloads (Python process spawning, file I/O, network calls). Compare with ptrace platform as a fallback.

Golden Snapshots and Forking

One of the most powerful patterns in Avtivka is the golden snapshot. Consider the scenario: you want to run one agent for each of N items (rows, files, issues). Instead of booting N microVMs from scratch:

  1. Create one golden snapshot with the common harness: Python environment, toolbelt, base configuration.
  2. Fork N cables from this snapshot. Each fork is a copy-on-write restore—the pages that all N cables share (Python runtime, libraries, toolbelt code) are read from NVMe exactly once and shared in the host page cache.
  3. Each cable diverges only in the pages it actually writes to. For a typical agent workload, this means the marginal cost of the Nth cable is a few megabytes of dirty pages, not gigabytes of duplicated runtime.
graph LR
    GS[Golden Snapshot<br/>Python + Toolbelt<br/>~500MB on NVMe]
    C1[Cable 1<br/>+12MB dirty]
    C2[Cable 2<br/>+8MB dirty]
    C3[Cable 3<br/>+15MB dirty]
    CN[Cable N<br/>+...MB dirty]

    GS --> C1
    GS --> C2
    GS --> C3
    GS --> CN
Loading

This is fundamentally more efficient than any container-based approach, because we're sharing at the memory page level, not the filesystem layer level.

TODO: Measure the actual copy-on-write overhead for N concurrent cables forked from the same golden snapshot. Determine the optimal golden snapshot size for different agent profiles (lightweight Python scripts vs. heavy ML workloads with large library imports).

Resumption Lifecycle

The full lifecycle of a cable wake event, from network packet to first agent instruction:

sequenceDiagram
    participant Net as 25G/100G Fabric
    participant XDP as XDP Program (eBPF)
    participant Buf as Packet Buffer
    participant Sup as Avtivka Supervisor
    participant NVMe as Samsung 9100
    participant FC as Firecracker
    participant GV as gVisor Container

    Net->>XDP: IPv6 packet for Cable ULA
    XDP->>XDP: Lookup: Cable is SLEEPING
    XDP->>Buf: Buffer packet
    XDP->>Sup: Wake signal (AF_XDP / eventfd)
    Sup->>NVMe: mmap snapshot file
    NVMe-->>FC: Metadata restore (~3-5ms)
    FC->>FC: Resume vCPUs
    FC-->>GV: First instruction executes (< 6ms total)
    Sup->>Buf: Drain buffered packets
    Buf->>Net: Inject into VM's virtio-net
    loop Demand Paging
        GV->>FC: Page fault on cold page
        FC->>NVMe: userfaultfd read (~67μs per 4KB page)
    end
    GV->>GV: Process request
    GV->>Net: Response
    Sup->>Sup: Idle timer starts
    Note over Sup,NVMe: After idle timeout...
    Sup->>FC: Pause vCPUs
    FC->>NVMe: Write snapshot (differential)
    Sup->>XDP: Mark cable as SLEEPING
Loading

Hardware-Direct: The "Siena" Advantage

Avtivka is not "cloud-native." It is cloud-agnostic: open to offloading some storage and edge compute capabilities, or highly-available control planes to the cloud, but fundamentally it is a VM image meant to run on modern bare-metal hardware. And it doesn't have to be expensive.

The Reference Platform

We perform complete integration testing on one platform:

Component Model Key Specs
CPU AMD EPYC 8434PN 48 cores / 96 threads, Zen 4c, 155W TDP
Memory DDR5-4800 6 channels, ~230 GB/s peak bandwidth
Motherboard ASUS S14NA-U12 Single socket, 96 PCIe 5.0 lanes
Storage Samsung 9100 PRO (4TB) PCIe 5.0 x4, 14.8 GB/s read, 4GB LPDDR4X cache
Network Broadcom BCM57414 Dual 25G SFP28, XDP native mode, RoCE v2, SR-IOV
Switch MikroTik CRS504 4x 100G QSFP28

This setup is chosen because:

  • S14NA-U12 is one of the most cost-effective single-socket EPYC boards available, with massive PCIe 5.0 expansion.
  • EPYC 8004 "Siena" is optimized for power efficiency and density, not raw clock speed. You get 96 threads at 155W—this is the edge/telco chip that AMD built for exactly this kind of "always-on, mostly-idle" workload.
  • Samsung 9100 4TB specifically has a 4GB LPDDR4X write cache. Always take the 4TB variant; the smaller models have proportionally smaller caches and the performance difference under mixed workloads is dramatic.

Why PCIe 5.0 Is King

The entire Avtivka performance story rests on PCIe 5.0 bandwidth. Consider what becomes possible:

  • Samsung 9100: 14.8 GB/s sequential read. This is not a theoretical maximum—the Samsung Presto controller and 236-layer TLC NAND sustain this in practice.
  • BCM57414 with RoCE v2: 25 Gbps per port (50 Gbps aggregate) with RDMA, zero-copy, kernel-bypass networking.
  • 96 PCIe 5.0 lanes: Enough to feed multiple NVMe drives, multiple NICs, and still have room for GPU accelerators if needed.

The point we want to emphasize: you can now have hardware that is superior to that of hyperscalers for this specific workload, without having to sacrifice any of your existing cloud workloads. This is what cloud-agnostic means. The hyperscaler gives you a shared NVMe with 3 GB/s if you're lucky, and a virtualized NIC with milliseconds of added latency. We give the cable a dedicated NVMe namespace at 14.8 GB/s and a dedicated SR-IOV VF at 25 Gbps with XDP.

Performance Analysis

On the reference platform, we can calculate expected restore latencies with precision:

Storage throughput (Samsung 9100, 4TB):

  • Sequential read: 14.8 GB/s → 67.6 μs per MB
  • Random read (4KB, QD1): ~2.2M IOPS → ~0.45 μs per page
  • Random read (4KB, QD32): ~2.2M IOPS → batch amortized

Firecracker restore overhead:

  • Snapshot metadata parsing: ~1-2ms (CPU-bound, includes KVM setup)
  • vCPU resume: < 1ms
  • userfaultfd handler registration: < 0.5ms

End-to-end restore estimates:

VM Memory Active Working Set Time to First Instruction Execution Ready
256 MB ~16 MB ~4 ms ~5 ms
1 GB ~64 MB ~4 ms ~8 ms
2 GB ~256 MB ~5 ms ~22 ms
4 GB ~512 MB ~5 ms ~39 ms

Key insight: Time to First Instruction is nearly constant regardless of VM size, because Firecracker doesn't load all memory upfront. The "Execution Ready" time (when the agent has its full Python environment paged in) scales linearly with the active working set, but for typical CodeAct payloads, this is dominated by the Python runtime itself (~50-100MB).

TODO: Run actual benchmarks on the reference hardware. The numbers above are derived from spec sheets and Firecracker documentation. Real-world performance will be affected by: filesystem metadata overhead, NVMe controller firmware behavior under mixed read/write (snapshots being written while other cables demand-page), and NUMA topology effects on the 8434PN.

Scale-Out with RoCE v2

While local NVMe is the primary target, the architecture extends naturally to disaggregated storage via RDMA over Converged Ethernet v2 (RoCE v2).

The BCM57414's bnxt_re driver provides native RoCE v2 support. Combined with NVMe-over-Fabrics (NVMe-oF), this means:

  • Remote snapshots: A cable's snapshot can live on a remote NVMe target. Restore reads bypass the remote CPU entirely—RDMA reads go straight from remote NVMe to local RAM.
  • 25G single link: ~3.1 GB/s effective throughput. A 256MB working set restores in ~82ms—slower than local NVMe but still fast enough for many workloads.
  • Dual 25G (bonded): ~6.2 GB/s. Working set restore drops to ~41ms.
  • 100G (via CRS504 uplink): ~12.5 GB/s. Approaching local NVMe speeds.
  • 400G (future): Theoretical 50 GB/s. At this point, remote NVMe-oF is faster than local PCIe 4.0 drives.

TODO: Test NVMe-oF RDMA with bnxt_re driver. Validate end-to-end latency for snapshot restore over RoCE v2 at 25G. Determine whether the CRS504's switching fabric introduces meaningful latency at the packet sizes typical of userfaultfd page reads (4KB).


Pure IPv6 L3 Networking

Why IPv6-Only?

Avtivka is opinionated: the dataplane is IPv6-only. This is not a limitation—it is a prerequisite for the kind of dense, dynamic, zero-configuration networking that cables demand.

The advantages:

  • Massive address pools: A single ULA /48 gives us 65,536 /64 subnets. We will never run out of addresses, never need to NAT between cables, never need to manage DHCP leases.
  • Prefix Delegation (PD): Each cable receives its own /56 prefix. This is not a single address—it's 256 /64 subnets that the cable can subdivide among its internal gVisor containers. The Firecracker VM acts as IPv6-PD to its gVisor networknamespace.
  • SLAAC: Containers inside a cable discover each other via Stateless Address Autoconfiguration. No DHCP server, no static configuration, no address conflicts.
  • Pure L3: There is no L2 domain crossing cable boundaries. This eliminates ARP/NDP broadcast storms, MAC table exhaustion, and all the other L2 pathologies that plague large flat networks. In heavily interconnected overlays with potentially thousands of cables talking to each other, this is not a nice-to-have—it's essential.
  • No broadcast storms: In a conventional bridge-based container network, a misbehaving container can broadcast-storm the entire bridge domain. In Avtivka, there is no bridge. Each cable is a routed endpoint.

Address Architecture

fd00:avtivka::/48              ← Avtivka ULA allocation
├── fd00:avtivka:0001::/56     ← Cable 1 (256 × /64 subnets)
│   ├── fd00:avtivka:0001:0001::/64  ← Cable 1, gVisor bridge
│   └── fd00:avtivka:0001:0002::/64  ← Cable 1, additional namespace
├── fd00:avtivka:0002::/56     ← Cable 2
│   └── ...
└── fd00:avtivka:ffff::/56     ← Cable 65535

Each cable's /56 prefix is large enough to host 256 separate /64 subnets. In practice, most cables will use a single /64 for the gVisor bridge network, but the headroom exists for complex multi-network topologies.

XDP Dataplane

The Broadcom BCM57414 supports XDP in native mode, meaning eBPF programs execute directly in the NIC driver's NAPI context, before any socket buffer allocation. This is critical for the wake-on-LAN mechanism:

  1. Packet classification: The XDP program inspects incoming IPv6 packets and looks up the destination address in a BPF map that tracks cable state (RUNNING, SLEEPING, STOPPED).
  2. Fast path: If the cable is RUNNING, the packet is passed through to the VM's virtio-net device via XDP_PASS or redirected via XDP_REDIRECT to the appropriate VF.
  3. Wake path: If the cable is SLEEPING, the packet is buffered in a per-cable BPF ring buffer, and the XDP program sends a wake signal to the Avtivka supervisor via AF_XDP or an eventfd.
  4. Drop path: If the cable is STOPPED or the destination is unknown, XDP_DROP discards the packet at line rate.

TODO: Prototype the XDP wake-trigger program. Determine the optimal buffering strategy for packets that arrive during the ~5-20ms restore window. BPF ring buffer has size limits; for bursty traffic, we may need to spill to a userspace buffer or implement backpressure.

NAT64 and DNS64

Since the external internet is still predominantly IPv4, Avtivka runs a NAT64/DNS64 service for outbound connectivity. This is implemented as an XDP-accelerated translation layer on the host:

  • DNS64: Cables resolve external hostnames through a DNS64 resolver that synthesizes AAAA records for IPv4-only destinations using the well-known 64:ff9b::/96 prefix.
  • NAT64: Outbound packets to 64:ff9b::/96 are translated to IPv4 by the host's NAT64 gateway.

This is a convenience layer, not a core architectural component. Cables that only need to communicate with other cables or with IPv6-native services (which includes all major cloud APIs) do not interact with NAT64 at all.

TODO: Evaluate whether to use an existing NAT64 implementation (e.g., Jool) or implement a minimal XDP-based translator. The advantage of XDP-based NAT64 is that it can share the same BPF maps as the wake-trigger program, avoiding redundant packet classification. The disadvantage is complexity—NAT64 state tracking is nontrivial.


Zero-Trust Overlay

Consul for Discovery

Avtivka integrates with HashiCorp Consul for service discovery and health checking:

  • Service Registration: Every running cable registers itself as a Consul service with its ULA address and MCP server port. Metadata tags indicate the cable's capabilities, current state, and owning agent.
  • DNS Interface: Other cables can discover peers via Consul's DNS interface: cable-foo.service.consul resolves to the cable's ULA.
  • Health Checks: Consul performs periodic health checks. If a cable fails to respond (because it has been snapshotted), Consul marks it as "maintenance"—not unhealthy. This distinction is important: a sleeping cable is not broken, it's conserving resources.

Vault for Identity and Secrets

HashiCorp Vault provides two critical services:

  1. mTLS Identity: Every newly spawned cable receives a short-lived X.509 certificate from Vault's PKI secret engine. All inter-cable communication is mutually authenticated TLS. Certificate lifetimes are short (hours, not days) and renewed automatically on wake.

  2. Dynamic Credentials: Cables can lease credentials for external resources via Vault's secret engines:

    • Database: PostgreSQL, MySQL, MongoDB credentials rotated per-cable.
    • Cloud IAM: Temporary AWS, GCP, Azure credentials for cloud resource access.
    • KV: Shared secrets and configuration distributed to cables by policy.

Inter-Cable Policy

Access between cables is governed by Vault policies and Consul intentions:

  • Vault Policies: Define which cables (by role, tag, or identity) can access which secret paths. A "database-agent" cable can lease PostgreSQL credentials; a "code-review" cable cannot.
  • Consul Intentions: Define which cables can communicate with which other cables at the network level. These are L7 intentions that enforce mTLS identity checks.

TODO: Design the policy language for inter-cable access. Should this be purely Consul intentions + Vault policies, or do we need a higher-level abstraction? Consider whether a simplified Zanzibar-style RBAC is warranted, or if the existing HashiCorp primitives are sufficient. The cablectl prior art mentioned Extended RBAC as in-scope.


Observability

Avtivka provides deep observability through two complementary paths:

  1. Langfuse: The primary tracing backend for LLM interactions. Every inference call, every tool invocation, every agent decision is traced in Langfuse. The cable semantics calls for strong LLM Ops so that the intuition space and the action space can be traced simultaneously.

  2. Pub/Sub Shell Access: Avtivka exposes a pub/sub interface that allows operators to:

    • Acquire a live shell in any sandbox (similar to docker exec).
    • Tap into real-time logs from any container in any cable.
    • Stream LLM traces alongside execution traces for a unified view.

TODO: Define the pub/sub transport. gRPC streaming? WebSockets? NATS? The transport must support multiplexing (many subscribers to one cable's output), backpressure (don't overwhelm observers), and authentication (only authorized operators can attach to a cable).


Prior Art

Avtivka builds on and supersedes several prior projects:

  • cablectl: The original cable controller, built on Jupyter Enterprise Gateway. It proved the cable semantics concept but was limited by Jupyter's kernel lifecycle model—cold starts were too slow, and persistence was not native to the architecture.
  • simp: The CLI tool and API gateway for LLM inference. Simp remains the primary frontend for the intuition space—the cmd/simp daemon handles model routing, batch API normalization, and cable format parsing. Avtivka takes over the action space that simp's tool-use features were never able to fully address.
  • CodeAct: The Apple research paper that introduced the idea of agents communicating through code rather than JSON function calls. Avtivka is the infrastructure that makes CodeAct practical at scale.

License

GNU Affero General Public License v3.0 (AGPL-3.0)

If you're going to modify our software, we want to see the patches, not a fig in the pocket.

About

Avtivka implements CodeAct action spaces via persistent, interruptible FaaS

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors