A Multi-Agent Orchestration Layer for the Homelab

Clawdbot just hit 60k GitHub stars. Mac Minis are selling out everywhere. The vibe-coded personal agent moment has arrived and everyone is scrambling to figure out where these things actually live.

I've been working through the same problem from a different angle. Not "how do I run Claude on a Mac Mini" but the layer underneath that — how do you build infrastructure that lets multiple agents spin up, do work, and die cleanly without turning your home network into a liability? The answer I keep arriving at is an orchestration layer. One agent to rule them, Docker containers for isolation, Tailscale for access. Not a product. A topology.

The architecture

┌─────────────────────────────────────────┐
│         Tailscale VPN                   │
├─────────────────────────────────────────┤
│    Homelab Server                       │
│    ├─ Orchestrator (Anthropic SDK)      │
│    ├─ Dashboard (observability/control) │
│    └─ Docker daemon                     │
├─────────────────────────────────────────┤
│    Docker Containers (on-demand)        │
│    ├─ Agent A                           │
│    ├─ Agent B                           │
│    └─ ...                               │
└─────────────────────────────────────────┘

The insight that made this click for me: the orchestrator isn't a manager, it's a thick thread. You talk to one agent. That agent decides whether the task needs help, spawns containers for sub-agents if it does, collects results when they're done. Each spawned agent gets its own Docker container — isolated filesystem, no host access, destroyed on completion. The orchestrator is the only thing with a persistent identity.

This matters because most "multi-agent" setups I've seen are really just multiple API calls wearing a trench coat. No isolation. No lifecycle management. State bleeding everywhere. The container boundary changes that equation entirely.

Development to production

The workflow I'm planning splits into two distinct phases.

Development happens in Claude Code with a Max subscription — iterate fast, test locally, break things cheaply. The agent works on your machine, you can watch it think, you can intervene when it goes sideways. I've been building agents this way for months and the feedback loop is tight enough to be genuinely productive.

Production is different. The orchestrator calls the API directly, manages container lifecycle — spin up, execute, monitor, destroy. No lingering processes. No state bleed between runs. The same agent code runs in both environments but the operational constraints are completely different.

This separation is the whole point. Development should be messy and cheap. Production should be predictable and contained.

Why Docker and not just... running things

I keep getting asked why containers instead of just running agents as processes. Fair question. Here's what pushed me toward isolation.

Agents that can execute code have access to whatever the host process has access to. Your SSH keys. Your browser cookies. That .env file with API keys you forgot to rotate. One hallucinated rm -rf away from a very bad afternoon. Containers draw a hard line. The agent sees its own filesystem and nothing else.

Dependencies are the other reason. One agent needs Node 18 and Puppeteer. Another needs Python 3.12 with specific ML libraries. A third needs a full LaTeX installation. Running these as bare processes means dependency conflicts, version mismatches, and debugging sessions that have nothing to do with the actual work. Containers make that someone else's problem — specifically, the Dockerfile's problem.

The orchestrator handles the complexity of building containers on demand, mounting the right volumes, setting environment variables. From the agent's perspective it's just running. From mine, it's safely boxed.

The dashboard problem

Here's something I've learned from running agents on my own projects: the scariest moment is when an agent has been running for forty-five minutes and you have no idea what it's doing. Is it stuck in a loop? Did it burn through your API budget? Is it quietly rewriting files it shouldn't touch?

The dashboard exists to answer those questions. Active agents and their current state. Context window usage — how close to the limit. API costs accumulating in real time. Logs from each execution. Files consumed and produced.

Not pretty graphs. Operational awareness. The difference between "my agent is running" and "my agent is on its third attempt at a task that should have taken one, and it's consumed 200k tokens doing it." That second thing happens more than people admit.

Tailscale as the glue

The networking piece was the part I expected to be painful. Port forwarding, dynamic DNS, firewall rules, SSL certificates for a home server — all the reasons homelabs stay local. Tailscale collapses that entire problem. Your server joins a tailnet. You access it from any device you've authorized. No exposed ports. No public IP. The orchestrator API and dashboard listen only on the Tailscale interface.

Practically, this means kicking off a research task from my phone while I'm out, checking progress from my laptop later, reviewing results at home. The infrastructure follows me without me having to think about network topology. I've been running other services on Tailscale for a while and the reliability has been solid enough that I trust it for this.

The Anthropic SDK as foundation

Building on the Anthropic Agent SDK — the same thing that powers Claude Code — gives you portability for free. Develop locally, deploy to homelab, burst to cloud if a task needs more compute than a home server provides. E2B slots in here naturally: the orchestrator can choose local Docker for most tasks and E2B sandboxes when you need something your hardware can't handle.

The SDK handles tool use, conversation management, token counting. The orchestrator ends up surprisingly thin — mostly container lifecycle management and routing. That thinness is a feature. Less orchestrator code means fewer places for bugs to hide in the layer between you and your agents.

What I haven't figured out yet

How minimal can the orchestrator actually be? My instinct says it needs agent registration, dependency management, and basic scheduling. But every feature I add is a feature I have to maintain, and the history of infrastructure software is littered with orchestrators that became more complex than the things they orchestrated.

Long-running tasks are the interesting design problem. Should the orchestrator implement proper job queues, or is fire-and-forget plus polling from the dashboard enough? I've gone back and forth on this. Queues add reliability but also add a message broker, persistence, retry logic — suddenly you're building a distributed system instead of a thin coordination layer.

The terminal-versus-dashboard question keeps coming back too. I want CLI access for debugging — tailing logs, inspecting container state, killing runaway processes. But the dashboard gives you the overview that makes CLI poking unnecessary most of the time. Probably both. But which one is primary shapes everything else about the interface.

Where this actually stands

I want to be direct about what exists and what doesn't. The architecture is designed. The individual pieces — the Anthropic SDK, Docker containerization, Tailscale networking — are all things I've used in production for other projects. The orchestration layer that ties them together is what I'm building toward.

The demand is obviously real. Clawdbot's reception proved that people want personal agents running on their own hardware. But most of what's out there right now is "run an agent on a Mac Mini" — single-agent, single-machine, no isolation, no observability. That works for demos. It doesn't work for running agents you actually depend on.

What I'm after is the infrastructure layer that sits underneath the exciting stuff. Container isolation so agents can't trash your system. Observability so you know what's happening. Remote access so the server works for you whether you're at your desk or not. None of that is glamorous. All of it is necessary once you move past the "look what Claude can do" phase into "I need this to work reliably every day."

The pieces exist. The question is how thin the glue between them can be while still holding.

A Multi-Agent Orchestration Layer for the Homelab

The architecture

Development to production

Why Docker and not just... running things

The dashboard problem

Tailscale as the glue

The Anthropic SDK as foundation

What I haven't figured out yet

Where this actually stands

Related Posts

What Actually Breaks When Agents Run Long

The Bazaar Ships Again

Sandboxed Execution for AI Agents: Why Isolation Is the Real Problem