Architecture
ZERG is a production-grade agentic orchestration platform built on three pillars.
System Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Limon │ │ ZERG CLI │ │ API │
│ (Vue.js) │ │ (LuaJIT) │ │ Clients │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────┬───────┴───────────────────┘
│ HTTP/SSE/WebSocket
┌───────────┴───────────┐
│ Sol Server │
│ (Erlang/OTP) │
│ ┌─────────────────┐ │
│ │ Cowboy HTTP │ │
│ │ 120+ REST endpoints │ │
│ ├─────────────────┤ │
│ │ Event Store │ │
│ │ (Mnesia, 35 │ │
│ │ event types) │ │
│ ├─────────────────┤ │
│ │ Workflow Engine │ │
│ │ (gen_statem) │ │
│ ├─────────────────┤ │
│ │ Inference │ │
│ │ (llama.cpp) │ │
│ ├─────────────────┤ │
│ │ Scheduler │ │
│ │ (cron/at/every) │ │
│ └─────────────────┘ │
└───────────┬───────────┘
│ ZMQ (ROUTER/DEALER)
┌───────────┴───────────┐
│ Worker Boundary │
│ ┌─────┐ ┌─────┐ │
│ │Luna │ │Luna │ │
│ │ │ │ │ │
│ └─────┘ └─────┘ │
│ (LuaJIT, <10MB each) │
└───────────────────────┘Component Details
Sol Server (Erlang/OTP)
- 289 modules, ~39,500 lines of Erlang code
- 120+ REST endpoints with SSE streaming
- WebSocket bidirectional chat with Mnesia history (30-day retention)
- Event sourcing with Mnesia (35 event types)
- gen_statem-based workflow engine with compensation
- llama.cpp local inference with remote provider fallback
- 12 provider adapters (Anthropic, OpenAI, z.ai, Ollama, Gemini, DeepSeek, Alibaba, Bedrock, Xiaomi, MiniMax, declarative)
- Prometheus metrics + OpenTelemetry tracing
Luna Agent (LuaJIT)
- 283 modules, ~50,000 lines of Lua code
- 994KB binary (bytecode + XOR + CRC32)
- 45+ built-in tools + 14 computer use tools + MCP discovery
- 6 agent modes (Build/Plan/Explore/Review/General/Coordinate) + user-defined
- MCP client (stdio + HTTP + WebSocket, multi-server, OAuth PKCE)
- Dual-write sessions (JSONL + SQLite with FTS5)
- Git auto-snapshots, undo stack, diff view
Mango (Python/Tornado)
- Auth service with token validation and RBAC (56 endpoint-specific permissions)
- ZMQ event bridge to Sol for live cache invalidation
- Migration runner, rate limiting (IP + per-account brute-force)
- Microservice framework for extending the platform
ZMQ Gateway Architecture
The ZMQ gateway uses a dedicated sharded architecture for scalability:
- sol_zmq_sender — owns the ROUTER socket, serializes all outbound sends
- sol_zmq_recv_dispatcher — dedicated gen_server for inbound message routing
- sol_zmq_result_router — routes child task results and manages parent/child cleanup
- sol_zmq_recv_bridge — recv loop with error backoff, extracted from gateway
- ETS indexes (
sol_zmq_tasks,sol_zmq_workers) for O(1) task/worker lookups - Plugin tool executor pool (
simple_one_for_one) isolates plugin tool execution
Extension System
Plugins can register custom tools, commands, modes, and event handlers:
- 15 bridge modules (tools, commands, sessions, providers, workflows, agents, memory, scheduler, etc.)
- Extension lifecycle (init/register/start/stop) with context invalidation
- Plugin tool dispatch via ZMQ RPC (server-side Luerl execution)
- Permission priority chain: user config > extension rules > mode defaults
Background Subagents
Fire-and-forget subagents with result aggregation:
- 5 backends (local/zmq/luerl/team/background) with tool restriction
- Background spawn with token budget propagation
- Result aggregation (
collect_results,aggregate_results) - Permission gating on
background_spawntool
Chat Interface
Bidirectional chat via WebSocket with SSE fallback:
- Cookie-based authentication (
zerg_session,HttpOnly; Secure; SameSite=Strict) - Server-side message history in Mnesia (30-day retention, hourly cleanup)
- Streaming token delivery with interrupt support
- Limon chat UI with markdown rendering, syntax highlighting, auto-reconnect
Memory Pipeline
Cross-session persistent memory with extraction and consolidation:
- Heuristic memory extractor (fact/preference extraction from conversations)
- Algorithmic consolidator (dedup + prune with 80% similarity threshold)
- Session close hook triggers memory extraction
- KV+TTL storage with provider abstraction
Key Design Decisions
- Erlang/OTP for massive concurrency and fault tolerance
- LuaJIT for ultra-lightweight worker nodes
- ZMQ for the polyglot worker boundary
- Event sourcing as the orchestration backbone
- llama.cpp for local-first inference