Chat API
ZERG provides two chat transports: WebSocket (bidirectional, recommended for UI) and SSE (unidirectional, for simple integrations).
WebSocket Transport
The primary chat interface. Authenticates via zerg_session cookie (set on Mango login) or Authorization header.
Connect
ws://host:11434/api/v1/chat/:session_id/wsThe session_id is obtained from POST /api/v1/chat.
Client → Server Messages
Send a message
{ "type": "message", "content": "Explain quicksort" }Interrupt streaming
{ "type": "interrupt" }Request message history
{ "type": "history" }Heartbeat
{ "type": "ping" }Server → Client Messages
| Type | Fields | Description |
|---|---|---|
connected | session_id | Connection confirmed |
history | messages[] | Last 50 messages (on connect/reconnect) |
token | data | Streaming token fragment |
done | usage | Generation complete with token usage |
error | error | Error message |
interrupted | — | Generation was interrupted |
pong | ts | Heartbeat response |
Example
const ws = new WebSocket(
`ws://${host}:11434/api/v1/chat/${sessionId}/ws`
)
ws.onmessage = (event) => {
const msg = JSON.parse(event.data)
switch (msg.type) {
case 'token':
appendText(msg.data)
break
case 'done':
finishGeneration(msg.usage)
break
case 'error':
showError(msg.error)
break
}
}
ws.send(JSON.stringify({
type: 'message',
content: 'Write a Fibonacci function'
}))Auth Refresh
The server sends an auth refresh challenge every 5 minutes. The client must re-authenticate via the same cookie/header mechanism. If refresh fails, the connection is closed with code 4001.
Heartbeat
Server sends ping every 30 seconds. Client should respond with pong. Missing 3 consecutive responses results in connection close.
SSE Transport
Fallback transport for simple integrations. Unidirectional: client sends messages via REST, receives tokens via SSE.
Create Session
POST /api/v1/chat{ "model": "glm-4.7-flash", "mode": "build" }Response:
{
"ok": true,
"data": {
"id": "a1b2c3d4e5f6",
"worker_id": "chat-a1b2c3d4e5f6",
"model": "glm-4.7-flash",
"mode": "build",
"status": "active"
}
}Stream Tokens
GET /api/v1/chat/:id/streamReturns an SSE stream. Events: connected, token, done, error.
curl -N http://127.0.0.1:11434/api/v1/chat/a1b2c3d4e5f6/stream \
-H "Authorization: Bearer $TOKEN"Send Message
POST /api/v1/chat/:id/message{ "content": "Explain quicksort" }End Session
DELETE /api/v1/chat/:idSession Management
List Sessions
GET /api/v1/chatReturns active sessions for the authenticated user. Admin users see all sessions.
Get Session Details
GET /api/v1/chat/:idMessage History
GET /api/v1/chat/:id/history?limit=50Messages are stored for 30 days in Mnesia. Supports pagination via limit query parameter.
Response:
{
"ok": true,
"data": [
{
"id": "msg-abc123",
"session_id": "a1b2c3d4e5f6",
"role": "user",
"content": "Hello",
"timestamp": 1745600000
}
]
}Permissions
Chat endpoints require chat:read permission. Sessions are scoped to the authenticated user — non-admin users can only access their own sessions.
Architecture: Luna Harness, Not a Thin Proxy
The web chat does NOT talk directly to LLM providers. Every message flows through the full Luna agent harness — the same runtime used by the TUI and CLI. This means web chat users get the complete agent experience: tool execution, session persistence, memory, MCP servers, compaction, and multi-provider failover.
Limon is a thin UI shell. Sol is the message bus. Luna is the brain.
Data Flow
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Limon │ │ Sol │ │ ZMQ │ │ Luna │
│ (Browser)│ │(Erlang) │ │(ROUTER) │ │(LuaJIT) │
└────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │ │
│ WS connect │ │ │
│─────────────────►│ spawn worker │ │
│ │─────────────────►│ DEALER connect │
│ │ │─────────────────►│
│ │ │ │
│ {type:message} │ │ │
│─────────────────►│ task msgpack │ │
│ │─────────────────►│ JSONL stdin │
│ │ │─────────────────►│
│ │ │ │
│ │ │ │ ┌─────────────┐
│ │ │ │ │ Agent Loop │
│ │ │ │ │ ┌─────────┐ │
│ │ │ │ │ │Provider │ │
│ │ │ │ │ │(Anthropic│ │
│ │ │ │ │ │OpenAI │ │
│ │ │ │ │ │z.ai) │ │
│ │ │ │ │ └─────────┘ │
│ │ │ │ │ ┌─────────┐ │
│ │ │ │ │ │ Tools │ │
│ │ │ │ │ │(45+MCP) │ │
│ │ │ │ │ └─────────┘ │
│ │ │ │ │ ┌─────────┐ │
│ │ │ │ │ │ Session │ │
│ │ │ │ │ │(JSONL+ │ │
│ │ │ │ │ │ SQLite) │ │
│ │ │ │ │ └─────────┘ │
│ │ │ │ │ ┌─────────┐ │
│ │ │ │ │ │ Memory │ │
│ │ │ │ │ │(KV+TTL) │ │
│ │ │ │ │ └─────────┘ │
│ │ │ │ └─────────────┘
│ │ │ │
│ │ │ token (JSONL) │
│ │ {sol_token} │◄─────────────────│
│ {type:token} │◄────────────────│ │
│◄─────────────────│ │ │
│ │ │ ...more tokens │
│ │ │ │
│ │ │ done (JSONL) │
│ │ {sol_done} │◄─────────────────│
│ {type:done} │◄────────────────│ │
│◄─────────────────│ │ │What Luna Provides in Web Chat Mode
Every web chat session runs a complete Luna agent. When Limon sends a message, the Luna worker:
- Appends to session — JSONL + SQLite dual-write, same as TUI/CLI
- Builds context — system prompt, session history, memory injection, context file scanning
- Calls provider — multi-provider failover (Anthropic → OpenAI → z.ai → Ollama)
- Streams tokens — each token fragment sent back to Limon via ZMQ → Sol → WS
- Executes tools — 45 built-in + MCP tools, parallel execution for async tools
- Manages memory — cross-session KV+TTL, heuristic extraction on session close
- Compacts when needed — multi-strategy compaction at 70% context threshold
- Persists checkpoints — automatic checkpoint before risky operations
What Sol Does NOT Do
- Sol does not call LLM providers directly for chat
- Sol does not execute tools or manage conversation state
- Sol does not handle sessions or memory
- Sol is a message bus: it receives WS messages, dispatches to Luna via ZMQ, and relays streaming tokens back
Why This Matters
- Consistency: Same agent behavior whether using TUI, CLI, or web chat
- Full tool access: Web chat users can run bash, edit files, search code — not just talk
- Session portability: Start in web, resume in TUI with
--resume— same JSONL session - Provider independence: Swap providers without touching Limon or Sol