Skip to content

Chat API

ZERG provides two chat transports: WebSocket (bidirectional, recommended for UI) and SSE (unidirectional, for simple integrations).

WebSocket Transport

The primary chat interface. Authenticates via zerg_session cookie (set on Mango login) or Authorization header.

Connect

ws://host:11434/api/v1/chat/:session_id/ws

The session_id is obtained from POST /api/v1/chat.

Client → Server Messages

Send a message

json
{ "type": "message", "content": "Explain quicksort" }

Interrupt streaming

json
{ "type": "interrupt" }

Request message history

json
{ "type": "history" }

Heartbeat

json
{ "type": "ping" }

Server → Client Messages

TypeFieldsDescription
connectedsession_idConnection confirmed
historymessages[]Last 50 messages (on connect/reconnect)
tokendataStreaming token fragment
doneusageGeneration complete with token usage
errorerrorError message
interruptedGeneration was interrupted
pongtsHeartbeat response

Example

javascript
const ws = new WebSocket(
  `ws://${host}:11434/api/v1/chat/${sessionId}/ws`
)

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data)
  switch (msg.type) {
    case 'token':
      appendText(msg.data)
      break
    case 'done':
      finishGeneration(msg.usage)
      break
    case 'error':
      showError(msg.error)
      break
  }
}

ws.send(JSON.stringify({
  type: 'message',
  content: 'Write a Fibonacci function'
}))

Auth Refresh

The server sends an auth refresh challenge every 5 minutes. The client must re-authenticate via the same cookie/header mechanism. If refresh fails, the connection is closed with code 4001.

Heartbeat

Server sends ping every 30 seconds. Client should respond with pong. Missing 3 consecutive responses results in connection close.

SSE Transport

Fallback transport for simple integrations. Unidirectional: client sends messages via REST, receives tokens via SSE.

Create Session

POST /api/v1/chat
json
{ "model": "glm-4.7-flash", "mode": "build" }

Response:

json
{
  "ok": true,
  "data": {
    "id": "a1b2c3d4e5f6",
    "worker_id": "chat-a1b2c3d4e5f6",
    "model": "glm-4.7-flash",
    "mode": "build",
    "status": "active"
  }
}

Stream Tokens

GET /api/v1/chat/:id/stream

Returns an SSE stream. Events: connected, token, done, error.

bash
curl -N http://127.0.0.1:11434/api/v1/chat/a1b2c3d4e5f6/stream \
  -H "Authorization: Bearer $TOKEN"

Send Message

POST /api/v1/chat/:id/message
json
{ "content": "Explain quicksort" }

End Session

DELETE /api/v1/chat/:id

Session Management

List Sessions

GET /api/v1/chat

Returns active sessions for the authenticated user. Admin users see all sessions.

Get Session Details

GET /api/v1/chat/:id

Message History

GET /api/v1/chat/:id/history?limit=50

Messages are stored for 30 days in Mnesia. Supports pagination via limit query parameter.

Response:

json
{
  "ok": true,
  "data": [
    {
      "id": "msg-abc123",
      "session_id": "a1b2c3d4e5f6",
      "role": "user",
      "content": "Hello",
      "timestamp": 1745600000
    }
  ]
}

Permissions

Chat endpoints require chat:read permission. Sessions are scoped to the authenticated user — non-admin users can only access their own sessions.

Architecture: Luna Harness, Not a Thin Proxy

The web chat does NOT talk directly to LLM providers. Every message flows through the full Luna agent harness — the same runtime used by the TUI and CLI. This means web chat users get the complete agent experience: tool execution, session persistence, memory, MCP servers, compaction, and multi-provider failover.

Limon is a thin UI shell. Sol is the message bus. Luna is the brain.

Data Flow

┌──────────┐       ┌──────────┐       ┌──────────┐       ┌──────────┐
│  Limon   │       │   Sol    │       │   ZMQ    │       │   Luna   │
│ (Browser)│       │(Erlang)  │       │(ROUTER)  │       │(LuaJIT)  │
└────┬─────┘       └────┬─────┘       └────┬─────┘       └────┬─────┘
     │                  │                  │                  │
     │  WS connect      │                  │                  │
     │─────────────────►│  spawn worker    │                  │
     │                  │─────────────────►│  DEALER connect  │
     │                  │                  │─────────────────►│
     │                  │                  │                  │
     │  {type:message}  │                  │                  │
     │─────────────────►│  task msgpack    │                  │
     │                  │─────────────────►│  JSONL stdin     │
     │                  │                  │─────────────────►│
     │                  │                  │                  │
     │                  │                  │                  │  ┌─────────────┐
     │                  │                  │                  │  │ Agent Loop  │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │Provider │ │
     │                  │                  │                  │  │ │(Anthropic│ │
     │                  │                  │                  │  │ │OpenAI   │ │
     │                  │                  │                  │  │ │z.ai)    │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │  Tools   │ │
     │                  │                  │                  │  │ │(45+MCP) │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │ Session  │ │
     │                  │                  │                  │  │ │(JSONL+   │ │
     │                  │                  │                  │  │ │ SQLite)  │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │ Memory   │ │
     │                  │                  │                  │  │ │(KV+TTL)  │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  └─────────────┘
     │                  │                  │                  │
     │                  │                  │  token (JSONL)   │
     │                  │  {sol_token}     │◄─────────────────│
     │  {type:token}    │◄────────────────│                  │
     │◄─────────────────│                  │                  │
     │                  │                  │  ...more tokens  │
     │                  │                  │                  │
     │                  │                  │  done (JSONL)    │
     │                  │  {sol_done}      │◄─────────────────│
     │  {type:done}     │◄────────────────│                  │
     │◄─────────────────│                  │                  │

What Luna Provides in Web Chat Mode

Every web chat session runs a complete Luna agent. When Limon sends a message, the Luna worker:

  1. Appends to session — JSONL + SQLite dual-write, same as TUI/CLI
  2. Builds context — system prompt, session history, memory injection, context file scanning
  3. Calls provider — multi-provider failover (Anthropic → OpenAI → z.ai → Ollama)
  4. Streams tokens — each token fragment sent back to Limon via ZMQ → Sol → WS
  5. Executes tools — 45 built-in + MCP tools, parallel execution for async tools
  6. Manages memory — cross-session KV+TTL, heuristic extraction on session close
  7. Compacts when needed — multi-strategy compaction at 70% context threshold
  8. Persists checkpoints — automatic checkpoint before risky operations

What Sol Does NOT Do

  • Sol does not call LLM providers directly for chat
  • Sol does not execute tools or manage conversation state
  • Sol does not handle sessions or memory
  • Sol is a message bus: it receives WS messages, dispatches to Luna via ZMQ, and relays streaming tokens back

Why This Matters

  • Consistency: Same agent behavior whether using TUI, CLI, or web chat
  • Full tool access: Web chat users can run bash, edit files, search code — not just talk
  • Session portability: Start in web, resume in TUI with --resume — same JSONL session
  • Provider independence: Swap providers without touching Limon or Sol

Released under the MIT License.