Chat API

ZERG provides two chat transports: WebSocket (bidirectional, recommended for UI) and SSE (unidirectional, for simple integrations).

WebSocket Transport

The primary chat interface. Authenticates via zerg_session cookie (set on Mango login) or Authorization header.

Connect

ws://host:11434/api/v1/chat/:session_id/ws

The session_id is obtained from POST /api/v1/chat.

Client → Server Messages

Send a message

json

{ "type": "message", "content": "Explain quicksort" }

Interrupt streaming

json

{ "type": "interrupt" }

Request message history

json

{ "type": "history" }

Heartbeat

json

{ "type": "ping" }

Server → Client Messages

Type	Fields	Description
`connected`	`session_id`	Connection confirmed
`history`	`messages[]`	Last 50 messages (on connect/reconnect)
`token`	`data`	Streaming token fragment
`done`	`usage`	Generation complete with token usage
`error`	`error`	Error message
`interrupted`	—	Generation was interrupted
`pong`	`ts`	Heartbeat response

Example

javascript

const ws = new WebSocket(
  `ws://${host}:11434/api/v1/chat/${sessionId}/ws`
)

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data)
  switch (msg.type) {
    case 'token':
      appendText(msg.data)
      break
    case 'done':
      finishGeneration(msg.usage)
      break
    case 'error':
      showError(msg.error)
      break
  }
}

ws.send(JSON.stringify({
  type: 'message',
  content: 'Write a Fibonacci function'
}))

Auth Refresh

The server sends an auth refresh challenge every 5 minutes. The client must re-authenticate via the same cookie/header mechanism. If refresh fails, the connection is closed with code 4001.

Heartbeat

Server sends ping every 30 seconds. Client should respond with pong. Missing 3 consecutive responses results in connection close.

SSE Transport

Fallback transport for simple integrations. Unidirectional: client sends messages via REST, receives tokens via SSE.

Create Session

POST /api/v1/chat

json

{ "model": "glm-4.7-flash", "mode": "build" }

Response:

json

{
  "ok": true,
  "data": {
    "id": "a1b2c3d4e5f6",
    "worker_id": "chat-a1b2c3d4e5f6",
    "model": "glm-4.7-flash",
    "mode": "build",
    "status": "active"
  }
}

Stream Tokens

GET /api/v1/chat/:id/stream

Returns an SSE stream. Events: connected, token, done, error.

bash

curl -N http://127.0.0.1:11434/api/v1/chat/a1b2c3d4e5f6/stream \
  -H "Authorization: Bearer $TOKEN"

Send Message

POST /api/v1/chat/:id/message

json

{ "content": "Explain quicksort" }

End Session

DELETE /api/v1/chat/:id

Session Management

List Sessions

GET /api/v1/chat

Returns active sessions for the authenticated user. Admin users see all sessions.

Get Session Details

GET /api/v1/chat/:id

Message History

GET /api/v1/chat/:id/history?limit=50

Messages are stored for 30 days in Mnesia. Supports pagination via limit query parameter.

Response:

json

{
  "ok": true,
  "data": [
    {
      "id": "msg-abc123",
      "session_id": "a1b2c3d4e5f6",
      "role": "user",
      "content": "Hello",
      "timestamp": 1745600000
    }
  ]
}

Permissions

Chat endpoints require chat:read permission. Sessions are scoped to the authenticated user — non-admin users can only access their own sessions.

Architecture: Luna Harness, Not a Thin Proxy

The web chat does NOT talk directly to LLM providers. Every message flows through the full Luna agent harness — the same runtime used by the TUI and CLI. This means web chat users get the complete agent experience: tool execution, session persistence, memory, MCP servers, compaction, and multi-provider failover.

Limon is a thin UI shell. Sol is the message bus. Luna is the brain.

Data Flow

┌──────────┐       ┌──────────┐       ┌──────────┐       ┌──────────┐
│  Limon   │       │   Sol    │       │   ZMQ    │       │   Luna   │
│ (Browser)│       │(Erlang)  │       │(ROUTER)  │       │(LuaJIT)  │
└────┬─────┘       └────┬─────┘       └────┬─────┘       └────┬─────┘
     │                  │                  │                  │
     │  WS connect      │                  │                  │
     │─────────────────►│  spawn worker    │                  │
     │                  │─────────────────►│  DEALER connect  │
     │                  │                  │─────────────────►│
     │                  │                  │                  │
     │  {type:message}  │                  │                  │
     │─────────────────►│  task msgpack    │                  │
     │                  │─────────────────►│  JSONL stdin     │
     │                  │                  │─────────────────►│
     │                  │                  │                  │
     │                  │                  │                  │  ┌─────────────┐
     │                  │                  │                  │  │ Agent Loop  │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │Provider │ │
     │                  │                  │                  │  │ │(Anthropic│ │
     │                  │                  │                  │  │ │OpenAI   │ │
     │                  │                  │                  │  │ │z.ai)    │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │  Tools   │ │
     │                  │                  │                  │  │ │(45+MCP) │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │ Session  │ │
     │                  │                  │                  │  │ │(JSONL+   │ │
     │                  │                  │                  │  │ │ SQLite)  │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  │ ┌─────────┐ │
     │                  │                  │                  │  │ │ Memory   │ │
     │                  │                  │                  │  │ │(KV+TTL)  │ │
     │                  │                  │                  │  │ └─────────┘ │
     │                  │                  │                  │  └─────────────┘
     │                  │                  │                  │
     │                  │                  │  token (JSONL)   │
     │                  │  {sol_token}     │◄─────────────────│
     │  {type:token}    │◄────────────────│                  │
     │◄─────────────────│                  │                  │
     │                  │                  │  ...more tokens  │
     │                  │                  │                  │
     │                  │                  │  done (JSONL)    │
     │                  │  {sol_done}      │◄─────────────────│
     │  {type:done}     │◄────────────────│                  │
     │◄─────────────────│                  │                  │

What Luna Provides in Web Chat Mode

Every web chat session runs a complete Luna agent. When Limon sends a message, the Luna worker:

Appends to session — JSONL + SQLite dual-write, same as TUI/CLI
Builds context — system prompt, session history, memory injection, context file scanning
Calls provider — multi-provider failover (Anthropic → OpenAI → z.ai → Ollama)
Streams tokens — each token fragment sent back to Limon via ZMQ → Sol → WS
Executes tools — 45 built-in + MCP tools, parallel execution for async tools
Manages memory — cross-session KV+TTL, heuristic extraction on session close
Compacts when needed — multi-strategy compaction at 70% context threshold
Persists checkpoints — automatic checkpoint before risky operations

What Sol Does NOT Do

Sol does not call LLM providers directly for chat
Sol does not execute tools or manage conversation state
Sol does not handle sessions or memory
Sol is a message bus: it receives WS messages, dispatches to Luna via ZMQ, and relays streaming tokens back

Why This Matters

Consistency: Same agent behavior whether using TUI, CLI, or web chat
Full tool access: Web chat users can run bash, edit files, search code — not just talk
Session portability: Start in web, resume in TUI with --resume — same JSONL session
Provider independence: Swap providers without touching Limon or Sol

Chat API ​

WebSocket Transport ​

Connect ​

Client → Server Messages ​

Send a message ​

Interrupt streaming ​

Request message history ​

Heartbeat ​

Server → Client Messages ​

Example ​

Auth Refresh ​

Heartbeat ​

SSE Transport ​

Create Session ​

Stream Tokens ​

Send Message ​

End Session ​

Session Management ​

List Sessions ​

Get Session Details ​

Message History ​

Permissions ​

Architecture: Luna Harness, Not a Thin Proxy ​

Data Flow ​

What Luna Provides in Web Chat Mode ​

What Sol Does NOT Do ​

Why This Matters ​

Chat API

WebSocket Transport

Connect

Client → Server Messages

Send a message

Interrupt streaming

Request message history

Heartbeat

Server → Client Messages

Example

Auth Refresh

Heartbeat

SSE Transport

Create Session

Stream Tokens

Send Message

End Session

Session Management

List Sessions

Get Session Details

Message History

Permissions

Architecture: Luna Harness, Not a Thin Proxy

Data Flow

What Luna Provides in Web Chat Mode

What Sol Does NOT Do

Why This Matters