ZMQ Protocol Reference
The ZMQ worker protocol defines how Sol communicates with worker agents over the wire. Workers connect as DEALER sockets to Sol's ROUTER gateway using msgpack encoding.
Protocol Version
| Property | Value |
|---|---|
| Version | 1.0 |
| Transport | ZMQ (ZMTP/4.0) |
| Pattern | DEALER-ROUTER |
| Encoding | msgpack (map format, binary strings) |
Topology
Worker (DEALER) <--> Sol ROUTER (port 5555)
+-- Worker 1 (DEALER)
Sol ROUTER (5555) -------+-- Worker 2 (DEALER)
+-- Worker 3 (DEALER)
Sol PUB (5556) --------> All subscribers (event bridge)Multiple workers connect simultaneously. Sol routes tasks to the first available (ready) worker. If no workers are ready, tasks queue until one becomes available.
Connection Model
- Worker opens DEALER socket, sets
ZMQ.IDENTITY(optional), connects totcp://<host>:<port> - Worker sends a
readymessage immediately after connect - Sol registers the worker and begins routing tasks
- Worker sends
readyagain after each task completes (heartbeat + re-availability) - Reconnection is handled by ZMQ internally (DEALER reconnects automatically)
Wire Framing
DEALER/ROUTER uses multipart ZMQ messages. The ROUTER socket prepends the worker's identity frame:
[identity_frame] [payload_frame]Workers send:
[empty_delimiter] [payload_frame]The DEALER socket adds the empty delimiter automatically. The payload_frame is a single msgpack-packed map.
Encoding Conventions
- msgpack options:
{pack_str, from_binary},{unpack_str, as_binary},{map_format, map} - All keys are binary strings (not atoms in Erlang, not native strings in Python)
- Python:
msgpack.pack()withuse_bin_type=True - Python:
msgpack.unpack()withraw=False
Message Types
Worker to Sol
ready -- Announce Availability
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "ready" |
worker_id | string | yes | Unique worker identifier |
capabilities | array of strings | yes | Feature tags (e.g., ["tools", "streaming"]) |
Send on connect and after completing each task. Acts as heartbeat and availability signal.
token -- Stream Partial Output
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "token" |
task_id | string | yes | Task identifier from the task message |
content | string | yes | Partial output text |
Send zero or more times during task execution. Enables real-time streaming to HTTP clients.
result -- Task Completed
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "result" |
task_id | string | yes | Task identifier |
status | string | yes | "ok" or "cancelled" |
content | string | yes | Final output text |
Send exactly once per task. Worker transitions from busy back to ready.
error -- Task Failed
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "error" |
task_id | string | yes | Task identifier |
error | string | yes | Error description |
log -- Diagnostic Log
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "log" |
level | string | yes | "info", "warn", or "error" |
message | string | yes | Log message text |
Optional. Accepted but not processed by Sol.
spawn_ack -- Child Task Accepted
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "spawn_ack" |
parent_task_id | string | yes | Parent task identifier |
child_task_id | string | yes | Child task identifier |
status | string | yes | "ok" or "rejected" |
child_result -- Child Task Result
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "child_result" |
parent_task_id | string | yes | Parent task identifier |
child_task_id | string | yes | Child task identifier |
status | string | yes | "ok" or "error" |
content | string | yes | Child task output |
Sol to Worker
task -- Dispatch Work
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "task" |
task_id | string | yes | Unique task identifier |
identity | binary | yes | Worker's ZMQ identity (routing frame) |
prompt | string | yes | The work to perform |
model | string | no | Suggested model for inference |
opts | map | no | Additional options |
context | string | no | Additional context |
Worker must respond with token/result or error.
cancel -- Abort Running Task
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "cancel" |
task_id | string | yes | Task to cancel |
Worker should abort and send result with status = "cancelled".
spawn -- Create Child Task
| Field | Type | Required | Description |
|---|---|---|---|
type | string | yes | "spawn" |
parent_task_id | string | yes | Parent task identifier |
prompt | string | yes | Child task prompt |
model | string | no | Suggested model |
opts | map | no | Additional options |
Worker responds with spawn_ack and later child_result.
Lifecycle State Machine
connect
|
v
+------+
| READY |<----------------------+
+--+---+ |
| receive task | send result/error
v |
+------+ receive cancel |
| BUSY |------------------> |
+--+---+ send result |
| (cancelled) |
+----------------------------+- Worker starts in READY after sending
ready - On
task: transition to BUSY, execute, sendtokens, sendresult/error, sendready - On
cancelwhile BUSY: abort work, sendresultwith"cancelled", sendready - On disconnect: Sol removes worker from registry; queued tasks reassigned
PUB/SUB Event Bridge
Sol exposes a PUB socket on port 5556 for event broadcasting:
Sol PUB (5556) --> SubscribersWorkers and external services can subscribe to receive real-time events:
- Workflow state changes
- Task lifecycle events
- Worker registration/deregistration
- System status updates
Events are msgpack-encoded maps with type and data fields.
Error Handling
| Scenario | Behavior |
|---|---|
| Malformed msgpack | Logged at warning level, message dropped |
| Unknown message type | Logged at warning level, message dropped |
| Worker disconnect | Tasks remain queued, reassigned to next worker |
| Task timeout | Not enforced by ZMQ gateway (handled by HTTP layer) |
Reference Implementations
| Language | File | Notes |
|---|---|---|
| Lua (Luna) | client/core/zmq_worker.lua | Full agent with streaming, cancel, IOLoop |
| Python | example/python/zmq_worker.py | Reference worker with handler function |
| Erlang (Sol) | server/src/sol_zmq_gateway.erl | ROUTER gateway, task routing |
| Erlang (Protocol) | server/src/sol_zmq_protocol.erl | msgpack encode/decode |