Skip to content

ZMQ Protocol Reference

The ZMQ worker protocol defines how Sol communicates with worker agents over the wire. Workers connect as DEALER sockets to Sol's ROUTER gateway using msgpack encoding.

Protocol Version

PropertyValue
Version1.0
TransportZMQ (ZMTP/4.0)
PatternDEALER-ROUTER
Encodingmsgpack (map format, binary strings)

Topology

Worker (DEALER) <--> Sol ROUTER (port 5555)

                         +-- Worker 1 (DEALER)
Sol ROUTER (5555) -------+-- Worker 2 (DEALER)
                         +-- Worker 3 (DEALER)

Sol PUB (5556)   --------> All subscribers (event bridge)

Multiple workers connect simultaneously. Sol routes tasks to the first available (ready) worker. If no workers are ready, tasks queue until one becomes available.

Connection Model

  1. Worker opens DEALER socket, sets ZMQ.IDENTITY (optional), connects to tcp://<host>:<port>
  2. Worker sends a ready message immediately after connect
  3. Sol registers the worker and begins routing tasks
  4. Worker sends ready again after each task completes (heartbeat + re-availability)
  5. Reconnection is handled by ZMQ internally (DEALER reconnects automatically)

Wire Framing

DEALER/ROUTER uses multipart ZMQ messages. The ROUTER socket prepends the worker's identity frame:

[identity_frame] [payload_frame]

Workers send:

[empty_delimiter] [payload_frame]

The DEALER socket adds the empty delimiter automatically. The payload_frame is a single msgpack-packed map.

Encoding Conventions

  • msgpack options: {pack_str, from_binary}, {unpack_str, as_binary}, {map_format, map}
  • All keys are binary strings (not atoms in Erlang, not native strings in Python)
  • Python: msgpack.pack() with use_bin_type=True
  • Python: msgpack.unpack() with raw=False

Message Types

Worker to Sol

ready -- Announce Availability

FieldTypeRequiredDescription
typestringyes"ready"
worker_idstringyesUnique worker identifier
capabilitiesarray of stringsyesFeature tags (e.g., ["tools", "streaming"])

Send on connect and after completing each task. Acts as heartbeat and availability signal.

token -- Stream Partial Output

FieldTypeRequiredDescription
typestringyes"token"
task_idstringyesTask identifier from the task message
contentstringyesPartial output text

Send zero or more times during task execution. Enables real-time streaming to HTTP clients.

result -- Task Completed

FieldTypeRequiredDescription
typestringyes"result"
task_idstringyesTask identifier
statusstringyes"ok" or "cancelled"
contentstringyesFinal output text

Send exactly once per task. Worker transitions from busy back to ready.

error -- Task Failed

FieldTypeRequiredDescription
typestringyes"error"
task_idstringyesTask identifier
errorstringyesError description

log -- Diagnostic Log

FieldTypeRequiredDescription
typestringyes"log"
levelstringyes"info", "warn", or "error"
messagestringyesLog message text

Optional. Accepted but not processed by Sol.

spawn_ack -- Child Task Accepted

FieldTypeRequiredDescription
typestringyes"spawn_ack"
parent_task_idstringyesParent task identifier
child_task_idstringyesChild task identifier
statusstringyes"ok" or "rejected"

child_result -- Child Task Result

FieldTypeRequiredDescription
typestringyes"child_result"
parent_task_idstringyesParent task identifier
child_task_idstringyesChild task identifier
statusstringyes"ok" or "error"
contentstringyesChild task output

Sol to Worker

task -- Dispatch Work

FieldTypeRequiredDescription
typestringyes"task"
task_idstringyesUnique task identifier
identitybinaryyesWorker's ZMQ identity (routing frame)
promptstringyesThe work to perform
modelstringnoSuggested model for inference
optsmapnoAdditional options
contextstringnoAdditional context

Worker must respond with token/result or error.

cancel -- Abort Running Task

FieldTypeRequiredDescription
typestringyes"cancel"
task_idstringyesTask to cancel

Worker should abort and send result with status = "cancelled".

spawn -- Create Child Task

FieldTypeRequiredDescription
typestringyes"spawn"
parent_task_idstringyesParent task identifier
promptstringyesChild task prompt
modelstringnoSuggested model
optsmapnoAdditional options

Worker responds with spawn_ack and later child_result.

Lifecycle State Machine

        connect
           |
           v
        +------+
        | READY |<----------------------+
        +--+---+                       |
           | receive task               | send result/error
           v                            |
        +------+   receive cancel       |
        | BUSY |------------------>     |
        +--+---+   send result          |
           | (cancelled)                |
           +----------------------------+
  • Worker starts in READY after sending ready
  • On task: transition to BUSY, execute, send tokens, send result/error, send ready
  • On cancel while BUSY: abort work, send result with "cancelled", send ready
  • On disconnect: Sol removes worker from registry; queued tasks reassigned

PUB/SUB Event Bridge

Sol exposes a PUB socket on port 5556 for event broadcasting:

Sol PUB (5556) --> Subscribers

Workers and external services can subscribe to receive real-time events:

  • Workflow state changes
  • Task lifecycle events
  • Worker registration/deregistration
  • System status updates

Events are msgpack-encoded maps with type and data fields.

Error Handling

ScenarioBehavior
Malformed msgpackLogged at warning level, message dropped
Unknown message typeLogged at warning level, message dropped
Worker disconnectTasks remain queued, reassigned to next worker
Task timeoutNot enforced by ZMQ gateway (handled by HTTP layer)

Reference Implementations

LanguageFileNotes
Lua (Luna)client/core/zmq_worker.luaFull agent with streaming, cancel, IOLoop
Pythonexample/python/zmq_worker.pyReference worker with handler function
Erlang (Sol)server/src/sol_zmq_gateway.erlROUTER gateway, task routing
Erlang (Protocol)server/src/sol_zmq_protocol.erlmsgpack encode/decode

Released under the MIT License.