Completions API

ZERG exposes an OpenAI Chat Completions compatible endpoint. This allows any OpenAI SDK or tool to target ZERG as a drop-in replacement.

Overview

The /api/v1/chat/completions endpoint mirrors the OpenAI wire format. ZERG translates the request internally, dispatches to the configured provider (Anthropic, OpenAI, z.ai, Ollama, etc.), and returns OpenAI-formatted responses. Streaming is supported via SSE.

Authentication

Requires Authorization: Bearer <token> header. The token is issued by Mango at POST /api/v1/auth/token.

Endpoints

Create Chat Completion

POST /api/v1/chat/completions

Request Body:

json

{
  "model": "claude-sonnet-4-20250514",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Explain quicksort" }
  ],
  "max_tokens": 1024,
  "temperature": 0.7,
  "stream": false
}

Response (non-streaming):

json

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1745600000,
  "model": "claude-sonnet-4-20250514",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quicksort is a divide-and-conquer algorithm..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 156,
    "total_tokens": 198
  }
}

Streaming Mode (stream: true):

Returns SSE events with data: {"choices": [{"delta": {"content": "..."}}]} fragments. Final event contains finish_reason and usage.

Supported Parameters:

Parameter	Type	Default	Description
`model`	string	required	Model identifier from the model catalog
`messages`	array	required	Array of message objects (system, user, assistant, tool)
`max_tokens`	integer	4096	Maximum tokens in the response
`temperature`	float	0.7	Sampling temperature (0.0–2.0)
`stream`	boolean	false	Enable SSE streaming
`stop`	string[]	[]	Stop sequences
`top_p`	float	1.0	Nucleus sampling parameter

Provider Translation:

The ZERG gateway translates OpenAI messages to the target provider's wire format. Tool calls are converted bidirectionally between OpenAI tool_calls and Anthropic tool_use blocks. Mid-conversation provider switching preserves context.

Error Codes:

Code	Description
400	Invalid request body or missing required fields
401	Missing or invalid authentication token
429	Rate limit exceeded or insufficient quota
500	Internal server error or provider failure
503	Provider unavailable or circuit breaker open

Example:

bash

curl http://127.0.0.1:11434/api/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Streaming variant:

bash

curl -N http://127.0.0.1:11434/api/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [{"role": "user", "content": "Count to 10"}],
    "stream": true
  }'

Completions API ​

Overview ​

Authentication ​

Endpoints ​

Create Chat Completion ​

Completions API

Overview

Authentication

Endpoints

Create Chat Completion