Skip to content

Completions API

ZERG exposes an OpenAI Chat Completions compatible endpoint. This allows any OpenAI SDK or tool to target ZERG as a drop-in replacement.

Overview

The /api/v1/chat/completions endpoint mirrors the OpenAI wire format. ZERG translates the request internally, dispatches to the configured provider (Anthropic, OpenAI, z.ai, Ollama, etc.), and returns OpenAI-formatted responses. Streaming is supported via SSE.

Authentication

Requires Authorization: Bearer <token> header. The token is issued by Mango at POST /api/v1/auth/token.

Endpoints

Create Chat Completion

POST /api/v1/chat/completions

Request Body:

json
{
  "model": "claude-sonnet-4-20250514",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Explain quicksort" }
  ],
  "max_tokens": 1024,
  "temperature": 0.7,
  "stream": false
}

Response (non-streaming):

json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1745600000,
  "model": "claude-sonnet-4-20250514",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quicksort is a divide-and-conquer algorithm..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 156,
    "total_tokens": 198
  }
}

Streaming Mode (stream: true):

Returns SSE events with data: {"choices": [{"delta": {"content": "..."}}]} fragments. Final event contains finish_reason and usage.

Supported Parameters:

ParameterTypeDefaultDescription
modelstringrequiredModel identifier from the model catalog
messagesarrayrequiredArray of message objects (system, user, assistant, tool)
max_tokensinteger4096Maximum tokens in the response
temperaturefloat0.7Sampling temperature (0.0–2.0)
streambooleanfalseEnable SSE streaming
stopstring[][]Stop sequences
top_pfloat1.0Nucleus sampling parameter

Provider Translation:

The ZERG gateway translates OpenAI messages to the target provider's wire format. Tool calls are converted bidirectionally between OpenAI tool_calls and Anthropic tool_use blocks. Mid-conversation provider switching preserves context.

Error Codes:

CodeDescription
400Invalid request body or missing required fields
401Missing or invalid authentication token
429Rate limit exceeded or insufficient quota
500Internal server error or provider failure
503Provider unavailable or circuit breaker open

Example:

bash
curl http://127.0.0.1:11434/api/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'

Streaming variant:

bash
curl -N http://127.0.0.1:11434/api/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "messages": [{"role": "user", "content": "Count to 10"}],
    "stream": true
  }'

Released under the MIT License.