Completions API
ZERG exposes an OpenAI Chat Completions compatible endpoint. This allows any OpenAI SDK or tool to target ZERG as a drop-in replacement.
Overview
The /api/v1/chat/completions endpoint mirrors the OpenAI wire format. ZERG translates the request internally, dispatches to the configured provider (Anthropic, OpenAI, z.ai, Ollama, etc.), and returns OpenAI-formatted responses. Streaming is supported via SSE.
Authentication
Requires Authorization: Bearer <token> header. The token is issued by Mango at POST /api/v1/auth/token.
Endpoints
Create Chat Completion
POST /api/v1/chat/completionsRequest Body:
{
"model": "claude-sonnet-4-20250514",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain quicksort" }
],
"max_tokens": 1024,
"temperature": 0.7,
"stream": false
}Response (non-streaming):
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1745600000,
"model": "claude-sonnet-4-20250514",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quicksort is a divide-and-conquer algorithm..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 156,
"total_tokens": 198
}
}Streaming Mode (stream: true):
Returns SSE events with data: {"choices": [{"delta": {"content": "..."}}]} fragments. Final event contains finish_reason and usage.
Supported Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model | string | required | Model identifier from the model catalog |
messages | array | required | Array of message objects (system, user, assistant, tool) |
max_tokens | integer | 4096 | Maximum tokens in the response |
temperature | float | 0.7 | Sampling temperature (0.0–2.0) |
stream | boolean | false | Enable SSE streaming |
stop | string[] | [] | Stop sequences |
top_p | float | 1.0 | Nucleus sampling parameter |
Provider Translation:
The ZERG gateway translates OpenAI messages to the target provider's wire format. Tool calls are converted bidirectionally between OpenAI tool_calls and Anthropic tool_use blocks. Mid-conversation provider switching preserves context.
Error Codes:
| Code | Description |
|---|---|
| 400 | Invalid request body or missing required fields |
| 401 | Missing or invalid authentication token |
| 429 | Rate limit exceeded or insufficient quota |
| 500 | Internal server error or provider failure |
| 503 | Provider unavailable or circuit breaker open |
Example:
curl http://127.0.0.1:11434/api/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 100
}'Streaming variant:
curl -N http://127.0.0.1:11434/api/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-20250514",
"messages": [{"role": "user", "content": "Count to 10"}],
"stream": true
}'