AsyncAPI definition for Moonshot AI's Kimi `POST /v1/chat/completions` streaming response channel. Moonshot's chat completions surface is OpenAI-compatible. When the request body sets `"stream": true`, the server returns a `text/event-stream` response on the same HTTPS connection used for the initial POST. Each Server-Sent Event has a `data:` line whose payload is a JSON `chat.completion.chunk` object. The stream terminates with a literal `data: [DONE]` sentinel. When `stream_options.include_usage` is `true`, an additional chunk is emitted immediately before `data: [DONE]`. That chunk has an empty `choices` array and a populated `usage` field summarising token consumption for the entire request; all preceding chunks include a `usage` field whose value is `null`. This document describes ONLY the streaming SSE channel. The non-streaming JSON response form is covered by the project's OpenAPI document (`openapi/kimi-moonshot-openapi.json`).
View SpecView on GitHubLLMLong ContextAIOpenAI CompatibleMultimodalChinaAsyncAPIWebhooksEvents
Channels
/v1/chat/completions
publishreceiveChatCompletionStream
Receive chat completion streaming events
Kimi / Moonshot chat completions streaming channel. The client issues a single HTTPS POST to `/v1/chat/completions` with `"stream": true` in the request body. The server replies with a `text/event-stream` body composed of `chat.completion.chunk` events (one JSON object per SSE `data:` line), optionally followed by a final `usage`-only chunk when `stream_options.include_usage` is set, and terminated by a literal `data: [DONE]` line. Supported request models (per the Moonshot OpenAPI document): `moonshot-v1-8k`, `moonshot-v1-32k`, `moonshot-v1-128k`, `moonshot-v1-auto`, `moonshot-v1-8k-vision-preview`, `moonshot-v1-32k-vision-preview`, `moonshot-v1-128k-vision-preview`, `kimi-k2-0905-preview`, `kimi-k2-0711-preview`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, `kimi-k2-thinking-turbo`, `kimi-k2.5`, `kimi-k2.6`.
Messages
✉
ChatCompletionChunk
chat.completion.chunk
Incremental chat completion chunk emitted during streaming.
✉
ChatCompletionUsageChunk
chat.completion.chunk (usage)
Final usage-only chunk emitted immediately before `data: [DONE]` when `stream_options.include_usage` is true. Carries an empty `choices` array and a populated `usage` object.
✉
ChatCompletionDoneSentinel
[DONE] sentinel
Stream termination sentinel. The literal payload is the ASCII string `[DONE]` (not a JSON object) on a single SSE `data:` line. After this line the server closes the response stream.
Servers
https
productionapi.moonshot.cn
Moonshot AI production HTTPS endpoint. Streaming chat completions are returned as Server-Sent Events on the same HTTP/1.1 (or HTTP/2) connection used for the initial `POST /v1/chat/completions` request. This is HTTP+SSE, NOT WebSocket.
asyncapi: 2.6.0
info:
title: Kimi (Moonshot AI) Streaming Chat Completions API
version: 1.0.0
description: |
AsyncAPI definition for Moonshot AI's Kimi `POST /v1/chat/completions`
streaming response channel.
Moonshot's chat completions surface is OpenAI-compatible. When the
request body sets `"stream": true`, the server returns a
`text/event-stream` response on the same HTTPS connection used for the
initial POST. Each Server-Sent Event has a `data:` line whose payload
is a JSON `chat.completion.chunk` object. The stream terminates with
a literal `data: [DONE]` sentinel.
When `stream_options.include_usage` is `true`, an additional chunk is
emitted immediately before `data: [DONE]`. That chunk has an empty
`choices` array and a populated `usage` field summarising token
consumption for the entire request; all preceding chunks include a
`usage` field whose value is `null`.
This document describes ONLY the streaming SSE channel. The
non-streaming JSON response form is covered by the project's OpenAPI
document (`openapi/kimi-moonshot-openapi.json`).
contact:
name: Moonshot AI Platform
url: https://platform.moonshot.cn/docs
x-transport: HTTP+SSE
x-not-websocket: true
servers:
production:
url: api.moonshot.cn
protocol: https
description: |
Moonshot AI production HTTPS endpoint. Streaming chat completions
are returned as Server-Sent Events on the same HTTP/1.1 (or HTTP/2)
connection used for the initial `POST /v1/chat/completions` request.
This is HTTP+SSE, NOT WebSocket.
bindings:
http:
type: response
method: POST
headers:
type: object
properties:
Content-Type:
type: string
const: text/event-stream
Cache-Control:
type: string
const: no-cache
Connection:
type: string
const: keep-alive
bindingVersion: '0.3.0'
security:
- bearerAuth: []
x-transport-details:
transport: HTTP+SSE
requestContentType: application/json
responseContentType: text/event-stream
framing: "SSE (each event is one JSON document on a `data:` line, with the stream terminated by `data: [DONE]`)"
defaultContentType: application/json
channels:
/v1/chat/completions:
description: |
Kimi / Moonshot chat completions streaming channel. The client
issues a single HTTPS POST to `/v1/chat/completions` with
`"stream": true` in the request body. The server replies with a
`text/event-stream` body composed of `chat.completion.chunk`
events (one JSON object per SSE `data:` line), optionally followed
by a final `usage`-only chunk when `stream_options.include_usage`
is set, and terminated by a literal `data: [DONE]` line.
Supported request models (per the Moonshot OpenAPI document):
`moonshot-v1-8k`, `moonshot-v1-32k`, `moonshot-v1-128k`,
`moonshot-v1-auto`, `moonshot-v1-8k-vision-preview`,
`moonshot-v1-32k-vision-preview`,
`moonshot-v1-128k-vision-preview`, `kimi-k2-0905-preview`,
`kimi-k2-0711-preview`, `kimi-k2-turbo-preview`,
`kimi-k2-thinking`, `kimi-k2-thinking-turbo`, `kimi-k2.5`,
`kimi-k2.6`.
bindings:
http:
type: request
method: POST
bindingVersion: '0.3.0'
publish:
operationId: receiveChatCompletionStream
summary: Receive chat completion streaming events
description: |
Server-Sent Events streamed back from
`POST /v1/chat/completions` when the request body sets
`stream: true`. Events are JSON-encoded
`chat.completion.chunk` objects; the stream terminates with the
literal sentinel `data: [DONE]`.
message:
oneOf:
- $ref: '#/components/messages/ChatCompletionChunk'
- $ref: '#/components/messages/ChatCompletionUsageChunk'
- $ref: '#/components/messages/ChatCompletionDoneSentinel'
components:
securitySchemes:
bearerAuth:
type: http
scheme: bearer
bearerFormat: API Key
description: |
Moonshot API key (`MOONSHOT_API_KEY`) passed as
`Authorization: Bearer <MOONSHOT_API_KEY>`. Generated from the
Moonshot platform console at
https://platform.kimi.com/console/api-keys.
messages:
ChatCompletionChunk:
name: chatCompletionChunk
title: chat.completion.chunk
summary: Incremental chat completion chunk emitted during streaming.
contentType: application/json
payload:
$ref: '#/components/schemas/ChatCompletionChunk'
ChatCompletionUsageChunk:
name: chatCompletionUsageChunk
title: chat.completion.chunk (usage)
summary: |
Final usage-only chunk emitted immediately before
`data: [DONE]` when `stream_options.include_usage` is true.
Carries an empty `choices` array and a populated `usage` object.
contentType: application/json
payload:
$ref: '#/components/schemas/ChatCompletionUsageChunk'
ChatCompletionDoneSentinel:
name: chatCompletionDone
title: '[DONE] sentinel'
summary: |
Stream termination sentinel. The literal payload is the ASCII
string `[DONE]` (not a JSON object) on a single SSE `data:`
line. After this line the server closes the response stream.
contentType: text/plain
payload:
$ref: '#/components/schemas/DoneSentinel'
schemas:
ChatCompletionChunk:
type: object
description: |
Streaming chunk for `POST /v1/chat/completions` when
`stream: true`. Mirrors the OpenAI-compatible
`chat.completion.chunk` shape that Moonshot returns.
required: [id, object, created, model, choices]
properties:
id:
type: string
description: Unique identifier for the completion. Stable across all chunks in a single stream.
object:
type: string
enum: [chat.completion.chunk]
created:
type: integer
description: Unix timestamp (seconds) for when the completion was created.
model:
type: string
description: Model that produced the completion (echoes the request `model`).
choices:
type: array
description: |
Per-choice incremental deltas. Length matches the request's
`n` parameter (default 1). On the terminal chunk for a choice,
`finish_reason` is populated.
items:
$ref: '#/components/schemas/ChatCompletionChunkChoice'
usage:
description: |
Per-chunk usage field. `null` on intermediate chunks. Only
populated on the dedicated final usage chunk emitted when
`stream_options.include_usage` is true; see
`ChatCompletionUsageChunk`.
oneOf:
- type: 'null'
- $ref: '#/components/schemas/Usage'
ChatCompletionChunkChoice:
type: object
required: [index, delta]
properties:
index:
type: integer
description: Index of this choice in the `choices` array.
delta:
$ref: '#/components/schemas/ChatCompletionChunkDelta'
finish_reason:
description: |
Reason the model stopped emitting tokens for this choice.
`null` on every chunk except the terminal one for the choice.
oneOf:
- type: 'null'
- type: string
enum: [stop, length, tool_calls]
ChatCompletionChunkDelta:
type: object
description: |
Incremental delta for a single choice. The first chunk for a
choice typically carries `role: assistant`; subsequent chunks
carry incremental `content` text and/or `tool_calls` argument
fragments; the terminal chunk for a choice carries an empty
delta and a populated `finish_reason` on the parent.
properties:
role:
type: string
enum: [assistant]
description: Role of the streamed message. Present on the first delta for a choice.
content:
description: |
Next fragment of assistant-generated text. May be `null` or
absent on chunks that only carry `tool_calls` deltas.
oneOf:
- type: 'null'
- type: string
tool_calls:
type: array
description: |
Incremental tool-call deltas. Each entry carries the
tool-call `index`, an optional stable `id`, `type`, and a
`function` object whose `name` is sent on the first delta
for that tool call and whose `arguments` field is streamed
as a JSON-fragment string across subsequent deltas.
items:
$ref: '#/components/schemas/ChatCompletionChunkToolCallDelta'
ChatCompletionChunkToolCallDelta:
type: object
required: [index]
properties:
index:
type: integer
description: Stable index of the tool call within the assistant message.
id:
type: string
description: Stable tool-call identifier. Typically present only on the first delta for a tool call.
type:
type: string
enum: [function]
function:
type: object
properties:
name:
type: string
description: Function name. Typically present only on the first delta for a tool call.
arguments:
type: string
description: |
Incremental fragment of the function arguments JSON
string. The complete arguments JSON is reconstructed by
concatenating these fragments in order.
ChatCompletionUsageChunk:
type: object
description: |
Final chunk emitted when `stream_options.include_usage` is set
on the request. Structurally identical to
`ChatCompletionChunk` but with an empty `choices` array and a
populated `usage` object describing total token consumption for
the entire request.
required: [id, object, created, model, choices, usage]
properties:
id:
type: string
object:
type: string
enum: [chat.completion.chunk]
created:
type: integer
model:
type: string
choices:
type: array
maxItems: 0
description: Always an empty array on the usage-only chunk.
items: {}
usage:
$ref: '#/components/schemas/Usage'
Usage:
type: object
description: |
Token-usage summary for the request. Mirrors the
non-streaming `usage` object documented on the Moonshot chat
completion response.
properties:
prompt_tokens:
type: integer
description: Number of tokens in the prompt.
completion_tokens:
type: integer
description: Number of tokens in the completion.
total_tokens:
type: integer
description: Total tokens consumed by the request.
DoneSentinel:
type: string
enum: ['[DONE]']
description: |
Literal SSE termination sentinel. Emitted as the payload of the
final `data:` line in the stream. Not JSON; the line is
exactly `data: [DONE]` on the wire.