Kimi (Moonshot AI) · AsyncAPI Specification

Kimi (Moonshot AI) Streaming Chat Completions API

Version 1.0.0

AsyncAPI definition for Moonshot AI's Kimi `POST /v1/chat/completions` streaming response channel. Moonshot's chat completions surface is OpenAI-compatible. When the request body sets `"stream": true`, the server returns a `text/event-stream` response on the same HTTPS connection used for the initial POST. Each Server-Sent Event has a `data:` line whose payload is a JSON `chat.completion.chunk` object. The stream terminates with a literal `data: [DONE]` sentinel. When `stream_options.include_usage` is `true`, an additional chunk is emitted immediately before `data: [DONE]`. That chunk has an empty `choices` array and a populated `usage` field summarising token consumption for the entire request; all preceding chunks include a `usage` field whose value is `null`. This document describes ONLY the streaming SSE channel. The non-streaming JSON response form is covered by the project's OpenAPI document (`openapi/kimi-moonshot-openapi.json`).

View Spec View on GitHub LLMLong ContextAIOpenAI CompatibleMultimodalChinaAsyncAPIWebhooksEvents

Channels

/v1/chat/completions
publish receiveChatCompletionStream
Receive chat completion streaming events
Kimi / Moonshot chat completions streaming channel. The client issues a single HTTPS POST to `/v1/chat/completions` with `"stream": true` in the request body. The server replies with a `text/event-stream` body composed of `chat.completion.chunk` events (one JSON object per SSE `data:` line), optionally followed by a final `usage`-only chunk when `stream_options.include_usage` is set, and terminated by a literal `data: [DONE]` line. Supported request models (per the Moonshot OpenAPI document): `moonshot-v1-8k`, `moonshot-v1-32k`, `moonshot-v1-128k`, `moonshot-v1-auto`, `moonshot-v1-8k-vision-preview`, `moonshot-v1-32k-vision-preview`, `moonshot-v1-128k-vision-preview`, `kimi-k2-0905-preview`, `kimi-k2-0711-preview`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, `kimi-k2-thinking-turbo`, `kimi-k2.5`, `kimi-k2.6`.

Messages

ChatCompletionChunk
chat.completion.chunk
Incremental chat completion chunk emitted during streaming.
ChatCompletionUsageChunk
chat.completion.chunk (usage)
Final usage-only chunk emitted immediately before `data: [DONE]` when `stream_options.include_usage` is true. Carries an empty `choices` array and a populated `usage` object.
ChatCompletionDoneSentinel
[DONE] sentinel
Stream termination sentinel. The literal payload is the ASCII string `[DONE]` (not a JSON object) on a single SSE `data:` line. After this line the server closes the response stream.

Servers

https
production api.moonshot.cn
Moonshot AI production HTTPS endpoint. Streaming chat completions are returned as Server-Sent Events on the same HTTP/1.1 (or HTTP/2) connection used for the initial `POST /v1/chat/completions` request. This is HTTP+SSE, NOT WebSocket.

AsyncAPI Specification

Raw ↑
asyncapi: 2.6.0
info:
  title: Kimi (Moonshot AI) Streaming Chat Completions API
  version: 1.0.0
  description: |
    AsyncAPI definition for Moonshot AI's Kimi `POST /v1/chat/completions`
    streaming response channel.

    Moonshot's chat completions surface is OpenAI-compatible. When the
    request body sets `"stream": true`, the server returns a
    `text/event-stream` response on the same HTTPS connection used for the
    initial POST. Each Server-Sent Event has a `data:` line whose payload
    is a JSON `chat.completion.chunk` object. The stream terminates with
    a literal `data: [DONE]` sentinel.

    When `stream_options.include_usage` is `true`, an additional chunk is
    emitted immediately before `data: [DONE]`. That chunk has an empty
    `choices` array and a populated `usage` field summarising token
    consumption for the entire request; all preceding chunks include a
    `usage` field whose value is `null`.

    This document describes ONLY the streaming SSE channel. The
    non-streaming JSON response form is covered by the project's OpenAPI
    document (`openapi/kimi-moonshot-openapi.json`).
  contact:
    name: Moonshot AI Platform
    url: https://platform.moonshot.cn/docs
  x-transport: HTTP+SSE
  x-not-websocket: true

servers:
  production:
    url: api.moonshot.cn
    protocol: https
    description: |
      Moonshot AI production HTTPS endpoint. Streaming chat completions
      are returned as Server-Sent Events on the same HTTP/1.1 (or HTTP/2)
      connection used for the initial `POST /v1/chat/completions` request.
      This is HTTP+SSE, NOT WebSocket.
    bindings:
      http:
        type: response
        method: POST
        headers:
          type: object
          properties:
            Content-Type:
              type: string
              const: text/event-stream
            Cache-Control:
              type: string
              const: no-cache
            Connection:
              type: string
              const: keep-alive
        bindingVersion: '0.3.0'
    security:
      - bearerAuth: []
    x-transport-details:
      transport: HTTP+SSE
      requestContentType: application/json
      responseContentType: text/event-stream
      framing: "SSE (each event is one JSON document on a `data:` line, with the stream terminated by `data: [DONE]`)"

defaultContentType: application/json

channels:

  /v1/chat/completions:
    description: |
      Kimi / Moonshot chat completions streaming channel. The client
      issues a single HTTPS POST to `/v1/chat/completions` with
      `"stream": true` in the request body. The server replies with a
      `text/event-stream` body composed of `chat.completion.chunk`
      events (one JSON object per SSE `data:` line), optionally followed
      by a final `usage`-only chunk when `stream_options.include_usage`
      is set, and terminated by a literal `data: [DONE]` line.

      Supported request models (per the Moonshot OpenAPI document):
      `moonshot-v1-8k`, `moonshot-v1-32k`, `moonshot-v1-128k`,
      `moonshot-v1-auto`, `moonshot-v1-8k-vision-preview`,
      `moonshot-v1-32k-vision-preview`,
      `moonshot-v1-128k-vision-preview`, `kimi-k2-0905-preview`,
      `kimi-k2-0711-preview`, `kimi-k2-turbo-preview`,
      `kimi-k2-thinking`, `kimi-k2-thinking-turbo`, `kimi-k2.5`,
      `kimi-k2.6`.
    bindings:
      http:
        type: request
        method: POST
        bindingVersion: '0.3.0'
    publish:
      operationId: receiveChatCompletionStream
      summary: Receive chat completion streaming events
      description: |
        Server-Sent Events streamed back from
        `POST /v1/chat/completions` when the request body sets
        `stream: true`. Events are JSON-encoded
        `chat.completion.chunk` objects; the stream terminates with the
        literal sentinel `data: [DONE]`.
      message:
        oneOf:
          - $ref: '#/components/messages/ChatCompletionChunk'
          - $ref: '#/components/messages/ChatCompletionUsageChunk'
          - $ref: '#/components/messages/ChatCompletionDoneSentinel'

components:

  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API Key
      description: |
        Moonshot API key (`MOONSHOT_API_KEY`) passed as
        `Authorization: Bearer <MOONSHOT_API_KEY>`. Generated from the
        Moonshot platform console at
        https://platform.kimi.com/console/api-keys.

  messages:

    ChatCompletionChunk:
      name: chatCompletionChunk
      title: chat.completion.chunk
      summary: Incremental chat completion chunk emitted during streaming.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/ChatCompletionChunk'

    ChatCompletionUsageChunk:
      name: chatCompletionUsageChunk
      title: chat.completion.chunk (usage)
      summary: |
        Final usage-only chunk emitted immediately before
        `data: [DONE]` when `stream_options.include_usage` is true.
        Carries an empty `choices` array and a populated `usage` object.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/ChatCompletionUsageChunk'

    ChatCompletionDoneSentinel:
      name: chatCompletionDone
      title: '[DONE] sentinel'
      summary: |
        Stream termination sentinel. The literal payload is the ASCII
        string `[DONE]` (not a JSON object) on a single SSE `data:`
        line. After this line the server closes the response stream.
      contentType: text/plain
      payload:
        $ref: '#/components/schemas/DoneSentinel'

  schemas:

    ChatCompletionChunk:
      type: object
      description: |
        Streaming chunk for `POST /v1/chat/completions` when
        `stream: true`. Mirrors the OpenAI-compatible
        `chat.completion.chunk` shape that Moonshot returns.
      required: [id, object, created, model, choices]
      properties:
        id:
          type: string
          description: Unique identifier for the completion. Stable across all chunks in a single stream.
        object:
          type: string
          enum: [chat.completion.chunk]
        created:
          type: integer
          description: Unix timestamp (seconds) for when the completion was created.
        model:
          type: string
          description: Model that produced the completion (echoes the request `model`).
        choices:
          type: array
          description: |
            Per-choice incremental deltas. Length matches the request's
            `n` parameter (default 1). On the terminal chunk for a choice,
            `finish_reason` is populated.
          items:
            $ref: '#/components/schemas/ChatCompletionChunkChoice'
        usage:
          description: |
            Per-chunk usage field. `null` on intermediate chunks. Only
            populated on the dedicated final usage chunk emitted when
            `stream_options.include_usage` is true; see
            `ChatCompletionUsageChunk`.
          oneOf:
            - type: 'null'
            - $ref: '#/components/schemas/Usage'

    ChatCompletionChunkChoice:
      type: object
      required: [index, delta]
      properties:
        index:
          type: integer
          description: Index of this choice in the `choices` array.
        delta:
          $ref: '#/components/schemas/ChatCompletionChunkDelta'
        finish_reason:
          description: |
            Reason the model stopped emitting tokens for this choice.
            `null` on every chunk except the terminal one for the choice.
          oneOf:
            - type: 'null'
            - type: string
              enum: [stop, length, tool_calls]

    ChatCompletionChunkDelta:
      type: object
      description: |
        Incremental delta for a single choice. The first chunk for a
        choice typically carries `role: assistant`; subsequent chunks
        carry incremental `content` text and/or `tool_calls` argument
        fragments; the terminal chunk for a choice carries an empty
        delta and a populated `finish_reason` on the parent.
      properties:
        role:
          type: string
          enum: [assistant]
          description: Role of the streamed message. Present on the first delta for a choice.
        content:
          description: |
            Next fragment of assistant-generated text. May be `null` or
            absent on chunks that only carry `tool_calls` deltas.
          oneOf:
            - type: 'null'
            - type: string
        tool_calls:
          type: array
          description: |
            Incremental tool-call deltas. Each entry carries the
            tool-call `index`, an optional stable `id`, `type`, and a
            `function` object whose `name` is sent on the first delta
            for that tool call and whose `arguments` field is streamed
            as a JSON-fragment string across subsequent deltas.
          items:
            $ref: '#/components/schemas/ChatCompletionChunkToolCallDelta'

    ChatCompletionChunkToolCallDelta:
      type: object
      required: [index]
      properties:
        index:
          type: integer
          description: Stable index of the tool call within the assistant message.
        id:
          type: string
          description: Stable tool-call identifier. Typically present only on the first delta for a tool call.
        type:
          type: string
          enum: [function]
        function:
          type: object
          properties:
            name:
              type: string
              description: Function name. Typically present only on the first delta for a tool call.
            arguments:
              type: string
              description: |
                Incremental fragment of the function arguments JSON
                string. The complete arguments JSON is reconstructed by
                concatenating these fragments in order.

    ChatCompletionUsageChunk:
      type: object
      description: |
        Final chunk emitted when `stream_options.include_usage` is set
        on the request. Structurally identical to
        `ChatCompletionChunk` but with an empty `choices` array and a
        populated `usage` object describing total token consumption for
        the entire request.
      required: [id, object, created, model, choices, usage]
      properties:
        id:
          type: string
        object:
          type: string
          enum: [chat.completion.chunk]
        created:
          type: integer
        model:
          type: string
        choices:
          type: array
          maxItems: 0
          description: Always an empty array on the usage-only chunk.
          items: {}
        usage:
          $ref: '#/components/schemas/Usage'

    Usage:
      type: object
      description: |
        Token-usage summary for the request. Mirrors the
        non-streaming `usage` object documented on the Moonshot chat
        completion response.
      properties:
        prompt_tokens:
          type: integer
          description: Number of tokens in the prompt.
        completion_tokens:
          type: integer
          description: Number of tokens in the completion.
        total_tokens:
          type: integer
          description: Total tokens consumed by the request.

    DoneSentinel:
      type: string
      enum: ['[DONE]']
      description: |
        Literal SSE termination sentinel. Emitted as the payload of the
        final `data:` line in the stream. Not JSON; the line is
        exactly `data: [DONE]` on the wire.