Together AI · AsyncAPI Specification

Together AI Streaming Inference API

Version 1.0.0

AsyncAPI 2.6 description of Together AI's streaming (Server-Sent Events) inference surface. Together AI exposes OpenAI-compatible HTTP endpoints that upgrade to a `text/event-stream` response when the client sets `"stream": true` in the request body. Each `data:` line in the response carries a JSON-encoded chunk; the stream terminates with the sentinel `data: [DONE]`. This document covers the three documented SSE-capable surfaces: * `POST /chat/completions` — token-level deltas for chat models * `POST /completions` — token-level deltas for legacy text completion models * `POST /audio/speech` — base64-encoded PCM/raw audio chunks for text-to-speech models Endpoints that do not stream (embeddings, image generations, rerank, files, models, batch, fine-tuning, etc.) are intentionally omitted.

View Spec View on GitHub AILLMInferenceFoundation ModelsGPUOpen Source AIAsyncAPIWebhooksEvents

Channels

/chat/completions

publish createStreamingChatCompletion

Open a streaming chat completion request.

OpenAI-compatible chat completions. Send a `POST` with `"stream": true` and a chat `messages` array. The server responds with `text/event-stream` and emits one `chat.completion.chunk` per token (or token group) followed by a `[DONE]` sentinel.

/completions

publish createStreamingCompletion

Open a streaming text completion request.

Legacy text completions endpoint. With `"stream": true` the server returns `text/event-stream` emitting `completion.chunk` events terminated by `[DONE]`.

/audio/speech

publish createStreamingAudioSpeech

Open a streaming text-to-speech request.

Text-to-speech (TTS). When `"stream": true` the server responds with `text/event-stream` emitting `audio.tts.chunk` events containing base64-encoded raw PCM audio. When streaming, the only supported `response_format` is `raw`. The stream terminates with `[DONE]`.

Messages

✉

ChatCompletionRequest

Chat Completion Request

JSON body posted by the client to open a streaming chat session.

✉

ChatCompletionChunk

Chat Completion Chunk

A single `data:` event emitted while streaming a chat completion.

✉

CompletionRequest

Completion Request

JSON body posted by the client to open a streaming text completion.

✉

CompletionChunk

Completion Chunk

A single `data:` event emitted while streaming a legacy text completion.

✉

AudioSpeechRequest

Audio Speech Request

JSON body posted by the client to open a streaming TTS session.

✉

AudioSpeechChunk

Audio Speech Chunk

A single `data:` event carrying a base64-encoded audio segment.

✉

StreamDone

Stream Done Sentinel

Final SSE event signalling the end of the stream. The literal payload is `[DONE]` (not JSON).

Servers

https

production api.together.xyz/v1

Together AI inference production base URL.

AsyncAPI Specification

asyncapi: '2.6.0'
info:
  title: Together AI Streaming Inference API
  version: '1.0.0'
  description: |
    AsyncAPI 2.6 description of Together AI's streaming (Server-Sent Events)
    inference surface. Together AI exposes OpenAI-compatible HTTP endpoints that
    upgrade to a `text/event-stream` response when the client sets `"stream": true`
    in the request body. Each `data:` line in the response carries a JSON-encoded
    chunk; the stream terminates with the sentinel `data: [DONE]`.

    This document covers the three documented SSE-capable surfaces:

      * `POST /chat/completions` — token-level deltas for chat models
      * `POST /completions`      — token-level deltas for legacy text completion
        models
      * `POST /audio/speech`     — base64-encoded PCM/raw audio chunks for
        text-to-speech models

    Endpoints that do not stream (embeddings, image generations, rerank, files,
    models, batch, fine-tuning, etc.) are intentionally omitted.
  contact:
    name: API Evangelist
    url: https://apievangelist.com
    email: [email protected]
  license:
    name: Together AI Terms of Service
    url: https://www.together.ai/terms-of-service

defaultContentType: text/event-stream

servers:
  production:
    url: api.together.xyz/v1
    protocol: https
    description: Together AI inference production base URL.
    security:
      - bearerAuth: []
    bindings:
      http:
        bindingVersion: '0.3.0'

channels:
  /chat/completions:
    description: |
      OpenAI-compatible chat completions. Send a `POST` with `"stream": true`
      and a chat `messages` array. The server responds with `text/event-stream`
      and emits one `chat.completion.chunk` per token (or token group) followed
      by a `[DONE]` sentinel.
    bindings:
      http:
        type: request
        method: POST
        bindingVersion: '0.3.0'
    subscribe:
      summary: Receive streaming chat completion chunks.
      operationId: streamChatCompletions
      bindings:
        http:
          type: request
          method: POST
          bindingVersion: '0.3.0'
      message:
        oneOf:
          - $ref: '#/components/messages/ChatCompletionChunk'
          - $ref: '#/components/messages/StreamDone'
    publish:
      summary: Open a streaming chat completion request.
      operationId: createStreamingChatCompletion
      bindings:
        http:
          type: request
          method: POST
          bindingVersion: '0.3.0'
      message:
        $ref: '#/components/messages/ChatCompletionRequest'

  /completions:
    description: |
      Legacy text completions endpoint. With `"stream": true` the server
      returns `text/event-stream` emitting `completion.chunk` events terminated
      by `[DONE]`.
    bindings:
      http:
        type: request
        method: POST
        bindingVersion: '0.3.0'
    subscribe:
      summary: Receive streaming text completion chunks.
      operationId: streamCompletions
      bindings:
        http:
          type: request
          method: POST
          bindingVersion: '0.3.0'
      message:
        oneOf:
          - $ref: '#/components/messages/CompletionChunk'
          - $ref: '#/components/messages/StreamDone'
    publish:
      summary: Open a streaming text completion request.
      operationId: createStreamingCompletion
      bindings:
        http:
          type: request
          method: POST
          bindingVersion: '0.3.0'
      message:
        $ref: '#/components/messages/CompletionRequest'

  /audio/speech:
    description: |
      Text-to-speech (TTS). When `"stream": true` the server responds with
      `text/event-stream` emitting `audio.tts.chunk` events containing
      base64-encoded raw PCM audio. When streaming, the only supported
      `response_format` is `raw`. The stream terminates with `[DONE]`.
    bindings:
      http:
        type: request
        method: POST
        bindingVersion: '0.3.0'
    subscribe:
      summary: Receive streaming text-to-speech audio chunks.
      operationId: streamAudioSpeech
      bindings:
        http:
          type: request
          method: POST
          bindingVersion: '0.3.0'
      message:
        oneOf:
          - $ref: '#/components/messages/AudioSpeechChunk'
          - $ref: '#/components/messages/StreamDone'
    publish:
      summary: Open a streaming text-to-speech request.
      operationId: createStreamingAudioSpeech
      bindings:
        http:
          type: request
          method: POST
          bindingVersion: '0.3.0'
      message:
        $ref: '#/components/messages/AudioSpeechRequest'

components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      bearerFormat: API Key
      description: 'Together AI API key passed as `Authorization: Bearer <TOGETHER_API_KEY>`.'

  messages:
    ChatCompletionRequest:
      name: ChatCompletionRequest
      title: Chat Completion Request
      summary: JSON body posted by the client to open a streaming chat session.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/ChatCompletionRequestBody'

    ChatCompletionChunk:
      name: ChatCompletionChunk
      title: Chat Completion Chunk
      summary: A single `data:` event emitted while streaming a chat completion.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/ChatCompletionChunk'

    CompletionRequest:
      name: CompletionRequest
      title: Completion Request
      summary: JSON body posted by the client to open a streaming text completion.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/CompletionRequestBody'

    CompletionChunk:
      name: CompletionChunk
      title: Completion Chunk
      summary: A single `data:` event emitted while streaming a legacy text completion.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/CompletionChunk'

    AudioSpeechRequest:
      name: AudioSpeechRequest
      title: Audio Speech Request
      summary: JSON body posted by the client to open a streaming TTS session.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/AudioSpeechRequestBody'

    AudioSpeechChunk:
      name: AudioSpeechChunk
      title: Audio Speech Chunk
      summary: A single `data:` event carrying a base64-encoded audio segment.
      contentType: application/json
      payload:
        $ref: '#/components/schemas/AudioSpeechChunk'

    StreamDone:
      name: StreamDone
      title: Stream Done Sentinel
      summary: |
        Final SSE event signalling the end of the stream. The literal payload
        is `[DONE]` (not JSON).
      contentType: text/plain
      payload:
        type: string
        const: '[DONE]'

  schemas:
    # ---------- Chat Completions ----------

    ChatCompletionRequestBody:
      type: object
      required:
        - model
        - messages
      properties:
        model:
          type: string
          description: Name of the model to query.
        messages:
          type: array
          items:
            $ref: '#/components/schemas/ChatMessage'
        stream:
          type: boolean
          description: If true, stream tokens as Server-Sent Events.
        max_tokens:
          type: integer
        stop:
          type: array
          items:
            type: string
        temperature:
          type: number
          format: float
          minimum: 0
          maximum: 1
        top_p:
          type: number
          format: float
        top_k:
          type: integer
        min_p:
          type: number
          format: float
        repetition_penalty:
          type: number
        presence_penalty:
          type: number
          minimum: -2.0
          maximum: 2.0
        frequency_penalty:
          type: number
          minimum: -2.0
          maximum: 2.0
        logprobs:
          type: integer
          minimum: 0
          maximum: 20
        echo:
          type: boolean
        n:
          type: integer
          minimum: 1
          maximum: 128
        logit_bias:
          type: object
          additionalProperties:
            type: number
        seed:
          type: integer
        safety_model:
          type: string
        context_length_exceeded_behavior:
          type: string
          enum: [truncate, error]
        response_format:
          type: object
        tools:
          type: array
          items:
            type: object
        tool_choice: {}
        function_call: {}
        reasoning_effort:
          type: string
          enum: [low, medium, high]
        reasoning:
          type: object
          properties:
            enabled:
              type: boolean
        chat_template_kwargs:
          type: object

    ChatMessage:
      type: object
      required:
        - role
      properties:
        role:
          type: string
          enum: [system, user, assistant, tool, function]
        content:
          oneOf:
            - type: string
            - type: array
              items:
                type: object
        name:
          type: string
        tool_calls:
          type: array
          items:
            type: object
        tool_call_id:
          type: string
        function_call:
          type: object

    ChatCompletionChunk:
      type: object
      description: |
        One streamed chunk of a chat completion. Emitted on every `data:` line
        until the terminal `[DONE]` sentinel.
      required:
        - id
        - object
        - created
        - model
        - choices
      properties:
        id:
          type: string
        object:
          type: string
          const: chat.completion.chunk
        created:
          type: integer
          description: Unix timestamp (seconds) when the chunk was generated.
        model:
          type: string
        choices:
          type: array
          items:
            $ref: '#/components/schemas/ChatCompletionChunkChoice'
        usage:
          oneOf:
            - $ref: '#/components/schemas/UsageData'
            - type: 'null'
          description: Present only on the final chunk.
        warnings:
          type: array
          items:
            type: object
        system_fingerprint:
          type: string

    ChatCompletionChunkChoice:
      type: object
      required:
        - index
        - delta
      properties:
        index:
          type: integer
        delta:
          $ref: '#/components/schemas/ChatCompletionChunkDelta'
        finish_reason:
          oneOf:
            - type: string
              enum: [stop, eos, length, tool_calls, function_call]
            - type: 'null'
          description: Present only on the final chunk.
        seed:
          oneOf:
            - type: integer
            - type: 'null'
        logprobs:
          oneOf:
            - type: number
            - type: 'null'
        top_logprobs:
          type: object

    ChatCompletionChunkDelta:
      type: object
      properties:
        role:
          type: string
          enum: [system, user, assistant, function, tool]
        content:
          oneOf:
            - type: string
            - type: 'null'
        reasoning:
          oneOf:
            - type: string
            - type: 'null'
        tool_calls:
          type: array
          items:
            type: object
        function_call:
          type: object
          description: Deprecated. Use `tool_calls`.
        token_id:
          type: integer

    # ---------- Text Completions ----------

    CompletionRequestBody:
      type: object
      required:
        - model
        - prompt
      properties:
        model:
          type: string
        prompt:
          type: string
        stream:
          type: boolean
        max_tokens:
          type: integer
        stop:
          type: array
          items:
            type: string
        temperature:
          type: number
          format: float
          minimum: 0
          maximum: 1
        top_p:
          type: number
          format: float
        top_k:
          type: integer
        min_p:
          type: number
          format: float
          minimum: 0
          maximum: 1
        repetition_penalty:
          type: number
          format: float
        logprobs:
          type: integer
          minimum: 0
          maximum: 20
        echo:
          type: boolean
        n:
          type: integer
          minimum: 1
          maximum: 128
        presence_penalty:
          type: number
          minimum: -2.0
          maximum: 2.0
        frequency_penalty:
          type: number
          minimum: -2.0
          maximum: 2.0
        logit_bias:
          type: object
          additionalProperties:
            type: number
        seed:
          type: integer
        safety_model:
          type: string

    CompletionChunk:
      type: object
      description: One streamed chunk of a legacy text completion.
      required:
        - id
        - object
        - created
        - choices
      properties:
        id:
          type: string
        object:
          type: string
          const: completion.chunk
        created:
          type: integer
        model:
          type: string
        token:
          $ref: '#/components/schemas/CompletionToken'
        choices:
          type: array
          items:
            $ref: '#/components/schemas/CompletionChunkChoice'
        usage:
          oneOf:
            - $ref: '#/components/schemas/UsageData'
            - type: 'null'
        seed:
          type: integer
        finish_reason:
          oneOf:
            - type: string
              enum: [stop, eos, length, tool_calls, function_call]
            - type: 'null'

    CompletionToken:
      type: object
      properties:
        id:
          type: integer
        text:
          type: string
        logprob:
          type: number
        special:
          type: boolean

    CompletionChunkChoice:
      type: object
      properties:
        text:
          type: string
        index:
          type: integer
        delta:
          $ref: '#/components/schemas/CompletionChunkDelta'

    CompletionChunkDelta:
      type: object
      properties:
        role:
          type: string
          enum: [system, user, assistant, function, tool]
        content:
          oneOf:
            - type: string
            - type: 'null'
        token_id:
          type: integer

    # ---------- Audio Speech (TTS) ----------

    AudioSpeechRequestBody:
      type: object
      required:
        - model
        - input
        - voice
      properties:
        model:
          type: string
          description: TTS model identifier (e.g. `cartesia/sonic`, `hexgrad/Kokoro-82M`, `canopylabs/orpheus-3b-0.1-ft`).
        input:
          type: string
          description: Text to convert to audio.
        voice:
          type: string
          description: Model-specific voice identifier.
        stream:
          type: boolean
          default: false
          description: If true, output is streamed for several characters at a time instead of waiting for the full response.
        response_format:
          type: string
          enum: [mp3, wav, raw]
          default: wav
          description: If streaming is true, the only supported format is `raw`.
        response_encoding:
          type: string
          enum: [pcm_f32le, pcm_s16le, pcm_mulaw, pcm_alaw]
          default: pcm_f32le
        sample_rate:
          type: integer
          default: 44100
          description: Sample rate in Hz.

    AudioSpeechChunk:
      type: object
      description: One streamed audio chunk carrying base64-encoded raw audio.
      required:
        - object
        - model
        - b64
      properties:
        object:
          type: string
          const: audio.tts.chunk
        model:
          type: string
        b64:
          type: string
          format: byte
          description: Base64-encoded audio stream segment.

    # ---------- Shared ----------

    UsageData:
      type: object
      properties:
        prompt_tokens:
          type: integer
        completion_tokens:
          type: integer
        total_tokens:
          type: integer