Lambda Inference API Chat Completions Streaming (HTTP + SSE)
Version 1.0.0
AsyncAPI 2.6 description of the Lambda (formerly Lambda Labs) **Inference API** chat completion streaming surface. The Lambda Inference API is an OpenAI-compatible REST gateway hosted at `https://api.lambda.ai/v1`. Chat completions are issued by `POST /chat/completions` with a JSON body that follows the OpenAI Chat Completions schema. When the request body sets `stream: true`, the server responds with `Content-Type: text/event-stream` and emits a sequence of Server-Sent Events whose `data:` payloads each carry one `chat.completion.chunk` JSON object, followed by a final `data: [DONE]` sentinel that marks end of stream. This SSE behavior is inherited from the OpenAI Chat Completions contract that Lambda advertises full compatibility with. SSE is a one-way, server-to-client HTTP streaming channel; it is **not** WebSocket. Lambda does not publish a WebSocket, MQTT, AMQP, Kafka, or webhook surface for inference. This AsyncAPI document models only the streamed events emitted on the SSE response. The request body fields (model, messages, temperature, max_tokens, tools, etc.) belong to the synchronous request side and are out of scope here; the parent REST surface is cataloged separately. Status note (verified 2026-05-29 on https://lambda.ai/inference): Lambda has announced that the Inference API is winding down in favor of customer self-hosted deployments on Lambda GPU instances. No end-of-life date was published at the time of authoring; the SSE contract described here remains active while the service is operational. Only fields and behaviors that Lambda explicitly advertises as OpenAI-compatible are modeled. Provider-proprietary metadata (e.g. Groq-style `x_groq`, Together-style `usage` sidecars) is intentionally not invented for Lambda; if Lambda later publishes proprietary stream extensions, they should be added here against a primary source.
Channels
Messages
Servers
api.lambda.ai/v1