Streaming

Stream responses to start rendering before the full completion is ready.

import openai

client = openai.OpenAI(
    base_url="https://api.llm7.io/v1",
    api_key="unused",  # Required. Get it for free at https://token.llm7.io/ for higher rate limits.
)

stream = client.chat.completions.create(
    model="fast",
    messages=[
        {"role": "system", "content": "Answer concisely."},
        {"role": "user", "content": "List three tips for fast Python scripts."},
    ],
    stream=True,
    temperature=0.4,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content or ""
    print(delta, end="", flush=True)

Each chunk includes choices[0].delta content. Watch choices[0].finish_reason to know when the stream ends.

Getting started

Text generation

AI tools