Waiting for a long model response to finish before showing anything feels slow. Streaming fixes that: the model sends its answer token by token as it is generated, so your users see text appear in real time, just like a typing effect. Model Database supports streaming through the same OpenAI-compatible interface using Server-Sent Events (SSE).
This tutorial shows how streaming works on the wire and how to consume it from curl, Python, and Node.
How streaming works
To enable streaming, add "stream": true to your chat completion request. Instead of one JSON body, the server keeps the connection open and pushes a sequence of SSE events. Each event is a line beginning with data: followed by a JSON chunk. The chunks contain incremental content in choices[0].delta.content rather than the full message. The stream ends with a final line:
data: [DONE]
Streaming with curl
You can watch the raw event stream directly. The -N flag disables buffering so chunks print as they arrive:
curl -N https://modeldatabase.com/v1/chat/completions \
-H "Authorization: Bearer $MDB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-6",
"stream": true,
"messages": [{"role": "user", "content": "Write a haiku about the ocean."}]
}'
You will see a series of data: {...} lines, each carrying a small piece of the haiku, then data: [DONE].
Streaming in Python
The OpenAI SDK turns the SSE stream into a simple iterator. Set stream=True and loop over the chunks, printing each delta as it arrives:
from openai import OpenAI
client = OpenAI(
base_url="https://modeldatabase.com/v1",
api_key="mdb_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Explain SSE in three sentences."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()
The flush=True ensures text appears immediately rather than being buffered. To capture the full text, accumulate the deltas into a string as you go.
Streaming in Node.js
In Node, the returned stream is an async iterable. Use for await to read chunks and write them to stdout without newlines:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://modeldatabase.com/v1",
apiKey: process.env.MDB_API_KEY,
});
const stream = await client.chat.completions.create({
model: "google/gemini-2.0-flash",
messages: [{ role: "user", content: "Count from 1 to 5 slowly." }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content || "";
process.stdout.write(delta);
}
process.stdout.write("\n");
Billing and usage with streaming
Streaming responses are billed just like normal ones. The X-MDB-Charged-USD and X-MDB-Balance-USD headers are sent with the response, and they are available on the raw HTTP response object before you begin iterating the body. If you want token usage in the stream itself, you can request it with the OpenAI stream_options parameter:
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hi"}],
stream=True,
stream_options={"include_usage": True},
)
With this enabled, a final chunk carries a usage object with prompt and completion token counts after the content finishes.
Tips for production
- Handle disconnects. If a client closes the connection mid-stream, stop iterating and clean up. Wrap the loop in try/finally.
- Flush to the browser. If you are proxying the stream to a web frontend, forward each chunk immediately and disable response buffering on your server.
- Always check for empty deltas. Some chunks carry role or metadata with no
content, so guard againstNoneorundefined.
Streaming makes your app feel dramatically faster for the same cost. Get your key at your dashboard and see the full streaming reference in the docs.