This tutorial gets you from an empty Python file to a working chat completion against Model Database. We will install the OpenAI SDK, point it at Model Database, send a message, read the reply, and inspect the billing headers that tell you what each call cost.
Because Model Database is OpenAI-SDK compatible, you use the familiar openai Python package, just with a different base URL and your mdb_live_ key.
Step 1: Set up your environment
Create a project folder and a virtual environment, then install the SDK:
python -m venv .venv
source .venv/bin/activate
pip install openai
Store your key in an environment variable so it never ends up in source control:
export MDB_API_KEY="mdb_live_xxxxxxxxxxxxxxxxxxxxxxxx"
Step 2: Create the client
The only Model Database-specific configuration is the base_url and the key. Create a file called chat.py:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://modeldatabase.com/v1",
api_key=os.environ["MDB_API_KEY"],
)
Step 3: Send your first message
A chat completion takes a model and a list of messages. Each message has a role (system, user, or assistant) and content:
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6",
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain what an API is in two sentences."},
],
)
print(resp.choices[0].message.content)
Run it with python chat.py and you will see the model's reply printed to your terminal. The system message sets behavior, and the user message is the actual question.
Step 4: Control the output
You can tune the response with standard sampling parameters. temperature controls randomness (lower is more deterministic) and max_tokens caps the length of the reply:
resp = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Give me three blog title ideas about caching."}],
temperature=0.7,
max_tokens=200,
)
for choice in resp.choices:
print(choice.message.content)
Step 5: Hold a multi-turn conversation
The API is stateless, so to continue a conversation you send the full history back each time, appending the model's previous reply as an assistant message:
messages = [{"role": "user", "content": "My name is Sam."}]
resp = client.chat.completions.create(model="anthropic/claude-sonnet-4-6", messages=messages)
# append the assistant reply, then ask a follow-up
messages.append({"role": "assistant", "content": resp.choices[0].message.content})
messages.append({"role": "user", "content": "What did I say my name was?"})
resp = client.chat.completions.create(model="anthropic/claude-sonnet-4-6", messages=messages)
print(resp.choices[0].message.content)
Step 6: Check what each call cost
Every billable response carries the headers X-MDB-Charged-USD and X-MDB-Balance-USD. With the OpenAI SDK you can grab the raw response to read them:
raw = client.chat.completions.with_raw_response.create(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
print("charged:", raw.headers.get("X-MDB-Charged-USD"))
print("balance:", raw.headers.get("X-MDB-Balance-USD"))
completion = raw.parse() # the normal completion object
print(completion.choices[0].message.content)
You can also read resp.usage on a normal response to see prompt and completion token counts, which is useful for estimating cost before it lands on your balance.
Switching models
To try a different model, change one string. The same code runs against google/gemini-2.0-flash, meta-llama/llama-3.3-70b-instruct, or deepseek/deepseek-chat without any other edits.
You now have a complete Python workflow: client, messages, parameters, multi-turn history, and cost tracking. Get a key and credit from your dashboard, and explore every parameter in the docs.