You cannot control what you cannot see. The teams that keep LLM costs under control are the ones that treat spend as a first-class metric, monitored as closely as latency or error rate. The good news is that Model Database gives you everything you need to do this on every single response.
This article shows how to build real-time visibility into usage and spend using the X-MDB headers and your dashboard.
Two headers, total transparency
Every response from Model Database carries two headers:
- X-MDB-Charged-USD — the exact cost of that request.
- X-MDB-Balance-USD — your remaining prepaid credit after the charge.
Because the charge is per request, you never have to estimate or reconcile against a monthly statement. The number is authoritative and immediate.
Capture the headers in code
With the OpenAI SDK, grab the raw response so you can read the headers alongside the parsed body.
import openai
client = openai.OpenAI(base_url="https://modeldatabase.com/v1",
api_key="mdb_live_...")
resp = client.chat.completions.with_raw_response.create(
model="openai/gpt-4o-mini",
messages=[{"role":"user","content":"Hello"}])
charged = float(resp.headers["X-MDB-Charged-USD"])
balance = float(resp.headers["X-MDB-Balance-USD"])
completion = resp.parse()
log_spend(endpoint="chat", model="openai/gpt-4o-mini",
charged=charged, balance=balance)
Or read them straight off a raw HTTP response:
curl -sD - https://modeldatabase.com/v1/chat/completions \
-H "Authorization: Bearer mdb_live_..." \
-H "Content-Type: application/json" \
-d @req.json | grep -i x-mdb
Turn per-request data into metrics
Once you log X-MDB-Charged-USD with useful labels, you can answer the questions that actually matter. Emit it to your metrics system tagged by endpoint, model, customer, and feature.
- Cost per feature: sum charges by the feature label to see what each part of your product costs.
- Cost per customer: spot the heavy users who may need a different plan.
- Cost per model: confirm your cheap-model routing is actually routing.
- Spend rate: dollars per minute, so a runaway loop shows up instantly.
Alert on the balance
The balance header is a built-in low-fuel gauge. Because billing is prepaid and a zero balance returns HTTP 402, you want to act before you hit empty. Watch the trend and alert early.
if balance < 25.0:
notify_ops(f"MDB balance low: ${balance:.2f}")
Also handle the 402 explicitly so a depleted balance degrades gracefully instead of throwing raw errors at users.
try:
resp = client.chat.completions.create(...)
except openai.APIStatusError as e:
if e.status_code == 402:
queue_for_retry_after_topup(request)
Use the dashboard for the big picture
Your code gives you fine-grained, labeled telemetry; the dashboard gives you the aggregate view: current balance, spend over time, and usage trends across models. Use the dashboard to top up credit, watch daily totals, and verify that the numbers your logs report match the account of record. Together they form a complete loop, real-time signals in your app and a reliable summary in the console.
Watch the cost cap in action
The per-request cost cap is part of your monitoring story too. If you see requests getting blocked by the cap, that is a signal: either a prompt has grown too large or something is generating far more output than intended. Treat cap hits as an alertable event rather than silent noise.
Build the habit early
Add header logging on day one, before you have a cost problem. A few lines of instrumentation now means that when traffic grows you already have the dashboards to understand it, and you will never be surprised by a bill again.
Top up credit and review your spend trends on your dashboard, or check per-model rates on the pricing page.