Cost & Scaling

Estimating Cost per Feature Before You Ship

EFElena FischerJan 9, 20264 min read

The worst time to discover what an LLM feature costs is after it ships and the bill arrives. The best time is before you write the production code, when a back-of-the-envelope estimate can still change your design. Estimating cost per feature up front turns LLM spend from a surprise into a planned line item.

Here is a repeatable way to estimate, validate, and track feature cost on Model Database.

Start with a usage model

Every feature has a few drivers that determine its cost. Write them down:

With those five numbers you can estimate cost before a single request is sent.

Do the napkin math

Use the rough rule that one token is about four characters of English. Suppose a "summarize this thread" feature has a 500-token system prompt, pulls in 1,500 tokens of thread context, and produces a 250-token summary. That is 2,000 input tokens and 250 output tokens per call.

Now layer in volume. If 10,000 users each use it three times a day, that is 30,000 calls daily: 60 million input tokens and 7.5 million output tokens per day. Multiply each by the per-token rate for your chosen model and you have a daily cost estimate you can defend in a planning meeting.

Validate with a real probe

An estimate built on assumptions can be wrong. Before trusting it, send a handful of realistic requests and read the actual charge from the headers. Ground truth beats a spreadsheet.

import openai
client = openai.OpenAI(base_url="https://modeldatabase.com/v1",
                       api_key="mdb_live_...")

total = 0.0
for sample in representative_inputs:      # 20-50 real examples
    r = client.chat.completions.with_raw_response.create(
        model="anthropic/claude-sonnet-4-6",
        max_tokens=300,
        messages=build_messages(sample))
    total += float(r.headers["X-MDB-Charged-USD"])

print("avg cost/call:", total / len(representative_inputs))

Multiply that measured average by your projected call volume and you have an estimate grounded in real X-MDB-Charged-USD figures rather than guesses.

Compare designs before committing

Estimation pays off most when you use it to choose. Run the same probe across a few designs and pick the cheapest one that meets quality:

Often a design change cuts cost far more than any per-request tweak, and you can only see that by comparing estimates side by side.

Convert cost to a unit that matters

Raw daily dollars are hard to reason about. Translate the estimate into a business unit: cost per active user, per document processed, or per conversation. A feature that costs a fraction of a cent per user per day is easy to approve; one that costs more than a user is worth needs a redesign. This framing also tells you whether the feature can pay for itself.

Set guardrails from the estimate

Your estimate directly informs your safety limits. Set max_tokens from the output size you measured, and set the per-request cost cap just above your largest legitimate call so a malfunction is blocked rather than billed. The estimate is not just a forecast, it is the source of your limits.

Keep the estimate alive

After launch, compare logged X-MDB-Charged-USD against your forecast. If reality drifts from the model, your prompts or usage patterns have changed, and updating the estimate keeps your forecasts trustworthy for the next feature.

Probe a few real requests and watch the charges add up on your dashboard, then compare model rates on the pricing page before you ship.

← All articles Get your API key →