Use Cases

Building a Customer Support Assistant

JMJonas MeyerJun 10, 20264 min read

A customer support assistant is one of the highest-leverage features you can ship: it deflects repetitive tickets, answers customers instantly, and hands off cleanly to humans when it hits a wall. With Model Database you can build one against a single OpenAI-compatible API, swapping models freely as your needs change.

This walkthrough covers the architecture, a working backend endpoint, and the production concerns that separate a demo from something you can put in front of real users.

Architecture at a glance

A practical support assistant has four moving parts:

Model Database sits in the middle: your code talks to https://modeldatabase.com/v1 and you pick a model per request. Use a fast, inexpensive model like openai/gpt-4o-mini for routine questions, and reserve anthropic/claude-sonnet-4-6 for complex or sensitive threads.

The core endpoint

Point the OpenAI SDK at Model Database by changing two lines: the base URL and the API key. Everything else is standard.

from openai import OpenAI

client = OpenAI(
    base_url="https://modeldatabase.com/v1",
    api_key="mdb_live_...",
)

SYSTEM_PROMPT = """You are a support agent for Acme.
Answer only from the provided context. If the context does
not contain the answer, say you are not sure and offer to
connect the user with a human. Be concise and friendly."""

def answer(question, context_docs, history):
    context = "\n\n".join(context_docs)
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "system", "content": f"Knowledge base:\n{context}"},
        *history,
        {"role": "user", "content": question},
    ]
    resp = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=messages,
        temperature=0.2,
    )
    return resp.choices[0].message.content

Keeping temperature low makes answers more deterministic and less likely to invent policy. Grounding the model in retrieved documents is what keeps it honest.

Streaming for a responsive UI

Support chat feels much faster when tokens appear as they are generated. Set stream=True and forward chunks to your frontend over Server-Sent Events or a WebSocket.

stream = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=messages,
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        yield delta  # push to the browser

Knowing when to escalate

The assistant should never guess about billing, refunds, or account security. A simple, reliable pattern is to ask the model to return structured output that includes a confidence signal and an escalation flag.

import json

ROUTER_PROMPT = """Classify the user's message. Return JSON:
{"category": "...", "needs_human": true|false, "reason": "..."}
Set needs_human to true for refunds, legal, or account access."""

def triage(question):
    resp = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[
            {"role": "system", "content": ROUTER_PROMPT},
            {"role": "user", "content": question},
        ],
        response_format={"type": "json_object"},
        temperature=0,
    )
    return json.loads(resp.choices[0].message.content)

If needs_human is true, create a ticket and tell the customer a person will follow up. This keeps the bot inside its competence and builds trust.

Production concerns

Where to go next

Once the basics work, add per-customer context (plan tier, recent orders) to the retrieval step, and test alternative models by changing a single string. A nightly job can review escalated threads to find gaps in your help center.

Grab an API key and free credit at your dashboard, and see the full request and streaming reference in the docs.

← All articles Get your API key →