Building Agents That Actually Work

An "agent" is a loop: the model observes state, decides on an action, the action runs, and the result feeds back in until a goal is met. The concept is simple. Making one that's reliable, bounded, and debuggable is where most projects struggle. This article focuses on the engineering, not the hype.

Everything here uses the Model Database API with standard OpenAI-style tool calling, so you can swap models without rewriting your loop.

Start with the smallest loop that works

Resist building a multi-agent swarm on day one. A single model with three good tools and a clear stopping condition beats an elaborate graph you can't debug. The minimal agent is: a system prompt defining the goal and rules, a set of tools, and a loop that executes tool calls until the model returns a final answer.

from openai import OpenAI
import json

client = OpenAI(base_url="https://modeldatabase.com/v1", api_key="mdb_live_...")

def run_agent(goal, tools, handlers, max_steps=8):
    messages = [
        {"role": "system", "content": "You are a task-completing agent. "
         "Use tools when needed. Stop when the goal is met."},
        {"role": "user", "content": goal},
    ]
    for step in range(max_steps):
        resp = client.chat.completions.create(
            model="anthropic/claude-sonnet-4-6",
            messages=messages, tools=tools,
        )
        msg = resp.choices[0].message
        if not msg.tool_calls:
            return msg.content
        messages.append(msg)
        for call in msg.tool_calls:
            args = json.loads(call.function.arguments)
            result = handlers[call.function.name](**args)
            messages.append({
                "role": "tool", "tool_call_id": call.id,
                "content": json.dumps(result),
            })
    return "Stopped: step budget exhausted."

Bound everything

Unbounded agents burn credits and loop forever. Always enforce:

A step budget (max_steps above) so the loop terminates.
A token/cost ceiling per run, checked between steps.
A wall-clock timeout on each tool call.

When a budget is hit, return partial progress with a clear status rather than failing silently.

Make tools forgiving and observable

Tool results are the agent's only feedback. Return errors as data, not exceptions, so the model can adapt: a tool that returns {"error": "city not found, try a full name"} lets the model recover, while a raw crash ends the run. Log every tool call with its arguments and result; this trace is the only way you'll debug why an agent went sideways.

Manage the context window

Long-running agents accumulate history until they overflow the window or get expensive. Strategies that work:

Summarize old steps into a compact running note once history grows.
Drop verbose tool outputs after they've been used, keeping only conclusions.
Externalize memory to a store and retrieve only what's relevant per step.

Plan, then act, for hard tasks

For multi-step goals, ask the model to produce a short plan first, then execute it. A separate planning step using a stronger model (for example anthropic/claude-opus-4-8) followed by a cheaper model for execution often gives better results per dollar than one model doing both.

Honest limitations

Agents compound errors: a wrong step early poisons everything after it. They are non-deterministic, so the same input can take different paths, which makes testing harder. Keep humans in the loop for irreversible actions (sending money, deleting data) by requiring an explicit confirmation tool. And measure: log success rate on a fixed task set so you know whether a prompt change actually helped.

Build and instrument your first agent with a key from your dashboard, and find the tool-calling reference in the docs.

Building Agents That Actually Work

Start with the smallest loop that works

Bound everything

Make tools forgiving and observable

Manage the context window

Plan, then act, for hard tasks

Honest limitations

More in Engineering

A Practical Guide to Retrieval-Augmented Generation

Function Calling and Tool Use, Explained

Getting Reliable JSON Out of LLMs