Most people in a company cannot write SQL, yet they all have questions about the data. A natural-language analytics layer lets them ask "What were our top five products by revenue last month?" and get an answer. This article builds that feature on Model Database by translating English into SQL, running it safely, and explaining the result.
The core idea: the LLM writes queries, but your application keeps full control over what runs. Never let a model touch your database directly.
The safe architecture
A trustworthy text-to-SQL system separates generation from execution:
- Schema context: give the model your table and column definitions, not your data.
- Generation: the model returns a single read-only SQL statement.
- Guardrails: your code validates the SQL before it runs.
- Execution and explanation: run against a read replica with a restricted role, then have the model summarize the rows.
Providing schema context
The model can only write correct SQL if it knows your schema. Supply a compact description of relevant tables.
SCHEMA = """
Tables:
orders(id, customer_id, product_id, amount_cents, created_at)
products(id, name, category)
customers(id, name, country, signup_date)
"""
For large databases, retrieve only the tables relevant to the question rather than dumping the whole catalog. That keeps the prompt small and the model focused.
Generating the query
from openai import OpenAI
client = OpenAI(
base_url="https://modeldatabase.com/v1",
api_key="mdb_live_...",
)
SYS = f"""You write PostgreSQL for a read-only analytics database.
{SCHEMA}
Rules: SELECT statements only. Always add a LIMIT (max 1000).
Return only SQL, no commentary."""
def to_sql(question):
resp = client.chat.completions.create(
model="anthropic/claude-sonnet-4-6",
messages=[
{"role": "system", "content": SYS},
{"role": "user", "content": question},
],
temperature=0,
)
return resp.choices[0].message.content.strip()
A capable model like anthropic/claude-sonnet-4-6 handles joins and aggregation reliably. Temperature zero gives reproducible queries.
Guardrails are non-negotiable
Prompt instructions are not a security boundary. Enforce read-only access in code and at the database level.
import re
BANNED = re.compile(
r"\b(insert|update|delete|drop|alter|truncate|grant|copy)\b",
re.IGNORECASE,
)
def is_safe(sql):
if BANNED.search(sql):
return False
if sql.count(";") > 1: # block stacked statements
return False
return sql.lower().lstrip().startswith("select")
On top of this, connect with a database user that only has SELECT on a read replica, and set a statement timeout. Defense in depth means a bad query can never cause damage even if it slips past the regex.
Explaining the results
Raw rows are not an answer. Send a sample of the results back to the model to produce a plain-language summary that a non-technical user understands.
def explain(question, rows):
resp = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "system", "content":
"Summarize the query result in 2-3 sentences for a "
"business user. State numbers plainly. Do not invent data."},
{"role": "user", "content":
f"Question: {question}\nRows: {rows[:20]}"},
],
temperature=0.3,
)
return resp.choices[0].message.content
Use a small model here; summarizing a handful of rows is easy work and keeps cost down.
Accuracy and trust
- Show the SQL: let users expand the generated query. Transparency builds trust and helps analysts catch mistakes.
- Validate the shape: if a question is ambiguous, have the model ask a clarifying question instead of guessing.
- Cache: identical questions produce identical SQL at temperature zero, so cache by question text to save tokens.
- Curate examples: include a few example question-to-SQL pairs in the system prompt for your trickiest tables.
Choosing and testing models
Text-to-SQL rewards stronger reasoning, so start with anthropic/claude-sonnet-4-6 for generation and a cheaper model for explanation. Build a test set of question-and-expected-SQL pairs and run it whenever you change models. Since every model is behind the same Model Database endpoint, switching is one string, and prepaid billing lets you benchmark candidates on real questions before committing.
Create a key and load credit at your dashboard, and see the chat reference in the docs.