Natural-Language Analytics With LLMs

Most people in a company cannot write SQL, yet they all have questions about the data. A natural-language analytics layer lets them ask "What were our top five products by revenue last month?" and get an answer. This article builds that feature on Model Database by translating English into SQL, running it safely, and explaining the result.

The core idea: the LLM writes queries, but your application keeps full control over what runs. Never let a model touch your database directly.

The safe architecture

A trustworthy text-to-SQL system separates generation from execution:

Schema context: give the model your table and column definitions, not your data.
Generation: the model returns a single read-only SQL statement.
Guardrails: your code validates the SQL before it runs.
Execution and explanation: run against a read replica with a restricted role, then have the model summarize the rows.

Providing schema context

The model can only write correct SQL if it knows your schema. Supply a compact description of relevant tables.

SCHEMA = """
Tables:
orders(id, customer_id, product_id, amount_cents, created_at)
products(id, name, category)
customers(id, name, country, signup_date)
"""

For large databases, retrieve only the tables relevant to the question rather than dumping the whole catalog. That keeps the prompt small and the model focused.

Generating the query

from openai import OpenAI

client = OpenAI(
    base_url="https://modeldatabase.com/v1",
    api_key="mdb_live_...",
)

SYS = f"""You write PostgreSQL for a read-only analytics database.
{SCHEMA}
Rules: SELECT statements only. Always add a LIMIT (max 1000).
Return only SQL, no commentary."""

def to_sql(question):
    resp = client.chat.completions.create(
        model="anthropic/claude-sonnet-4-6",
        messages=[
            {"role": "system", "content": SYS},
            {"role": "user", "content": question},
        ],
        temperature=0,
    )
    return resp.choices[0].message.content.strip()

A capable model like anthropic/claude-sonnet-4-6 handles joins and aggregation reliably. Temperature zero gives reproducible queries.

Guardrails are non-negotiable

Prompt instructions are not a security boundary. Enforce read-only access in code and at the database level.

import re

BANNED = re.compile(
    r"\b(insert|update|delete|drop|alter|truncate|grant|copy)\b",
    re.IGNORECASE,
)

def is_safe(sql):
    if BANNED.search(sql):
        return False
    if sql.count(";") > 1:   # block stacked statements
        return False
    return sql.lower().lstrip().startswith("select")

On top of this, connect with a database user that only has SELECT on a read replica, and set a statement timeout. Defense in depth means a bad query can never cause damage even if it slips past the regex.

Explaining the results

Raw rows are not an answer. Send a sample of the results back to the model to produce a plain-language summary that a non-technical user understands.

def explain(question, rows):
    resp = client.chat.completions.create(
        model="openai/gpt-4o-mini",
        messages=[
            {"role": "system", "content":
             "Summarize the query result in 2-3 sentences for a "
             "business user. State numbers plainly. Do not invent data."},
            {"role": "user", "content":
             f"Question: {question}\nRows: {rows[:20]}"},
        ],
        temperature=0.3,
    )
    return resp.choices[0].message.content

Use a small model here; summarizing a handful of rows is easy work and keeps cost down.

Accuracy and trust

Show the SQL: let users expand the generated query. Transparency builds trust and helps analysts catch mistakes.
Validate the shape: if a question is ambiguous, have the model ask a clarifying question instead of guessing.
Cache: identical questions produce identical SQL at temperature zero, so cache by question text to save tokens.
Curate examples: include a few example question-to-SQL pairs in the system prompt for your trickiest tables.

Choosing and testing models

Text-to-SQL rewards stronger reasoning, so start with anthropic/claude-sonnet-4-6 for generation and a cheaper model for explanation. Build a test set of question-and-expected-SQL pairs and run it whenever you change models. Since every model is behind the same Model Database endpoint, switching is one string, and prepaid billing lets you benchmark candidates on real questions before committing.

Create a key and load credit at your dashboard, and see the chat reference in the docs.

Natural-Language Analytics With LLMs

The safe architecture

Providing schema context

Generating the query

Guardrails are non-negotiable

Explaining the results

Accuracy and trust

Choosing and testing models

More in Use Cases

Building a Customer Support Assistant

A Content Generation Pipeline That Scales

Automating Code Review With LLMs