OpenAI shares how its internal data agent works

OpenAI published a technical write-up on January 29, 2026 describing an internal AI “data agent” it built for employees to query and analyze OpenAI’s own data systems.^[1]

The company says the tool is internal-only and not a product offering, and that it is designed to speed up routine analysis and reduce common errors when working with large, complex datasets.

Context and background

A lot of organizations are adding natural-language interfaces on top of data warehouses so non-specialists can ask questions without writing SQL, or so analysts can iterate faster. Examples include Microsoft’s Copilot experiences for SQL in Fabric^[2], Google’s Gemini assistance in BigQuery data canvas^[3], and Snowflake’s Cortex Analyst for text-to-SQL style workflows^[4].

A core challenge in these systems is grounding responses in the right tables, definitions, and access rules, especially when data is spread across many datasets with similar schemas. OpenAI positions its approach as a response to those practical problems at internal scale.

Key details

OpenAI says it built the agent to operate over a large internal data platform, citing more than 3,500 internal users, over 600 petabytes of data, and around 70,000 datasets.

OpenAI-internal-agent-workflow-diagram — OpenAI’s internal data agent workflow: employees ask questions via Slack, web, or developer tools; an agent API pulls relevant company context and data knowledge, runs live warehouse queries when needed, and uses GPT-5.2 to generate an answer. (Image: OpenAI)

The agent is described as being powered by GPT-5.2 and integrated into places employees already work, including Slack, a web interface, and development environments.

OpenAI says the system blends multiple layers of context, including schema metadata, human annotations, code-derived understanding of how tables are produced, institutional documentation, and a memory mechanism that can retain corrections for future use.

For retrieval, OpenAI describes using retrieval-augmented generation (RAG) so the agent can pull relevant embedded context rather than scanning large stores at runtime.

On reliability, OpenAI says it uses evaluation pipelines that compare both generated SQL and resulting data, and feeds signals into an eval “grader” to track correctness and regressions as the system changes.

On security, OpenAI says the agent enforces existing access controls through “pass-through” permissions, meaning users can only query data they are already authorized to access, and the tool flags missing access or suggests alternatives the user can access.

The company also says the agent is built for transparency by surfacing assumptions and execution steps and linking to underlying query results for inspection.

Why this matters

This write-up offers a concrete example of how a company is trying to make AI systems useful in day-to-day analytical work without treating them as free-form chat tools.

It also highlights where these systems tend to succeed or fail in practice: table selection, definitions, and permissions often matter as much as model quality, and organizations are increasingly combining models with retrieval, evaluation, and auditability features to manage that complexity.

Outlook (uncertain)

OpenAI says it plans to keep improving the agent’s ability to handle ambiguous questions and to strengthen validations, but it does not describe plans to ship this specific internal tool externally.

More broadly, the same pattern is likely to continue across enterprise data products, where vendors add natural-language query and analysis features on top of governed data systems, and then expand them into more workflows over time.^[5]^[6] This is a general industry direction, not a prediction about any one vendor’s roadmap.

OpenAI shares how its internal data agent works

Context and background

Key details

Why this matters

Outlook (uncertain)

References

Cite This Article

OpenAI Begins Testing Ads in ChatGPT, Turning “Go” Into the Budget Tier While Plus Stays Ad-Free

The Cowork Selloff: How an Anthropic Plugin Update Rattled Software Stocks

Goldman Sachs Is Building AI Agents With Anthropic for Accounting and Compliance