OpenAI shares how its internal data agent works

Joseph Nordqvist

Joseph Nordqvist

January 29, 2026 at 10:50 PM UTC

4 min read
0:004:43
  • 1What it is: OpenAI described an internal-only AI “data agent” for employees to query company data in natural language.
  • 2How it works (high level): It uses layered context (metadata, annotations, code enrichment, institutional docs), memory, and evals to improve reliability.
  • 3Why it matters: It shows what “enterprise-ready” AI analytics looks like in practice: permissions, grounding, and continuous testing, not just chat.
OpenAI shares how its internal data agent works

OpenAI published a technical write-up on January 29, 2026 describing an internal AI “data agent” it built for employees to query and analyze OpenAI’s own data systems.[1] 

The company says the tool is internal-only and not a product offering, and that it is designed to speed up routine analysis and reduce common errors when working with large, complex datasets. 

Context and background

A lot of organizations are adding natural-language interfaces on top of data warehouses so non-specialists can ask questions without writing SQL, or so analysts can iterate faster. Examples include Microsoft’s Copilot experiences for SQL in Fabric[2], Google’s Gemini assistance in BigQuery data canvas[3], and Snowflake’s Cortex Analyst for text-to-SQL style workflows[4]

A core challenge in these systems is grounding responses in the right tables, definitions, and access rules, especially when data is spread across many datasets with similar schemas. OpenAI positions its approach as a response to those practical problems at internal scale. 

Key details

OpenAI says it built the agent to operate over a large internal data platform, citing more than 3,500 internal users, over 600 petabytes of data, and around 70,000 datasets. 

OpenAI-internal-agent-workflow-diagram
OpenAI’s internal data agent workflow: employees ask questions via Slack, web, or developer tools; an agent API pulls relevant company context and data knowledge, runs live warehouse queries when needed, and uses GPT-5.2 to generate an answer. (Image: OpenAI)

The agent is described as being powered by GPT-5.2 and integrated into places employees already work, including Slack, a web interface, and development environments. 

OpenAI says the system blends multiple layers of context, including schema metadata, human annotations, code-derived understanding of how tables are produced, institutional documentation, and a memory mechanism that can retain corrections for future use. 

For retrieval, OpenAI describes using retrieval-augmented generation (RAG) so the agent can pull relevant embedded context rather than scanning large stores at runtime. 

On reliability, OpenAI says it uses evaluation pipelines that compare both generated SQL and resulting data, and feeds signals into an eval “grader” to track correctness and regressions as the system changes. 

On security, OpenAI says the agent enforces existing access controls through “pass-through” permissions, meaning users can only query data they are already authorized to access, and the tool flags missing access or suggests alternatives the user can access. 

The company also says the agent is built for transparency by surfacing assumptions and execution steps and linking to underlying query results for inspection. 

Why this matters

This write-up offers a concrete example of how a company is trying to make AI systems useful in day-to-day analytical work without treating them as free-form chat tools. 

It also highlights where these systems tend to succeed or fail in practice: table selection, definitions, and permissions often matter as much as model quality, and organizations are increasingly combining models with retrieval, evaluation, and auditability features to manage that complexity.

Outlook (uncertain)

OpenAI says it plans to keep improving the agent’s ability to handle ambiguous questions and to strengthen validations, but it does not describe plans to ship this specific internal tool externally. 

More broadly, the same pattern is likely to continue across enterprise data products, where vendors add natural-language query and analysis features on top of governed data systems, and then expand them into more workflows over time.[5][6] This is a general industry direction, not a prediction about any one vendor’s roadmap. 

Joseph Nordqvist

Written by

Joseph Nordqvist

Joseph founded AI News Home in 2026. He holds a degree in Marketing and Publicity and completed a PGP in AI and ML: Business Applications at the McCombs School of Business. He is currently pursuing an MSc in Computer Science at the University of York.

This article was written by the AI News Home editorial team with the assistance of AI-powered research and drafting tools. All analysis, conclusions, and editorial decisions were made by human editors. Read our Editorial Guidelines

References

  1. 1.
    ^Inside OpenAI’s in-house data agentBonnie Xu, Aravind Suresh, and Emma Tang, OpenAI, January 29, 2026
    Primary
  2. 2.
  3. 3.
  4. 4.
    ^Cortex Analyst, Snowflake
  5. 5.
    ^The Rise of AI Data Teams: From Chatbots to Autonomous AgentsDavid Jayatillake, Cube Dev, Inc, May 27, 2025
  6. 6.
    ^A new era of data engineering: dbt Copilot is GAChakshu Mehta, Tom Grabowski, dbt Labs, Inc., March 19, 2025

Was this useful?

Jan 29, 2026 at 11:56 PM UTCLatest

Formatting.

Joseph Nordqvist
Jan 29, 2026 at 11:53 PM UTC

Formatting.

Joseph Nordqvist
Jan 29, 2026 at 11:52 PM UTC

Added more details about the internal data platform.

Joseph Nordqvist
Story published
OpenAI shares how its internal data agent works | AI News Home