IT Brief Canada - Technology news for CIOs & IT decision-makers
Story image

OpenAI reveals how deep research transforms inquiry

Today

OpenAI has introduced a new agentic AI system called 'deep research,' designed to handle complex, time-consuming research tasks by simulating the work of a human analyst.

Presented by researchers Isa Fulford and Edward Sun during an OpenAI forum event, the new tool is powered by a fine-tuned version of OpenAI's upcoming O3 model and leverages advanced reasoning and browsing capabilities.

"Deep research is an agent in ChatGPT that can do work for you independently," Fulford explained.

"You give it a prompt, and it will find, analyse, and synthesise hundreds of online sources to create a comprehensive report at the level of a research analyst."

The system is intended to help users across a range of sectors—from academia and medicine to business and software development. "Members are finding that deep research asks clarifying questions to refine research before it even starts," said Fulford. "We think that deep research can accomplish in tens of minutes what would take a human many hours."

The model represents a major step forward in OpenAI's work with reasoning systems, building on reinforcement learning techniques introduced in its earlier models. Fulford explained how the company developed the tool: "We launched O1 in September of last year. This was the first model that we released in this new paradigm of training where models are trained to think before answering… and we called this text where the model is thinking, 'chain of thought'."

This method of structured, internal reasoning proved effective not only in tasks such as maths and coding, but also in navigating complex real-world information environments. "Around a year ago internally, we were seeing really great success… and we wondered if we could apply these same methods but for tasks that are more similar to what a large number of users do in their daily lives and jobs," Fulford said.

Sun detailed how the tool works by combining reasoning with specialised capabilities like web browsing and code execution. "The browser tool helps the model to aggregate or synthesise real-time data, and the Python tool is helping the model to process this data," he explained. The system dynamically alternates between reasoning and action, using reinforcement learning to improve over time.

One striking example involved analysing medal data from the 2020 Tokyo Olympics. "You can see how the model interleaved reasoning with actual tool calls to search for information, refine the data, and process it programmatically," Sun said.

Unlike older approaches that rely on a single-pass search or instruction-following, deep research iteratively refines its answers. "We train the model with end-to-end reinforcement learning," Sun added. "We directly optimise the model to actively learn from the feedback, both positive and negative."

OpenAI tested the model extensively against both public and internal benchmarks. According to Fulford, "the model pairing deep research scored a new high of 26.6%" on the Humanities Last Exam, an expert-level evaluation spanning over 100 subjects.

On another benchmark, GAIA, the tool also achieved a state-of-the-art result for multi-step web browsing and reasoning.

The model also underwent safety evaluations prior to release. "We did extensive red teaming with external testers, and then also went through preparedness and governance reviews that we always do at OpenAI," Fulford said.

Despite strong results, the researchers acknowledged current limitations. "It still may hallucinate facts or infer things incorrectly," Fulford said.

"Sometimes it struggles to distinguish between authoritative sources and rumours."

Use cases continue to emerge in unexpected domains. "People might be using the model a lot for coding. And that's been a really big use case," Fulford observed. Other domains include scientific and medical research, where professionals have begun verifying the model's output against their own expertise.

Users are also adapting their behaviour to suit the model. "We've seen interesting user behaviour where people put a lot of effort into refining their prompts using O1 or another model," Fulford said. "And then only after really refining that instruction, they'll send it to deep research… which makes sense if you're going to wait a long time for an output."

Currently, deep research is available to users on the Plus, Pro, Teams, Enterprise and EDU plans.

"We're very excited to release a smaller, cheaper model to the free tier," Fulford confirmed. The team also plans to improve personalisation and explore ways to let users incorporate subscription services or private data into the research process.

"This showcases how the model can effectively break down a complex task, gather information from various sources, and structure the response coherently for the user," Sun said in closing.

OpenAI's forum audience, composed of members across academia, government, and business, left the event with a clear sense that deep research marks a meaningful step toward AI systems capable of handling work currently done by skilled analysts.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X