OpenAI on Thursday, July 17, launched its powerful ChatGPT agent that can automate complex tasks on its own. In a livestream, OpenAI CEO Sam Altman, along with his team of engineers, gave a live demonstration of ChatGPT agent’s capabilities.
With this, ChatGPT can now do a wide range of tasks – from web browsing to creating presentations. The latest offering from the AI powerhouse has been described as a tool that can accomplish any task on behalf of the user using its own virtual computer.
The ChatGPT Agent is powered by a new model which the company developed specifically for it. The new tool is capable of scanning a user’s calendar to inform them about their upcoming meetings; it can assist with planning, shopping, and even generating slides for presentations.
While the model has no particular name, reportedly it was trained on complex tasks that often required multiple tools. These tools are essentially text browsers, visual browsers, and terminals where users can import their own data through reinforcement learning methods.
OpenAI said that starting today, Pro, Plus, and Team users can access ChatGPT’s new agentic capabilities directly through the tools dropdown from the composer by selecting ‘agent mode’ any time during a conversation.
In simple words, a ChatGPT agent is an AI that has been designed to browse websites, filter results, execute code, run analyses, create editable documents, etc. OpenAI said that at the core of its latest update is something known as a ‘unified agentic system’ which brings together capabilities of its existing tools Operator and Deep Research.
The new ChatGPT agent is packed with multiple tools such as a visual browser, text-based browser, direct API access, etc. It has been designed to automatically pick the best tool for a task. The tool can also access apps such as Gmail and GitHub through connectors.
Story continues below this ad
When it comes to performance benchmarks, the new model that is backing ChatGPT Agent has secured 41.6 per cent on one of the toughest tests – ‘Humanity’s Last Exam (HLE)’. HLE tests an AI’s expertise with academic questions. In FrontierMath, it secured 27.4 per cent and 45.5 per cent in SpreadsheetBench, and 68.9 per cent on BrowseComp – a benchmark that tests an AI model’s web navigation abilities. On data science tasks (DSBench), ChatGPT Agent outperformed humans.
The concept of AI agents gained popularity in 2023, following which big tech like Amazon, Google, and Meta began pushing their AI agents. Recently, in its pursuit of agentic AI, Google hired Windsurf’s CEO and R&D team. Now, OpenAI has followed suit with its launch of ChatGPT Agent, which has been built over its existing AI tool Operator that has been designed to perform web-based tasks. It seems the race among big tech is now to develop AI agents that can be essential tools for users.
© IE Online Media Services Pvt Ltd