Google DeepMind has introduced SIMA 2, its Gemini-powered AI agent that can follow instructions, reason, and teach itself new skills in virtual environments. The company has reportedly enhanced its predecessor’s performance, nearing human-level task completion.
SIMA stands for Scalable Instructable Multiworld Agent and it was introduced last year as a generalist AI that could follow basic instructions across a wide range of virtual environments. According to Google, SIMA was a big leap in teaching AI how to translate language into meaningful action in rich, 3D worlds.
The latest SIMA 2 is dubbed as the next milestone in Google’s research creating general and helpful AI agents. The new AI Agent integrates advanced capabilities with Gemini models and has transformed from an ‘instruction-follower’ into an interactive gaming companion. SIMA 2 can also follow human-language instructions in virtual worlds, and it can also think about goals, talk with users, and improve itself with time.
Story continues below this ad
Google claims that this is a significant step in the direction of Artificial General Intelligence (AGI).
When it comes to performance, the agent reportedly completed 45-75 per cent of tasks in never-before-seen games such as ASKA, MineDojo, where SIMA 1 completed 15-30 per cent on the same challenges.
According to the official blog post, SIMA 2 improves itself through trial and error, without any human training data, using Gemini to create tasks, score attempts, and learn from mistakes. The AI agent explores games by analysing on-screen visuals, simulating keyboard/mouse inputs, and interacting with the user like a gaming companion.
Reportedly, DeepMind also tested SIMA 2 in generated worlds from its Genie 3 model. The AI agent successfully adapted to the environments it had never seen or trained before.
Story continues below this ad
According to the company, SIMA 2’s architecture backed by Gemini’s powerful reasoning abilities allow it to understand high-level goals, perform complex reasoning in pursuit, and skillfully execute goal-oriented actions within games.
“We trained SIMA 2 using a mixture of human demonstration videos with language labels as well as Gemini-generated labels. As a result, SIMA 2 can now describe to the user what it intends to do and detail the steps it’s taking to accomplish its goals,” read the blog post by Google DeepMind.

