Google unveils Gemini AI upgrade for universal assistant vision

Today

Google has detailed developments in its Gemini artificial intelligence (AI) platform and outlined plans to expand the capabilities of multimodal foundation models, with a focus on integrating these systems into products for everyday use.

Over the past decade, Google has concentrated on foundational AI research, including the development of the Transformer architecture that underpins large language models. The company has also advanced AI agent systems through projects such as AlphaGo and AlphaZero, which demonstrated learning and planning in complex games. These methods have been applied to fields including quantum computing, mathematics, life sciences and algorithmic discovery.

Google stated, "We've applied these techniques to make breakthroughs in quantum computing, mathematics, life sciences and algorithmic discovery. And we continue to double down on the breadth and depth of our fundamental research, working to invent the next big breakthroughs necessary for artificial general intelligence (AGI)."

The company is working to enhance its Gemini 2.5 Pro model into what it refers to as a "world model" that is capable of contextual understanding, planning and simulation. According to Google, "This is why we're working to extend our best multimodal foundation model, Gemini 2.5 Pro, to become a 'world model' that can make plans and imagine new experiences by understanding and simulating aspects of the world, just as the brain does."

Progress in creating these models has drawn from previous work in training AI agents for complex games such as Go and StarCraft, as well as the development of tools like Genie 2, which is able to generate interactive 3D simulated environments from a single image prompt. Google stated, "We've been taking strides in this direction for a while, from our pioneering work training agents to master complex games like Go and StarCraft, to building Genie 2, which is capable of generating 3D simulated environments that you can interact with, from a single image prompt."

Gemini models are already demonstrating emerging "world model" capabilities, including applying knowledge and reasoning to simulate environments, as well as robotics training. Google noted, "Already, we can see evidence of these capabilities emerging in Gemini's ability to use world knowledge and reasoning to represent and simulate natural environments, Veo's deep understanding of intuitive physics, and the way Gemini Robotics teaches robots to grasp, follow instructions and adjust on the fly."

Developing Gemini into a "world model" is described by Google as a critical step in producing a universal AI assistant. The company explained, "Making Gemini a world model is a critical step in developing a new, more general and more useful kind of AI — a universal AI assistant. This is an AI that's intelligent, understands the context you are in, and that can plan and take action on your behalf, across any device."

Google is aiming to transform the Gemini app into an assistant that can manage everyday administrative tasks and provide personalised recommendations. The company said, "Our ultimate vision is to transform the Gemini app into a universal AI assistant that will perform everyday tasks for us, take care of our mundane admin and surface delightful new recommendations — making us more productive and enriching our lives."

This effort begins with capabilities such as video understanding, screen sharing and memory, first explored through the Project Astra research prototype. In the past year, features such as improved voice output, enhanced memory, and computer control have been integrated into Gemini Live. Google commented, "Over the past year, we've been integrating capabilities like these into Gemini Live for more people to experience today. We continue to relentlessly improve and explore new innovations at the frontier. For example, we upgraded voice output to be more natural with native audio, we've improved memory and added computer control."

Google is collecting feedback from trusted testers on these capabilities, with plans to make them available through Gemini Live, new experiences in Search, the Live API for developers and additional devices such as smart glasses. The company emphasised its commitment to safety and responsibility, stating, "Through every step of this process, safety and responsibility are central to our work. We recently conducted a large research project, exploring the ethical issues surrounding advanced AI assistants, and this work continues to inform our research, development and deployment."

In addition, Google is investigating how AI agentic capabilities can assist users in multitasking through Project Mariner, which is designed to support human-agent interaction, primarily in web browsers. Project Mariner agents can now complete up to ten different tasks concurrently, such as information searches, bookings, online shopping and research. "Project Mariner now includes a system of agents that can complete up to ten different tasks at a time. These agents can help you look up information, make bookings, buy things, do research and more — all at the same time," according to Google.

The updated Project Mariner is available to Google AI Ultra subscribers in the United States, and its computer use capabilities will be introduced into the Gemini API, with plans for broader integration across Google products later in the year.

Google concluded, "With this, and all our groundbreaking work, we're building AI that's more personal, proactive and powerful, enriching our lives, advancing the pace of scientific progress and ushering in a new golden age of discovery and wonder."

Share on: