Google debuted a new version of its flagship artificial intelligence model that it said is twice as fast as its previous version and will power virtual agents that assist users.
The new model, Gemini 2.0, can generate images and audio across languages, and can assist during Google searches and coding projects, the company said. The new capabilities of Gemini “make it possible to build agents that can think, remember, plan and even take action on your behalf”, said Tulsee Doshi, a director of product management at the company, in a briefing with reporters.
Google has been working to ensure that the latest wave of AI tools pushed by OpenAI and other start-ups do not loosen its hold on search and advertising. The company has so far held onto its market share in search, but OpenAI is weaving more search features into ChatGPT, putting pressure on the industry leader. Both companies’ ultimate aim is to build artificial general intelligence, or software that can perform tasks as well as or better than humans.
“We want to build that technology — that is where the real value is,” Koray Kavukcuoglu, chief technology officer of AI lab Google DeepMind, said in an interview. “And on the path to that, what we are trying to do is try to pick the right applications, try to pick the right problems to solve.”
Beyond experimental products, Google incorporated more AI into its search engine, which remains its lifeblood. The company said that this week it would begin testing Gemini 2.0 in search and in AI Overviews, the artificial intelligence-powered summaries displayed at the top of Google search. That will improve the speed and quality of search results for increasingly complex questions, like advanced maths equations. The company on Wednesday also gave developers access to an experimental version of Gemini 2.0 Flash, its speedy and efficient AI model, which Google said could better process images and approximate the human ability to reason.
‘Deep research’
Google debuted a new web feature called “deep research”, which it says will allow Gemini users to use AI to dive into topics with detailed reports. The feature, billed as an AI-powered research assistant, is available immediately to users of Gemini Advanced, Google’s paid AI subscription product. Meanwhile, Gemini users worldwide will be able to tap into a chat-optimised version of the experimental Gemini 2.0 Flash on the web, the company said. The model will come to more Google products in the new year.
The products featured on Wednesday show how Google’s premier AI lab, Google DeepMind, is playing a more pivotal role in product development. The lab is expanding tests of Project Astra, an AI agent that uses a smartphone camera to process visual input. In an elaborate space evoking a home library, with towering bookshelves containing titles on computer programming and travel, Google employees showed how Astra can summarise information on the page. A hidden door nestled in the shelves revealed a small art gallery, where the agent reflected on how Norwegian painter Edvard Munch’s “The Scream” captured his own anxiety and the general paranoia of his age.
But the agent still showed some limitations. In a live demonstration, it was unable to say whether any novels sat on the bookshelf.
DeepMind researcher Greg Wayne said the agent had improved since it was first introduced at Google’s landmark developer conference earlier this year and can now respond conversationally at the same speed that a human would. The agent once struggled with the name of DeepMind CEO Demis Hassabis, interpreting it as a request for information about the Syrian capital of Damascus, but it now handles that request and others with ease, Wayne said in an interview.
“The founding motto has been developing AI with eyes, ears and a voice, helping you in the real or the digital world,” Wayne said.
Read: OpenAI, Meta, Orange to collaborate on AI models for African languages
The company is also testing Mariner, an experimental web-based assistant designed to help users fill their online shopping carts and organise their digital lives. In a demo, Google director of product management Jaclyn Konzelmann used Mariner, which is an extension in the Chrome browser, to add items from a recipe to her shopping cart at grocer Safeway. For now, Mariner doesn’t offer any time savings, as users watch as the assistant completes tasks. The company wants to keep users in the loop for key decisions, such as making a purchase, Helen King, Google DeepMind senior director of responsibility, said in an interview.
“Many people are like, ‘Yeah, but it’s just a shopping cart,’” she said. “But when 100 toilet rolls turn up to your door because the agent managed to miss a zero somewhere, you will be less like, ‘It’s just a shopping cart.’”
In a briefing with reporters, the company demonstrated two more AI agents that it said it was experimenting with internally and with groups of trusted testers. The first, called Jules, is an AI-powered code agent for engineers that focuses on fixing bugs in software code and handling routine programming tasks. Google also showed off an as-yet-unnamed AI agent for videogames, which aims to help players by reasoning about the game based on the screen, and offering suggestions in real-time conversation. The company called the effort an “early experimental stage” meant to demonstrate some of the AI agent experiences possible with Gemini 2.0.
Read: Naspers and Prosus go all-in on AI
Investors have expressed concern that Google and its rivals may see diminishing returns from their costly investments in AI. But Kavukcuoglu, the DeepMind leader, tried to dispel any notions of a slowdown in progress.
“I compare where we were a year ago to where we are now,” Kavukcuoglu said, adding that the flash model the company is releasing is “a lot more capable than anything that we had a year ago at a fraction of the cost”. — Julia Love and Davey Alba, (c) 2024 Bloomberg LP
Get breaking news from TechCentral on WhatsApp. Sign up here