As Google’s annual developers’ conference started in Mountain View, California, AI (artificial intelligence) is hogging the limelight as a strategy proposition to boost its popular products. Alphabet’s Chief Executive Officer Sundar Pichai said the company is still innovating at breakneck speed by putting AI at the forefront of nearly every service and product it ships.
“We see so much opportunity for creators or developers or startups or everyone helping to (advance) those opportunities – is what Gemini is all about,” said Pichai.
As part of its suite of announcements during its high-profile I/O event, Google announced a brand new personal AI Assistant that could be the successor to the Google Assistant. It’s called “Project Astra” and it’s powered by Gemini AI. Project Astra can see the world around you and answer questions about it. During a pre-recorded demo of a live experience of Project Astra, a person held up an Android smartphone and kept the camera’s live feed open while asking questions. Astra correctly identifies what it’s looking at and provides accurate responses through voice interactions with the user.
Project Astra is lot faster and more intuitive than Google Assistant. Google DeepMind CEO Demis Hassabis describes “Project Astra” as focused on creating a “universal AI agent helpful in everyday life.” Hassabis calls Astra a “multimodal” AI assistant, meaning it can respond to various inputs, such as text, images, audio, and video, making it work more like a human. The company plans on rolling out parts of Project Astra later this year through the Gemini app. Project Astra is Google’s answer to OpenAI’s new GPT-4o AI model, which can also speak and view the world through the user’s smartphone camera.
Perhaps the biggest announcement came in the form of Veo, its new generative AI model that rivals OpenAI’s Sora, expanding beyond text and images to offer video-generation AI for the first time. The new model allows a user to type out a desired scene and turns it into a 1080p clip that Google says goes well beyond a minute in different cinematic and visual styles. “Veo has an advanced understanding of natural language and visual semantics and can generate video that closely represents the user’s creative vision — accurately rendering details in longer prompts and capturing tone,” the company said, describing the capabilities of the new video-generation AI model. Veo can understand cinematic terms like “timelapse” or “aerial shots of a landscape” and can create footage of people, animals, and objects moving realistically throughout shots. Google says Veo is already available for select creators as a private preview inside VideoFX, and users can sign up to join the waitlist. The company also promises to bring some elements of what Veo can do to YouTube Shorts and other products in the future.
Video could be the next big thing in generative AI. While companies like Google and OpenAI say these AI tools create more opportunities for people in creative industries, the recent strikes that hit Hollywood opened up a new battle over artificial intelligence and ethics. But industry watchers believe new text-to-video generation tools pose serious misinformation concerns as major political elections are underway in many parts of the world. According to data from Clarity, a deepfake detection firm, 900 percent more deepfakes have been created and published this year compared to last year.
Google also said it is launching Imagine 3, its new text-to-image AI model that produces photorealistic, lifelike images and can be used for tasks such as creating personalized birthday messages or adding visual elements to presentations.
Google has also begun integrating artificial intelligence more into the core search experience. The “Search Generative Experience” – as Google dubs the feature – has been available for nearly a year in the US, but only to users who signed up via Google Labs. It has also been experimenting with “AI overviews”, a set of queries where it thinks generative AI can be especially helpful in getting information from a range of web pages. Since last year, Google opened up a Search Labs section for searchers to opt in to see and use the Google SGE results. Google is now opening AI Overviews to everyone in the US, with more countries to follow soon. The company says it continues to improve AI Overviews, and its new customised Gemini model specifically tailored for Google Search will add multi-step reasoning capabilities. While Google says AI Overviews were already good with complex queries, it is now going to be more helpful with complex questions. For users, instead of breaking your question into multiple searches, they can ask their most complex questions with just one search. For example, “find the best yoga or pilates studios in Boston and show details on their intro offers and walking time from Beacon Hill.”
Google is also beefing up the search’s ability to better handle queries related to planning. For example, if you ask Google to “create a 3-day meal plan for a group that’s easy to prepare,” the search results will show a wide range of recipes from across the web. You can also customize your meal plan or even export your meal plan to Docs and Gmail. Like meals, Google also lets you plan for trips. Both meal and trip planning are in Search Labs in English in the US. Google says it plans to add customization capabilities to planning capabilities in search, and users could add parties, date nights, and workouts.
Google used its developer conference to add a myriad of new features and capabilities to Gemini, its version of the AI chatbot ChatGPT that can answer questions in text form and can also generate pictures in response to text prompts. As part of the I/O announcements, Google said it is bringing Gemini 1.5 Pro to the ultimate version of Google’s AI: Gemini Advanced, for which users need to pay $20 per month for the privilege — the same price OpenAI charges for its upgraded ChatGPT Plus.
The new model’s 1 million-token context window allows users to upload large PDFs, code repositories, and lengthy videos as prompts. Gemini Advanced is also getting a new planning experience feature allowing subscribers to get a customized itinerary using their flight timing, meal preferences, and information about local museums through Gmail, Search, and Maps just by a simple prompt. And there’s more. Google will also allow Gemini Advanced subscribers to customise the AI chatbot by creating Gems, such as a writing guide, a gym buddy, coding partner, or a chef. Just describe the Gem and it will respond, keeping your specific needs in mind. For example, you can tell the Gem to come up with a daily running plan that charges you up in the morning and keeps you motivated.
Google has announced that it is adding new AI capabilities right inside the Gmail mobile app, as well as bringing the power of Gemini to its Workspace apps, including Gmail, Drive, Slides, Docs, and more, with an AI-powered sidebar.
Google is introducing a new Gemini model called 1.5 Flash, which it says is optimised for high-volume, high-frequency tasks at scale and is more cost-efficient in nature. It also supports large context windows, which allow a model to process and understand extremely long documents, books, scripts, or codebases that would otherwise need to be processed separately. Meanwhile, Gemini Nano, a smaller version of Google’s Gemini AI model for “on-device AI” currently equipped with some Android phones, will expand beyond text inputs to include sight, sound, and spoken language.
In the two-hour presentation that was almost entirely centered on AI, Google spent the least amount of time on Android 15 on stage. The newest version of Android is in beta now and is expected in the fall. But the company’s focus was on three breakthroughs coming to Android this year: better searching on your Android, Gemini becoming your AI assistant, and on-device AI unlocking new experiences. The non-presence of Android on stage shows Google’s priorities have changed, even though the dominant mobile operating system still has a giant role to play in bringing Google’s services and offerings to billions of people every day.
The strategy adopted by the internet giant is to make investors and developers more confident about Google’s play in the digital universe not just in the past but in the future as well. However, the rise of OpenAI, the developer behind ChatGPT, backed by Microsoft, is coming to eat into Google’s dominance. The Sam Altman-run company recently debuted a new model, GPT-4o, which it claims is “much faster,” with improved capabilities in text, video, and audio. OpenAI also said it eventually plans to allow users to video chat with ChatGPT. OpenAI chose to announce the new GPT-4o model ahead of Google’s developer conference, making trade pundits more confident about the tech startup’s ability to compete with a behemoth like Google in the AI arms race and create more such hit products similar to ChatGPT. In fact, many believe OpenAI is gearing up to add a search function in ChatGPT, which would make the rival AI chatbot in direct competition with Google Search.
At its I/O conference, Google showed how generative AI is ready to make meaningful improvements for users in areas that matter to them, from search to Gmail to videos, spanning every sector from creatives to business. It will try to highlight its research emphasis on technologies that may not be ready yet but could have a huge impact with the right use cases.