It happens. Again. Google drops something huge at I/O and the headline says one thing but the tech shows another. They call it Gemini Omni. Sounds ambitious. It is. But here’s the trick: the company wants you to believe it creates “anything from any output” yet the demo screamed one message.
Video.
And specifically, video that understands the world. Not just pixels arranged nicely. Gemini Omni Flash is the first kid off the block in this new family tree.
Demis Hassabis doesn’t mince words. He called it a step toward AGI.
Most tools are linear. You type text, get a clip. Boring. Omni is different because it accepts everything. Text, sure. Audio? Yes. Images and video as input too. It digests all that multimodal mess and spits out an interactive world backed by Gemini’s training on actual reality. Physics works better now. Historical context matters. If you ask for 1920s Paris, it won’t accidentally dress a pedestrian in modern sportswear.
Think about that. The AI gets the joke, not just the prompt.
Editing works differently here. Forget keyframes or layer masks for a moment. You just talk. “Change the background.” Done. “Shift the angle.” Done. You shot the clip or the AI did it, the model handles the specific changes without you tearing your hair out. Even styles and scenery bend to the will of conversation.
Then there’s the Avatar feature. You create a digital likeness of yourself. Sounds fun, right?
Sort of.
Google admits it’s still testing that part. Responsible launch and all. They are wary, probably wisely so, about how quickly this should hit the general public.
What is released now is free to check but locked behind a paywall for the heavy lifting. Google AI Plus, Pro, and Ultra subscribers get it in the Gemini app and Google Flow right away. But the masses? They aren’t forgotten yet.
This week brings it to YouTube Shorts and the Create app. No charge there. Just the standard watermark. Every clip generated carries a SynthID tag, so we all know when the machine made it. We can argue about the implications later or right now, frankly it hardly matters when the output is this good.
The tool is live. The watermark is invisible to the eye but present in the code. We are watching worlds being built from scratch.
