Gemini’s First Steps: AI Automation on Phones Is Slow, But Shows the Future

20

Google’s Gemini is now able to automate tasks directly within apps on Pixel and Galaxy phones. Early tests demonstrate a clunky but functional first iteration of what could become a transformative AI assistant experience. While it’s limited to basic functions like food delivery and ridesharing, the ability for an AI to independently navigate app interfaces represents a significant leap forward—even if the current performance is far from seamless.

Why This Matters

For years, smartphone assistants have relied on voice commands and pre-defined integrations. Gemini’s task automation is different: it directly controls apps, tapping buttons, scrolling menus, and making decisions as a human would. This has implications beyond convenience. It suggests a future where AI handles routine mobile tasks autonomously, freeing up users for more complex activities. But the current execution highlights how far we are from that reality.

Slow, But Functional

Testing reveals Gemini is noticeably slower than a human user. Ordering dinner through Uber Eats took nearly nine minutes, as the AI struggled with menu navigation. The system’s default behavior is to run in the background, allowing it to work without direct oversight, but also making it opaque. Text logs show the AI’s thought process (“Selecting a second portion of Chicken Teriyaki”), which some may find fascinating, while others will see it as inefficient.

Accuracy and Limitations

Despite its slowness, Gemini is surprisingly accurate. In tests, it rarely completed orders without user review, and errors tended to occur early in the process (e.g., needing location permissions). One particularly impressive feat was scheduling an Uber to the airport, accessing calendar and flight details to suggest optimal departure times.

However, the AI’s performance is heavily dependent on app design. Human-centric interfaces filled with ads and irrelevant visuals hinder its efficiency. Google acknowledges this, suggesting the current approach is a stopgap until app developers adopt more AI-friendly protocols, such as Model Context Protocol (MCP).

The Future of App Design

If apps were built for AI, they would look radically different. The focus would shift from visual clutter to structured data. The current struggle highlights that the most effective AI automation requires an infrastructure that isn’t optimized for human interaction.

This version of task automation feels like a notable first step toward a new way of using our mobile assistants—awkward, slow, but very promising.

The development of Gemini’s task automation is a crucial step towards fully integrated AI assistants, even if the current iteration is imperfect. The core takeaway is that AI-driven app control is now possible, and its evolution will reshape how we interact with our phones.