Google’s Interactions API: A New Era for AI Development

3

For the past two years, AI development has largely operated on a “stateless” model: prompts in, responses out, no memory between turns. This worked for basic chatbots, but it’s now a major bottleneck for more complex agents that require long-term memory, tool use, and extended reasoning. Last week, Google DeepMind launched the public beta of the Interactions API, a solution designed to address this fundamental infrastructure gap.

This move signals a strategic shift from treating Large Language Models (LLMs) as simple text generators to managing them as remote operating systems with persistent state. OpenAI took the first step with its Responses API in March 2025, but Google’s entry reinforces the industry’s direction toward “stateful” AI.

The Shift to Stateful AI: Why It Matters

The traditional stateless approach forced developers to manually manage conversation histories, sending potentially massive JSON files with every request. The Interactions API eliminates this by storing state server-side; developers simply provide a previous_interaction_id, and Google handles the rest. As DeepMind’s Ali Çevik and Philipp Schmid explain, forcing these capabilities into the old generateContent endpoint would have created an unstable and overly complex API.

This unlocks Background Execution, a crucial feature for autonomous agents. Workflows that previously timed out due to HTTP limits can now run in the background, with developers polling for results later. The API effectively becomes an intelligent job queue.

Key Features: Deep Research and MCP Support

Google is leveraging this new infrastructure to introduce its first built-in agent: Gemini Deep Research. This agent performs long-horizon research tasks, synthesizing information through iterative searches and readings – unlike models that simply predict the next token.

Equally important is Google’s embrace of the Model Context Protocol (MCP). This allows Gemini models to call external tools (like weather services or databases) without custom integration code, streamlining workflows.

Google vs. OpenAI: Two Approaches to State Management

While both Google and OpenAI are solving the same problem – context bloat – their approaches differ significantly. OpenAI prioritizes token efficiency through Compaction, compressing conversation history into opaque, encrypted items. This creates a “black box” where the model’s reasoning is hidden.

Google, in contrast, retains full conversation history, allowing for inspection, manipulation, and debugging. The data model is transparent, prioritizing composability over compression.

Supported Models and Pricing

The Interactions API is now available in public beta via Google AI Studio, supporting:

  • Gemini 3.0: Gemini 3 Pro Preview.
  • Gemini 2.5: Flash, Flash-lite, and Pro.
  • Agents: Deep Research Preview (deep-research-pro-preview-12-2025).

Pricing follows Google’s standard token rates, but the new data retention policies change the economics. The Free Tier offers only 1-day retention, while the Paid Tier extends this to 55 days. This extended retention lowers total costs by maximizing cache hits, as recurring users avoid re-processing massive context windows.

Note: This is a Beta release, so expect breaking changes.

Implications for Teams: Efficiency and Risks

For AI engineers, the Interactions API offers a direct solution to timeout problems through Background Execution. Instead of building custom asynchronous handlers, you can offload complexity to Google. However, this convenience trades control for speed: the Deep Research agent is a “black box” compared to custom LangChain or LangGraph flows.

Senior engineers managing budgets will benefit from Implicit Caching. By leveraging server-side state, you avoid token costs associated with re-uploading context. But integrating MCP means validating the security of remote tools.

Data engineers will appreciate the structured data model, improving overall pipeline integrity. However, the current Deep Research agent returns “wrapped” URLs that may expire, requiring cleaning steps in ETL pipelines.

Finally, IT security directors must weigh the trade-offs of centralized state: improved security versus new data residency risks. Google’s retention policies (1 day for Free, 55 days for Paid) are critical to consider.

In conclusion, Google’s Interactions API is a fundamental shift in how AI agents are built. By prioritizing state management and background execution, it offers significant efficiency gains, but also introduces new considerations for control, transparency, and data security. This marks a clear evolution in the developer stack, moving beyond simple text-in, text-out interactions toward true system-level intelligence.