xAI has started rolling out Imagine Agent, a beta feature inside Grok on the web that replaces the standard chat interface with an infinite canvas where an autonomous agent generates images, videos, and short films from natural language descriptions. The feature is gradually opening to Grok Heavy and Super Grok subscribers who already have Grok Imagine access, according to TestingCatalog.
How It Works
Imagine Agent handles multi-step creative projects autonomously. A user can request a one-minute short film, and the agent drafts a scenario, generates individual scene clips, stitches them into a longer sequence, and produces a companion poster image, according to TestingCatalog. The same logic applies to product photoshoots, composite scenes built from multiple source images, and complete manga sets.
The beta includes preset workflow templates for creating worlds, short films, user-generated content product stories, and brand identities, according to MetaEra via Phemex. Users describe the desired output in a single sentence. The agent handles generation, editing, image-to-video conversion, and video effects.
xAI has not issued a formal announcement. User reports and screenshots confirmed the rollout on April 30, according to TestingCatalog.
The Competitive Landscape
The launch follows xAI’s Imagine API release on partner platforms earlier this year and the introduction of multi-agent architecture inside Grok 4.20. Imagine Agent extends that architecture into a consumer-facing creative product, according to TestingCatalog.
The feature positions Grok against a crowded field. OpenAI’s Images 2.0, Meta’s Vibes platform, and Google’s Stitch and AI Studio tools are all building toward agentic creative surfaces. The shared trajectory is the same: replacing prompt-by-prompt iteration with agents that execute entire creative projects from a single brief.
From Prompts to Projects
The practical shift is from describing individual outputs to delegating entire campaigns. Instead of iterating on a single image across multiple prompts, a subscriber can describe a product launch and receive a photoshoot, video assets, and brand materials as a single output. The canvas becomes a workspace populated by an autonomous collaborator rather than a tool that responds to one instruction at a time, according to TestingCatalog.
For content teams and solo creators producing marketing assets, the question is no longer which model generates the best single image. It is which platform can produce an entire content package autonomously.