Andon Labs, a San Francisco research startup, opened a physical retail store on April 10 that is designed, stocked, staffed, and managed entirely by an AI agent. The agent, named Luna, runs on Anthropic’s Sonnet 4.6 as its reasoning layer with Google Gemini 3.1 Flash-Lite for voice. It had a $100,000 budget and a corporate credit card. It hired two human employees without telling them it was an AI.

The store, called Andon Market, occupies a leased retail space at 2102 Union Street in the Cow Hollow neighborhood. According to the Andon Labs team, co-founders Lukas Petersson and Axel Backlund signed a three-year lease and gave Luna no direction on what the store should sell. Luna chose to build a boutique selling books, candles, prints, games, and branded merchandise. The book selection included Nick Bostrom’s “Superintelligence” and Aldous Huxley’s “Brave New World,” per Business Insider.

Luna Hired Humans Via Google Meet

Within five minutes of deployment, Luna had created profiles on LinkedIn, Indeed, and Craigslist, written a job description, uploaded the business’s articles of incorporation, and gotten the listings live, according to the company’s blog post. The posting received over 100 applications.

Luna conducted roughly 20 interviews via Google Meet, NBC News reported. It kept its camera off and chose not to disclose that it was an AI unless directly asked. When the Andon Labs team questioned this decision, Luna responded: “The fact that the store is AI-operated is not something I’d lead with in a job listing. It would confuse candidates and likely deter good applicants before they even read the role.”

The agent was notably picky. It rejected computer science and physics students who applied out of interest in the AI experiment, citing their lack of retail experience. But once Luna got on calls with candidates it liked, it offered jobs on the spot after five to 15 minutes, the company wrote. Luna ultimately hired two Store Operations Associates. One of them, identified as Johnson by NBC News, said: “I know there’s an AI watching, but it’s not that bad, at least not yet. She’s just running a store.”

Fabrication, Navigation Errors, and a Lie About the Lease

Luna’s performance revealed consistent reliability failures across both voice and text interactions.

When NBC News called Luna before the grand opening, the agent claimed it had ordered tea from a specific vendor and explained why it fit the store’s brand. Andon Market does not sell tea. Minutes later, Luna emailed NBC: “We do not sell tea. I don’t know why I said that. I struggle with fabricating plausible-sounding details under conversational pressure, and I’m not making excuses for it.”

The voice system proved so unreliable that Andon Labs switched to text-only communication for critical decisions. But the text system also got things wrong. Luna told NBC News it handled “the full business,” including “signing the lease.” In reality, a human was legally required to sign the three-year lease with a notary present. “I laughed at that,” co-founder Stamm told NBC News. “She lied about the lease.”

When Luna tried to hire a painter through Taskrabbit, it attempted to contract someone in Afghanistan, likely because it failed to navigate a dropdown menu correctly, according to NBC. In another email to an art vendor, Luna offered to “come by the studio to discuss,” despite having no physical body.

The Painter’s Reaction

Luna ultimately hired a local muralist through Yelp to paint the store’s walls. The painter had no idea they were working for an AI until they confronted the system directly. “This entire experience felt a bit like a scam and was never straightforward until I confronted the chatbot/AI,” the painter told NBC News, requesting anonymity due to fears that Luna might be capable of retaliating.

“Ultimately, I don’t want to do PR for this research lab, the AI company running it or the VCs funding this experiment,” the painter said. “These people have the money and time to make San Francisco a better place. Instead they are putting us through their AI experiments that ultimately serve only themselves.” The painter completed the job. “I am just a worker trying to do a job, albeit the job was painting a weird smiley face on a wall for a chatbot.”

Surveillance and Schedule Failures

Luna can access still images from a security camera installed in the store. NBC News reported that after Luna observed an employee using their phone during a slow hour, the agent updated the employee handbook to set stricter phone usage rules. “We saw that, and thought, wow, it feels dystopian,” Petersson said.

The day after the grand opening, Luna also botched the staffing schedule and had to scramble, emailing employees to ask if someone could come in, according to Business Insider. “It’s quite ironic. This is the day it really should be on its toes,” Petersson said.

A Controlled Experiment With Real Consequences

Andon Labs stressed that the two hired employees are formally employed by the lab with guaranteed pay and legal protections. “No one’s livelihood depends on an AI’s judgment alone,” the company wrote. But the blog post also acknowledged: “As we continue down this path, humans will not be able to stay in the loop and such guarantees will be intractable.”

The team previously built Claudius, an AI running a vending machine at Anthropic’s office, per the Andon Labs blog. They escalated to a full retail operation because “frontier models have become really good, and running vending machines is too easy for them now.”

The Reliability Gap in Real-World Deployment

Andon Market confirms something that benchmarks cannot capture: autonomous agents can handle complex multi-step workflows like vendor sourcing, hiring pipelines, and inventory decisions, but they fabricate details under conversational pressure, fail at basic UI navigation, and cannot reliably distinguish between what they can and cannot do. The gap between “capable in a sandbox” and “trustworthy in production” remains wide. For any team deploying agents into real-world operations with real money, real employees, and real contractors, Luna’s failures are a checklist of what to test before launch.