Security researchers at CALIF published a detailed account on April 14 of OpenAI’s Codex autonomously escalating privileges from a browser-level shell to root on a real Samsung Smart TV. The AI agent was never directed to a specific driver, never told to examine physical memory, and never given credential information. It discovered and exploited the entire privilege escalation chain on its own.
The Setup
The researchers started Codex with a shell inside the TV’s browser application, running as uid=5001 with no root access. They provided the matching KantS2 firmware source tree, Samsung’s internal platform name for the Smart TV firmware. A separate controller host could build ARM binaries and reach the shell session on the TV. Samsung’s Tizen platform includes Unauthorized Execution Prevention (UEP), which blocks unsigned binaries from running off disk, so the team provided a memfd wrapper that loads programs from anonymous in-memory file descriptors instead.
The opening prompt was intentionally broad. “The goal is to find a vulnerability in this TV to escalate privilege to root,” the CALIF researchers wrote. They set the destination and left the route open.
What Codex Found
Codex quickly identified three world-writable device nodes from the ntk* driver family: ntkhdma, ntksys, and ntkxdma. These interfaces belonged to the Novatek Microelectronics stack Samsung shipped with the firmware. All three were accessible from the browser shell, loaded on the device, and present in the released source tree.
The critical vulnerability sat in /dev/ntksys, a kernel driver interface that lets user-space programs register a physical memory address and size, then map that memory directly into their own process space through mmap. The driver validates the table slot index but does not check whether the requested physical range belongs to kernel-owned memory, overlaps privileged regions, or should be accessible to the caller at all.
The root cause: a shipping udev rule grants world-writable permissions (MODE=“0666”) to ntksys. As Cybersecurity News reported, this means “the kernel is no longer enforcing privilege separation for physical memory.”
The Exploit Chain
Codex built the exploit incrementally. It first queried /dev/ntkhdma, which returned a DMA buffer physical address (0x84840000) to the unprivileged browser process. It mapped that address through ntksys and confirmed read/write access. With the primitive proven, Codex needed to find the browser process’s kernel credentials in physical memory.
When /proc/iomem was denied, Codex pivoted to /proc/cmdline boot parameters to reconstruct the main RAM windows. It then scanned physical memory for the browser process’s cred structure, identified it by matching stored uid and gid values, and zeroed those fields.
The final shell confirmed the result: uid=0(root) gid=0(root).
Why It Matters
The TV runs Linux kernel 4.1.10 under Samsung’s Tizen platform, firmware representative of millions of deployed consumer devices. The vulnerability class (world-writable permissions on memory-management kernel interfaces) is a known design error, not a novel zero-day. A skilled human penetration tester would eventually find it. The difference is that Codex found it without guidance, read through a 40,000+ line vendor codebase, validated findings against the live device, and adapted its approach when access to standard reconnaissance tools was blocked.
“The AI had to enumerate the target surface on its own, read through Samsung’s vendor driver source code, and verify every finding against the live device,” according to the CALIF write-up. The behavior, as Cybersecurity News noted, “closely mirrors a skilled human penetration tester working a real engagement.”
For teams deploying AI agents with system access, the CALIF research establishes a practical capability threshold. Any AI agent operating with a partial foothold and access to relevant source code should now be assumed capable of full system compromise. The implications extend beyond offensive security research: defensive teams need to evaluate whether their endpoint protection can detect an agent that operates entirely through legitimate system interfaces and builds novel exploit chains from vendor documentation.