Agent libOS: Securing Self-Evolving LLM Agents
Abstract
Large language model (LLM) agents are becoming long-running software actors rather than fixed tool users. They accumulate memory, activate Skills, synthesize tools, fork children, attach remote resources, and commit checkpoints into reusable execution images. These mechanisms improve adaptability, but they create a systems-security failure mode: if exposing an action also grants the authority needed to perform it, self-evolution becomes a permissionescalation path. This paper presents Agent libOS, an agent-native library-OS substrate for capability-controlled self-evolving agents. The central invariant is that model-visible affordances may evolve while resource authority changes only through explicit, audited runtime primitives. Agent libOS represents an agent as an AgentProcess with process identity, lifecycle state, process-local Object Memory, a working directory, message queues, a tool table, loaded Skills, process-local Deno/TypeScript just-in-time (JIT) tools, child processes, hierarchical budgets, checkpoints, and explicit capabilities. AgentImage objects define boot-time prompt and tool-table state; Skills and JIT tools extend the action surface; checkpoint-derived images make internal state reusable. None of these mechanisms grants filesystem, shell, human, Object Memory, process, checkpoint, image, JSON-RPC, Model Context Protocol (MCP), or pseudoterminal (PTY) authority by itself. The research prototype implements a thread-backed scheduler, process-local namespaces, SQLite/PostgreSQL runtime persistence, configurable LLM-call observability, human approval queues, hierarchical resource budgets, syscall-mediated Deno/TypeScript JIT tools, trusted startup Runtime Modules, optional Object-bound PTY sessions, checkpoint restore/fork/commit, client-only JSON-RPC and MCP providers, a local Electron supervision console, and a deterministic runtime-safety benchmark harness. On 27 versioned deterministic tasks in the artifact, Agent libOS completed the task plans while preventing all modeled unauthorized side effects; it also conservatively denied 7.0% of allowed effect attempts. Under the same task plans, deliberately simple direct-wrapper, confirmationwrapper, and sandbox-only baselines preserved task completion but failed most safety checks. The contribution is a runtime authority boundary for agents whose tools, Skills, images, checkpoints, and remote affordances can change after deployment without silently expanding what they may affect.