The Chatbot Trap: Why We Stopped Letting AI Run the Game

Remember the first time you saw a video of an RPG NPC hooked up to ChatGPT? It was intoxicating. You could type anything, and the blacksmith would respond. You could ask him about his childhood, his favorite color, or his philosophical views on the monarchy.

But then the illusion broke. You asked him for a million gold pieces, and because LLMs are inherently designed to “yes-and” the user, he said, “Of course, traveler! Here you go.” Except… your inventory didn’t change. The game engine had no idea what the LLM just promised. And five minutes later, the blacksmith forgot your name.

When we set out to build NarrativeSim, we looked at the current landscape of “AI in Games” and realized the industry is falling into a trap. Developers are treating Large Language Models as state machines. They are asking the AI to be the Game Director.

The Problem with LLM Game Directors

While outright hallucinations are getting rarer with better models, the fundamental drawbacks of LLMs remain:

They are stateless: They have context windows, not true memory.
They are people-pleasers: They struggle to hold firm boundaries against a player’s inputs.
They don’t know math: They don’t know the player’s true X/Y coordinates, their actual gold count, or the pathfinding navmesh.

The NarrativeSim Philosophy

We decided to flip the architecture on its head. In our engine, the LLM makes zero behavioral decisions. Instead, we split our system into two distinct halves:

The LLM as a Text Renderer: The game’s Entity Component System (ECS) and our deterministic Utility AI (NAAS) calculate the rigid, mathematical truth of the world. The LLM is simply handed that truth and asked to “render” it into natural language.
Semantic Meaning Turned Mechanical: When a dialogue scene concludes, the LLM doesn’t just output text. It outputs strict, schema-validated JSON commands that hook directly back into the game engine’s physics, economy, and scheduling.

Over the next two posts, we are going to dive into the exact data structures and pipelines that make this work. We’ll look at how we generate infinite dialogue without ever breaking the simulation, and how we turn a simple menu-click into a physical, systemic consequence.