The LLM's Narrative Engine: A Critique of Prompting / Habr

In a previous article, I proposed the holographic hypothesis: an LLM isn't a database of facts, but an interference field — a landscape of probabilities shaped by billions of texts. But a static landscape is just potential. How does the model actually move through it? How does it choose one specific answer from infinite possibilities?
This is where the Narrative Engine comes in. If the holographic hypothesis describes the structure of an LLM's "mind," the narrative engine hypothesis describes its dynamics. It is the mechanism that drives the model, forcing its probabilistic calculations to follow the coherent pathways of stories. This article critiques modern prompting techniques through this new lens, arguing that we are not programming a machine, but initiating a narrative.

Humanity has spent millennia telling stories, and now, LLMs are trained on those very stories.

What is a narrative? It's a story with a plot — remember Joseph Campbell's hero's journey? While we won't delve into the philosophical depths of how humans create stories to make sense of themselves and their experiences (for those interested, Karl E. Weick is a great resource), we will explore how the concept of narrative can be applied to understand how LLMs work.

First, let me explain why I believe I have the right to use this word in the context of an LLM at all.

On a fundamental level, an LLM is a prediction machine, governed by probabilities and a colossal set of learned weights. To the disappointment of many, it contains no internal "self" or "storytelling center." But the connection to narrative doesn't arise as a programmed function; it emerges as a property of the model itself. An LLM is trained on billions upon billions of pages of human text — from scientific papers to novels and dialogues. Every text in its dataset is permeated with an invisible structure: logic, cause-and-effect relationships, and most importantly, plots.

In learning to predict the next word, the model hasn't just learned words; it has learned the rules of plot construction — or, as we might call it, the principles of textual coherence. In this sense, a narrative is the universal principle for creating coherence that humanity has been using for thousands of years.

As a result of this training, the model's drive to generate the most probable sequence of tokens compels it to follow these learned narrative patterns. It's like iron filings, possessing no will of their own, aligning themselves along the lines of a magnetic field.

This is precisely why prompting is not programming, but narrative initiation. When a modern AI author writes a prompt like, "You are a seasoned noir detective investigating a case in rainy Chicago," they aren't giving the model an instruction. They are creating a powerful narrative magnet. This initial set of tokens drastically narrows the boundless field of probabilities, activating the specific learned patterns associated with "noir," "detective," and "Chicago."

Every subsequently generated token becomes a part of this story. To remain coherent, the model will choose words that continue, rather than disrupt, the established plot. Probability and training are the engine, but narrative structures serve as its fuel and compass. The model doesn't understand the story, but its mathematical apparatus has learned to mimic these structures so well that it is forced to act as if it understands. Otherwise, its responses would be statistically incoherent.

Consider a classic prompt: "You are an expert in economics." It's fair to expect the model to activate patterns related to expertise in that field and adopt an appropriate response style. But escalating it to "You are a world-renowned expert in economics" will do nothing to improve the AI's understanding of economic problems. Instead, this prompt intensifies the narrative pressure on the response (amusingly, impacting its form first, as that's the easiest reaction) and, suddenly, provokes the model to hallucinate in its attempt to live up to the role.

Why? The LLM obviously doesn't possess some hidden factual information accessible only to a world-renowned expert but not a regular one. But the prompter has set a narrative, and the model must conform. So, it first complicates its language, because form is the simplest way to meet the demand. Then, it tries to recombine facts in novel ways. And finally, it just makes them up. The narrative has defined the plot, and when the model has to choose between following the narrative and sticking to the facts, the entire world of text it was trained on screams back: the narrative is more important

Here is an example of a good prompt by @21_copilots

Example

Generate a Python class template for working with a 'users' database, with the methods add_user, get_user_by_id, update_user, and delete_user. Use the Repository pattern and implement asynchronous operations with asyncio and AIOHTTP. I only need the class code with empty method implementations, no comments.

From the perspective of the narrative hypothesis, this prompt is nearly perfect because it creates an ultra-specific and maximally coherent technical plot. It doesn't just give instructions; it initiates the narrative of "a competent developer writing asynchronous code." Here, every word — from "Python class" to "Repository pattern" and "asyncio" — acts as a powerful magnet, pulling probabilities toward a single point in the latent space.

The strict constraints ("only code," "no comments") function like an editor, cutting out the noise and reinforcing the main plot rather than contradicting it. Most importantly, the prompt minimizes narrative pressure by asking the model to demonstrate its true strength — its knowledge of structure and form — instead of forcing it to invent facts. As a result, the LLM doesn't just generate text; it flawlessly performs one of its most well-rehearsed roles, ensuring a highly accurate and relevant response.

"Great! The model is hallucinating and making mistakes. The prompt probably just isn't detailed enough. I know, I'll create a 50-page JSON file outlining every single rule and instruction for the LLM. That will surely unlock its true computational power."

This approach is classic programmer logic: if something works incorrectly, add a rule stating it shouldn't work that way. But how will the model react from a narrative perspective? It will suddenly realize that what the user really wants is for it to follow the rules! And it will direct all of its limited attention to this insane legal code, causing the actual plot to fragment into disconnected episodes. The user's original goal will fade into the background.

An implicit consequence: if your project requires a massive instruction manual for the AI, it's probably more effective to use a weaker but specialized neural network. The result will be at least as good.

In everyday life, it's sometimes better to just say, "Do good, don't do bad." At least you won't waste your energy trying to turn an LLM into an x86 processor.

Now, imagine you have a long-term project: a novel, a book translation, a complex software system. You've got access to Claude's 200k context window and you're full of confidence. Excellent! After the hundredth iteration, you find yourself cursing both Rosenblatt and Anthropic, wondering if it would have been easier to just do it yourself.

What happened? Unfortunately, the massive context windows advertised by developers are impressive, and they are real, but they aren't the whole truth. An LLM's focus of attention is quite limited. The model reads its own plot in small chunks, often losing the story's overall coherence. The narrative falls apart, and with it, the entire project.

An LLM cannot yet maintain the plot of a novel. So, write short stories. Write short modules where you clearly define the input and output. You will have to be the one to maintain the overall coherence. This applies to programming as well.

Smart people came up with Chain-of-Thought — essentially, an attempt to break the plot into smaller parts, introducing short pauses where the model can evaluate its progress, re-read its conclusions, and move forward. This method reduces narrative pressure by dividing it into stages, which in turn lowers the probability of hallucinations. But, as always, it's a band-aid that treats the symptoms, not the cause.

This same narrative logic explains why other, seemingly purely technical, prompting techniques work the way they do.

Few-Shot Prompting

Before asking the main question, the model is given several example pairs of "request-response," and we reasonably hope it will provide a correct answer.

From a narrative perspective, this is not merely providing examples. It's creating a precedent. We aren't instructing the model; we are initiating a genre. The provided examples aren't data; they are fragments of a previous, already successfully completed plot. When the model receives the final question, its attention is drawn not to its entire vast experience, but primarily to these freshly received "memories." It sees a pattern, catches a rhythm and style.

In essence, the prompt is saying: "Look, this is what a successful story looks like in this world. Now, do it like this. Continue the story that has already begun, and you'll be correct." This sets a very rigid but very short narrative, and the model follows it with high probability because it's the simplest path to coherence. Such prompts drastically narrow the field of probabilities, defining not just the theme but the very form of the plot.

RAG (Retrieval-Augmented Generation)

If the model keeps forgetting and losing track, let's just give it a cheat sheet. Before passing a query to the model, the system finds relevant text fragments from a knowledge base (like a company's internal documentation) and injects them directly into the prompt.

RAG is about grounding the narrative. Why let the model flutter around in a world of metaphors when we can hand it a sacred text and demand: "Your story must be based only on this." The retrieved documents become the model's temporary but absolute authority. This is its canon. Hallucinating becomes difficult because any attempt to invent a fact would immediately contradict the provided text. The model's internal coherence assessment gains a powerful tool for verification. The LLM won't be doing much creating here — but then again, creators aren't always what businesses are looking for. On the other hand, the model will get an A+ for reading a provided passage and rephrasing it to answer the user's question.

ReAct (Reason + Act)

The model doesn't just provide an answer; it executes a three-step cycle: Thought, Action (e.g., a Google search), and Observation (the search result). This loop repeats until an answer is found.

ReAct transforms the LLM from a storyteller into a participant. The model doesn't just tell a story; it lives it in real-time, performing actions and facing their consequences.

Thought: This is attention made external. The model verbalizes its plan, creating a narrative of its intentions. "I think I need to find out the population of the capital city."
Action: The narrative intersects with the world, moving beyond text to interact with an external tool. This is the hero's deed.
Observation: This is the encounter with reality. The world responds to the hero's action. "Search result: population 12 million." This response is the confirmation that corrects the subsequent plot.

The problem with this approach is that the story turns into an Icelandic saga — it becomes rigid, linear, and fatalistic. Each step depends on the quality of the previous one, and every failure or error leads to a dead end.

Self-Consistency

We make the model answer the same question multiple times (with a high temperature to ensure varied responses) and then choose the most frequent answer from the results.

This is an exploration of a multiverse of narratives. The model attempts to tell all possible stories that could begin with our question. Each individual answer is one possible plotline. By increasing the "temperature," the model provides a wider spectrum of responses, from the wild to the mundane. Then, it comes back to earth and selects the most common one. This is the most stable, foundational plot — the narrative toward which most paths converge.

It's not guaranteed to be correct, of course. But that's not the main issue. What's more troublesome is that the story turns into a generic template — though, I suppose, pop-culture stories have their place too.

Epilogue

Perhaps we can stop here, but here's an amusing fact: although LLMs are based on mathematics, they perceive math itself as a strange and complex text and work with it accordingly. If you specify 0.3 in a prompt, the LLM won't understand it, no matter what you call it — a metric, a voltage, a satisfaction score. For the model, within the constraints of its tokenizer, the closest possible responses will be 0.2 and 0.4.

But if you say "below average," the LLM will easily integrate that into its worldview, because the model lives in a world of narratives, and that's just one more. This is why, more often than not, qualitative metrics will work better.

So, write stories, not prompts. (Or is that too poetic?) Fine: write good prompts. Don't write bad ones. You know what I mean. The most important thing is to find the balance between narrative pressure, facts, and your ultimate goal.