Japanese Tacos & How Large Language Models Work

Introduction: Meet the World’s Most Knowledgeable Chef

Imagine walking into a restaurant and meeting a chef who has read every cookbook ever written. Not just skimmed them—truly absorbed a billion cookbooks, understanding not just individual recipes, but the underlying principles of cuisine itself. This chef can answer questions about cooking, create entirely new dishes you’ve never imagined, and explain the “why” behind culinary techniques. Remarkably, this chef has never actually tasted food, yet somehow creates coherent, delicious-sounding dishes.

This is, in essence, what a Large Language Model (LLM) like ChatGPT does with words.

If you’ve ever wondered how AI systems seem to “know” so much, or how they generate thoughtful responses to questions they’ve never encountered before, you’re about to discover that the answer is far more intuitive than you might think. It all comes down to patterns, relationships, and an extraordinary amount of learning. Let’s explore this culinary metaphor and uncover the fascinating world of how LLMs are built and how they work.

The Chef’s Education: Training on a Billion Cookbooks

Before our chef could create anything remarkable, they needed to learn. Lots and lots of learning.

An LLM begins its life the same way: by reading. Specifically, it reads an enormous amount of text data—we’re talking hundreds of billions of words. For context, that’s equivalent to reading every book in the Library of Congress multiple times over. This text comes from books, websites, articles, and countless other sources. It’s the digital equivalent of our chef reading a billion cookbooks.

But here’s the crucial part: the LLM isn’t just passively reading. During this training phase, the system is actively learning patterns. It notices that certain words tend to appear near other words. It observes that questions often follow a particular structure. It sees that recipes have ingredients, then instructions, then results. It recognises that conversations between people follow predictable patterns—someone asks a question, the other person provides an answer.

Think of our chef reading cookbooks. They don’t just memorise individual recipes. They start noticing patterns: Italian cuisine often pairs certain herbs with tomatoes. Japanese cuisine emphasises subtle flavours and presentation. Mexican cuisine balances heat, acid, and richness. The chef begins to understand not just what to cook, but why certain combinations work.

This is the training phase. The LLM is absorbing patterns about how language works, what topics connect to other topics, and how ideas flow together. It’s learning the grammar of human communication—not just grammatical rules, but the deeper patterns of how we structure thoughts, tell stories, and exchange information.

The Chef’s Mind: Building an Intricate Map of Knowledge

Here’s where it gets truly fascinating. Our chef doesn’t store knowledge by memorizing recipes word-for-word. Instead, they’ve built something far more sophisticated: an intricate mental map of cooking.

In this mental map, ingredients are connected to techniques, which are connected to cuisines, which are connected to flavor profiles, which are connected to cultural traditions. When the chef thinks about “salt,” they don’t just recall one recipe—they understand that salt enhances sweetness, that it’s essential in Asian cuisine, that it preserves food, that too much ruins a dish, and that it pairs with countless ingredients. Salt isn’t an isolated fact; it’s a node in an enormous web of relationships.

This is what’s happening inside an LLM, though the mechanism is mathematical rather than conceptual. The LLM has learned to represent language as patterns of numerical values. Words and concepts are encoded as vectors—imagine them as positions in an incredibly high-dimensional space. The relationships between concepts are captured as connections with varying strengths.

For instance, the words “king” and “queen” might be positioned near each other in this space, but not identically. The difference between them encodes gender and perhaps royalty. Similarly, “Paris” and “France” are close together, but the relationship captures the idea of a capital city. “Paris” and “London” are also relatively close, but their relationship reflects that they’re both European capitals. These relationships aren’t explicitly programmed—the LLM learned them from patterns in the training data.

This is the LLM’s knowledge base, and it’s not stored as a database of facts. It’s stored as this intricate web of relationships, patterns, and connections. The LLM doesn’t “know” that Paris is the capital of France the way a database does. Instead, it has learned the statistical patterns of how these concepts relate to each other in human language.

The beauty of this approach is that it captures nuance and context in ways that simple fact memorisation never could. Just as our chef understands that the same ingredient can play different roles depending on the dish, the LLM understands that the same word can mean different things depending on context.

Creating Something New: The Art of Generation

Now comes the magic. A customer walks into the restaurant and says: “I want something I’ve never had before. Something that combines Japanese and Mexican cuisine in a way that’s never been done.”

The chef pauses, closes their eyes, and thinks. They’re not retrieving a recipe from their memory—they’re creating something entirely new. They draw upon their deep understanding of Japanese culinary principles: precision, minimalism, balance, the importance of rice and umami. They also draw upon their knowledge of Mexican traditions: bold flavors, the use of chilies and lime, the importance of texture and temperature contrasts.

Then, the chef begins to create. They decide on a crispy wonton wrapper as a taco shell—combining the crispness of Japanese tempura technique with the handheld format of a taco. They add a filling of miso-marinated short rib with a lime crema. They top it with pickled daikon, cilantro, and a drizzle of jalapeño-infused ponzu sauce. Each element is coherent. Each element makes sense. The dish is entirely new, yet it feels inevitable—like it should have existed all along.

This is exactly how an LLM generates text.

When you ask an LLM a question, it doesn’t retrieve a pre-written answer from memory. Instead, it generates a response word by word. More precisely, it generates token by token (a token is roughly a word or part of a word). Here’s how it works:

The Process Begins: You ask the LLM, “What’s the best way to learn a new language?”

The Prediction Game: The LLM looks at your question and asks itself: “What word should come next?” Based on all the patterns it learned during training, it calculates probabilities for thousands of possible next words. Words like “is,” “involves,” and “depends” have high probabilities because they often follow questions. Words like “purple” or “refrigerator” have near-zero probability because they don’t fit the context.

The LLM doesn’t choose the most likely word—that would make responses repetitive and boring. Instead, it samples from the probability distribution. It might choose a likely word, or occasionally a less likely but still reasonable word. This introduces creativity and variation.

Building the Response: The LLM has chosen its first word. Let’s say it’s “The.” Now it looks at the context again: your original question plus the word “The.” It recalculates probabilities for the next word. “best” has a high probability. It gets selected.

This process continues. “The best way…” then “The best way to…” then “The best way to learn…” The LLM is building your response token by token, each token informed by everything that came before it. It’s like our chef building the Japanese taco, deciding each component based on what’s already been chosen.

Coherence Emerges: Here’s the remarkable part: even though the LLM is making local decisions (choosing one word at a time), the overall response is coherent. It stays on topic. It follows a logical structure. It builds toward a meaningful conclusion. Why? Because the LLM has learned, from billions of examples, what coherent text looks like. It has internalized the patterns of how ideas connect, how paragraphs flow, and how arguments develop.

This is similar to how our chef, by understanding cooking principles, creates a dish that works as a unified whole—not because they’re following a recipe, but because they understand how flavors and textures should interact.

The Limitations: When the Chef Gets Creative (Too Creative)

Our chef analogy is powerful, but it’s important to acknowledge where it breaks down.

Our chef can actually taste food. They know when a dish works because they’ve experienced it. An LLM, by contrast, has never experienced language in any meaningful sense. It doesn’t understand that “The capital of France is Rome” is wrong. It understands statistical patterns, not truth.

This can lead to a phenomenon called “hallucination.” The LLM might generate something that sounds plausible and coherent—it follows all the patterns of real text—but it’s completely false. Imagine our chef creating a dish that sounds delicious and follows all the right culinary principles, but uses ingredients that don’t actually pair well in reality. The chef created something structurally sound but practically inedible.

Similarly, an LLM might confidently provide a citation to a paper that doesn’t exist, or describe a historical event with completely fabricated details. The response is grammatically perfect and contextually coherent. It just isn’t true.

There’s also a fundamental difference between pattern matching and understanding. Our chef understands why certain flavors work together—they’ve tasted thousands of combinations and developed intuition. An LLM has learned statistical patterns about which words tend to appear together, but it doesn’t have the embodied understanding that comes from experience.

This doesn’t make LLMs useless—far from it. It just means they’re best used as tools that amplify human capabilities, not as replacements for human judgment. An LLM is excellent at brainstorming, explaining concepts, writing drafts, and exploring ideas. It’s less reliable when you need absolute truth or when the stakes of being wrong are high.

Conclusion: The Power of Patterns

So, how do Large Language Models work? They’re like super-skilled chefs who have read a billion cookbooks, learned the deep patterns of how ingredients, techniques, and cuisines relate to each other, and can now create entirely new dishes by drawing on that learned knowledge.

They don’t memorize. They pattern-match. They don’t understand in the human sense. They predict based on statistical relationships. They don’t know truth. They know coherence.

And yet, from these seemingly simple principles emerges something remarkable: a system that can engage in conversation, answer questions, write essays, explain complex concepts, and help us think through problems. The chef may have never tasted food, but they can still create something delicious.

Understanding this chef analogy gives us insight into both the power and the limitations of modern AI. It explains why LLMs are so good at certain tasks (anything involving pattern recognition and coherent generation) and why they struggle with others (anything requiring grounded truth or genuine understanding). It demystifies the “magic” of AI and reveals that underneath is something more prosaic, yet in its own way, more wondrous: the power of learning patterns from vast amounts of data.

The next time you interact with an LLM, remember: you’re not talking to a computer that “knows” things. You’re collaborating with a sophisticated pattern-recognition engine that has learned to generate human-like text by absorbing the patterns of human communication. And that, perhaps, is remarkable enough.