---
title: "How Large Language Models Work, in Plain English"
description: "No math, no jargon. A clear explanation of what a large language model is doing when it writes, why it works at all, and where it breaks."
url: https://buildfirstbrain.com/journal/how-large-language-models-work/
canonical: https://buildfirstbrain.com/journal/how-large-language-models-work/
author: "Lawrence Arya"
authorUrl: https://www.linkedin.com/in/vibecoding/
published: 2026-05-23
updated: 2026-05-23
category: "AI & Cognition"
tags: ["large language models", "ai", "how it works", "explainer"]
lang: en
---

# How Large Language Models Work, in Plain English

> **TL;DR** A large language model is a system trained to predict the next chunk of text, over and over, across a huge amount of writing. From that single skill it learns the structure of language well enough to answer questions, translate, and reason. It does not look anything up and it does not have beliefs. Understanding that one fact explains both why it is useful and why it makes things up.

People talk about large language models as if they were either magic or a trick. They are neither. The real explanation is simpler than the magic and more interesting than the trick, and you do not need any mathematics to follow it. This is the plain-English version.

It is also background for the bigger argument I make about [how AI is changing human language](/journal/how-ai-is-changing-human-language/).

## One sentence: it predicts the next word

A large language model is a system trained to do one thing: given some text, predict what comes next. That is the entire core skill. Show it "the capital of France is" and it predicts "Paris," not because it looked anything up, but because that continuation is overwhelmingly common in the text it learned from.

Everything else, answering questions, writing code, holding a conversation, is that same next-word prediction, repeated one chunk at a time, with each new word fed back in to predict the next.

## How it learns

The model is trained by being shown enormous amounts of human writing with the next word hidden, guessing it, and being corrected. Repeat that billions of times and the model is forced to build an internal representation of how language works: grammar, facts, styles, the shape of an argument, the format of an answer.

Nobody programs in the rules of grammar or a database of facts. They fall out of the single goal of predicting text well. This is the surprising result that the [2017 transformer paper](https://arxiv.org/abs/1706.03762) set in motion: a simple objective, at a large enough scale, produces general-seeming competence.

## Why it is useful

Because so much of human knowledge is written down, a system that has absorbed the structure of that writing can be genuinely helpful:

- It can rephrase, summarise, and translate, because those are transformations of text it has seen done.
- It can answer many questions, because the answers are statistically encoded in how the relevant words relate.
- It can follow instructions, because it has seen countless examples of instructions being followed.

## Why it breaks

The same design explains the failures, which are not bugs so much as the flip side of the method:

1. **It makes things up.** When the most plausible next words are not the true ones, the model produces them anyway, with the same confidence. This is usually called hallucination.
2. **It can be out of date.** A base model knows only what was in its training data, frozen at a point in time, unless it is connected to live tools.
3. **It has no built-in sense of certainty.** Fluent and wrong looks exactly like fluent and right, because only fluency was trained directly.

The practical rule that falls out of this is the one I keep coming back to: trust a model on the shape of language, verify it on facts about the world. For the deeper version of why that split exists, see [do large language models actually understand language](/journal/do-large-language-models-understand-language/).

## What this means

Once you see that a language model is a next-word predictor that learned the structure of human writing, the mystery dissolves without the usefulness going away. It is not a mind and not a search engine. It is a model of language itself, which is exactly why it belongs in a conversation about where language is going.

That conversation is the subject of my book, [Building Your First Brain](/), which is free for the first 1,000 readers and assumes no technical background.

## Further reading

- ["Attention Is All You Need"](https://arxiv.org/abs/1706.03762), the architecture behind today's models.
- For where this fits in the long arc of communication, [the evolution of language from speech to code](/journal/the-evolution-of-language-speech-to-code/).

---

Source: https://buildfirstbrain.com/journal/how-large-language-models-work/
Author: Lawrence Arya — https://www.linkedin.com/in/vibecoding/
