Large Language Models (LLMs) have become a central technology of modern AI - yet most explanations remain either overly technical, vague, or buried under metaphors.
This book offers a clear and rigorous introduction to how LLMs actually work, using only standard school-level mathematics and avoiding misleading simplifications. It answers the questions that many tutorials skip: how neural networks can perform prediction, why pattern-matching scores can be interpreted as probabilities, what embeddings really represent, and what the Query-Key-Value mechanism is truly doing inside a Transformer.
Starting from the early Bengio-style neural language model, the book gradually builds up the essential concepts - forward neural networks, feature recognition, logits, softmax, gradient descent - before reaching the Transformer architecture and its attention mechanism.
Written for curious readers without a machine-learning background, this is an accessible but faithful journey into the foundations of today's language models.
If you want to understand LLMs without handwaving, buzzwords, or empty metaphors, this book is for you.