Skip to content
Scan a barcode
Scan
Paperback Small Language Models: When Smaller Is Better Book

ISBN: B0H2GPZQ2N

ISBN13: 9798197568922

Small Language Models: When Smaller Is Better

Bigger is not always better. In production AI systems, bigger is often slower, more expensive, harder to deploy, harder to customize, and harder to control.

Small Language Models: When Smaller Is Better is a practical guide to building useful AI systems when latency, cost, privacy, reliability, and deployment constraints matter as much as raw benchmark scores.

Large language models are extraordinary generalists, but most products do not need the largest possible model for every request. They need the right model for the job. Sometimes that means a compact local model. Sometimes it means a fine-tuned specialist. Sometimes it means retrieval, routing, adapters, quantization, or a hybrid system where a small model handles the common path and a larger model becomes the fallback.

This book treats small language models as engineering components, not as weaker clones of frontier models. You will learn how to reason about SLMs as classifiers, extractors, summarizers, local assistants, retrieval partners, tool callers, routing stages, draft generators, privacy-preserving workers, and cost-control mechanisms inside real systems.

Inside, you will learn how to:


Decide when a small language model is good enough, and when it is notUnderstand tokens, embeddings, attention, context windows, KV cache, logits, sampling, and instruction tuningThink clearly about scaling laws, data quality, synthetic data, distillation, and the lessons behind Phi-style training recipesUse compression techniques such as distillation, pruning, quantization, LoRA, QLoRA, and adapter-based fine-tuningChoose an SLM by task fit, license, hardware target, latency budget, context window, evaluation results, and operational riskRun models locally with tools and formats such as llama.cpp, GGUF, Ollama, ONNX Runtime GenAI, MLX, vLLM, and related inference stacksDesign retrieval-augmented generation systems that help smaller models answer with better contextBuild evaluations that measure task quality, hallucination risk, latency, regressions, and cost-per-successUse routing, cascades, speculative decoding, tool calling, structured outputs, caching, and AI gatewaysHandle safety, privacy, governance, model observability, rollout strategy, and production operation

The book is written for backend engineers, platform engineers, machine learning engineers, product engineers, architects, tech leads, and developers who want to build AI systems that survive real constraints. You do not need to be a research scientist. You need enough technical grounding to ask better questions before sending every request to the biggest model available.

If you are building AI features for mobile, desktop, edge devices, private environments, customer VPCs, low-latency workflows, high-volume products, or specialized domain tasks, this book gives you the mental models and system-design vocabulary to make better trade-offs.

By the end, you will have a practical decision framework for answering the central question: when is a smaller model not just cheaper, but architecturally better?

Recommended

Format: Paperback

Condition: New

$12.27
Ships within 2-3 days
Save to List

Customer Reviews

0 rating
Copyright © 2026 Thriftbooks.com Terms of Use | Privacy Policy | Do Not Sell/Share My Personal Information | Cookie Policy | Cookie Preferences | Accessibility Statement
ThriftBooks ® and the ThriftBooks ® logo are registered trademarks of Thrift Books Global, LLC
GoDaddy Verified and Secured