Skip to content
Scan a barcode
Scan
Paperback Open Source LLMs: Systems Architecture, Efficient Fine-Tuning, and High-Performance Scaling at Enterprise Scale Book

ISBN: B0GQMCCJFB

ISBN13: 9798250129800

Open Source LLMs: Systems Architecture, Efficient Fine-Tuning, and High-Performance Scaling at Enterprise Scale

Open Source LLMs is a production-focused engineering manual for developers, ML engineers, platform teams, and technical leaders who want to design, fine-tune, deploy, scale, secure, and optimize open source large language models in real enterprise environments.

This is not a beginner's guide.
This is not a prompt engineering book.
This is not a collection of tutorials.

This book shows you how to build and operate an enterprise-grade LLM platform.

What This Book Teaches

You will learn how to:

Architect open source LLM systems for real production workloads

Model GPU memory usage, KV cache growth, and token throughput

Optimize latency (TTFB, p95, p99) and eliminate tail bottlenecks

Deploy high-performance inference engines at scale

Implement dynamic batching and multi-GPU parallelism

Fine-tune efficiently using LoRA, QLoRA, PEFT, and alignment strategies

Model cost per token and forecast GPU-hour consumption

Design RAG systems that scale without exploding context cost

Secure LLM platforms against prompt injection and data leakage

Implement multi-tenant isolation and SLA enforcement

Build observability pipelines for token-level telemetry

Migrate from closed APIs to fully controlled open infrastructure

Deploy in hybrid cloud and regulated on-prem environments


Built Around a Real Enterprise Platform

Unlike fragmented AI books, this guide evolves one cohesive system throughout:

Each chapter upgrades it:

From model internals to distributed scaling

From fine-tuning to production hardening

From inference optimization to governance and compliance

From cost modeling to executive communication frameworks

You won't just learn theory.
You will design a real production blueprint.


Deep Engineering Focus

This book goes beyond surface-level explanations.

It includes:

GPU memory math for 7B to 70B models

KV cache scaling laws

Tokens-per-second modeling

Cost-per-million-token forecasting

Failure case studies from real production patterns

Performance regression methodologies

Hardware accelerator considerations

Hybrid cloud architecture design

If you are building AI systems that must handle:

Thousands of concurrent users

Strict compliance requirements

Multi-region deployments

Enterprise security standards

This book was written for you.


Who This Book Is For

ML Engineers deploying open models

Platform Engineers building internal AI platforms

DevOps and MLOps professionals

CTOs and AI Infrastructure leads

Enterprise architects designing scalable AI systems

If you want to move from experimentation to production-grade AI infrastructure, this is your blueprint.


Why Open Source LLMs Matter Now

Open models provide:

Full architectural control

Custom fine-tuning capability

Cost transparency

Data residency control

Vendor independence

But control requires engineering maturity.


The Outcome

By the end of this book, you will know how to:

Build a secure, observable, scalable LLM platform

Optimize performance under real traffic

Forecast infrastructure cost accurately

Design systems that survive growth

Future-proof your AI architecture

You will not just deploy a model.

You will architect an AI system that lasts.

Recommended

Format: Paperback

Temporarily Unavailable

We receive fewer than 1 copy every 6 months.

Save to List

Customer Reviews

0 rating
Copyright © 2026 Thriftbooks.com Terms of Use | Privacy Policy | Do Not Sell/Share My Personal Information | Cookie Policy | Cookie Preferences | Accessibility Statement
ThriftBooks ® and the ThriftBooks ® logo are registered trademarks of Thrift Books Global, LLC
GoDaddy Verified and Secured