Warp 2.0: High-Performance Infrastructure for Low-Latency AI and Distributed Inference: Distributed serving, quantization, hardware orchestration, edg

By Marcus Endo

No Customer Reviews

This book is a Technical manual for building ultra-low-latency inference platforms for LLMs and multimodal models. Explains model sharding, tensor/model parallelism, pipeline parallelism, clever batching strategies, quantization techniques, kernel-level optimizations, and GPU/TPU orchestration. Includes on-device and edge patterns, caching strategies, network optimizations, and benchmarking methodologies to select cost-effective hardware-software stacks.
Who this book is forInfrastructure and performance engineers optimizing inference pipelines.Platform architects designing low-latency, high-throughput model serving.CTOs evaluating hardware and deployment trade-offs for AI services.Engineers deploying on-edge or hybrid cloud-edge inference topologies.What the reader will learnModel sharding and parallelism techniques for throughput and latency.Batching and pipelining heuristics for real-world traffic patterns.Quantization, pruning, and distillation tactics to reduce compute.Autoscaling, scheduler design, and GPU orchestration best practices.Edge inference patterns and hybrid on-prem/cloud deployment strategies.How to benchmark, profile, and tune end-to-end latency and costs.

Format:Paperback

Language:English

ISBN:B0FQVLNM51

ISBN13:9798264957444

Release Date:September 2025

Publisher:Independently Published

Length:324 Pages

Weight:1.24 lbs.

Dimensions:0.7" x 7.0" x 10.0"

Related Subjects

Computers Computers & Technology

Customer Reviews

0 rating

Write a review

ThriftBooks sells millions of used books at the lowest everyday prices. We personally assess every book's quality and offer rare, out-of-print treasures. We deliver the joy of reading in recyclable packaging with free standard shipping on US orders over $20. ThriftBooks.com. Read more. Spend less.

Copyright © 2026 Thriftbooks.com Terms of Use | Privacy Policy | Do Not Sell/Share My Personal Information | Cookie Policy | Cookie Preferences | Accessibility Statement
ThriftBooks ^® and the ThriftBooks ^® logo are registered trademarks of Thrift Books Global, LLC

Warp 2.0: High-Performance Infrastructure for Low-Latency AI and Distributed Inference: Distributed serving, quantization, hardware orchestration, edg

Recommended

Customer Reviews

Popular Categories

Website

My Account

Partnerships

Quick Help

About Us

Follow Us