Multimodal AI Systems: Architectures, Training, and Applications (Transformer Principles Series)

By Wei Sun

No Customer Reviews

The Transformer Principles Series is a three-volume graduate-level treatise that builds a complete mathematical and engineering understanding of modern AI systems, from the foundational attention mechanism to large language models and multimodal architectures.

Volume III - Multimodal AI Systems: Architectures, Training, and Applications extends the Transformer paradigm beyond text into vision, audio, and video. It covers modality-specific encoders and tokenizers, cross-modal fusion and contrastive alignment (CLIP, SigLIP), diffusion and flow-matching generative models, vision-language architectures (ViT, LLaVA, Q-Former), text-to-image and text-to-video generation, speech and audio processing, efficient inference for multimodal models, long-context scaling, and reasoning agents that perceive and act across modalities.

Format:Paperback

Language:English

ISBN:B0H6NW8LG1

ISBN13:9798184326054

Release Date:June 2026

Publisher:Independently published

Length:478 Pages

Weight:2.43 lbs.

Dimensions:11.0" x 1.1" x 8.5"

Customer Reviews

0 rating

Write a review

ThriftBooks sells millions of used books at the lowest everyday prices. We personally assess every book's quality and offer rare, out-of-print treasures. We deliver the joy of reading in recyclable packaging with free standard shipping on US orders over $20. ThriftBooks.com. Read more. Spend less.

Copyright © 2026 Thriftbooks.com Terms of Use | Privacy Policy | Do Not Sell/Share My Personal Information | Cookie Policy | Cookie Preferences | Accessibility Statement
ThriftBooks ^® and the ThriftBooks ^® logo are registered trademarks of Thrift Books Global, LLC

Multimodal AI Systems: Architectures, Training, and Applications (Transformer Principles Series)

Recommended

Customer Reviews

Popular Categories

Website

My Account

Partnerships

Quick Help

About Us

Follow Us