Build production grade ML compilers with MLIR, from TensorFlow and PyTorch graphs to fast GPU, CPU, and embedded executables
Machine learning teams struggle to turn research models into efficient binaries across diverse hardware. Toolchains are fragmented, passes are opaque, and small changes can break performance or correctness.
This book gives you a clear path. You get a practical workflow that starts with readable IR, enforces graph invariants with strong verifiers, and lowers to portable or vendor specific code that you can ship with confidence.
Design solid operators using ODS and traits, add verifiers and builders that keep graphs legal, and attach interfaces that unlock tiling, fusion, and bufferizationImport TensorFlow with StableHLO and VHLO, use the TFLite and TF bridges, and keep portability with TOSA when you need framework neutral flowsCapture PyTorch programs with Torch MLIR, decompose to arith tensor and linalg, and manage distinct training and inference paths without forking pipelinesApply shape reasoning with the Shape dialect, handle static and dynamic ranks, and wire in inference that feeds downstream transformsRun post training quantization with the Quant dialect, carry scales and zero points correctly, and build calibration aware dequant pipelinesBufferize tensors with One Shot Bufferize, control function boundaries, model effects precisely, and validate lifetimes with ownership based deallocationTune memory with MemRef layout maps, alignment and packing, and pick layouts that suit accelerators without losing legalityGenerate GPU code with GPU and NVGPU dialects, target NVVM or ROCDL, and use vector and tensor core paths that map to real intrinsicsTarget SPIR V for Vulkan environments with capability gating, or generate portable C and C++ for microcontrollers with EmitCJIT with ExecutionEngine and JitRunner, or use IREE end to end for compilation and runtime on mobile, desktop, and serverDrive performance with tiling fusion and vectorization in Linalg and Vector, add autotuning hooks, and apply the Sparse Tensor dialect for structured sparsityProfile with remarks counters and traces, then lock down stability with lit and FileCheck, mlir reduce, bytecode, and dialect versioningWork through complete case studies, TensorFlow ResNet to CUDA with NVGPU and NVVM, PyTorch Transformer to ROCm with ROCDL, quantized MobileNet to EmitC for Cortex M, and sparse attention to SPIR V for VulkanThis is a code heavy guide with labeled MLIR Python C++ Shell and TableGen listings, you can copy pipelines and schedules directly into your builds to stand up real projects.
Grab your copy today