{"product_id":"quantization-and-fast-inference-vivek-kalyanarangan-9781633433915","title":"Quantization and Fast Inference: A Practitioner's Guide to Efficient AI","description":"\u003cb\u003eGet the eBook free when you register your print book at Manning.\u003c\/b\u003e \u003cp\u003e\u003c\/p\u003eToday's AI models demand a lot of memory, compute, and server horsepower--which quickly translates into cost. This book show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment--all with minimal accuracy loss. \u003cp\u003e\u003c\/p\u003eFrom quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles, and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully compressed deployment. You'll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4. \u003cp\u003e\u003c\/p\u003e \u003cb\u003eWhat's inside\u003c\/b\u003e \u003cp\u003e\u003c\/p\u003e - Applying post-training quantization to production models\u003cbr\u003e - Deploying efficiently on CPUs, edge devices, and mobile\u003cbr\u003e - Framework-agnostic techniques and real cross-framework parity testing\u003cbr\u003e - Flowcharts and checklists for efficient decision making \u003cp\u003e\u003c\/p\u003e\u003cb\u003eAbout the reader\u003c\/b\u003e \u003cp\u003e\u003c\/p\u003e For ML engineers and researchers experienced in Python. \u003cp\u003e\u003c\/p\u003e \u003cb\u003eAbout the author\u003c\/b\u003e \u003cp\u003e\u003c\/p\u003e \u003cb\u003eVivek Kalyanarangan\u003c\/b\u003e is an AI\/ML architect, researcher, and educator with over twelve years of experience designing and deploying large-scale machine learning systems.\u003cbr\u003e\u003cbr\u003e\u003cb\u003eAuthor:\u003c\/b\u003e Vivek Kalyanarangan\u003cbr\u003e\u003cb\u003eISBN-10:\u003c\/b\u003e 1633433919\u003cbr\u003e\u003cb\u003eISBN-13:\u003c\/b\u003e 9781633433915\u003cbr\u003e\u003cb\u003ePublisher:\u003c\/b\u003e Manning Publications\u003cbr\u003e\u003cb\u003eLanguage:\u003c\/b\u003e English\u003cbr\u003e\u003cb\u003ePublished:\u003c\/b\u003e 12\/29\/2026\u003cbr\u003e\u003cb\u003ePages:\u003c\/b\u003e 350\u003cbr\u003e\u003cb\u003eFormat:\u003c\/b\u003e Paperback\u003cbr\u003e\u003cb\u003eWeight:\u003c\/b\u003e 0.92lbs","brand":"Vivek Kalyanarangan","offers":[{"title":"Paperback","offer_id":48748327108863,"sku":"9781633433915","price":59.99,"currency_code":"USD","in_stock":false}],"url":"https:\/\/www.whiterainbookhouse.com\/products\/quantization-and-fast-inference-vivek-kalyanarangan-9781633433915","provider":"WR Book House","version":"1.0","type":"link"}