AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series) - Tapa blanda

Libro 6 de 11: Production AI Engineering Series

Team, ChatVariety

9798199720021: AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series)

Tapa blanda

ISBN 13: 9798199720021

Editorial: Independently published, 2026

Ver todas las copias de esta edici�n del ISBN

0 Usado

5 Nuevo

De EUR 13,59

Slash LLM Deployment Costs and Latency

Deploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.

What you will master inside this book:

Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.
State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.
Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.
Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.
Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.

Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines.

"Sinopsis" puede pertenecer a otra edici�n de este libro.

Editorial: Independently published
A�o de publicaci�n: 2026
Idioma: Ingl�s
ISBN 13: 9798199720021
Encuadernaci�n: Tapa blanda
N�mero de p�ginas: 95
Contacto del fabricante: Manufactured by Amazon on behalf of the author
https://www.amazon.es/hz/contact-us

c/o Amazon Media EU S.�.r.l., 38 Avenue John F. Kennedy
Luxembourg
L-1855
Luxemburgo

Resultados de la b�squeda para AI Inference Optimization Engineering: Quantization,...

Imagen de archivo

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series)

Team, ChatVariety

Publicado por Independently published, 2026

ISBN 13: 9798199720021

Nuevo Tapa blanda

Impresi�n bajo demanda

Librería: California Books, Miami, FL, Estados Unidos de America

Calificaci�n del vendedor: 4 de 5 estrellas

Condici�n: New. Print on Demand. N� de ref. del art�culo: I-9798199720021

Contactar al vendedor

Comprar nuevo

EUR 13,59

Gastos de env�o gratis
Se env�a dentro de Estados Unidos de America

Cantidad disponible: M�s de 20 disponibles

A�adir al carrito

Imagen de archivo

AI Inference Optimization Engineering

Team, Chatvariety

Publicado por Independently published, 2026

ISBN 13: 9798199720021

Nuevo PAP

Librería: PBShop.store US, Wood Dale, IL, Estados Unidos de America

Calificaci�n del vendedor: 5 de 5 estrellas

PAP. Condici�n: New. New Book. Shipped from UK. Established seller since 2000. N� de ref. del art�culo: L2-9798199720021

Contactar al vendedor

Comprar nuevo

EUR 14,14

Gastos de env�o gratis
Se env�a dentro de Estados Unidos de America

Cantidad disponible: M�s de 20 disponibles

A�adir al carrito

Imagen de archivo

AI Inference Optimization Engineering

Team, Chatvariety

Publicado por Independently published, 2026

ISBN 13: 9798199720021

Nuevo PAP

Librería: PBShop.store UK, Fairford, GLOS, Reino Unido

Calificaci�n del vendedor: 5 de 5 estrellas

PAP. Condici�n: New. New Book. Shipped from UK. Established seller since 2000. N� de ref. del art�culo: L2-9798199720021

Contactar al vendedor

Comprar nuevo

EUR 13,42

Env�o por EUR 3,85
Se env�a de Reino Unido a Estados Unidos de America

Cantidad disponible: M�s de 20 disponibles

A�adir al carrito

Imagen de archivo

AI Inference Optimization Engineering (Paperback)

Chatvariety Team

Publicado por Independently Published, 2026

ISBN 13: 9798199720021

Nuevo Paperback

Impresi�n bajo demanda

Librería: CitiRetail, Stevenage, Reino Unido

Calificaci�n del vendedor: 5 de 5 estrellas

Paperback. Condici�n: new. Paperback. Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.What you will master inside this book: Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. N� de ref. del art�culo: 9798199720021

Contactar al vendedor

Comprar nuevo

EUR 16,84

Env�o por EUR 43,25
Se env�a de Reino Unido a Estados Unidos de America

Cantidad disponible: 1 disponibles

A�adir al carrito

Imagen de archivo

AI Inference Optimization Engineering : Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Chatvariety Team

Publicado por Independently Published Jun 2026, 2026

ISBN 13: 9798199720021

Nuevo Taschenbuch

Librería: AHA-BUCH GmbH, Einbeck, Alemania

Calificaci�n del vendedor: 5 de 5 estrellas

Taschenbuch. Condici�n: Neu. Neuware - Slash LLM Deployment Costs and LatencyDeploying Large Language Models (LLMs) in production is a massive economic and engineering hurdle. AI Inference Optimization Engineering is your comprehensive, hands-on guide to mastering the full stack of modern LLM optimization techniques. From memory-bandwidth solutions to hardware-specific compilation, this book bridges the gap between research-level models and enterprise-grade execution.What you will master inside this book: - Hardware-Aware Optimization: Dive deep into KV cache mechanics, autoregressive decoding, and GPU memory hierarchies to eliminate latency bottlenecks.- State-of-the-Art Quantization: Apply GPTQ, AWQ, and GGUF compression algorithms to scale down massive neural networks without sacrificing model accuracy.- Advanced Acceleration Methods: Implement speculative decoding with draft models (like Medusa and Eagle), PagedAttention, and FlashAttention to boost throughput by 2-3x.- Production-Grade Serving: Build ultra-low-latency deployment infrastructures using vLLM, Triton Inference Server, and continuous batching.- Cross-Platform Deployment: Optimize models for specific target hardware, including NVIDIA H100 (TensorRT-LLM), Apple Silicon (llama.cpp/Metal), and Qualcomm mobile/edge accelerators.Whether you are an ML infrastructure engineer, an AI platform architect, or a technical leader looking to scale LLMs cost-effectively, this book provides the production-ready code, equations, and architectural patterns you need to build hyper-efficient AI pipelines. N� de ref. del art�culo: 9798199720021

Contactar al vendedor

Comprar nuevo

EUR 13,00

Env�o por EUR 60,71
Se env�a de Alemania a Estados Unidos de America

Cantidad disponible: 2 disponibles

A�adir al carrito

Art�culos relacionados a AI Inference Optimization Engineering: Quantization,...

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series) - Tapa blanda

Team, ChatVariety

Sinopsis

Resultados de la b�squeda para AI Inference Optimization Engineering: Quantization,...

AI Inference Optimization Engineering: Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment (Production AI Engineering Series)

Comprar nuevo

AI Inference Optimization Engineering

Comprar nuevo

AI Inference Optimization Engineering

Comprar nuevo

AI Inference Optimization Engineering (Paperback)

Comprar nuevo

AI Inference Optimization Engineering : Quantization, Speculative Decoding, and Hardware-Specific LLM Deployment

Comprar nuevo