Librería: GreatBookPrices, Columbia, MD, Estados Unidos de America
EUR 18,59
Cantidad disponible: Más de 20 disponibles
Añadir al carritoCondición: As New. Unread book in perfect condition.
Librería: GreatBookPrices, Columbia, MD, Estados Unidos de America
EUR 19,00
Cantidad disponible: Más de 20 disponibles
Añadir al carritoCondición: New.
EUR 21,37
Cantidad disponible: Más de 20 disponibles
Añadir al carritoPAP. Condición: New. New Book. Shipped from UK. Established seller since 2000.
EUR 19,20
Cantidad disponible: Más de 20 disponibles
Añadir al carritoPAP. Condición: New. New Book. Shipped from UK. Established seller since 2000.
Librería: GreatBookPricesUK, Woodford Green, Reino Unido
EUR 19,19
Cantidad disponible: Más de 20 disponibles
Añadir al carritoCondición: New.
Librería: GreatBookPricesUK, Woodford Green, Reino Unido
EUR 20,83
Cantidad disponible: Más de 20 disponibles
Añadir al carritoCondición: As New. Unread book in perfect condition.
Librería: Grand Eagle Retail, Bensenville, IL, Estados Unidos de America
EUR 21,94
Cantidad disponible: 1 disponibles
Añadir al carritoPaperback. Condición: new. Paperback. Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.Key Features: - Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more- Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching- Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment- Covers performance profiling, streaming, batching, and cost-efficient scaling- Future-proof insights on compiler-aware models, LoRA 2.0, and edge inferenceReady to build LLM systems that are faster, cheaper, and more scalable?Grab your copy of Optimizing LLM Performance today and deploy smarter. This item is printed on demand. Shipping may be from multiple locations in the US or from the UK, depending on stock availability.
Librería: CitiRetail, Stevenage, Reino Unido
EUR 23,26
Cantidad disponible: 1 disponibles
Añadir al carritoPaperback. Condición: new. Paperback. Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.Key Features: - Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more- Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching- Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment- Covers performance profiling, streaming, batching, and cost-efficient scaling- Future-proof insights on compiler-aware models, LoRA 2.0, and edge inferenceReady to build LLM systems that are faster, cheaper, and more scalable?Grab your copy of Optimizing LLM Performance today and deploy smarter. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability.