This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
"Sinopsis" puede pertenecer a otra edición de este libro.
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC).
The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models.
Topics and features:
This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing.
Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.
"Sobre este título" puede pertenecer a otra edición de este libro.
EUR 10,00 gastos de envío desde Alemania a España
Destinos, gastos y plazos de envíoEUR 9,76 gastos de envío desde Estados Unidos de America a España
Destinos, gastos y plazos de envíoLibrería: Universitätsbuchhandlung Herta Hold GmbH, Berlin, Alemania
ix, 320p. Hardcover. Versand aus Deutschland / We dispatch from Germany via Air Mail. Einband bestoßen, daher Mängelexemplar gestempelt, sonst sehr guter Zustand. Imperfect copy due to slightly bumped cover, apart from this in very good condition. Stamped. Stamped. Computer Communications and Networks. Sprache: Englisch. Nº de ref. del artículo: 4823IB
Cantidad disponible: 2 disponibles
Librería: Buchpark, Trebbin, Alemania
Condición: Sehr gut. Zustand: Sehr gut | Seiten: 332 | Sprache: Englisch | Produktart: Bücher. Nº de ref. del artículo: 25708812/12
Cantidad disponible: 1 disponibles
Librería: Books Puddle, New York, NY, Estados Unidos de America
Condición: New. pp. 320. Nº de ref. del artículo: 26372815544
Cantidad disponible: 1 disponibles
Librería: moluna, Greven, Alemania
Condición: New. Dieser Artikel ist ein Print on Demand Artikel und wird nach Ihrer Bestellung fuer Sie gedruckt. The first complete overview of this increasingly important fieldPresents a unique, rigorous approach based on the design of analytical models to predict performanceProvides a coherent collection of valuable insights from internationally-renown. Nº de ref. del artículo: 31406393
Cantidad disponible: Más de 20 disponibles
Librería: Majestic Books, Hounslow, Reino Unido
Condición: New. pp. 320. Nº de ref. del artículo: 374278503
Cantidad disponible: 1 disponibles
Librería: BuchWeltWeit Ludwig Meier e.K., Bergisch Gladbach, Alemania
Buch. Condición: Neu. This item is printed on demand - it takes 3-4 days longer - Neuware -This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems. 332 pp. Englisch. Nº de ref. del artículo: 9783319209425
Cantidad disponible: 2 disponibles
Librería: AHA-BUCH GmbH, Einbeck, Alemania
Buch. Condición: Neu. Druck auf Anfrage Neuware - Printed after ordering - This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems. Nº de ref. del artículo: 9783319209425
Cantidad disponible: 1 disponibles
Librería: Ria Christie Collections, Uxbridge, Reino Unido
Condición: New. In. Nº de ref. del artículo: ria9783319209425_new
Cantidad disponible: Más de 20 disponibles
Librería: Biblios, Frankfurt am main, HESSE, Alemania
Condición: New. pp. 320. Nº de ref. del artículo: 18372815538
Cantidad disponible: 1 disponibles
Librería: GreatBookPrices, Columbia, MD, Estados Unidos de America
Condición: New. Nº de ref. del artículo: 23922726-n
Cantidad disponible: Más de 20 disponibles