Incident Management: Leading When Systems Fail
Modern systems don’t fail quietly—and neither do the organisations that run them.
When critical technology breaks, the real challenge is rarely the code. It’s decision-making under pressure, communication under uncertainty, and leadership when the cost of mistakes is measured in trust, revenue, and real-world impact.
Drawing on years of experience managing incidents in high-stakes financial and banking environments, this book reframes incident management as what it truly is: a leadership discipline, not a debugging exercise.
Using clear analogies from emergency response—fires, floods, earthquakes, and tsunamis—Incident Management provides a practical, human-centred guide to staying effective when everything is on the line.
✔ What an incident really is—and why impact matters more than urgency
✔ How to stabilise situations in the first five critical minutes
✔ Why clearly defined roles outperform heroics under pressure
✔ How communication becomes infrastructure during outages
✔ Techniques for decision-making when information is incomplete
✔ How to manage cascading failures, dependency shocks, and systemic events
✔ Practical guidance on observability, triage, and recovery
✔ How to run blameless post-incident reviews with real accountability
✔ How to build resilient systems and resilient people
This book covers the full lifecycle of incident management, including:
Incident command and team structure
Crisis communication for technical and non-technical stakeholders
Flood control patterns such as rate limiting and backpressure
Dependency failures and staged recovery
Customer trust during outages
Security and compliance incidents
Automation, drills, and simulation
Metrics that matter—and those that mislead
The psychology of stress, fatigue, and group dynamics
Each chapter combines practical frameworks with timeless insight, supported by reflections from philosophy, leadership, and emergency management.
This book is written for:
Technology leaders and engineering managers
Incident commanders and on-call engineers
SRE, platform, and reliability teams
Executives responsible for critical systems
Anyone expected to lead calmly when systems fail
No prior incident management framework is required—just the responsibility to act when things go wrong.
Most books focus on tools, dashboards, and postmortems.
This one focuses on how people think, communicate, and decide under pressure.
Because when systems fail, leadership—not technology—determines the outcome.
"Sinopsis" puede pertenecer a otra edición de este libro.
Librería: California Books, Miami, FL, Estados Unidos de America
Condición: New. Print on Demand. Nº de ref. del artículo: I-9798242485730
Cantidad disponible: Más de 20 disponibles
Librería: PBShop.store US, Wood Dale, IL, Estados Unidos de America
PAP. Condición: New. New Book. Shipped from UK. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000. Nº de ref. del artículo: L0-9798242485730
Cantidad disponible: Más de 20 disponibles
Librería: Grand Eagle Retail, Bensenville, IL, Estados Unidos de America
Paperback. Condición: new. Paperback. Incident Management: Leading When Systems FailModern systems don't fail quietly-and neither do the organisations that run them.When critical technology breaks, the real challenge is rarely the code. It's decision-making under pressure, communication under uncertainty, and leadership when the cost of mistakes is measured in trust, revenue, and real-world impact.Drawing on years of experience managing incidents in high-stakes financial and banking environments, this book reframes incident management as what it truly is: a leadership discipline, not a debugging exercise.Using clear analogies from emergency response-fires, floods, earthquakes, and tsunamis-Incident Management provides a practical, human-centred guide to staying effective when everything is on the line.What You'll Learn What an incident really is-and why impact matters more than urgency How to stabilise situations in the first five critical minutes Why clearly defined roles outperform heroics under pressure How communication becomes infrastructure during outages Techniques for decision-making when information is incomplete How to manage cascading failures, dependency shocks, and systemic events Practical guidance on observability, triage, and recovery How to run blameless post-incident reviews with real accountability How to build resilient systems and resilient peopleInside the BookThis book covers the full lifecycle of incident management, including: Incident command and team structureCrisis communication for technical and non-technical stakeholdersFlood control patterns such as rate limiting and backpressureDependency failures and staged recoveryCustomer trust during outagesSecurity and compliance incidentsAutomation, drills, and simulationMetrics that matter-and those that misleadThe psychology of stress, fatigue, and group dynamicsEach chapter combines practical frameworks with timeless insight, supported by reflections from philosophy, leadership, and emergency management.Who This Book Is ForThis book is written for: Technology leaders and engineering managersIncident commanders and on-call engineersSRE, platform, and reliability teamsExecutives responsible for critical systemsAnyone expected to lead calmly when systems failNo prior incident management framework is required-just the responsibility to act when things go wrong.Why This Book Is DifferentMost books focus on tools, dashboards, and postmortems.This one focuses on how people think, communicate, and decide under pressure.Because when systems fail, leadership-not technology-determines the outcome. This item is printed on demand. Shipping may be from multiple locations in the US or from the UK, depending on stock availability. Nº de ref. del artículo: 9798242485730
Cantidad disponible: 1 disponibles
Librería: PBShop.store UK, Fairford, GLOS, Reino Unido
PAP. Condición: New. New Book. Delivered from our UK warehouse in 4 to 14 business days. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000. Nº de ref. del artículo: L0-9798242485730
Cantidad disponible: Más de 20 disponibles
Librería: CitiRetail, Stevenage, Reino Unido
Paperback. Condición: new. Paperback. Incident Management: Leading When Systems FailModern systems don't fail quietly-and neither do the organisations that run them.When critical technology breaks, the real challenge is rarely the code. It's decision-making under pressure, communication under uncertainty, and leadership when the cost of mistakes is measured in trust, revenue, and real-world impact.Drawing on years of experience managing incidents in high-stakes financial and banking environments, this book reframes incident management as what it truly is: a leadership discipline, not a debugging exercise.Using clear analogies from emergency response-fires, floods, earthquakes, and tsunamis-Incident Management provides a practical, human-centred guide to staying effective when everything is on the line.What You'll Learn What an incident really is-and why impact matters more than urgency How to stabilise situations in the first five critical minutes Why clearly defined roles outperform heroics under pressure How communication becomes infrastructure during outages Techniques for decision-making when information is incomplete How to manage cascading failures, dependency shocks, and systemic events Practical guidance on observability, triage, and recovery How to run blameless post-incident reviews with real accountability How to build resilient systems and resilient peopleInside the BookThis book covers the full lifecycle of incident management, including: Incident command and team structureCrisis communication for technical and non-technical stakeholdersFlood control patterns such as rate limiting and backpressureDependency failures and staged recoveryCustomer trust during outagesSecurity and compliance incidentsAutomation, drills, and simulationMetrics that matter-and those that misleadThe psychology of stress, fatigue, and group dynamicsEach chapter combines practical frameworks with timeless insight, supported by reflections from philosophy, leadership, and emergency management.Who This Book Is ForThis book is written for: Technology leaders and engineering managersIncident commanders and on-call engineersSRE, platform, and reliability teamsExecutives responsible for critical systemsAnyone expected to lead calmly when systems failNo prior incident management framework is required-just the responsibility to act when things go wrong.Why This Book Is DifferentMost books focus on tools, dashboards, and postmortems.This one focuses on how people think, communicate, and decide under pressure.Because when systems fail, leadership-not technology-determines the outcome. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. Nº de ref. del artículo: 9798242485730
Cantidad disponible: 1 disponibles