This book introduces a novel approach to the design and operation of large ICT systems. It views the technical solutions and their stakeholders as complex adaptive systems and argues that traditional risk analyses cannot predict all future incidents with major impacts. To avoid unacceptable events, it is necessary to establish and operate anti-fragile ICT systems that limit the impact of all incidents, and which learn from small-impact incidents how to function increasingly well in changing environments.
The book applies four design principles and one operational principle to achieve anti-fragility for different classes of incidents. It discusses how systems can achieve high availability, prevent malware epidemics, and detect anomalies. Analyses of Netflix’s media streaming solution, Norwegian telecom infrastructures, e-government platforms, and Numenta’s anomaly detection software show that cloud computing is essential to achieving anti-fragility for classes of events with negative impacts.