Skip to main content

Resilience In Complex Adaptive Systems: Operating At The Edge Of Failure

Video of Richard Cook at Velocity NY 2013 with the topic: "Resilience In Complex Adaptive Systems".  Richard Cook wrote an excellent article "How complex systems fail" which you can access here.  This article is a good read to reference when considering crisis management even in Information Technology even though Cook's initial work was within health care.

View the excellent video below:


Comments

Popular posts from this blog

Using OPENDNS on a Mikrotik

At the office we use a Mikrotik which is connected via fibre to Cool Ideas.  We use OpenDNS as a Information Security tool.  It prevents ransomware and bots from becoming major incidents within the office.

The router is scheduled to do a daily update via script of the OpenDNS settings.  Below is the example:

:local opendnsuser "user@domain.co.za";
:local opendnspass "itsprivate";
:local opendnshost "office";

:log info "OpenDNS Update";
:local url "https://updates.opendns.com/nic/update";
/tool fetch url=($url . "\3Fhostname=$opendnshost") user=("$opendnsuser") password=("$opendnspass") mode=https dst-path=opendnsupdate.txt
:local opendnsresult [/file get opendnsupdate.txt contents];
:log info "OpenDNS: Host $opendnshost - $opendnsresult";

The importance of the major incident process

ITIL mentions the Major Incident process as a special case of the incident management process as well its close relationship to problem management.  However, the Major Incident process requires greater clarity and specification as in many large enterprises the process is crucial for overcoming a crisis. A Major Incident typically defined as an incident with severe negative business consequences and an important duty of any designated Information Technology (IT) resources is to deal with Major Incidents in a structured manner.  We will address this important topic in a series of articles that specifically addresses the process and crisis management in general.

Read the full article here.

Checklist of the information a manager of a NOC needs to have close at hand

This is a checklist devised by DS of the information a manager of a Network Operations Centre (NOC) needs to have close at hand:
Command and controlDate and time of current shift including start, finish and handover.NOC manager on dutyShift leadersService Delivery Manager on duty/standbyMajor Incident Manager on duty/standby Tiger Team status (refer here for process) echowhiskydeltaromeobravoalpha Red-Amber-Green (RAG) of the trenchesSecurityData centreAppsSupportInfrastructure NotificationsOngoing Service Level Agreement (SLA) or contract violationsAll Major Incidents All failures and outages Last 10 maintenance tasks completedNext 10 maintenance tasks scheduledPlanned continuity tests scheduled (inverter/generator tests, network path protection tests, business continuity or application high availability tests)Resources available to the NOCResources unavailable to the NOCChanges completed during the past week (includes the status on whether they were successful or failed)Changes sch…