The ingredients for effective alarm management
20 August 2014
This paper proposes an approach to deliver an alarm management improvement program based on field-proven, phased methodology that not only supports industry best practice, but also delivers it in a practical and cost-effective manner. The focus here will be phases one and two of the alarm management improvement process: Phase 1 – Identification and elimination of bad actors and Phase 2 – Alarm rationalisation.
Honeywell DynAMo Metrics & Reporting Dashboard
Too often, approaches to alarm management within process industries have been dictated by, and even restricted to, the capabilities and functions of specific tools or local site knowledge. Investments in software tools, for example, while an essential part of any alarm improvement program, can do little to resolve underlying problems if the results and findings are not actioned in a structured and timely way.
Even where process companies take a wider view to include software tools, formal standards and procedures, a framework to tie these otherwise disparate elements together is usually lacking.
It is not only in the process industries for which alarm management is crucial. In April 2013, The Joint Commission, a US non-profit which accredits the country’s hospitals, warned that doctors were increasingly desensitised, immune or overwhelmed by constant hospital medical alarms – "alarm fatigue", as researchers put it.
Between January 2005 and June 2010 the US Food and Drug Administration’s (FDA) Manufacturer and the User Facility Device Experience (MAUDE) database revealed that 566 alarm-related patient deaths were reported.
The issues will be wearily familiar to all those who have tackled industrial alarm strategies: staff are faced with thousands of alarm signals from a myriad of medical instruments on every unit, up to 99% of which require no clinical intervention. As a result, the report found, physicians were prone to turn down alarm volumes, turn them off, or adjust the alarm settings outside safe parameters – with serious, potentially fatal, consequences.
A new approach
At the very outset of this program, it is important for the process company to be clear about their reasons for embarking on an alarm management project. These will vary between businesses, but without clear agreement on the benefits sought, it is impossible to tailor an effective program to the organisation’s needs, or to achieve buy-in from those tasked with implementing it. There are a number of common drivers:
• Regulatory compliance
• Safety improvement
• Operator efficiency
• Insurance premiums
• Reduction of trips
• Increased production
• Reduced maintenance costs.
The impact of the insurance industry is an interesting driver. No insurer wants exposure to a Texas City or Milford Haven, and underwriters are increasingly insistent that clients demonstrate a proactive approach to alarm handling before agreeing to provide coverage.
Moreover, all the aforementioned drivers are reasons enough in themselves to embark on an alarm improvement program, and there is convincing evidence to show measurable benefits can be achieved. Honeywell’s own research suggests plants can, on average, cut in half both unplanned downtime and the number/cost of incidents with an effectively executed alarm management program. Other observed benefits included 3% increased capacity utilisation, 5% better energy utilization, and a 5% improvement in mechanical availability.
What these figures suggest is that the primary goal of alarm management is not to reduce the number of alarms. A reduction will occur, but it is the result of a good system put in place to achieve business and operating goals, rather than the aim in itself. The quality and clarity of the alarms presented, not the number, is the most important aspect of any alarm management program.
There are, in fact, few areas of business improvement where better alarm management cannot have an impact. Whether the object is better reliability, productivity or performance, safeguarding the plant or reducing costs, alarm management is an important contributor. Consider one example: the challenge of an aging workforce and the skills gap that exists between them and the next generation workforce. An effective alarm management system will capture the knowledge of experienced staff, cataloguing causes, consequences and corrective actions for each alarm, and retain this information for the benefit of less experienced recruits.
In all cases, the key to developing an effective alarm management strategy is to first understand why it is being developed. Within the same development framework, different requirements will drive different solutions.
Techniques, standards, tools and best practices all play a role in alarm management. Most plants will be familiar with, and employ a range of these.
• Standards and guidelines: The EEMUA 191 guidelines are widely used in Europe; the ISA 18.2 standard in the US and across Asia.
• Best practice: This comes from a variety of sources, but the work of the Abnormal Situations Management (ASM) Consortium, founded by Honeywell, has been particularly important in driving best practices in the area of alarm management.
• Tools: Software, databases and written procedures are all commonly used. Among the most frequently found are alarm management metrics, recording and analysis software, master alarm databases, and alarm response manuals.
A study presented to a meeting of the ASM Consortium by oil and gas giant Total in this respect is interesting . It looked at all of these areas – standards, techniques, procedures and tools – to see if any one of them had an overwhelming impact on the effectiveness of the alarm management program. The conclusion: there was no “silver bullet”. No single, specific action had the desired result; instead, a combination of actions made a positive and significant impact. An effective solution requires such a combination – a model.
The alarm management program provides the framework to ensure the various contributors are applied in a coordinated and coherent way in order to achieve the business goals. It ensures lessons learned in one area of a plant or facility can be captured and applied elsewhere. It also ensures the strategy is resilient: by building alarm management around a program rather than a specific tool, process or workflow, the strategy – the goal – is never undermined.
Human factors are, of course, crucial to effective alarm management. The operator is the important link in the chain in any alarm system. They are the vital supercomputer, without which no system can be effective, regardless of how technologically advanced it may be.
Figure 1 - A model for continual improvement
Human factors are not included as a distinct element of this model simply because they are integral to every part of it. Established best practice and research should inform every step within the program. If we know that it takes on average, one to two minutes for an operator to read an alarm, understand the consequences and take corrective action, then there is little point, for example, in an alarm system that relies on alerting an operator with two emergency alarms at the same time. How do you prioritize the emergencies?
Operators should be engaged at every stage of the improvement plan, in part to ensure buy-in to the program, but also to capture their knowledge and insight into human limitations.
Alarm management improvement is achieved through a phased approached. The two phases that have the largest impact on the program are the identification and elimination of bad actors and alarm rationalisation.
As each phase is successfully accomplished, the overall number of daily alarms will reduce, as will the alarm floods (Figure 1).
Phase 1: Identification and Elimination of Bad Actors
This phase should focus on the areas of biggest risk and greatest returns first, while generating EEMUA, ISA-compliant KPI assessment reports with quantifiable deliverables. EEMUA and ISA-compliant metrics are used because most regulatory or management bodies want to benchmark their assets against agreed upon best practices. The focus is to address the problem alarms as they occur so that the facility can stay within the KPIs as listed in these guidelines and standards. The software tools employed should be easy to use, while generating web-based, key performance indicator reports that provide a snapshot of current alarm system performance.
In practice though, reporting on KPI metrics is only one part of the solution. Improvement comes from the action taken on the information provided by these metrics. In this phase to drive improvement, action clearly needs to be taken. Bad Actors can range from faulty transmitters, to inadequate on/off delays or dead-band timings, leading to repeat offenders and chattering alarms, all of which provide unnecessary ‘noise’ and ‘fog’ to the board operator.
The improvement plan can start simple. Identify the top 3 alarms each week, engage with Maintenance and Operations to have these problems addressed, build this weekly process into the workflow of the organisation to ensure ownership and continued delivery of the improvement plan. Do this at the same frequency throughout the plan and results show it’s possible to achieve an 80% reduction in overall alarms in a very short space of time.
Don’t let these quick wins make you think you can short circuit the rest of the model however, solving the alarm problem takes completion of several component parts (or Phases), but at least this phase allows you to see quick improvement, fast return on investment and confidence that you’re moving in the right direction.
The use of an advanced software dashboard enables users to focus on the bad actors for quick identification and resolution.
Phase 2: Alarm Rationalisation
Alarm rationalisation is commonly misunderstood. Effective alarm rationalisation can only take place when the “noise” caused by the mass of nuisance alarms has been eliminated. Alarm rationalisation is not specifically about reducing the number of alarms, but rather more about the quality of them by ensuring the design of the alarm is correct in the first place.
The process involves analysing each alarm and looking at its cause, potential consequences and any corrective actions that are required: an alarm is only an alarm if a defined operator action is specified. If there is no operator action it is not an alarm. Operator alerts may be more beneficial in these circumstances, however evaluating the need for each is recommended otherwise alerts will become the next problem to solve.
Alarm Rationalisation will include review and approval changes from Phase 2: grouping, cloning, and a tag-by-tag review, as well as addressing standing alarms and operating modes, alarm priorities and so on. This should lead to an end of assessment review, summary and training, highlighting the changes made and reasons for them.
Essentially, plants must determine if alarms have the correct priority, whether operators know what to do, and what the action(s) associated with each alarm are. The purpose of priority is to indicate to the operator which alarm to respond to first when one or more alarms ring in at the same time. An alarm’s priority should conform to the guidelines set out in the alarm system design document – the Alarm Philosophy Document (APD).
A typical alarm priority matrix will take into account the severity of the incident as it relates to company priorities, against the time it takes an operator to safely respond. For example, if the consequence of the alarm is severe and the time the operator has to respond is less than 2 minutes, the alarm priority will most likely be critical. An alarm with relatively minor consequences and greater than 30 minutes of time allocated to respond will have a lower priority. The consequences and response time will be site-specific and detailed in the APD.
It is also critical at this stage to document findings in a master alarm database and alarm response manual. This will ensure the change management process is followed and that the knowledge of the rationalisation exercise is captured.
And certainly one of the most important cautions to convey is that, for added alarms due to new equipment or changes in operations need to be rationalized prior to implementation, otherwise all the effort in this phase of the alarm management improvement program will be lost – results will eventually erode.
These two phases will have a positive impact on the alarm rate and improve the overall performance of the alarm system. Software tools are key enablers to managing, monitoring and maintaining the alarm system. Using the tools continuously will guarantee long term success of the alarm system and improvement of operator reaction to alarms.
About the authors:
Tyron Vardy is Senior Alarm Management Consultant and Kevin Brown Alarm Management Best Practices Leader at Honeywell Process Solutions.