Why is it so difficult to learn from someone else’s mistakes?
01 May 2014
Over the last eight years, functional safety expert Tino Vande Capelle has trained over 1,500 process industry employees worldwide under TÜV Rheinland’s competency review programme. In this article he looks at the human dimension in high hazard industry incidents, how KPIs and competency can reduce the likelihood of future events, and concludes with a checklist of the Top 10 functional safety pitfalls and how to avoid them.
There have been many process industry disasters in the past, and there will likely be many more in the future as working conditions, materials, equipment and processes change and become ever more demanding.
Major accidents such as Seveso, Flixborough, Piper Alpha, Bhopal, Chernobyl, Texas City, and most recently Deepwater Horizon have all painfully revealed failures that we can learn from, failures that have caused loss of life, environmental damage and capital losses. Today, we have the knowledge that each of them could have been prevented if people had designed the plant or processes for failure and staff had possessed adequate competency to avoid such events in the future.
For the past 30 years, standards have helped engineers apply good engineering practices. DIN 19250, ISA 84.00.01, IEC 61508 & 61511 and others have been put in place to create a safety culture in our industry in the hope of achieving a better world where people, environment and investment can be safe. But human nature does not like to acknowledge problems, so the weakest link in safety culture remains the human being.
Some recent disasters
In Europe in the seventies, a series of disasters raised awareness of the potential risks inherent in process industry operations. The accident at Flixborough in the UK caused 28 fatalities in 1974, and the chemical release at Seveso in Italy resulted in widespread dioxin contamination and led to the evacuation of 117,000 people in 1976.
These incidents were followed by the disaster at Bhopal in India in 1984, where there were between 3,000 and 5,000 fatalities on the night of the accident and up to 25,000 subsequent deaths, and Piper Alpha in 1988 where167 died in an oil rig fire in the UK sector of the North Sea.
The sorry roll-call of major disasters continued into the new century. In 2005 the 15 were killed at the Texas City refinery in the USA, the same year 43 were injured and 2,000 evacuated at Buncefield in the UK and 2010 saw the Deepwater Horizon disaster, where 11 were killed and the worst oil spill ever recorded contaminated large parts of the Gulf of Mexico.
All of these events had one thing in common: human failure, with safety culture (or the lack thereof) an integral part of the fundamental cause, thus confirming chemical safety guru Trevor Kletz’s statement: “Accidents are not due to lack of knowledge, but failure to use the knowledge we have.”
The UK Health and Safety Executive (HSE) analysed 34 incidents for its report: “Out of control: why control systems go wrong and how to prevent failure”. Following its analysis, it came to the conclusion that 44% of all failures were caused by inadequate specification, 20% by changes after commissioning, 15% by design and implementation, 15% during operation and maintenance and 6% during installation and commissioning. This means approximately three-fifths of all control system failures are built-in before operation commences.
Functional safety standards and norms
In 1984, TÜV released the handbook Microcomputers in safety technique to help developers and manufacturers design safety systems; which were followed by the requirement classes (RC) specified in the DIN19250 standard in 1989. The first Safety Lifecycle approach with Safety Integrity Level definitions were specified in IEC61508 (1997) and later IEC61511 (2004), and these changed ‘Safety’ to ‘Functional Safety’. From then onwards, good engineering practices were available to help engineers design, maintain and operate safety systems to a high standard and to achieve process safety by protecting against residual risk.
Both IEC 61508 & 61511 are performance-oriented standards, not prescriptive.
Safety systems and instrumentation used nowadays are increasingly reliable, but the weakest link remains the human contribution to the safety chain. Human factors can be categorised systematically and specific measurements and methods described in functional safety standards on how to avoid those potential failures.
Human error and Key Performance Indicators (KPIs)
In the safety world we can try to implement better or reliable instrumentation and systems using redundancy and diversity. But how about the operator, maintenance engineer or manager? How do we ensure that we have adequate competency that will help us achieve the necessary process safety level?
In recent years there have been many initiatives defining leading and lagging indicators to measure and encourage process safety performance improvements. (See references below at the end of this article). However, “listing ‘human error’ as one of the causes of an accident is about as helpful as listing gravity as the cause of a fall” (T. Kletz, Lessons from Disasters).
The aphorisms “You get what you inspect, not what you expect” and “You don’t improve what you don’t measure”, underline simple truths that lead to the basic philosophy behind Key Performance Indicators, one of the most effective tools to improve process safety.
Here are some examples of potential KPIs:
· Employee Participation
· Process Hazard Analysis
· Training / Competency
· Trade Secrets
· Hot Work Permits
· Incident Investigation
· Pre-Startup Safety Reviews
· Process Safety Information
· Operating Procedures
· Compliance Audits
· Mechanical Integrity
· Management of Change
· Emergency Planning and Response
Despite the voluminous literature on the subject, Key Performance Indicators remains a challenge for most organisations to understand and apply effectively. Only with a strong safety culture, the support of management and staff competency can measurements be controlled and subjectivity minimised.
Competency and training
Despite the release of guidelines such as IEC61511, different cultures, languages and interpretations have led to different approaches on how to comply with leading functional safety standards. And safety tools such as SILs and PFDavg measurements can lose the engineer in a jungle of functional safety definitions, leading ever further away from the common sense solution.
The challenges for process safety implementation are not getting easier, mainly due to:
· Increasing complexity of process operations, process control and safeguarding equipment
· Poor management, ineffective communication with staff, lack of competency
· A focus on optimising production
· Technology transfer to countries with different cultures and standards
· Loss of process-specific experience due to job hopping or retirement of key personnel
A comprehensive training strategy aimed at promoting competence can go some way towards overcoming these challenges.
Since the release of IEC 61508 edition 2.0 (April 2012) competency has become a normative requirement, and several competency review schemes are now available, including CFSE, TÜV FS Eng, ISA 84 training, and TÜV FS for SIS professionals, amongst others.
The TÜV Rheinland scheme, for example, was launched in 2004 and by August 2012 had trained and certified more than 5,500 engineers, illustrating the process industries’ increasing understanding of the importance of having competent staff involved in the safety lifecycle.
Avoiding the Top 10 functional safety pitfalls
The following list is based on the author’s first-hand experience from discussions with thousands of participants over more than 20 years in safety seminars, workshops and training courses.
1. Hazard identification
This is the most crucial phase in the life cycle of any project and yet so many companies use HAZOP methodology as a formality. It should be the first and most important step when identifying the required safety functions for your safety instrumented system (SIS). A safety function is useless when it cannot be linked to a hazard or hazardous event. Thinking about the unthinkable or outside the box is a challenge for all hazard risk analysis (HRA) teams. Top tips include keeping the brainstorming session to a maximum of six hours, with a maximum of eight of the most experienced engineers in attendance. Another tip: take all HAZOP reports and have the auditor check that ALL required actions from previous HAZOP exercises have been implemented, tested, verified, assessed and documented.
2. Risk reduction tools
Many companies use risk reduction tools such as risk matrices, risk graphs, LOPA etc. without calibrating the tools, perhaps because the head office defined the criteria or the EPC consultant proposed their own preferences. Whatever tool you decide to use, make sure that:
a) You calibrate the tool(s) first to your specific needs, criteria, environment, projects and plant specifics
b) You don’t copy and paste between projects
c) You periodically review (e.g. yearly) your tools and recalibrate them if needed
3. Layer of protection (LOPA)
LOPA is an ideal tool to play with numbers; which is probably why so many companies like to use it. However make sure that ALL layers are completely independent of the initiating event of other layers; meaning you can only take one credit for a layer in LOPA. Any combination of normal PLC or DCS/BPCS interlocks are maximum Risk Reduction Factor <= 10 (SIL 0). Beware of common design (systematic) failures.
4. SIL & PFD
There is a misunderstanding that with both a Safety Integrity Level (SIL) and Probability to Fail on Demand average (PFDavg) number you can express safety achieved for your safety instrumented functions (SIF). But the SIL & PFD are only a small part of the technical requirements. What is very often forgotten are the management or non-technical requirements of the functional safety (FS) standard(s). Applying a good FS management strategy can help you avoid systematic failures and supervise competency, assessments and audits.
5. SIS and complete loop concept
Simply speaking, many safety instrumented functions (SIF) are built using a combination of different technologies from different manufacturers. Be aware that the weakest link can take down the complete safety integrity of that SIF. Example: it doesn’t make sense to use a safety-related output module to drive a non-safe interposing relay. Every single subsystem should fulfill the SIL requirements.
6. Proof test coverage and frequency
There are some SIL calculation software programs on the market to calculate the achieved SIL per SIF that set the default prooftest coverage as high as 90%. There are even companies that believe that they achieve 80-90 % coverage during the periodically required SIF functionality test. Not only is the frequency that the functions are tested important, but even more important is the achievable coverage of the safety functionality.
7. Hardware with implemented software, SIL by FMEA?
Nowadays most plants have field devices that have software incorporated, for example field transmitters. Some of them might only have used failure mode effect analysis (FMEA) to predict the achievable SIL level of that device and often the software will not have been checked or verified. But even when you have fully compliant software and hardware in the device, it doesn’t necessarily achieve a higher SIL level by putting 2 together in a 1oo2 configuration because of software design limitations.
8. Certificates, reports and safety manuals
It is hard to believe how many people never read the small print in certificates that come with safety devices and systems - only the magical SILx number. In most cases the accompanying certificate report will not be read either, but this explains to the user how the certificate (SIL level) was achieved and what the potential restrictions are (if any). Furthermore, IEC61508 ed2.0 also calls for a safety manual, where the manufacturer explains to the end user how to install, commission, operate, maintain and repair the device to comply with the SIL level.
So do not buy a product unless it comes with a certificate, certificate report and safety manual, and then read and digest fully the information within!
9. Safety availability versus Process availability
This is probably one of the oldest and biggest misunderstandings in the process industries. FS standards have nothing to say about process availability, only safety availability - by predicting potential ‘dangerous failures’.
10. The jungle of Functional Safety
FS standards are not prescriptive or cast in stone; they are performance-oriented. This means they are open to interpretation - and are just as prone to be open to misinterpretation. You should be aware that there is a jungle of functional safety documentation, definitions and concepts out there!
This paper is based on a presentation delivered at the Hazardex Conference on 26 February 2014.
[Use smaller font for refs and retain italics]
* CCPS – AIChE, Process Safety Leading and Lagging Metrics (2008)
* OECD, Guidance on developing Safety Performance Indicators (2008)
- OGP, Process Safety – recommended practice on Key Performance Indicators, report No. 456, Nov 2011
* HSE-UK, Developing process safety indicators HSG254, ISBN 978 0 7176 6180 0
* HSE-UK, Out of control: Why control systems go wrong and how to prevent failure (2nd edition) ISBN 0-7176-2192-8
* CCPS – AIChE, Layer of Protection Analysis, simplified process risk assessment (2001) ISBN 0-8169-0811-7
* CCPS – AIChE, Guidelines for Safe and Reliable Instrumented Protective Systems (2007) ISBN 978-0-471-97940-1
*IChemE – UK, HAZOP, Guide to best practice, ISBN 978-0-85295-525-3
* HIMA Italia safety road show presentation May 2012, “HIMA FSCS - Why is it so difficult to learn from someone else’s mistakes - rev 02” T. Vande Capelle - HIMA Paul Hildebrandt GmbH + Co KG
* SIL Manual, Safety Instrumented Systems, 3rd edition, GM International, technology for safety
* White paper, Functional Safety: Guiding principles for End-Users and System Integrators, (2009) Dr. M.J.M Houtermans – Risknowlogy, T. Vande Capelle - HIMA Paul Hildebrandt GmbH + Co KG
* White paper, Functional Safety: Improve Industrial Process Plant Safety & Availability via Reliability Engineering (2008) Dr. M.J.M Houtermans - Risknowlogy, Mufeed Al-Ghumgham – Safco, T. Vande Capelle - HIMA Paul Hildebrandt GmbH + Co KG
* White paper, Safety Availability versus Process Availability, introducing Spurious Trip LevelsTM, (2006) Dr. M.J.M Houtermans - Risknowlogy
* Kletz, Trevor A. (2001). Learning from Accidents, 3rd edition. Oxford U.K.: Gulf Professional. ISBN 978-0-7506-4883-7.
* Kletz, Trevor A. (1993). Lessons from Disaster, How Organizations Have No Memory and Accidents Recur. Gulf Professional. ISBN 978-0884151548.
About the author:
Tino Vande Capelle was educated in Belgium where he gained a qualification in Automation & Critical Control Systems. He spent 28 years in the LNG, Petrochemical, Refining and Petroleum industries in a variety of engineering and management positions, and set up as an independent contractor in 2005. He is a Functional Safety Expert and Trainer for Safety Instrumented Systems (SIS) within TÜV Rheinland Group’s International Functional Safety Accreditation program (FS Expert ID 109/05).