Considering people in functional safety
Author : Ian Curtis, Functional Safety Consultant, Siemens Industry.
04 September 2012
All systems have a human input and while people can often “save the day” by avoiding serious incidents, our failings are one of many influences which can contribute in thwarting layers of protection that might otherwise prevent an incident. Given that human activity can affect many different aspects of process safety, it is important that we consider human factors and human error as part of our overall approach to risk reduction and also specifically in relation to functional safety.
Functional safety is often used to provide part of the overall risk reduction, with the Safety Instrumented System (SIS) acting as an important preventative layer of protection. Human error and human factors can have a considerable influence on a SIS. Human factors can influence the likelihood of a hazard occurring in the first place but can also hamper the effectiveness of other protection layers, thus affecting the demand rate on a given safety function.
Process safety uses a ‘defence in depth’ approach with multiple layers of protection combining to give the desired overall risk reduction. When an operator is involved in a non SIS layer of protection, then the amount of credit that can be taken for that layer of protection will be limited by human factor issues, such as how quickly action needs to be taken and the complexity of the tasks involved.
For non safety related systems, the amount of risk reduction credit associated with the Basic Process Control System (BPCS) is limited to a risk reduction factor (RRF) of 10 in a best case scenario (i.e. 10% probability of failure on demand). On the face of it, claiming a RRF of 10 or less would seem to be easily achievable, however research demonstrates that, for tasks which are more complex and performed seldom and in an unfamiliar situation, the likelihood of human error and system failure is greatly increased.
This is especially true in the type of highly stressful situation that could lead to a hazardous and potentially life-threatening incident. Achieving even a claimed RRF of 10 should not be considered as trivial.
Operators need to be trained and on hand in the control room, and the information presented to the operator needs to help them take the right decisions. The EEMUA 191 guidance on 'Alarm systems - a guide to design, management and procurement' is considered best practise in terms of presenting alarm information to an operator in a meaningful way. In addition, there also needs to be a strong safety culture in place whereby an operator feels he has the necessary authority to take appropriate action. If necessary, by shutting down the process- without fear of comeback should they make the wrong call.
Safety culture is often highlighted as an area for improvement in the “lessons learnt” from previous incidents. Human factors and cultural issues undoubtedly played a part at Fukushima. One critical decision involved whether to pump seawater into the reactors. To do so would ruin them, but it could also keep them cool and prevent meltdowns. The engineers on site hesitated for some hours before deciding to go ahead however, if done earlier, the damage of fuel could most likely have been limited greatly or even prevented.
In a similar vein, onboard the Transocean Deepwater Horizon a number of critical actions which would normally have been automated, were left to the operator. There were systems on board designed to trigger a general master alarm in the event of the many sensors placed across the rig detecting fire or gas. Transocean set the system up so that this general master alarm was triggered manually rather than automatically as designed - allegedly this was to stop spurious alarms waking the crew in the middle of the night. This change had the blessing of the authorities but was a source of concern also. When the incident occurred, with as many as 20 sensors in alarm, the operator on duty chose not to activate the general master alarm immediately. In addition the Emergency Shutdown System had to be manually triggered and a similar reluctance to “push the button” by the operator and her supervisor was shown.
According to a recent report from the US Chemical Safety board: “Hazard assessments of major accident risks on the Deepwater Horizon relied heavily on prompt, correct manual intervention by the rig crew to prevent a catastrophe, for example to divert the flow of flammable hydrocarbons away from the rig during a blowout. Depending on a human reaction alone during an emergency situation – with many distractions – is not a reliable safety layer. A comprehensive hazard assessment should have identified this risk.”
The recommendations coming out of the Buncefield incident point strongly to the use of automated safety shutdown systems for overfill protection of gasoline storage vessels.
These recent examples point to having less human involvement the SIS layer. In a typical process scenario the SIS acts as a backstop which comes into play when the control system and the operator have failed to bring a situation under control so it makes more sense for the SIS to act autonomously when things have gone this far. However, one of the paradoxes the industry has faced over the last 20 years is that, as levels of automation increase, operators become more remote from the process and their “feel” for the process dynamics is lessened. For much of the time, the process is in “autopilot”. This can lead to a false sense of security that the control system will take care of everything.
In the event of a control system related failure, operators may be less well equipped to deal with the resulting situation - just at the time when they are most needed. While no-one would argue for less automation this gap in real world experience needs to be addressed. Training with simulators in realistic scenarios is one possible way of achieving this.
Functional Safety Standards
Where an operator, as a result of an alarm, takes action and the risk reduction claimed is greater than a factor of 10, then the overall system will need to be treated as safety related and designed accordingly. However typically, while some credit may be taken for the operator’s involvement in other protection layers there is no direct operator involvement in the Safety Function itself.
A typical SIS is largely autonomous. However, there are still many opportunities for human error to play its part in rendering the SIS ineffective. In order for the SIS to be dependable it needs to be specified, designed, realised, installed, operated, tested and maintained correctly and human error can affect all of these activities through systematic failures.
International standards IEC61508 and IEC61511 are increasingly being used as a benchmark of best practise. They are performance based rather than prescriptive and are intended to ensure the right level of functional safety is achieved throughout the safety lifecycle.
Functional Safety Management
The standards seek to address both random hardware failures and systematic errors by having competent people develop, implement, operate and maintain a sound technical solution by adopting good processes and procedures.
The latest version of the standard, IEC61508 Ed.2 (2010), significantly increases the emphasis on functional safety management (FSM) and makes competence a normative requirement. In essence companies must ensure that those involved in the safety lifecycle have the right knowledge, experience, training and qualifications to perform the activities required of them, and that they perform those duties following methodologies, procedures and systems that are in accordance with the requirements of the standard. It is also essential that they can provide documented evidence to support this. As an additional check the standards require an independent Functional Safety Assessment (FSA) to be undertaken.
If using sub-suppliers then it is incumbent on a company to ensure they too address issues of competence and FSM. A “joined-up” approach between organisations in the supply chain is required to ensure nothing falls through the cracks. Roles and responsibilities need to be assigned and documented in a project safety plan.
Reducing risk of human error with safety lifecycle tools
This emphasis on a safety lifecycle approach has prompted a move towards more use of safety lifecycle tools. Standards suggest that tools should be “selected so as to reduce human error in their practical application”.
The traditional Cause & Effect Matrix (CEM) approach for documenting and defining safety logic is well established, but a move toward encompassing other aspects of the lifecycle has taken it beyond simply being a specification tool during the analysis phase. The newer breed of safety lifecycle tools are not just planning tools to allow an engineer to document the CEM logic required for a SIS in a form that will be familiar to them. They can now subsequently automate the creation of the logic for the SIS and allow testing and commissioning using the same CEM format for engineering, testing and visualisation.
This approach can significantly reduce the engineering time as well as the possibility of human error and misinterpretation, thus significantly reducing systematic errors. The enhanced functionality of such tools can also embed the mechanisms for implementing overrides and bypasses in a carefully controlled manner without this needing to be custom engineered within the code. Essentially, these tools tame the extra power and capability of state-of-the-art programmable safety logic solvers, and help keep the logic in a form that everyone, from the process engineer right through to the regulatory authorities, can understand.
Software development typically follows a “V” model approach - and this is also advocated by IEC61508 for SIS software. At various levels within the “V” there are requirements for test plans, verification activities and ultimately validation. The closer the code is to the original design document, the easier all of these activities become and the more human error can be avoided, so the use of a Cause & Effect matrix can bring significant benefits in terms of streamlining the software development activity.
By automating the creation of the operator graphics for the SIS logic, these tools also make a significant contribution to the latter stages of the safety lifecycle and help to close the loop by supporting change management of the SIS code. In another exciting development, these CEM tools are also able to generate the Cause & Effect diagrams from the SIF models contained in a typical SIL verification tool.
By getting the basics right, considering human factors, building on a sound foundation of effective functional safety management and competence, and by use of suitable tools, SIS designers and operators can help reduce complexity and deliver value. This ultimately helps prevent high consequence incidents by effectively controlling systematic errors for Safety Instrumented Systems throughout the lifecycle.
Contact Details and Archive...