Functional safety: Keeping a lid on danger
21 March 2016
Ensuring the functional safety of safety systems – such as alarms – helps to minimise the chance of accidents due to abnormal conditions in hazardous process plant. Jeremy Gadd, Head of Instrumentation, Control & Automation at GSE Systems, explains how to achieve this.
Alarms are a key safeguard in preventing industrial accidents, especially in a large and complex site such as a chemical plant.
However, an alarm system must be properly designed and maintained. On the one hand, a lack of alarm sensors can result in critical information – such as an overfilling tank – being missed. At the same time, too many alarms can prove dangerous: in the 11 minutes before the 1994 Texaco Milford Haven refinery explosion, for example, two operators had to recognise and act on 275 different alarms. This is known as an ‘alarm flood’.
Poor alarm performance has contributed to a number of high-profile incidents and events. This can be overcome by designing and implementing alarm systems that comply with the guidance contained EEMUA 191 which, if the organisation comes under the Control of Major Accident Hazards (COMAH) regulations, is likely to be audited as part of the Competent Authorities (CA) remit. These regulations are the wide-ranging set of rules that apply to process plants in the UK that handle hazardous substances. They were introduced in 1999, and have since undergone two major updates – of which the latest happened last year.
However, Alarms only make up one part of the Electrical, Control and Instrumentation (EC&I) operational delivery guide. This doucment, issued last year and produced by the CA – primarily the Health & Safety Executive (HSE) describes the approach that the CA follows for inspecting EC&I systems at COMAH establishments. The guide specifies the benchmark standards used to assess the management of risks by the operators of COMAH sites.
There are three priority topics for EC&I systems: explosive atmospheres (hazardous areas); electrical power systems; and – the subject of this article – functional safety.
Functional safety defined
Inspections for functional safety are concerned with the management, design, installation, operation and maintenance of instrumented process safety systems that reduce the risk of a major accident. These include alarms, process control systems and safety instrumented systems. The benchmark standard for these activities is BS EN 61511 Functional Safety - Safety instrumented systems for the process industry sector.
Functional safety is the risk reduction provided by safety functions (typically instrumentation) implemented for the purposes of safer process operation. Process Safety is arranged in ‘layers’ – from Mechanical and Process design and Safety Instrumented Systems, through to operator responses and Emergency shutdown. The intention being that if one layer should fail, the next level is there to mitigate or contain any danger. Each layer has a prescribed level of reliabilty to act in the event of it being called on. For Safety Instrumented Systems, the levels of reliability are called Safety Integrity Levels (SILs). Each system should have an appropriate Safety Instrumented Function (SIF) that reduces risk. The SIF is a set of instrumentation – such as sensors – that works to bring a hazard (such as over-pressure or excess temperature) under control.
The SIF is one of several layers of protection, as other systems also help to reduce risk. The process control system (BPCS), for instance – when working well – maintains a safe, efficient operating envelope for the process plant or unit. If it fails, this leads to a demand on the next layer – such as the process alarm, which must be acted on quickly in order to stop the event from escalating. All of these affect the demand frequency and required risk reduction of the SIF.
Mechanical devices such as relief valves can add further protection. Mitigation layers, like bunds, are commonly plant safety strategies, but are not normally included in the quantification of risk – as they serve only to lessen the impact of a hazard once it has occurred, rather than reduce the likelihood of an accident.
A robust, compliant approach to functional safety comprises five main elements:
• The Safety Lifecycle
• Hazard Identification and Quantification
• Engineering and Design
• Commissioning, Operation and Maintenance
1. Safety Lifecycle
The overall functional safety lifecycle – from concept, through hazard analysis, requirements, realisation and operation, to end of life decommissioning – is described in the BS EN 61511 standard. A key element is the Safety Requirement Specification (SRS). There are detailed requirements for the SRS, but companies have considerable flexibility to determine how it will be delivered.
During an inspection, the CA wants to see that an organisation has a clear and consistently applied SRS. It is good practice to create a single document containing all the core information – with appropriate mapping to other managed documents, such as SIL demonstration calculations or the main process hazard tables.
At the system design stage, the SRS sets out base requirements, including acceptable spurious trip rates, and operational constraints such as the frequency of shutdown testing or differences between process streams. It also describes the approach needed for manual shutdowns, emergency stops, overrides, operator resets and any allowable auto-resets. The SRS should also list the requirements for system testing, including proof test intervals and the process conditions required for these.
A compliant, comprehensive SRS goes far beyond CA inspections. Instead, it becomes a benchmark for the organisation’s Safety Instrumented System (SIS) – allowing its performance to be assessed objectively. It also helps companies take a consistent approach to changes as they upgrade and improve the plant and its safety systems.
2. Hazard Identification and Quantification
There are several recognised approaches to hazard identification – in categories including hazard to personnel leading to injury, damage to the environment, and financial and societal hazards. These need to follow a corporate procedure that is formally and fully documented.
There are also various approaches – both qualitative and quantitative – to determine the required level of risk reduction for a particular hazard. One qualitative approach is a risk graph or matrix, where the output is given as an order of scale. Though quite simple to apply, it is conservative and may lead to over-specification in design.
The most common quantitative method is Layers of Protection Analysis (LOPA), which produces a target risk reduction expressed as the required probability of failure on demand (PFD) of the safety instrumented function. The output of the LOPA approach is definitive, but it is still subjective - relying on the experience and knowledge of the team applying it.
Effective LOPA studies should use a written procedure, based on defined tolerable risk requirements. The team – whose skills, training and experience should be defined – needs an experienced leader. The study should take care to avoid claiming a layer as protection, when its failure is an initiating cause of the hazard. It should also exclude the SIF under review, as its output is the target level of risk reduction – and determines the appropriate Safety Integrity Level (SIL) for that SIF.
At the same time, everything – the basis, background assumptions, team members and all documents used in the study – should be recorded.
3. Engineering and Design
This phase is the realisation of the requirements from SRS and target risk reduction. The CA will look for evidence that sufficient effort has been put into the engineering and design process to achieve these aims. It is important that a company provide documentary evidence that is has completed work to the right standard. This documentation should record:
• How the design achieves the SRS
• Which components will be used within the safety circuits
• How the circuits will be designed, built and commissioned
• The design basis and programming conventions for the software used to program the logic solver or safety PLC
• The competence of contributors to the design
An important point to consider here is that the design should be optimised, and not over-engineered – as this has a maintenance cost overhead.
This phase of the lifecycle must demonstrate that the design meets the SRS, including the required level of risk reduction and fault tolerance for the required safety integrity level. Calculating the SIL required to deliver the correct PFD is only part of the requirements of IEC61511: the design must also demonstrate the required level of Hardware Fault Tolerance (HFT) for the desired SIL.
Proprietary tools are available to assist with SIL level calculation, and they offer advantages such as certified calculation engines to remove the risk of manual error, and access to managed databases of component reliability and failure rates. Some offer additional design support, such as showing the components in the design that are the largest contributors to failures or spurious trip rates.
4. Commissioning, Operation and Maintenance
It’s important to demonstrate that comprehensive testing has been done before introducing a hazard relating to a particular safety system – and, in effect, proving the effectiveness of the SIF. The approach in each case will be different depending on the type of testing.
Factory Acceptance Testing (FAT) should include a definitive design freeze date. It must be designed to include extensive negative testing – that is, testing that shouldn’t happen – and the required logic testing should be to a definitive test script against an approved cause and effect logic.
Once installed at the site, the system needs formal Site Acceptance Testing (SAT). This should include integrated function tests, including full system architecture performance testing – such as interface communications and data refresh. SAT testing should also test services such as power and air supplies, and include failover testing and UPS autonomy.
The final proof testing of each safety function should cover full end-to-end loop testing including settings, parameters and trip points. Any failed tests should be rectified and fully proof tested again before using the safety function.
Once the system is in service, a clear and documented regime is needed to ensure it achieves the required integrity level over its operating life.
Frequency of testing should be set as part of the risk reduction calculation. The management of the testing must ensure tests are called and completed in time. Test methods must be clear, comprehensive and adequate for the ‘competent’ technician to complete.
In-service proof-testing should inspect hardware, to ensure that it is in good condition and meets ATEX requirement if appropriate. This requires suitable systems for Hazardous Area inspections, equipment condition inspection, and predictive and preventative maintenance procedures.
All faults should be repaired, and all failures logged and recorded even if the repair was simple – such as with a small calibration error. Fault record systems must contain all historical records, and the system should ideally be able to identify systemic faults and repeat errors by equipment type, duty or service.
All processes should be supported by a clear set of management procedures including management of change, management audit of testing, HR competency management and engineering line management.
Finally – and this is often forgotten in any system based on so much hardware, software and electronics – the safety lifecycle relies on skilled people. The CA must be satisfied that engineers and technicians are competent, and will look for evidence of appropriate competence management systems to support this. It uses a benchmark document – HSE Human Factors Guides – Managing competence for safety-related systems – to assess this.
The document details the four phases – Plan; Design; Operate; Audit and Review – and 15 principles that internal competence management systems and processes should follow in order to gather the right evidence of staff competence.
Having the right skills for the relevant task is crucial. Senior technical staff can demonstrate competence through related higher education courses such as degrees, professional membership and continuing professional development. Engineers need good awareness of laws and regulations, including training in local procedures and company standards. They should also have a clear responsibility and authority to sign off or approve key processes such as change management, deferrals and test procedures.
There are several accredited training courses for functional safety, which can support an individual’s competence level demonstration. This type of in-depth training can be cascaded down through an organisation.
Technician competence can be based on experience, but this alone is not sufficient: it should be backed by evidence of training through apprenticeship, formal external training courses and technical qualifications from a recognised technical college. Local training in company procedures must be documented, as does on-the-job training, inductions and mentoring. An individual’s attitude, fitness for the task, and personal and communication skills should also be assessed.
As with all other areas of COMAH compliance, record keeping is key. Companies must establish whether responsibility for training and authorisation records is with HR or with technical functions, whether the records are centralised or dispersed, and how the CA can access them during an inspection.
Contact Details and Archive...