Unlocking the potential of data in offshore maintenance records
Author : Matthew Celnik and Chris Bell, DNV GL
14 December 2020
Today’s increased focus on digitalisation is interwoven with the oil and gas industry’s drive towards greater cost and production efficiency, understanding and application of complex digital systems, to ultimately improve safety and sustainability.
Image: DNV GL
According to the latest report from the Oil & Gas Authority (OGA), production efficiency in the UKCS improved for the seventh consecutive year reaching 80%, subsequently reaching its target three years ahead of schedule. This figure represents a five-percentage point increase on the 2018 study with nearly half (43%) of UKCS hubs achieving 80% or more. The OGA sums this up as a ‘significant achievement’ and reflects the continuing efforts by industry to improve operational efficiencies.1
There is no doubt that the increase in a vast array of digital technologies and enhancements offers sector leaders and decision-makers the chance to automate high-cost and error-prone tasks in which the cumulative effects of inconsistency and analytical error can adversely impact safety. Those with successful programmes must think in terms of total lifecycle cost economics. To benefit from the application of such technologies, leaders must build teams that are fully engaged with the process from the outset.
In 2019, DNV GL was tasked to perform a preliminary data review of an offshore asset’s Safety and Environment Critical Elements (SECEs) in its maintenance records.2 Using machine learning techniques, this accurately analysed thousands of records, in just a few seconds. This vital information was used to apply a more focussed approach to verification, thereby, reducing manual effort and the potential for human error or bias in sampling.
This year, the technical advisor to the oil and gas industry performed analyses of a larger dataset related to the operation of a range of fire and gas (F&G) detection systems on a different installation. The goal of the project was to improve the operator’s understanding of the constituent SECEs and how they interact. However, a lack of clarity on what data was available and what information should be investigated made the project more challenging.
Methodology – the CRISP-DM approach
As part of any assets’ assurance process, it can be instructive to review maintenance records for insights, particularly trending issues and identifying potential improvements. The goal is to ensure the asset is performing safely and effectively, with high reliability while adopting the most cost-effective strategies for all maintenance work.
A CRISP-DM (cross-industry standard process for data mining) sprint methodology was adopted for this project and involved working closely with the client to analyse the available data. CRISP-DM is an iterative approach whereby short “sprints” are used to explore the data sources, determine potential outcomes, and develop models for data analytics.
The software industry has applied such agile development techniques for several years with great success and is an efficient and effective method to answer critical business questions. Essentially, the project team can shortcut management debate cycles and reduce requirements gathering and solution design into a single week. The key cycle steps are shown in Figure 1 below.
After each evaluation step, the team reviews the business requirements with the client. Hence, the requirements are subject to change during the project life cycle. The project schedule included a 4-day iterative sprint with all team members in the same place.
Figure 1: CRISP-DM key cycle steps
Potential outcomes identified prior to kick off included comparison of perceived versus actual systems performance (as determined using data analytics model), a review of system availability and recommendations to enhance verification activities.
Application
For the sprint to be successful, it must deliver something of value to the client. This can be achieved by understanding what output is sought from the analysis or tool and what value can be gained by the business in getting that output. The responses to these questions are essential in driving the scope of work and the project lifecycle. “User stories” were then generated to guide the sprint data analysis steps.
These are key elements of an agile approach to software development, which can also be applied to data analyses as they help the entire team understand what data to target for analysis, for whom, why, and when. They also help non-technical team members understand the purpose of the analysis and encourage their participation and define the desired outcomes in terms of the main stakeholder requirements.
To create and track tasks assigned to each user story, Microsoft Azure DevOps3 was used in the sprint approach. Using this method, the sprint team can assign priorities to each task, which reflect the expected value for the user, complexity, dependencies and other business requirements. Three user stories, their priorities and potential outputs were identified:
1. The operator technical authority (TA) wanted a better understanding of how the firewater system is operating. This was deemed a high priority to attain figures for each component, and time-series plots showing overlapping periods of component outage
2. The operator (TA) wanted a better understanding of how failures, and isolations are reported (or not). As a medium priority, the approach aimed to identify areas where there are unrecorded faults. For example, long inhibits or process system faults not corresponding with maintenance records
3. The independent verifier wanted to focus effort on deficiencies revealed during the assurance process. DNV GL NextGen Verification relies on being able to identify risk from the data available from operations or from assurance, maintenance and testing records.
This enables future work and knowledge gaps to be captured, which can facilitate future management of change. The work was thus split into three main areas:
1. Apply the machine learning (ML) classification algorithm from the preliminary work to other systems
2. Using the process system logger data, calculate statistics related to inhibits, alarms and faults logging by the system
3. Using process data and work order history, plot equipment availability history relative to the performance standard criteria.
While other lower priority tasks were identified, due to time constraints, they were not considered in this sprint. However, they were tracked in Azure DevOps for future work.
Notably, a study by PwC identified five key drivers of digitalisation in the oil and gas industry. One of which is ML, to analyse data and identify operational patterns and shortcomings that can be used to improve efficiency, for example, in predictive maintenance.4 Adopting such technologies and applications has the potential to aid better decision-making, dramatically improve efficiency and sustainability, and transform company-wide operations.
Data understanding and preparation and model creation
The sprint made extensive use of Python (version 3.6)5 for data-processing, machine learning and plotting and coding in Jupyter Notebooks6 for the data analysis. All code was managed using a Git repository7, another tool repurposed from software development, which enables a distributed team to work on the same code.
Using this technology, with additional text interpretation code, a tag list was developed, categorized by system or equipment to tie together all the datasets, as in this case the datasets do not use a single consistent tag format. This allowed a quick comparison of related items. A range of 18 detectors, generators, switchboards, pumps and telecommunications items were analysed: sensors and valves were not categorised. These items were instead considered when reviewing the inhibits, alarms and faults in the process system logger logs.
Tag use is not consistent between datasets. In particular, the process system logs use a different naming convention that is sometimes difficult to map directly to the equipment tags. We recommend all systems use the same naming convention with a common standardised format where possible, or a separate table mapping between systems is maintained. This is a common problem seen in the process safety industry where many systems are set up and configured independently of each other.
A work order review was then carried out encompassing a heat map of work order counts. For instance, the mains power and firewater ring main showed a relatively higher number of work orders per tag than other systems. Failure and corrective maintenance counts were also carried out and analysed.
Considering deferrals (deviation of target finish date from actual finish date), we see that heat detectors and mains power have the largest number of deferred work orders. These values are not weighted by tag.
The work order classification (PASS/FAIL analysis) was repeated from the preliminary study and applied to all equipment items, not just detectors. As before, the data is highly skewed towards PASS. This required additional steps in the classification algorithm to prevent bias. This allowed timelines for related equipment (e.g. all fire pumps.) to be constructed enabling quick identification of periods with high activity, long lengths of deferrals and overlapping periods of assurance and PM fails.
Several models and cleaning workflows were therefore undertaken to achieve an acceptable prediction accuracy of the classification model, with the most suitable model demonstrating 96% accuracy when compared against the entire work order dataset to the as-reported values, including the test data (Figure 2). Importantly, this allowed NULL or unclassified records which would otherwise have been difficult to process and use in reliability/availability calculations to be classified automatically.
Figure 2: Prediction accuracy of best PASS/FAIL model
There are some caveats to estimating availability of equipment over time:
· Other than assurance routines, it is not guaranteed that a PASS or FAIL record means the system is left in a functioning state. It may simply mean the work order was completed as per the description
· There are often large time gaps between work orders, and certainly between assurance routines. This results in a very coarse estimate of availability
· The work order history does not relate directly with the performance standard criteria, hence gives a measure only of component availability, not system availability
· The work order history as provided did not include creation dates.
The work instead attempted to determine availability using the process system logs. This enables review of periods when the performance standards are not met, as well as calculation of the system availability and involved creating timeline figures to illustrate live work orders, and periods of unavailability (after failed assurance routines). Running the ML classification algorithm presented a more complex picture of availability, potentially with non-assurance work orders resulting in a FAIL. Therefore, the caveats above were noted when interpreting this data.
To identify long inhibit durations; the process system logs were assessed. As expected, the majority of inhibits are short duration, under one day, though some lasted several months. The process system (F&G) logs for the detectors was analysed to investigate unreported faults. Most failures were short, a few seconds or minutes, but the longest (a smoke detector) lasted almost a month. Most detectors demonstrate a low fault count, though many have over 50 faults (approximately one per week), and some far more. The most frequently in-fault detector was a gas detector.
Plotting the same data against time of day revealed an unexpected result - there is a definite increase in detector failures at the start and end of the day shift, which is unexplained. There are also fewer failures overnight than during the day. This suggests human factor in the failure rates. A similar analysis was performed for the fire pumps, which gave system availability of 99.6% over the year based on the PS criteria.
Conclusion
The DNV GL study applied a sprint methodology often used in data science applications when the precise outcomes are undefined. This differs from a “traditional” project approach, which is more suited when the required outcomes are known. While there is a risk that the sprint does not produce meaningful or useful results, this is mitigated as the rapid sprint cycles enable effort to be re-focussed at regular intervals or stopped if it reaches a natural end-point.
Applying a CRISP-DM framework to the process safety data held by asset operators’ is a beneficial means to achieve a high level overview of the data and unlock its potential quickly and effectively.
The focus of this sprint was to understand the fire and gas system in order to drive improvements in process safety, this same approach can equally be used to develop analytics, methodologies and products that result in better utilisation of staff, maintenance activities and production uptime.
Matthew Celnik and Chris Bell, DNV GL
About the authors:
Matthew Celnik, Principal Consultant at DNV GL, is a chartered chemical engineer (MIChemE) with a background in offshore safety and research, he also has a PhD in Chemical Engineering. He has over 15 years' experience working in engineering R&D and consultancy, with DNV GL, other companies and in academia. Within DNV GL he specializes in technical risk, working on complex numerical problems such as quantitative risk assessment (QRA). Matthew's main area of expertise is numerical modelling: developing models to solve engineering problems and implementing them using computational methods. He has extensive computer programming and software development experience.
Chris Bell, Senior Consultant at DNV GL, works in the UK Digital innovation team. He is a chartered engineer with the IMechE, and has a PhD in Material Science research and a Masters in Physics and Applied Mathematics. Chris has over 10 years engineering/R&D experience particularly focused on projects using new methods, materials and/or equipment. He has experience with the planning, designing, manufacturing, and the analytical stages of developing new technologies. Complete with a strong safety/risk and asset integrity background as well as a proven track record of programming and developing unique digital solutions for customers.
References
1. https://www.ogauthority.co.uk/news-publications/publications/2020/ukcs-production-efficiency-2019-report/
2. Celnik, Matthew, and Chris Bell. 2019. “Automated review of offshore maintenance records.” IChemE Hazards 2019
3. Microsoft. 2019. Accessed 12 30, 2019. https://azure.microsoft.com/en-gb/services/devops/
4. https://www.strategyand.pwc.com/gx/en/insights/2020/digital-operations-study-for-oil-and-gas/2020-digital-operations-study-for-energy-oil-and-gas.pdf
5. Python Software Foundation. 2016. https://www.python.org/downloads/release/python-360/
6. Project Jupyter. 2019. https://jupyter.org/
https://git-scm.com/about
Contact Details and Archive...