What A Healthy SMS Looks Like

After several years of operating with a safety management system (SMS), an SMS enterprise should be operating with zero regulatory findings. The accountable executive (AE) should have full control over the path their SMS has taken in the past and established a vision in their SMS policy of what to expect in the future. The are three regulatory compliance principles for a successful safety management system. The accountable executive is responsible for compliance with all regulations, the certificate holder (CH) is responsible for the quality assurance program (QAP), the person managing the safety management system (SMS manager) is responsible for monitoring concerns that the aviation industry has about your airport. A healthy SMS includes a risk management officer (RMO) position. Risk management is what makes a safety management system a healthy SMS within a fluid environment and ever-changing priorities.  

The duties of a risk management officer are often assigned to an SMS manager when the CH appoints a person to managing their SMS. The person managing the safety management system shall identify hazards and carry out risk management analyses of those hazards. Other duties assigned to an SMS manager are to maintain a reporting system, investigate, analyze and identify the cause or probable cause of all hazards, incidents and accidents, maintain a safety data system, by either electronic or other means, to monitor and analyze trends in hazards, incidents and accidents, monitor and evaluate the results of corrective actions with respect to hazards, incidents and accidents, monitor the concerns of the civil aviation industry in respect of safety and their perceived effect on the your airport, and determine the adequacy of the training required. These responsibilities which are assigned by the regulations to an SMS manager are extremely labor intensive, research intensive, data collection intensive and comprehension intensive. There are not enough hours in a 24-hour day for one person to comply with these requirements in addition to carry out daily risk management analyses. 

If anyone for a minute thought that risk management analyses are not a daily and ongoing tasks, an SMS is not only rolling downhill, but it is also rolling down a path to operational failure. SMS itself cannot fail since all it does is to paint a true picture of a failed operation, but operations can fail by ignoring SMS drift and trends. Just as investments professionals must assess the risk daily, an airline and airport operator must also assess their risks daily. 

Conventional wisdom is that airlines and airports only need to assess the risks for accidents that already have happened. This is also a misconception, but it does not imply that it is wrong or incorrect. When SMS first was introduced, there were little to no information or literature available of what an aviation safety management system actually is. Airlines and airports required to implement SMS continued the path they were on, which was to react reactively to incidents and accidents. SMS was not fully understood at that time. Common phrase was that safety is common sense, knowing that common sense had produced accidents since the beginning of time on December 17, 1903. 

Some time ago, I received a practice SMS report, and this is what the report said: “On 17 DEC 1903 two unlicensed pilots, Orville and Wilbur Wright, made 4 unauthorized flights in an unregistered aircraft. They departed and arrived without communicating with air traffic control or utilizing local CTAF. Their airplane, which had not received its annual inspection by a licensed Aircraft Mechanic, was damaged during their last flight. They failed to report the incident to the TC and TSB, neither of which had been invented yet. Corrective Action: Recommend TC to be invented immediately, and Wilbur and Orville Wright's pilot certificates to be issued then revoked.” 

In the Safety oversight component, the reactive reporting process was the first operational task for airlines and airports. This task was fully understood, since reactive reporting with corrective actions was how safety was managed prior to a regulated implemented SMS. There were several other options available on how to initiate the regulated SMS process, and the consensus was to begin with the reactive reporting process. 

When operating with a reactive process system, an incident or accident must first happen before it is reported and analyzed by applying statistic process control (SPC). The first step to report an accident was familiar to operators, but the challenge came when the analytical process took place. In the pre-SMS days, the broken piece was fixed, forgotten about, and nobody conducted process analysis. Special cause variation for root cause analysis was unknown, and most operators could not identify the difference between common cause variations and special cause variations. SMS was implemented with several other new definitions and tasks in the reactive system, which immediately caused confrontations. Since the SMS regulations are performance based, the golden rule is that if the regulation does not specifically state what needs to be done, that is the exact reason why an airline or airport operator must do what it takes to meet the intent of the regulations. A common phrase with the SMS implementation was that “the regulations does not say that.” 

The next step of the safety oversight element was to phase-in the proactive process. There was still a confusion among airlines and airport operators, including the Regulator, of what defined an SMS process. Since the phase-in was a proactive task, the consensus became to identify hazards and do something about that hazard before it became a bigger problem or would lead to an incident. Operators dangled carrots, or bribes, for employees to report hazards. Whoever reported the most hazard in a month would receive a gift. Gifts, or bribes, when initiating a process to learn the process itself is acceptable, but within a fully operational SMS, bribes, or carrots do not paint a true picture of the health of an SMS.

The Heinrich Pyramid, or the Heinrich Law, was used as justification to action to prevent minor hazards immediately, since they would, unquestionable, lead to accidents. Heinrich's law is based on probability and assumes that the number of accidents is inversely proportional to the severity of those accidents. It leads to the conclusion that minimizing the number of minor incidents will lead to a reduction in major accidents, which is not necessarily the case. In a workplace, for every accident that causes a major injury, there are 29 accidents that cause minor injuries and 300 accidents that cause no injuries. Hinrich Law is applicable to an overcontrolled environment with common cause variations only, and where special cause variations are excluded. Eventually, several airline and airport operators put the Heinrich Law aside and referenced this principle as guidance and instruction material only, rather than a law written in stone. 

After the reactive and proactive process systems were phased-in, the next step in the SMS was to implement investigation and analysis. The first constraint for this phase-in period was to determine what to investigate and a consensus made sense to investigate accidents and incidents. After all, this is what TSB did, so operators assumed they were expected to do the same. Accidents and incident investigated by operators were not limited to the severity of the outcome, but anything that failed were placed in the investigation hat. Upon completion of an investigation an operations bulletin was issued for personnel to read and accept, and after just a few months, the paper clipboard was overloaded with bulletins. An airport would conduct a root cause analysis and investigate a burnt-out runway edge light, and airline would do the same for a burnt-out aircraft taxi light. During the phase-in period SMS personnel had limited training to comprehend the safety management system. Investigations and analysis of incidents that were done at that time were not the wrong thing to do, since it was common sense based on their current knowledge. Investigating the outcome itself was the incorrect thing to do. The difference between doing the wrong thing and the incorrect thing, is that doing the wrong thing is to do a task against better knowledge, and doing the incorrect thing is the lack of knowledge of what needs to be done. As the SMS learning level progressed, it became clear that the investigation was not to investigate the outcome, but to investigate the hazard and how a hazard was carried forward in the operational process. 

The final step in the 4-year phase-in period was to implement the quality assurance program and assess the effectiveness of SMS. The struggle with this phase-in period was to determine what makes an effective SMS. Conventional wisdom was that operating with zero accidents or incidents was the prime key-performance indicator, and the SMS performance level was assessed to the number of incidents during an established time period. This is still an ongoing assessment process used to establish an effective SMS. Effectiveness is analyzed in graph-charts and run-charts, where a downwards trends are good, and upward trends are bad. Applying this process provides some useful information, but the analysis is based on opinions and emotions. When opinions and emotions are the foundation for analyses, the trap to fall into is overcontrolling of processes. When there is overcontrolling of processes, the ops-bulletin clipboard gets filled up faster than the paper can be printed. An invaluable tool to operate with a paper-format SMS is that process overcontrol can easily be identified by viewing the number of paper files. When operating with a flawed system, e.g. flying an airplane without required maintenance, by random chance that flight will be successful and safe. If a pilot on a precision approach misread the approach chart minimums, e.g. a flawed training system, and lands in zero-zero, the odds by random chance is that the flight will be successful. The moral of the story is that lack of accidents is not a key performance indicator (KPI) of how effective an SMS is. 

The most critical task and difficult task in assessing the effectiveness of a safety management system is to rate, or classify processes to different risk levels, safety critical areas and safety critical functions within these areas. From a non-analytical point of view, all processes in flying must be assessed as high-risk levels since there are always possibilities for an element to cause an accident. Operating with possibilities is an emotional assessment of effectiveness. There is no evidence that missing one or all items on a landing checklist will cause an accident. The effectiveness of a safety management system cannot be determined without applying statistical process control since it must be assessed by probabilities, as opposed to possibilities.

The quality assurance program is a component of the safety management system and is therefore an integrated part of an SMS in the same manner as the safety polity, processes for setting goals, measuring the attainment of goals, hazard identification, training, reporting system, process manual, communication to personnel, periodic review of the SMS and review for cause are integrated components of the SMS. 

A regulatory requirement of a safety management system is to conduct an audit of the entire quality assurance program carried out every three years. During the 4th year phase-in period, the struggle with this requirement was to identify what the quality assurance program actually was and what it should look like. Since the quality assurance program is a component of the SMS system, it must be treated the same way as a safety policy, goalsetting processes, or reporting processes. Since none of these components include specific text on what an airline or airport must include to meet the performance requirement, an airline or airport must design their own quality assurance program tailored specifically to their operations. One vital component, and prerequisite of a healthy quality assurance program is an operational daily quality control system. This system is not included in the text of the regulations but is a component of the overarching quality assurance system. With the daily quality control program implemented, and just as any small or large grocery store counts the cash at the end of the day, an SMS enterprise must count their daily quality control processes daily. When the quality control system is counted, an audit of the quality assurance program is possible, and the checkboxes may be downgraded to be incidental to the daily quality control.    

Over a period of four years, both airlines and airport had been operating with an SMS without knowing or comprehending its definite purpose. This also caused conflicts and struggles within the industry to define the SMS path of how to apply this to operations. A consensus for a solution was to ensure that all required checkboxes were completed, and the aviation SMS quality assurance program built its platform on this principle. The checkbox syndrome is still the basis of SMS performance and effectiveness and has become so powerful that it was also implemented in the initial pilot training programs. Checkboxes are necessary for a healthy SMS, but when checkboxes become the primary task, the accountable executive takes their SMS down the wrong path. As I learned from a groundbreaking woman in aviation, who also become one of the first female pilots hired by a major airline, that completing all checkboxes have become a more important task than the actual individual flight training. 

Operating with a healthy SMS is a simple task when all the groundwork is completed. A healthy SMS does not interfere or affect roles, responsibilities or assigned tasks that an airline or airport has assigned to a consultant, director of operations, airside crew, airport manager, SMS manager, airfield maintainers, airside operations personnel, or cloudbased SMS resources systems. A healthy SMS is scaled to the size and complexity of operations by assigning multiple regulatory requirements to one task and operating with a regulatory element of the SMS and an operational element of the SMS separately, but with both integrated in the SMS analysis. 

The single most significant role for a healthy SMS to accept that the accountable executive is the person who is responsible for complying with the regulatory requirement to be responsible for operations, and to be accountable on behalf of the certificate holder for meeting the requirements of the regulations. A healthy SMS looks like an organization where major factors affecting operations are monitored daily. A healthy SMS collects data from multiple different sources, such as web cameras, internal and external reports, and publicly available flight critical observations and predictions. A healthy SMS operates with an Above the Fold system, where factors that the risk management officer has assessed as operational priority risk levels for that day are placed above the fold, communicated to the AE, and monitored by the SMS manager. 

A healthy SMS is when an accountable executive accepts that a healthy SMS is a maturity system. 
 

OffRoadPilots

Comments

Popular posts from this blog

Accepting or Rejecting Risks

Lawless

Human Factors