Monday, 18 May 2015

# 006 Common Sense Approach to Root Cause Analysis and Effective Corrective Action


I have carried out close to 1500 days of third party certification audits. One of the shortcomings I have noticed on the part of almost all auditees is their poor understanding about the Corrective Actions for the Non Conformances raised. There has been many articles about Root Cause Analysis (RCA) and Corrective Action (CA). But there is hardly one which addresses on effective corrective action.
 
The need for corrective action can arise from many different sources including internal audits, external audits, customer complaints and feed backs, employee complaints and suggestions, and the requirements of regulatory bodies. All these inputs to the corrective action process should be dealt with in a timely and effective manner.

Effective corrective action is that action which eliminates the actual/root cause of the problem, thereby eliminating the recurrence of the Non Conformity or Problem. All too often people responsible for providing corrective action responses focus on the incident itself, rather than the larger issue of what caused the incident to happen.

The first step in getting to an effective corrective action solution is to clearly understand what the problem is (or is not). Sometimes the statement of  a nonconformity can be so vague or poorly stated, that it will quickly lose its meaning and perhaps lead to no meaningful action at all. A example of this would be “Documents are not controlled.” What documents? Where did you find these documents? Why do they need to be controlled? Is it in all areas, by all persons and all the time? This type of nonconformity statement will not result in any useful type of action being taken to correct the problem. Therefore the NCR must first be properly documented.

For a nonconformity report to be proper, effective and useful, it must have the following 3 parts:

Finding – System Level deficiency.

Requirement – Exactly requirement that is not fulfilled.

Evidence - Evidence that goes to prove that the requirement is not fulfilled.

Finding, here, is that part or element of the System that is found to be not in conformity with the requirement.

Requirement is the statement of actually repeating the exact requirement against or based on which audit is carried out - could be a clause of a Standard, Policy or Procedural requirement of the organization, a legal/customer or any other interested parties’ requirement and so on that was not met. When the requirement is not directly attributable to the standard then it is important to link the cited requirement to the standard.

Evidence is actual incident/problem or the fact that was obtained during the audit, which is the main activity of the audit. It should be factual giving reference to what exactly happened, where it was noticed etc. The problem has to be at System level ie. one off incidences, mistakes, miss outs that are normally expected  are not to be considered but those that are happening all or most of the time or entire planning/determining/implementation or area under audit is not meeting the requirement. This will take the emphasis off of the incident, and focus on the System but at the same time provide the auditee with an actual incident to investigate while performing RCA.

The following is a well-documented nonconformity:

Finding: Planning for product realization is not effective.
Requirement: Clause 7.1 of ISO 9001 requires  “in planning product realization, the organization shall determine …………………… c) required verification, validation, ……… activities specific to the product and the criteria for product acceptance.”
Evidence: Acceptance criteria/ tolerance for PCB thickness of 1.6 mm is not specified in PD 1376 R01 for ETL Adaptor 7.2 V PCB.

Most organizations give CA as PD 1376 will be revised by including tolerance for the thickness. This is only containment of the particular incident of the system level problem. It is just a correction and not CA. Correction must definitely be done as part of the corrective action process. But, without finding out why this happened in the first place, the problem is likely to be repeated. Addressing the nonconformity at the system (or process) level will force the organization to investigate further to determine how widespread the problem is and then address the larger contributing factors to the problem. This is where good RCA is vital.

A number of tools are available for carrying out RCA like Ishikawa Diagrams (Fishbone, Cause and Effect), Pareto Analysis, 5 Why, Fault Tree Analysis (FTA), Interrelation Diagrams, and many others. Perhaps one of the easiest to implement is the ‘5 Why’ Method. Starting with the incident, keep asking “Why did this happen?” until you arrive at the root cause. Generally asking WHY 5 times takes one to the root of the cause. But it is not mandatory. It is possible that the root cause is reached in just 3 WHYs or sometimes it takes more than 5 times. I had come across a problem which took 16 WHYs to arrive at the root!! The following example may provide good clarity.

Tolerance for PCB thickness is not included in PD 1376

Why?             It is not documented by the Design.

Why?             Design Review is not effective.

Why?             Approver/Reviewer is careless

Scenario 1

Correction: PD 1376 will be reviewed and revised.

CA: Reviewer will be counseled. A system of random check of drawings by HOD after approval of the drawing but before release will be introduced. All documents will be reviewed for completeness in next 30 days.

Scenario 2

Tolerance for PCB thickness is not included in PD 1376

Why?             It is not documented by the Design.

Why?             Review is not effective.

Why?             Reviewer simply relied on the person who prepared the drawing and approved without actually verifying the drawing.

Correction: PD 1376 will be reviewed and revised.

CA: Reviewer will be counseled. A system of random check of drawings by HOD after approval of the drawing but before release will be introduced. All documents will be reviewed for completeness in next 30 days. Competence of various personnel will be reviewed and revised if necessary.

Scenario 3

Tolerance for PCB thickness is not included in PD 1376

Why?             It is not documented by the Design.

Why?             Review is not effective.

Why?             Reviewer is not clear of his roles & responsibilitiy

Correction: PD 1376 will be reviewed and revised.

CA: Reviewer will be given requisite awareness and made aware of his roles & responsibilities. A system of random check of drawings by HOD after approval of the drawing but before release will be introduced.

Scenario 4

Tolerance for PCB thickness is not included in PD 1376

Why?             It is not documented by the Design.

Why?             Procedure for documenting specification is not clear

Why?             Rolls of persons Preparing and person Reviewing are not clearly defined.

Correction: PD 1376 will be reviewed and revised.

CA: Procedure for Communication of Roles & responsibilities will be reviewed and improved.

Scenario 5

Tolerance for PCB thickness is not included in PD 1376

Why?             It is not documented by the Design.

Why?             It is from our sister org. Review of documents from Sister companies are not carried out.

Why?              Procedure for review of documents do not cover documents from sister org.

Correction: PD 1376 will be got revised from sister org.

CA: Procedure for Control of documents will be revised to include review of ALL documents incl. that received from sister org. and others outside the org.

The RCA has unearthed some underlying issues that allowed the system to fail. Now we have some good candidates for corrective actions to be taken. In this case the problem was not merely product specification per se, but other processes that acts as an input determining product specification. Pl note that there could be more than one possible root cause and correspondingly different CA. One needs to choose one of the CA depending on which of the causes is most likely to be the cause. Taking action on the root/actual cause of the problem is likely to eliminate the possibility of this problem reoccurring.

Now coming to the question of verification of effectiveness of CA taken, it would make sense to evaluate the results say by an audit. In some situations it might possibly take a quite a few months before a verification audit would be suitable to evaluate the effectiveness of the actions taken. For example, in the first case, one need wait for one month or till another new Product Specification is received/documented to see whether the Reviewer is being careful. If the problem has not recurred since the implementation of the corrective actions, then we can assume that the root cause has been correctly identified and eliminated. If however, there are similar problems found on verification audits, then there would be a need to go back to RCA and initiate CA based on the next most likely root cause.

In the second case Commitment on the part of the reviewer rather than just the knowledge/awareness is important. 3 rd Scenario, however, demands creating right awareness on the part of the person about his role in reviewing while it is going back to a procedure(s) and improving by bringing in clarity in the 4 th (communication procedure) & 5 th cases (document review procedure)

For example in the above case suppose there is repetition of this NC. It is then necessary to go back to the RCA and select the second most likely cause RCA 2 and the corresponding CA and so on. Or sometimes RCA has to be carried out all over again based on the results.

A common problem with RCA during the corrective action process is that many organizations will stop at “Operator Error” if that is one of the causes of a given problem. While in many cases the human factor may well have contributed to the incident, there is likely that a system element has failed, or was not in place, to avoid the problem. “Operator Error” as root cause usually results in a brief meeting with the subject operator to remind him about how to do the job. While this might work for some people for some of the time, it is not likely to result in any sort of lasting protection from recurrence.

For example, an operator is producing left-hand and right-hand knobs on the same piece of equipment. The left-hand and right-hand knobs are to be placed in different shipping bins. The potential exists for the operator to mix parts, as there is no system to prevent it. In the event of a customer complaint about mixed parts, it is likely that the cause of the nonconformity would focus on “Operator Error,” leading to ineffective corrective actions that do not prevent recurrence. This situation lends itself to potential customer dissatisfaction and repeat problems with the same parts. The risk to the customer may be high and the cost of containment (on-site sorting, 100% or 200% inspection) would definitely make a strong argument for investing in some good RCA and effective corrective actions.

In the same example, 5 Why? RCA might uncover that the Production Planning Group failed to implement any sort of mistake-proofing to prevent mixed parts. A system for mistake-proofing the operator’s process could probably be developed and implemented in timely and cost-effective manner, thereby eliminating the root cause and providing the operator with tools to perform the job correctly. Auditing of the planning and production processes after implementation will determine the overall effectiveness of the corrective actions taken. Is the mistake-proofing installed? Is the operator using the mistake-proofing equipment correctly? Has subsequent inspection revealed any escapes? Have customers had repeat complaints about mixed parts? Are mistake-proofing concepts being applied to similar processes? These and other such questions need to be verified for determining the effectiveness of the CA taken.

The examples given above are rather simplistic in nature. In actual practice some of the NCs may be quite complex and require application of elaborate statistical techniques like FMEA, Control Charts etc. in addition to the ones mentioned already.

In one instance the FMEA carried out for a manufacturer had over 13,800 process inputs - identified over 5,600 failure modes that came from just 20 root causes! For an Asset Management company the inputs were 9800 - identified over 3,450 failure modes and 8 root causes. A root cause investigation for a service failure using SPCO-FMEA tool revealed 400 process inputs and identified 89 failure modes that came from just 2 root causes!!

The end result of utilizing sound root cause analysis practices should be effective corrective action. The end result of effective corrective action should be improved processes, and ultimately improved organisational performance/customer satisfaction. Regardless of the nonconformity’s source, organizations that only take action on the incidents are bound to repeat the same ineffective corrective actions over and over again. The purpose of this article was to present a common sense approach to implementing RCA and effective corrective action. All organizations will experience different problems with varying degrees of severity. But, by applying some good investigative tools and taking appropriate action of the causes of problems, repeat issues can become a thing of the past.