TQM Assignment on Failure Mode Effect Analysis
Srideep Kumar Mohanta 215112033 2013-14
Introduction
Also called potential failure modes and effects analysis; failure modes, effects and criticality analysis (FMECA). Failure modes and effects analysis (FMEA) is a step-by-step approach for identifying all possible failures in a design, a manufacturing or assembly process, or a product or service. Failure modes means the ways, or modes, in which something might fail. Failures are any errors or defects, especially ones that affect the customer, and can be potential or actual. Effects analysis refers to studying the consequences of those failures. Failures are prioritized according to how serious their consequences are, how frequently they occur and how easily they can be detected. The purpose of the FMEA is to take actions to eliminate or reduce failures, starting with the highest-priority ones. Failure modes and effects analysis also documents current knowledge and actions about the risks of failures, for use in continuous improvement. FMEA is used during design to prevent failures. Later its used for control, before and during ongoing operation of the process. Ideally, FMEA begins during the earliest conceptual stages of design and continues throughout the life of the product or service. For years, failure modes and effects analysis (FMEA) has been an integral part of engineering designs. For the most part, it has been an indispensable tool for industries such as the aerospace and automobile industries. Government agencies (i.e., Air Force, Navy) require that FMEAs be performed on their systems to ensure safety as well as reliability. Most notably, the automotive industry has adopted FMEAs in the design and manufacturing/assembly of automobiles. Although there are many types of FMEAs (design, process, equipment) and analyses vary from hardware to software, one common factor has remained through the yearsto resolve potential problems before they occur.
Design Overview
What is an FMEA? An FMEA is a design tool that has been around for many years and is recognised as an essential function in design from concept through to the development of every conceivable type of equipment. It is commonly defined as a systematic process for identifying potential design and process failures before they occur, with the intent to eliminate them or minimise the risk associated with them. FMEA procedures are based on standards in the reliability engineering Industry, i.e. both military and commercial. Who wants an FMEA? Whenever the function of an item of equipment or system is for it to work in an environment in which any failure mode has the potential for a catastrophic effect on the process, it is common sense and responsible design practice to carry out an FMEA. Consequently, a number of people, organisations, bodies, etc., should be very interested in the findings of an FMEA. When to perform FMEA? Ideally, the FMEA should be initiated at as early a stage in the design process as possible, and then run in parallel with the design phase. Where DP is concerned, on new builds and conversions, the vessel owner or yard typically contracts for the study near the end of the vessel construction or conversion phase with the objective of identifying any single point failures. For maximum benefit, the time to identify and eliminate or mitigate the effect of equipment failure is during the design process, not in the latter stages of vessel construction or conversion. Who performs an FMEA? The FMEA should be initiated by the design engineer for the hardware approach, and the systems engineer for the functional approach. Once the initial FMEA has been completed, the entire engineering team should participate in the review process. The team will review for consensus and identify the high-risk areas that must be addressed to ensure completeness. Changes are then identified and implemented for improved reliability of the product. The following is a suggested team for conducting/reviewing an FMEA. Project Manager Design Engineer (hardware/software/systems) Test Engineer Reliability Engineer Quality Engineer Field Service Engineer Manufacturing/Process Engineer Safety Engineering Outside supplier engineering and/or manufacturing could be added to the team. Customer representation is recommended if a joint development program between user/supplier exists. What standards are used for an FMEA? There are a number of standards to which an FMEA can be carried out. The use of standards is important so that the FMEA will be accepted by all parties interested in it. US Department of Defense MIL-STD-1629A
CEI/IEC812 Analysis techniques for system reliability - Procedure for failure modes and effects analysis (FMEA) BSI (BS 5760-5:1991 (Reliability of systems, equipment and components. Guide to failure modes, effects and criticality analysis)
FMEA Process
Since the FMEA concentrates on identifying possible failure modes and their effects on the equipment, design deficiencies can be identified and improvements can be made. Identification of potential failure modes leads to a recommendation for an effective reliability program. Priorities on the failure modes can be set according to the FMEAs risk priority number (RPN) system. A concentrated effort can be placed on the higher RPN items based on the Pareto analysis obtained from the analysis. As the equipment proceeds through the life cycle phases, the FMEA analysis becomes more detailed and should be continued. The FMEA process consists of the followings: FMEA Prerequisites Functional Block Diagram Failure mode analysis and preparation of work sheets Team Review Corrective action FMEA Prerequisites A. Review specifications such as the statement of work (SOW) and the system requirement document (SRD). The type of information necessary to perform the analysis includes: equipment configurations, designs, specifications, and operating procedures. B. Collect all available information that describes the subassembly to be analyzed. Systems engineering can provide system configuration (i.e., equipment types, quantities and redundancy), interface information, and functional descriptions. C. Compile information on earlier/similar designs from in-house/customer users such as data flow diagrams and reliability performance data from the company's failure reporting, analysis and corrective action system (FRACAS). Data may also be collected by interviewing: design personnel; operations, testing, and maintenance personnel; component suppliers; and outside experts to gather as much information as possible. The above information should provide enough design detail to organize the equipment configuration to the level required (i.e., wafer handler, pre-aligner, computer keyboard) for analysis. Functional Block Diagram A functional block diagram is used to show how the different parts of the system interact with one another to verify the critical path. The recommended way to analyze the system is to break it down to different levels (i.e., system, subsystem, subassemblies and field replaceable units). Review schematics and other engineering drawings of the system being analyzed to show how different subsystems, assemblies or parts interface with one another by their critical support systems such as power, plumbing, actuation signals, data flow, etc. to understand the normal functional flow requirements. Failure mode analysis and preparation of work sheets A. Determine the potential failure modes: Put yourself in the place of the end user by simply asking, what can go wrong? Assume that if it can, it will! What will the operators see?
Subassembly examples of failure modes o Mechanical load positions out of tolerance o Multiple readjustments o Unspecified surface finish on wafer chuck Assembly examples of failure modes o Inadequate torque o Surface wear o Loose/tight fit o Interference Manufacturing/Process examples of failure modes o Over/undersize o Cracked o Omitted o Misassembled o Improper finish o Rough o Eccentric o Leaky o Imbalance o Porous o Damaged surface Component examples of failure modes o Semiconductor open/short (stuck at 0 or 1) o Detail partsBroken wire/part (permanent fault) o Worn part (intermittent/transient fault) o Noise level (intermittent/transient fault)
B. Determine the potential effects of the failure mode: The potential effects for each failure mode need to be identified both locally (subassembly) and globally (system). For example, a local effect on the malfunction of a wafer handlerflip arm could be a wafer rejection, but the end effect could be system failure resulting in equipment down-time, loss of product, etc. Customer satisfaction is key in determining the effect of a failure mode. Safety criticality is also determined at this time based on Environmental Safety and Health (ES & H) levels. Based on this information, a severity ranking is used to determine the criticality of the failure mode on the subassembly to the end effect. Sometimes we tend to overlook the effects of a failure by focusing on the subassembly itself rather than the overall effect on the system. The end (global) effect of the failure mode is the one to be used for determining the severity ranking. C. Determine the potential cause of the failure: Most probable causes associated with potential failure modes. As a minimum, examine its relation to: Preventive maintenance operation Failure to operate at a prescribed time Intermittent Operation
Failure TO cease operation at a prescribed time Loss OF output or failure during operation Degraded output or operational capability Other, unique failure conditions based upon system characteristics and operational requirements or constraints. Design causes (improper tolerance, improper stress calculations)
D. Determine current controls/fault detection: Many organizations have design criteria that help prevent the causes of failure modes through their design guidelines. Checking of drawings prior to release, and prescribed design reviews are paramount to determining compliance with design guidelines. Typical detection methods might be: Local hardware concurrent with operation (i.e., parity) Downstream or at a higher level Built-in test (BIT), on-line background, off-line Application software exception handling Time-out Visual methods Alarms Typical recovery methods: Retry (intermittent/transient vs. permanent) Re-load and retry Alternate path or redundancy Degraded (accepted degradation in performance) Repair and restart E. Determine the Risk Priority Number (RPN): The RPN is the critical indicator for determining proper corrective action on the failure modes. The RPN is calculated by multiplying the severity (110), occurrence (110) and detection ranking (110) levels resulting in a scale from 1 to 1000. RPN= Severity Occurrence Detection The smaller RPN is better than the larger or the worse. A pareto analysis should be performed based on the RPNs once all the possible failure modes, effects and causes, have been determined. The high RPNs will assist you in providing a justification for corrective action on each failure mode. The generation of the RPN allows the engineering team to focus their attention on solutions to priority items rather than trying to analyze all the failure modes. An assessment of improvements can be made immediately. Priorities are then re-evaluated so that the highest priority is always the focus for improvement. F. Preparation of FMEA Worksheets : The FMEA worksheet references the "Fault Code Number" for continuity and traceability. For example, the code I-WH-PA-001 represents the following: I: system I WH: wafer handler subsystem
PA: pre-aligner subassembly 001: field replaceable unit The data that is presented in the worksheets should coincide with the normal design development process, (system hardware going through several iterations). Therefore, the worksheet should follow the latest design information that is available on the baseline equipment block diagram. The outcome of the worksheet leads to better design that has been thoroughly analyzed prior to commencing the detailed design of the equipment. Other information on the worksheet should include: System Name Subsystem Name Subassembly name Field Replaceable Unit (FRU) Reference Drawing Number Date of worksheet revision (or effective date of design review) Sheet number (of total) Preparer's name Team Review The suggested engineering team provides comments and reviews the worksheets to consider the higher ranked different failure modes based on the RPNs. The team can then determine which potential improvements can be made by reviewing the worksheets. If the engineering team discovers potential problems and/or identifies improvements to the design, block diagrams need to be revised and FMEA worksheets need to be updated to reflect the changes. Since the FMEA process is an iterative process, the worksheets need to reflect the changes until final design of equipment. When the design is finalized, the worksheets are then distributed to the users, design engineering, technical support and manufacturing. This assures that the recommended improvements are implemented, if appropriate. The worksheets may also provide information to other engineering areas that may not have been aware of potential problems. Determine Corrective Action A. Design Engineering Design engineering uses the completed FMEA worksheets to identify and correct potential design related problems. This is where the FMEA becomes the basis for continuous improvement. Software upgrades can also be performed from the worksheet information. B. Technical Support From the FMEA worksheets, the engineering team can suggest a statistically based preventive maintenance schedule based on the frequency and type of failure. A spares provisioning list can also be generated from the worksheet. Field service benefits as well as the design engineers. C. Manufacturing
From the FMEA worksheets, the team could suggest a process be changed to optimize installations, acceptance testing, etc. This is done because the sensitivities of the design are known and documented. FMEA proliferates design information as it is applied. The selection of suppliers can be optimized as well. Statistical process control on the manufacturing floor can also be aided by the
use of the FMEA. FMEA can be a way to communicate design deficiencies in the
manufacturing of the equipment. If the equipment being manufactured has workmanship defects, improper adjustments/set-ups, or parts that are improperly toleranced, input can be to the FMEA which will in turn make the problem visible to the design engineer. These issues relate to design for manufacturability (DFM). This is one effective way that FMEA can be used to affect DFM since many failure modes have origins in the manufacturing process.
Ranking Criteria for the FMEA
Severity Ranking Criteria Calculating the severity levels provides for a classification ranking that encompasses safety, production continuity, scrap loss, etc. There could be other factors to consider (contributors to the overall severity of the event being analyzed). Table 1 is just a reference; the customer and supplier should collaborate in formalizing a severity ranking criteria that provides the most useful information. Rank Description Failure is of such minor nature that the 1-2 customer (internal or external) will probably not detect the failure. Failure will result in slight customer 3-5 annoyance and/or slight deterioration of part or system performance. Failure will result in customer dissatisfaction and annoyance and/or 6-7 deterioration of part or system performance. Failure will result in high degree of 8-9 customer dissatisfaction and cause nonfunctionality of system. Failure will result in major customer dissatisfaction and cause non-system 10 operation or non-compliance with government regulations. Occurrence Ranking Criteria The probability that a failure will occur during the expected life of the system can be described in potential occurrences per unit time. Individual failure mode probabilities are grouped into distinct, logically defined levels. Rank Description An unlikely probability of occurrence during the item operating time interval. Unlikely is defined as a single failure 1 mode (FM) probability < 0.001 of the overall probability of failure during the item operating time interval. A remote probability of occurrence during the item operating time interval (i.e. once every two months). Remote 2-3 is defined as a single FM probability > 0.001 but < 0.01 of the overall probability of failure during the item operating time interval. An occasional probability of occurrence during the item operating time interval (i.e. once a month). Occasional is 4-6 defined as a single FM probability > 0.01 but < 0.10 of the overall probability of failure during the item operating time interval.
7-9
10
A moderate probability of occurrence during the item operating time interval (i.e. once every two weeks). Probable is defined as a single FM probability > 0.10 but < 0.20 of the overall probability of failure during the item operating time interval. A high probability of occurrence during the item operating time interval (i.e. once a week). High probability is defined as a single FM probability > 0.20 of the overall probability of failure during the item operating interval.
Detection Ranking Criteria This section provides a ranking based on an assessment of the probability that the failure mode will be detected given the controls that are in place. The probability of detection is ranked in reverse order. For example, a "1" indicates a very high probability that a failure would be detected before reaching the customer; a "10" indicates a low zero probability that the failure will be detected; therefore, the failure would be experienced by the customer. The below table ranks the recommended criteria. Rank 1-2 Description Very high probability that the defect will be detected. Verification and/or controls will almost certainly detect the existence of a deficiency or defect. High probability that the defect will be detected. Verification and/or controls have a good chance of detecting the existence of a deficiency or defect. Moderate probability that the defect will be detected. Verification and/or controls are likely to detect the existence of a deficiency or defect. Low probability that the defect will be detected. Verification and/or controls not likely to detect the existence of a deficiency or defect. Very low (or zero) probability that the defect will be detected. Verification and/or controls will not or cannot detect the existence of a deficiency or defect.
3-4
5-7
8-9
10
FMEA Example: Pressure Cooker FMEA
Scope: Resolution- The analysis will be restricted to four major sub-systems (electrical system, safety valve, thermostat and pressure gage). Focus- Safety Safety Features: Safety valve relieves pressure before it reaches to dangerous level. Thermostat opens circuit through heating coil when the temperature rises above 250C Pressure gage is divided into red and green sections.
Pressure
Safety Valve
Pressure Gage
Electrical System
Thermostat
Heating Coil
Cord
Plug
Valve Spring
Valve Casing
References
1. Readings in System Safety Analysis, 5th Ed., Safety Sciences Dept., IUP 2. Bloswick, Donald S., Systems Safety Analysis, NIOSH P.O. #939341 3. Nancy R. Tagues The Quality Toolbox, Second Edition, ASQ Quality Press, 2004, pages 236240 4. B.G. Dale and P. Shaw, Failure Mode and Effects Analysis in the U.K. Motor Industry: A State-of-Art Study, Quality and Reliability Engineering International, Vol.6, 184, 1990 5. Ciraolo, Michael, Software Factories: Japan, Tech Monitoring by SRI International, April 1991, pp. 15 6. Reliability Analysis Center, 134408200, Failure Modes Data, Rome, NY: Reliability Analysis Center, 1991