A Gage R&R study is used to quantify the inherent variation of all components compromising the measurement system selected for a task.

The analysis of a measurement system is determined by the statistical properties of reproducable measurements in a stable environment. In order to be valid, the measurement system must be in statistical, product and process control. This means that variation is random (common cause), and that measurements have appropriate resolution relative to the specified part tolerance. All measurements have error. However, the question that must be asked is: Are the measurements good enough?  Determining that answer, requires that the measuring system be calibrated against a known standard. Calibration will determine a gage’s measurement to an agreed upon reference value. In general, it is necessary that calibrating equipment have a level of measurement accuracy (10X) greater than that of the instrument it is calibrating.  After calibration and bias analyses are done, a Gage R&R study is used to quantify the inherent variation of all components comprising the measurement system selected for a task.

Gage Bias Study

Bias is defined as: Response error introduced into the measurement such that, it always trends a certain way. It is a systemic error. Example: an operator uses a gage in a consistent manner, and every time it repeats the measurement value to an upper or lower side with respect to its reference measurement. Bias will tell you how accurate the gage is with respect to a master reference.  A (Type 1 Gage Study) requires repeated measurements of the same part, by one person. How many? At least (10), but more are preferred. A reference measurement must be first taken to a known standard with significant accuracy. Without a reference value – the sampling is nothing more than a repeatability study.

Gage Linearity and Bias Study

This is the same as the Bias study but with several different parts of different size spanning the applied application range of the gage. Master Reference values must be obtained for all measurements. The study looks at the bias for each part and how it changes from part to part with respect to these reference values. The slope generated by point cloud measurements with respect to each reference dimension will determine the linearity of the measurement system. Linearity reveals the accuracy of the gage throughout a range of measurements. Figure 1

Example: Six parts are submitted for study. The part will encompass dimensions that are in the full range of the gage or expected range of a specific process. A precision gage measures the dimensions to determine the master measurement or standard for each part. Then, one appraiser measures each part, in random order with the in-process gage that will be specified for the task. This process is repeated (10X) for each part until all (6) parts have randomly been cycled through.

Also, if the study is based on random parts from the shop floor, getting a consistent Master reference measurement can be difficult without proper fixtures to hold the part securely and measure it in the exact same place and way every time. The same fixture would be required for duplicating that effort with a production gage. It is most desirable to use artifacts such as gage pins or gage blocks in the appropriate measurement range for the Master reference and bias study. The point is to determine where the production gage reads, not to measure parts. What is done with a gage bias once it is discovered? If the measuring equipment is adjustable – recalibrate to center the reading with respect to the Master reference. If the gage is not adjustable and the deviation is linear and has rising variation with increasing dimension; then a critical judgment must be made. Adjust all future readings to the average bias value or adjust readings with the actual bias value for the appropriate range for utmost precision. Gage Linearity and Bias should be known before starting any GR&R study to determine required bias offsets and to verify that the equipment selected for the task is appropriate to the part specification tolerances.

Gage R&R Study and DFCI

It is recommended for Double Flank Composite Inspection that the Crossed ANOVA GR&R method be used. A Gage R&R study evaluates people, process measurements and equipment: Specifically, it is a metric measuring repeatability and reproducibility. It involves several parts, replicated a predetermined number of times by multiple operators. There is an important distinction between repeatability and reproducibility. Repeatability is a measure of the replicating variation of a gage. For example: measuring a parameter (6X) in a row and evaluating the difference in readings. Repeatability also takes into account repeated gage and fixture setups. Reproducibility is a determination of the measurement system, to reproduce the same data, by any evaluator, on any given day. Reproducibility is commonly referred to as “Appraiser Variation”. The GR&R metric then becomes – an evaluation of the entire measuring system for a specific task.

An Important Note on Reproducibility

When automated equipment is used to handle, fixture and measure parts, the appraiser value is 1 and reproducibility is zero. Only a repeatability analysis is achievable. However, if more than one fixture is used, reproducibility is analyzed as “between fixtures”, rather than appraisers.

GR&R’s follow these Steps

1. Obtain a representative sample of parts to measure. How many? The more data taken, the better the reproducibility statistic, degree of confidence and the power of the test will be. These can be unique parts but the unique measurements must be taken in exactly the same way and location – measurement to measurement, operator by operator.

2. Each appraiser shall measure parts in random order. If possible: each operator should be selected randomly. Appraisers should be chosen from those who regularly operate the system. How many operators? At least (3). You need (2) appraisers to measure any reproducibility.

3. Decide how many parts to measure. How many? That depends on the difficulty of measurement, setup, or availability of parts and the degree of confidence desired: but in general, (5-30) is good. (10) parts are often selected.

4. Generate a data collection sheet showing the order of the measurement process and the randomized order of data collection.

5. Take the measurements.

6. Analyze the data.

Note that GR&R’s are a controlled study. Equipment and inspections that require setup must be torn down then set up again for each part. For example: if (10) parts are to be measured in a fixture, every part must be removed and re-inserted into the fixture each time. If a fixture is taken apart and set up before the first measurement can begin then – that fixture must be taken down and reset-up every time. There are no shortcuts. The GR&R is an evaluation of the measurement system. Measuring equipment setup is part of the variability of that measurement system. Randomness: each and every part must be in a “blind study” to reduce the effects of bias from any physical or environmental opportunities. It is a convenient practice to give parts a random identification such as ( A1, E3, ZE, 6Q….) and to discourage the tendency of any appraiser to “organize the parts”. Figure 2

Note: The Total Gage R&R Study Variable represents all the sources of variation in the study – except the variation of the parts themselves. A Study Variable of (10.2%) means that (89.8%) of the variation is Part-to-Part, exactly where we want it to be. And (10.2%) represents the variation within the measuring system itself, i.e. (equipment, fixtures, gages, appraisers). The preferred analysis of measurement variation is a Crossed Gage R&R using the ANOVA or XBAR / R method for non-destructive testing. The XBAR / R method is simpler but the ANOVA method is more accurate.

Performing a Gage R&R Study for Double Flank Composite Inspection (DFGI)

Performing a GR&R study with double flank testing is a process truly unique to gear inspection – yet it can incorporate all the classic attributes of Gage R&R study techniques. Historically, GR&R’s done for this type of inspection show total study variables >20% and sometimes > 30%, especially in gears with higher levels of composite variation. But that undesirable level of MSA validation may be more a function of how the GR&R was done rather than the true accuracy of the measuring equipment. As a general rule:

• Total GR&R’s < = 10% are deemed excellent
• GR&R’s > 10% & < 20% are marginal
• GR&R’s > = 30% are unacceptable in any situation

Getting at the true double flank measurement system variation of a specific test machine is best shown by example.

First Question: Is the GR&R to determine the acceptable use of the machine with a variety of Test Gears – i.e. size, quality, materials, or is the GR&R about one production Test Gear? If the answer is for a variety of gears then select the (3-5) unique test gears spanning the range of variation you expect to encounter. For example:

• Precision Gears (perhaps two of the same master gears)
• Typical production gear (medium variation) & Master
• Higher variation test gear & Master

And do a separate GR&R for each test gear and its master. For brevity the following example will involve (5) test gears, (2) appraisers and (3) replicate measurements of each part.


1. Choose the number of Inspectors, test gears and test replicates to be performed as part of the GR&R.

2. Create a detailed data collection sheet so each inspector follows the exact procedure. Randomize the part numbers differently for each appraiser.

3. Have each inspector select each test gear part as specified by the data collection sheet.

4. Select the test gears, Masters, collets, spindles and all the fixtures for the specific machine.

5. Set up the 1st test gear and Master for rolling.

6. Mark a mating test gear tooth and corresponding Master tooth at a logical and repeatable starting position on the rolling machine. Both inspectors will use the same alignment marks on each test gear. For each of the (3) replicates the inspector aligns the same exact starting position for each replicated roll. Fixture or system re-setup does not need to be redone for replicate measurements, only that the Test Gear and Master are re-indexed after each roll. It is important to roll the same teeth in test gear and master every time for the repeatability statistic.

7. For each individual Part (1 to 5): the fixture set up needs to be disassembled and reassembled in the same way the inspector would do it on any given day. Remove the gears clamps collets and any fixtures: then reassemble. Any variation due to setting up the equipment on any given day must be part of the MSA reproducibility statistic.

8. Roll the next part until all parts & replicates are finished. In every case, the appraisers roll the test gear and master with the same tooth starting and ending position.


There are no data outliers in a MSA. Take great care that no abnormal or external variation affects the measuring system during the GR&R. The key is consistency and stability. However, if the measurement system is to be used on the shop floor and in order to get a true MSA; the GR&R should be done at the specific station, rather than in a lab, to mimic the production task.

The Data Collection Sheet

The No-Randomization table is not an acceptable practice because of the propensity to induce bias into the results from any number of physical and environmental factors. Randomizing the runs only, creates two separate blocks between operators. The amalgamation of physical and environmental factors may not be the same and can create a distinct block-block bias in the data. Randomizing both the runs and operators does not eliminate the physical and environmental factors, but rather, tends to average the bias out of the results to give the most accurate MSA. Table 1

The desired outcome for Double Flank Composite Inspection (DFCI) is a GR&R study variable typically less than (15%). Determining the “Standard” for composite variation is sometimes problematic, in that the reference value is usually not made with the “rule of thumb” 10:1 accuracy discrimination. Very accurate (DFCI) equipment is made and should be used whenever possible to set the reference values. In lieu of that, an agreement between the supplier and customer must be made regarding the gaging and equipment to be used in the study.

When the GR&R study variable is not acceptable it can be a real wake-up call to all involved. One of the most common responses heard is: “We thought we were accurately measuring the parts until we did our MSA.” Some of the most frequent issues discovered are:

• Equipment and Instruments need maintenance
• Fixtures and Gaging needs to be more ridged
• Greater process inspection controls are necessary
• Appraisers need better training and practice for consistency


1. Minitab® Statistical Software, StatGuide™
2. Measurement System Analysis 3rd Edition “Reference Manual,” DaimlerChrysler / Ford Motor Co. /General Motors Corp. Supplier Quality Task Force, March 2002
3. Pyzdek, Thomas “The Six Sigma Handbook”, McGraw-Hill, 2003 pp. (332-340)
4. Bass, Isa “Six Sigma Statistics With Excel & MiniTab”, McGraw-Hill, 2007pp. (303-327)
5. Tang, Goh, Yam, Yaop “Six Sigma – Advanced Tools for Black Belts and Master Black Belts” John Wiley & Sons, 2006

“Portions of information contained in this publication/book are printed with permission of Minitab Inc. All such material remains the exclusive property and copyright of Minitab Inc. All rights reserved.”