IV.C.3 Statistical analysis Description

### Excerpt From The Certified Manager of Quality/Organizational Excellence HandbookSimple Statistics Used to Analyze Groups of Data

Some measures of central tendency are:
• Average. Also called the arithmetic mean, it is the sum of the individual data values (∑x) divided by the number of samples (n).
• Median. If data are arranged in numeric order, the median is the center number. If there is an even number of data points, the median is the average of the two middle numbers.
• Mode. Another measure of central tendency, mode is the number in the data set that occurs most often.

The spread of a sample group of data (also called dispersion or variation) is:
• Range (R). The arithmetic difference between the largest and smallest number in the data—a measure of spread, dispersion, or variation.
• Standard deviation. The standard deviation of a sample of data is given as:
• Variance. Another measure of dispersion, the standard deviation squared.

If variance is the same as the standard deviation squared, why does it exist? The standard deviation is in the same units as the mean or average, the units of the measurement. This allows a clear representation of both the central tendency and the variability. If there are multiple sources of variability (such as tool choice and material choice), there would be several components of variability that are added to make the total. Standard deviations from multiple sources can not be added, but the square of the standard deviation components (that is, the variances) may be added.

Two sets of data could be analyzed simply by comparing their central tendencies and spread. For example, standardized test scores for groups of students who attend two different schools could be compared to try to determine whether the scores are higher (average, median, or mode) or more consistent (range or standard deviation) at one school versus the other. (Note: Making decisions based solely on these rough comparisons would of course be dangerous due to the large number and the interactive nature of the variables that could contribute to differences.)

Another way to analyze two comparable data sets is to create a histogram to see how their distributions compare. (Care must be taken to use the same x- and y-axis scales).

Probability Distributions. Comparisons that take into account statistical probability can be done using tests for signiﬁcance such as the F-test to compare standard deviations and the t-test to compare means. Juran’s Quality Handbook contains a table to help in selecting these and other test statistics. Advantages of this approach include compensating for sample size differences (by adjusting based on the degrees of freedom) and basing the decision on whether observed differences are actually statistically signiﬁcant.

Although the normal distribution may be more common, there are several different shapes that probability distributions can take, depending on the type of data (for example, discrete versus continuous data) and characteristics of the process that produces the data. Following are some of the more widely used distributions; Figure 15.4 depicts them graphically:
• Normal. A bell-shaped distribution for continuous data where most of the data are concentrated around the average (approximately two-thirds are within ±1 standard deviation) and it is equally likely that an observation will occur above or below the average. This distribution especially applies to processes where there are usually many variables that contribute to the variation. This is one of the most common distributions.
• Exponential. A continuous distribution where data are more likely to occur below the average than above it (63.2 percent and 36.8 percent respectively, compared to 50 percent/50 percent for the normal distribution). Typically used to describe the break-in portion of the failure bathtub curve.
Juran’s Quality Handbook, Chapter 48.5, discusses the bathtub curve for predicting reliability of a product over its lifetime. There would be a high incidence of failures during the product’s “infant mortality” phase, random failures during the period of normal use, and an increase in wear-out failures toward the end of the product’s life cycle. The chart of the distribution appears to resemble a bathtub.
• Weibull. A distribution of continuous data that can take on many different shapes and is therefore used to describe a variety of patterns, including distributions similar to but slightly different from the normal and exponential.
• Binomial. Deﬁnes the probability for discrete data of r occurrences in n trials of an event that has a probability of occurrence of p for each trial. Used when the sample size is small compared to the population size and when proportion defective is greater than 0.10.
• Poisson. Also used for discrete data and resembles the binomial. It is especially applicable when there are many opportunities for occurrence of an event, but a low probability (less than 0.10) on each trial.
• Hypergeometric. A discrete distribution deﬁning the probability of r occurrences in n trials of an event when there are a total of d occurrences in a population of N. It is most applicable when the sample size is a large proportion of the population.
ASQ Statistics Division

Quality Management BOK Reference

IV Quality Management Tools
IV.C Measurement: Assessment and Metrics
IV.C.3 Statistical analysis - Calculate basic statistics: measures of central tendency (mean, median, mode), and measures of dispersion (range, standard deviation, and variance). Identify basic distribution types (normal, bimodal, skewed) and evaluate run charts, statistical process control (SPC) reports, and other control charts to make data-based decisions.

Back to the Quality Management Tools CMC
Back to the Quality Management Body of Knowledge