DOE Versus Multiple Regression - Which One to Use?

DOE Versus Multiple Regression - Which One to Use?

Posted by Scott Sterbenz on Sep 18, 2018 8:44 am

I always prefer a DOE to a multiple regression, because the analysis is stronger and the factors are completely independent of each other. However, there are two specific instances when you must use a multiple regression over a DOE. The first is when someone hands you a stack of data and asks if there are any relationships between the variables. In this case, you have no power to set up the study the way you want--the data is already collected and is ready to analyze. The second is when you have the ability to set up the study, but you find out that controlling each of the potential factors is not as easy as you think. I always ask the question "Can I specify that each of the factors be set to a given value that I demand, in any combination I want?" If that answer is yes, then a DOE is the way to go. If the answer is no, then multiple regression is the way to go. There haven been a handful of cases where I ran into the second situation. The most recent example was with some work I was doing with the United States Bowling Congress. We wanted to predict certain characteristics of ball motion based on properties of the ball. The problem was that I could not manufacture a ball with all the combinations of specific properties I wanted at will. We ran the multiple regression, and it still turned out very successful. You can learn about this study by visiting bowl.com and browsing to the Equipment Specifications and Certification Department.
Best, Scott C. Sterbenz, P.E. ASQ Six Sigma Forum

Re: DOE Versus Multiple Regression - Which One to Use?

Posted by Jerry Rice on Oct 11, 2018 8:57 am

Based on your post, I take it the real answer to your question is “It depends.” I agree. However, I see DOE and multiple regression being tools used for different purposes. To your point, the controls in place during a fractional factorial DOE usually provide a more robust model of the process. Those controls come at a cost though.

Let’s say there are 6 or more independent input variables you suspect influence the dependent output variable. A fractional factorial using 7 or 8 input variables can become expensive and very difficult to control. You must really start making tradeoffs between efficiency and effectiveness of the experiment.
  
What if you could get your hands on some passive data (without turning knobs) to screen variables before performing active experimentation (turning knobs)? This is where I see the most value of tools such as multiple regression, ANOVA, and t-tests. By the time you go into the active DOE you are working with only 4 or 5 pink and red X’s. It makes for an experiment that has fewer tradeoff’s between efficiency and effectiveness. It’s sort of like Shainin’s “Talk to the parts.”, except what you are really doing is listening to the process… not the intuition of engineers to set up the experiment.

The DOE then becomes less exploratory and more of a way to develop and refine the process model using the 4 or 5 significant variables. The results of the DOE are used to increase your ability to predict the process output based on where the inputs are set. Hypothesis testing using passive data alone can seldom do that.
 
Anyway, I see multiple regression as one of several good screening tools for DOE. The DOE is used to “set” the process model. It seems to be the most effective and efficient way I’ve found. With all that said, I have seen such a strong signal analyzing passive data with multiple regression that a DOE isn’t even needed, but that is rare.