Sampling plans for validating processes

I am curious if there are some members that is familiar with developing sampling plans during process validations? Ideally, I would like to be able to determine or attempt to quantify a sampling plan based off the following:

  • Risk factor/analysis of a parameter being tested/inspected (including factors such as time it takes to complete a test and destructive/nondestructive)
  • Required number of inspections during production run as defined on a MPR (Ex: # checks per hour)
  • Length of PV
  • # Units produced

I have done a few web searches but I am not having any luck finding guidance which addresses the factors listed above.

10 Replies

Hi Michael,

Great topic, thanks for raising it!

I would suggest to use the fantastic ASQ database

with a huge variety of publications on this topic.

E.g., this is what I've found on selection of statistically valid sampling plans:

I should clarify that I am looking for an attribute sampling plan based off go/no go or pass/fail decisions and needs to include confidence interval level. Unfortunately ANSI/ASQZ1.4 doesn't include confidence interval in its calculation. I need to include confidence interval as this is a requirement for validating a process (ex: Process Performance Qualification).

I have done some research. I am leaning towards binomial or SPRT but have been unsuccessful at finding a formula because they require standard deviations and/or margin of error which I don't have. I am not sure if I can replace these values with AQL or confidence levels.

For ANSI/ASQZ1.4, I thought about multiplying the sampling number by confidence interval using Z(alpha/2). ~84% = 1. I don't think doing this would be a statically sound sampling plan.

Let's see what more experienced colleagues will say, by my mind, an attribute sampling plan based off go/no go or pass/fail decisions which needs to include confidence interval level is a little bit rare combination of the pre-requisites.
Once the characteristic we need to control could be assessed via continuous data (as opposed to attribute one), the best choice, in my opinion, would be use of the process capability ratios to prevent the OOS from happening, i.e. detect the process variation early to adjust the latter before the OOS happens.

Great topic. I felt like there was some missing still here.

Few notes:

  • Risk factors are the biggest concerns I would have and focus on.
  • With regards to destructive vs. non-destructive vs. the time it takes, does that really change your need for assurance? I am sensitive in this area, but at the end of the day, do the work or don't. You don't get much of anything going at it half-baked.
  • Dr. Wayne Taylor is the main resource I use. In the validation stats world, he is amazing.
  • When it comes to what sample set to use, I look to what reliability and confidence you want, n=ln(1-C)/ln(R) for no failures allowed. With failures, use Binomial proportion confidence interval. In the past, I have used 99.9/95 (99.9 reliability and 95 confidence), 99/95, 95/95, and 90/95.
  • The length of the run should cover normal operations. If you are 3 shift operation, you should cover all three. Get some stop and starts in the middle.

I used to use iso 2859 sampling procedures for inspection by attributes - check it out

Hey Michael,

My opinion is that the criteria you listed may be more a matter of practical significance rather than statistical, with the exception of the population size, which is taken into consideration under the ANSI standard for acceptance sampling. If I’m understanding correctly though, this process validation isn't really about establishing acceptance criteria, so I agree that a binomial approach for proportion defective sounds more like what you’re looking for. I have been in a similar situation where I have been tasked with determining a statistically valid sampling plan and no data to start with, and I do have a recommendation that might help. It is possible to establish a sample size by using a hypothesized defect rate, a desired margin of error, and desired level of confidence.

  1. Defect Rate: The outcome of the data that you are looking for is binomial (go/no go or pass/fail) and since there are only two possible outcomes, the probability of each outcome is 50%. Therefore, 50% can be used as a hypothesized “defect rate.”
  2. Margin of Error: The margin of error is effectively synonymous with a confidence interval, where we have a point estimate that is calculated by the sample statistics, and a certain range around that estimate where the true population parameter might be. When selecting a desired margin of error, we can say that we want to know our true population parameter falls within a desired range of plus and minus the measured point estimate/calculated statistic.
  3. Confidence Level: Confidence levels are based on the probability of certain outcomes from the sample data. If we were to observe every possible combination of samples, each of those would have a calculated result and those results would all be slightly different from one another. Being 95% confident, for example, is saying that 95 times out of 100 sample combinations, we would observe statistical results within a certain range.

I’m sorry that I do not recall the exact formula to calculate the sample size with this method, but I can walk you through how to do it in Minitab.

  • From the top menu bar, select Stat -→ Power and Sample Size -→ Sample Size for Estimation…
  • In the dialogue box:
    • select the Parameter: Proportion (Binomial)
    • Enter Planning Value Proportion: 0.5
    • Enter desired Margins of error for confidence intervals: ie. 0.01 for a 1% margin of error, 0.02 for a 2% margin of error, etc
    • Click the Options… box
      • Enter the desired Confidence level: ie. default is 95.0, but it can be changed to 90 or 99, etc.

Alternatively, below is a table to get you started. The hypothesized defect rate is 50% (p = 0.5), different levels of confidence are across the columns, different margins of error down the rows, and the sample size to meet these criteria populate the table. So for example, with a sample size of n=289, you can say that you are 90% confident that the true population parameter is within +/- 5% of the calculated point estimate.

p = 0.5
Confidence level80%85%90%95%99%
Margin of error
2%1,0751,3441,7392,4494,193
5%183226289402680
10%506076104172

The benefit of this sampling method is that you can consider the type of tests and resource demand, etc and choose a reasonable sample size with practical significance knowing how confident you can be in the results and confidence interval. The drawback is that this is kind of a “worst case scenario” approach and is (hopefully!) an overestimate, so you might risk over-sampling.

If I'm mistaken about the binomial recommendation and it is acceptance criteria that you are looking for, then you can also use minitab to determine a sampling plan (as opposed to the ANSI standard) by entering the desired acceptable quality level (AQL), lot tolerance percent defective (LTPD) and alpha and beta risk percentages. This is a little different though, if you're thinking confidence intervals, what you might actually mean is the acceptable range of the defect rate. This is actually covered between the AQL and LTPD. If you think about sampling for attributes in a real time scenario where you might be using an SPC control chart, the AQL is technically the centerline, and the LTPD is like the upper control limit. And the risk criteria is the inverse of confidence level where you're basically saying that x % of the time, you might make the wrong decision about whether a lot is or is not acceptable. In my opinion, everything that goes into determining a sample size for acceptance should be established by whomever makes the business decisions before the sample size is determined. For example, your team might establish that for minor defects, you can accept an average of 4% defective (AQL) and an absolute maximum of 10% defective (LTPD) of any single lot. (Also consider whether the acceptance rates might be different for major or critical defects). Additionally, the team might decide that they are willing to risk the wrong decision 10% of the time (consider for both producer's (alpha) and consumer's (beta) risk.) With that information, then you can go to Minitab -→ Stat -→ Quality Tools -→ Acceptance Sampling by Attributes, and enter the applicable information. The Operating Characteristic curve shows you the probability range of accepting the lot (y-axis) with your sample size and acceptance number assuming hypothetical defect rate(s) (x-axis) of the population.

Hope this helps, and good luck!

Thank you, Sarah - OC (operating curve) is a great tool to use/illustrate the options to the leadership teams/business decision-makers!
Also, as you show in the table above, ability to “filter” below 5% defective will cost hundreds of items to inspect, while aiming at/around 1% - thousands of individual samples.
This is kind of “rule of thumb” that I find to be useful while explaining why downstream testing should be only done where it is not technically avoidable or is, to the point, “prescribed” from compliance perspectives (standards, industrial codes etc)…
For many industries, even 1% if the defective components slipping through the inspection is not acceptable and won't not be well-received neither by customer, nor regulator, I believe.

Duke Okes
184 Posts

FYI:

Hello Michael

Hope, this article will help you.

One practical (empirical) thing to consider - How many defective products are you willing to produce before you detect a bad one? Let's say the line starts producing 100% defective items immediately after the last sample and no one will detect it until the next item is sampled. You can have a discussion of - how much does a defective unit cost us versus how much does it cost to sample a unit? This discussion can help put bones on a discussion of risk.