|
Contents
| Quality Encyclopedia
| Discussion Blogs
Best
Fit (Johnson)
This family of distributions,
published by statistician N.L. Johnson in 1949, is perhaps the most versatile
choice. It is based on a transformation of the standard normal variable,
and includes four forms:
- Unbounded: the
set of distributions that go to infinity in both the upper or lower
tail.
- Bounded: the set
of distributions that have a fixed boundary on either the upper or lower
tail, or both.
- Log Normal: a border
between the Unbounded and Bounded distribution forms.
- Normal: a special
case of the Unbounded form.
The flexibility provided
by the choice of form and fitting parameters allows for great flexibility
in adjusting the curve to fit the data. The fact that the Johnson system
involves a transformation of the raw variable to a Normal variable allows
estimates of the percentiles of the fitted distribution to be calculated
from the Normal distribution percentiles, for use in control limit calculations
(on the Individual-X chart) or for Capability Analysis. Thus, although capability
indices and control limits are generally only defined for normal variables,
this approach allows their calculation for all distribution types.
One of two methods is
used to fit the curve to the data: the Four-Point
Method or the All-Point Method.
The Four-Point method does well for large data sets, since it tends to smooth
the data in fitting the distribution. The All-Point method takes more time
because the fit must be made at all data values rather than just the four
percentile points.
It should be noted that
regardless of the method used, all the data values are used, they're just
used differently. The four percentile method has been shown to be nearly
identical to the all points method for these larger files. However, the
All-Points method is generally recommended for overall accuracy considerations,
and given the speed of today's computers, the trade-off in speed is not
that great of an issue.
The Cumulative Distribution
Function of the transformed standardized normal variable z is (Hahn &
Shapiro; Johnson (1983)):
where
Normal form (Sn):
Log Normal form (Sl):
Unbounded form (Su):
Unbounded form (Sb):
and for computational
reasons, the data values, x are first transformed to standardized values:
|