|
Contents
| Quality Encyclopedia
| Discussion Blogs
All-Point
Method
The All Point method of curve
fitting provides a best fit based on all of the data points using each of
the four transformations (Bounded, Unbounded, Log Normal, Normal). In the
case of the Normal and Log Normal forms, the regression of the transformed
standardized normal variable on the cumulative Normal distribution involves
only two parameters, resulting in simple linear regression. In the case
of the more general Bounded and Unbounded forms, the estimation of the four
parameters requires the use of the Marquardt nonlinear regression algorithm.
This algorithm, well documented in mathematical literature, iteratively
varies each of the parameters in the direction of best fit until an optimal
solution results (Marquardt). The Kolmogorov-Smirnov (K-S)
test is then applied to test for the significance of the fit between the
transformed (normalized) values derived for each of the Johnson forms. The
form having the least maximum absolute deviation is selected as the final
fitted form.
An all-points Johnson fit
to data may generate one of two messages even for a fit with a satisfactory
value for the K-S test. Although the fit may be acceptable, tests on the
fit can indicate that there may be an opportunity for improvement in the
fit. The tests are based on the deviations (residuals) between the data
and the fit to the data. There must be more than 6 data for the outliers
test and nine for the deviations test, with at least 4 plus and 4 minus
deviations. If there are not enough data, the tests are not done and no
message will be printed.
Message 1: There
appear to be systematic deviations between the data and the fitted curve,
which may indicate the effect of multiple processes.
This message may be triggered
by either one or both of two tests on the randomness of the deviations between
the data and the fit:
- The first test calculates
a neighborhood correlation coefficient, rho, for the residuals between
the data and the fitted curve. The neighborhood correlation coefficient
compares each residual with its previous and next in sequence neighbors.
The message is triggered when rho is greater than (is significant at)
a 0.05 level of significance. The neighborhood correlation (rho) (1-sided)
test at the 0.05 significance level, against the hypothesis that rho
= 0 (Sachs) is:
abs(rho) * sqrt(m-1) <
1.7 where m is the number of data.
- The second test looks
for too many or too few runs (a series of residuals all of the same
sign). This two-sided test at the 0.05 level, against the hypothesis
that the number of runs is equal to the expected number is (Swed):
number of runs < E - (1.645
* sqrt(V)) - 0.5 (for significantly few runs)
number of runs > E + (1.645*-
sqrt(V)) + 0.5 (for significantly too may runs)
where 0.5 is a continuity
correction, and the expected number of runs and its variance is (Gibbons):
E(number of runs) = 1 + 2*m1*(m
- m1) / m
V(number of runs) = (E -
1)*(E - 2) / (m - 1)
Message 2: There
may be outliers in the data.
It's expected that there
will be as many positive values for the residual as there are negative values.
The number of positive deviations
is expected to be E(m1) = m / 2. m1 is the number of positive deviations
of the fit. For this two-sided test at the 0.10 level, a rule of thumb (Duckworth)
is,
abs(2*m1 - m)/ sqrt(m) <
1.645 for lack of significance.
References:
Duckworth and Wyatt. "Rapid
Statistical Techniques for OR Workers." Operation Research Quarterly,
9 (1958) pp.218+
Gibbons. Non-Parametric
Statistical Inference, New York: Marcel Dekker, Inc., 2nd Ed., 1985.
Sachs. Applied Statistics,
Springer, 1982.
Swed and Eisenhart. "Testing
Randomness of Grouping in a Sequence." Ann Math Statistics, 14 (1943)
pp.83+.
|