|
Contents
| Quality Encyclopedia
| Discussion Blogs
Interpreting
a Scatter Diagram
The x-axis in the diagram
is used to measure the scale of one characteristic (called the independent
variable), and the y-axis measures the second, (called the dependent variable).
If the two characteristics are somehow related, the pattern formed by plotting
them in a Scatter Diagram will show clustering in a certain direction and
tightness. The more the cluster approaches a line in appearance, the more
the two characteristics are likely to be linearly correlated.
The relative correlation
of one characteristic to another can be seen both from how closely points
cluster the line, and the correlation coefficient in the Statistics window.
Values near one imply very high correlation between the characteristics,
meaning that a change in one characteristic will be accompanied by change
in the other characteristic. Positive correlation means that as one increases,
so does the other, and is shown on the Scatter Diagram as a line with positive
slope. Negative correlation implies that as one characteristic increases,
the other decreases, and a negative slope is seen on the Scatter Diagram.
Remember that correlation
does not necessarily mean a cause and effect relationship exists. Both the
characteristics may be the effect of a number of other causes. All it means
is that there appears to be a relationship between the two over the range
of the data. Be careful not to extrapolate beyond the data region, since
you have no experience upon which to draw.
The confidence lines indicate
the bounds of variation that can be expected for the fitted Regression function.
The width of the Confidence Interval provides an indication of the quality
of the fitted Regression function. The fact that the confidence lines diverge
at the ends, and converge in the middle, may be explained one of two ways:
- The regression
function, in this case a line, requires estimation of two parameters:
slope and y-intercept. The error in estimating slope can be visualized
by imagining the fitted line's slope varying about its middle. This
results in the hourglass-shaped region shown by the confidence intervals.
- The center of the
data is located near the middle of the fitted line. The ability to predict
the regression function should be better where there is more data; hence
the confidence limits are narrower at the middle.
See also:
When
to Use a Scatter Diagram
Regression
Function
Correlation
Coefficient
Coefficient
of Determination
F
Statistic
|