QA Inc.
QUALITYAMERICA.COM we're worth your visit!
QP Inc.

 
Interpreting a Scatter Diagram

Contents | Quality Encyclopedia | Discussion Blogs

Interpreting a Scatter Diagram

The x-axis in the diagram is used to measure the scale of one characteristic (called the independent variable), and the y-axis measures the second, (called the dependent variable). If the two characteristics are somehow related, the pattern formed by plotting them in a Scatter Diagram will show clustering in a certain direction and tightness. The more the cluster approaches a line in appearance, the more the two characteristics are likely to be linearly correlated.

The relative correlation of one characteristic to another can be seen both from how closely points cluster the line, and the correlation coefficient in the Statistics window. Values near one imply very high correlation between the characteristics, meaning that a change in one characteristic will be accompanied by change in the other characteristic. Positive correlation means that as one increases, so does the other, and is shown on the Scatter Diagram as a line with positive slope. Negative correlation implies that as one characteristic increases, the other decreases, and a negative slope is seen on the Scatter Diagram.

Remember that correlation does not necessarily mean a cause and effect relationship exists. Both the characteristics may be the effect of a number of other causes. All it means is that there appears to be a relationship between the two over the range of the data. Be careful not to extrapolate beyond the data region, since you have no experience upon which to draw.

The confidence lines indicate the bounds of variation that can be expected for the fitted Regression function. The width of the Confidence Interval provides an indication of the quality of the fitted Regression function. The fact that the confidence lines diverge at the ends, and converge in the middle, may be explained one of two ways:

  1. The regression function, in this case a line, requires estimation of two parameters: slope and y-intercept. The error in estimating slope can be visualized by imagining the fitted line's slope varying about its middle. This results in the hourglass-shaped region shown by the confidence intervals.

  2. The center of the data is located near the middle of the fitted line. The ability to predict the regression function should be better where there is more data; hence the confidence limits are narrower at the middle.

See also:

When to Use a Scatter Diagram

Regression Function

Correlation Coefficient

Coefficient of Determination

F Statistic


Search | Site Map | Privacy | About Us

Copyright © 1995-2008 Quality America Inc. All Rights Reserved