L23 & UNIT 4 REVIEW
Requirement | How to Check | What you hope to see | |
|---|---|---|---|
1. | Linear Relationship | Scatterplot | “Hot dog” shape |
Residual Plot | No pattern in the residuals | ||
2. | Normal Error Term | Histogram of the Residuals | A shape that is approximately normal |
3. | Constant Variance | Residual Plot | No megaphone shape in the residuals |
4. | ’s are Known | Cannot be checked directly | ’s should be measured |
5. | Observations are | Cannot be checked directly | Knowing the value of one of the tells you nothing about any other points |
By the end of this lesson, you should be able to:
- Confidence Intervals for the slope of the regression line:
- Calculate and interpret a confidence interval for the slope of the regression line given a confidence level.
- Identify a point estimate and margin of error for the confidence interval.
- Show the appropriate connections between the numerical and graphical summaries that support the confidence interval.
- Check the requirements for the confidence interval.
- Hypothesis Testing for the slope of the regression line:
- State the null and alternative hypothesis.
- Calculate the test-statistic, degrees of freedom and p-value of the hypothesis test.
- Assess the statistical significance by comparing the p-value to the α-level.
- Check the requirements for the hypothesis test.
- Show the appropriate connections between the numerical and graphical summaries that support the hypothesis test.
- Draw a correct conclusion for the hypothesis test.
Creating scatterplots of bivariate data allows us to visualize the data by helping us understand its shape (linear or nonlinear), direction (positive, negative, or neither), and strength (strong, moderate, or weak).
The correlation coefficient () is a number between and that tells us the direction and strength of the linear association between two variables. A positive corresponds to a positive association while a negative corresponds to a negative association. A value of closer to or indicates a stronger association than a value of closer to zero.
In statistics, we write the linear regression equation as where is the Y-intercept of the line and is the slope of the line. The values of and are calculated using software.
Linear regression allows us to predict values of for a given . This is done by first calculating the coefficients and and then plugging in the desired value of and solving for .
The independent (or explanatory) variable () is the variable which is not affected by what happens to the other variable. The dependent (or response) variable () is the variable which is affected by what happens to the other variable. For example, in the correlation between number of powerboats and number of manatee deaths, the number of deaths is affected by the number of powerboats in the water, but not the other way around. So, we would assign to represent the number of powerboats and to represent the number of manatee deaths.
The unknown true linear regression line is where is the true y-intercept of the line and is the true slope of the line.
A residual is the difference between the observed value of for a given and the predicted value of on the regression line for the same . It can be expressed as:
To check all the requirements for bivariate inference you will need to create a scatterplot of and , a residual plot, and a histogram of the residuals.
We conduct a hypothesis test on bivariate data to know if there is a linear relationship between the two variables. To determine this, we test the slope () on whether or not it equals zero. The appropriate hypotheses for this test are:
For bivariate inference we use software to calculate the sample coefficients, residuals, test statistic, -value, and confidence intervals of the true linear regression coefficients.
Comments
Post a Comment