Summary

Difficulty: ★★★★☆

Covers: Simple linear regression, Regression vs correlation, Regression line and prediction equation, Residuals and prediction error, Explained variance and R-squared, Hypothesis testing of the slope, Regression assumptions, Running and interpreting regression in Stata

What is Regression?

Simple linear regression predicts a numeric outcome (Y / DV) from one numeric predictor (X / IV).

Correlation: describes the relationship between X and Y
Regression: uses X to predict Y (and explains variation in Y)

Note: Regression is an extension of correlation.

Regression ≠ Causation

Regression is usually used in non-experimental (correlational) designs, so:

If X predicts Y, that does not prove X causes Y
Causation needs appropriate design (often experimental + converging evidence)

Regression vs Correlation

Correlation (r): describes the strength and direction of a linear relationship
Regression: uses that relationship to predict Y from X

In simple linear regression: $R^2 = r^2$

When Do We Use Regression?

Common designs:

Cross-sectional surveys (measure many variables once)
Longitudinal studies (use earlier measures to predict later outcomes)

The Regression Line + Equation

Regression finds the line of best fit on a scatterplot.

Regression equation

ŷ=a+bX

Symbol	Meaning	Plain English
ŷ	predicted Y	predicted score on the outcome
$a$ (alpha / intercept)	intercept	predicted Y when X = 0
$b$ (beta / slope)	slope	change in Y for a 1-unit increase in X

Slope interpretation example:
If $b = 0.5$ b=0.5, then every +1 in X predicts +0.5 in Y.

Important: What the Slope (b) Tells You

The slope (b) is the key result.

Positive b → higher X predicts higher Y
Negative b → higher X predicts lower Y

Example:

b = 0.26 → every extra 1 unit of X predicts +0.26 units in Y

What are Residuals? (Errors)

A residual is the difference between:

observed Y
predicted Y

$\text{Residual} = Y – \hat{Y}$

Regression uses least squares to minimise total error

Small residuals → good prediction

Large residuals → poor prediction

Residual sign	Means
Positive	point is above the regression line
Negative	point is below the regression line

Variance + R² (Why Regression Works)

Regression explains variance in Y.

R-squared (R²)

$R^2 = \frac{\text{variance explained by the model}}{\text{total variance in Y}}$ R2=total variance in Yvariance explained by the model

R² ranges from 0 to 1
Often expressed as a percentage

Examples:

R² = .03 → 3% of variance explained (small)
R² = .25 → 25% of variance explained (large)

Hypothesis Testing in Regression

In regression, we test only the slope.

Hypotheses

H₀: $b = 0$ b=0 (X does not predict Y)
H₁: $b \neq 0$ b=0

Decision rule:

p < .05 → significant predictor
p ≥ .05 → not significant

The test statistic is: $t = \frac{b}{SE(b)}$ t=SE(b)b

Assumption Checks

You only need to check three things:

Relationship between X and Y looks roughly linear
Residuals are approximately normal
Residuals show constant spread (no clear pattern)

If these look reasonable → interpret results.

Running Regression in Stata (Commands)

regress y x

Visual check

graph twoway (scatter y x) (lfit y x)

Residual checks

predict r, residual
histogram r
swilk r
rvfplot, yline(0)

Reading Stata Output

Output piece	What it tells you
b (coefficient)	direction and size of prediction
p-value	is X a significant predictor?
R²	how much variance in Y is explained

Using the Regression Equation to Predict

Once you have $a$ and $b$ , you can predict Y for any X: $\hat{Y} = a + bX$

This is how regression is used for:

prediction
policy decisions
real-world forecasting

How to Write the Result For Reports

Significant

X significantly predicted Y, b = __, p = __, explaining __% of the variance (R²).

Not significant

X did not significantly predict Y, b = __, p = __.

The Psych Diaries

STAT 1103 Week 11 Notes: Simple Linear Regression

Summary

What is Regression?

Regression ≠ Causation

Regression vs Correlation

When Do We Use Regression?

The Regression Line + Equation

Regression equation

Important: What the Slope (b) Tells You

What are Residuals? (Errors)

Variance + R² (Why Regression Works)

R-squared (R²)

Hypothesis Testing in Regression

Hypotheses

Assumption Checks

Running Regression in Stata (Commands)

Visual check

Residual checks

Reading Stata Output

Using the Regression Equation to Predict

How to Write the Result For Reports

Leave a comment Cancel reply

STAT 1103 Week 11 Notes: Simple Linear Regression

Summary

What is Regression?

Regression ≠ Causation

Regression vs Correlation

When Do We Use Regression?

The Regression Line + Equation

Regression equation

Important: What the Slope (b) Tells You

What are Residuals? (Errors)

Variance + R² (Why Regression Works)

R-squared (R²)

Hypothesis Testing in Regression

Hypotheses

Assumption Checks

Running Regression in Stata (Commands)

Visual check

Residual checks

Reading Stata Output

Using the Regression Equation to Predict

How to Write the Result For Reports

Share this:

Leave a comment Cancel reply