ENGR200 Term Project — Excel-Only (GPA Demo)

Instructor: Dr. Maggie Foley • Course: Engineering Statistics • Individual Assignment • This page shows every step in Excel only. Students follow this workflow, then scale to their own approved dataset.

0) Copy the GPA dataset

Paste into Excel at A1. You’ll have columns A:F.
Expected quick checks (for this demo dataset):
Mean GPA ≈ 3.322; Mean StudyHours ≈ 10.55; Mean Attendance ≈ 87.05; Mean SAT ≈ 1231.5.

1) Descriptive statistics (mean, median, stdev, count)

Assume numeric columns in A:D = GPA, StudyHours, Attendance, SAT (rows 2..21).

P2 (Count GPA): =COUNT($A$2:$A$21) P3 (Mean GPA): =AVERAGE($A$2:$A$21) P4 (Median GPA): =MEDIAN($A$2:$A$21) P5 (StdDev GPA): =STDEV.S($A$2:$A$21) Q2 (Count Study): =COUNT($B$2:$B$21) Q3 (Mean Study): =AVERAGE($B$2:$B$21) Q4 (Median Study): =MEDIAN($B$2:$B$21) Q5 (StdDev Study): =STDEV.S($B$2:$B$21) R2 (Count Attend): =COUNT($C$2:$C$21) R3 (Mean Attend): =AVERAGE($C$2:$C$21) R4 (Median Attend): =MEDIAN($C$2:$C$21) R5 (StdDev Attend): =STDEV.S($C$2:$C$21) S2 (Count SAT): =COUNT($D$2:$D$21) S3 (Mean SAT): =AVERAGE($D$2:$D$21) S4 (Median SAT): =MEDIAN($D$2:$D$21) S5 (StdDev SAT): =STDEV.S($D$2:$D$21)
Expected: Mean GPA ≈ 3.322, Median ≈ 3.27, SD ≈ 0.323.

2) Histograms (Toolpak or FREQUENCY)

Toolpak: Data → Data Analysis → Histogram → Input Range: A2:A21 → Output Range: choose a blank area → Chart Output.

No Toolpak: set bin edges (e.g., 2.6, 2.8, 3.0, …, 4.0) in a column, then use array formula FREQUENCY.

(Assume bins in U2:U10; put this in V2 and confirm as dynamic array in 365; older Excel: Ctrl+Shift+Enter) V2: =FREQUENCY($A$2:$A$21,$U$2:$U$10) Then insert a Column Chart on U2:V10.

3) Scatter plots (GPA vs predictors)

Insert → Scatter → select (B2:B21 vs A2:A21) for GPA vs StudyHours; repeat for Attendance and SAT. Add trendline if desired.

4) Correlation matrix (numeric)

Use CORREL across A:D.

(Place headers somewhere: GPA, Study, Attend, SAT in X1:AA1 and again in W2:W5; then:) X2: =CORREL($A$2:$A$21,$A$2:$A$21) (GPA,GPA) Y2: =CORREL($A$2:$A$21,$B$2:$B$21) (GPA,Study) Z2: =CORREL($A$2:$A$21,$C$2:$C$21) (GPA,Attend) AA2:=CORREL($A$2:$A$21,$D$2:$D$21) (GPA,SAT) ... fill symmetrical for Study, Attend, SAT rows
Expected (approx): corr(GPA,Study) ≈ 0.991; corr(GPA,Attend) ≈ 0.965; corr(GPA,SAT) ≈ 0.982.

5) Simple Linear Regression: GPA ~ StudyHours

Two options: (A) quick functions, or (B) full matrix method.

A) Quick functions

Slope (β1): =SLOPE($A$2:$A$21,$B$2:$B$21) Intercept (β0): =INTERCEPT($A$2:$A$21,$B$2:$B$21) R^2: =RSQ($A$2:$A$21,$B$2:$B$21) ŷ for each i: =$Intercept$ + $Slope$*B2 Residual e_i: =A2 - ŷ MSE: =SUMSQ(e_range)/(COUNT(e_range)-2) SE(β1): =SQRT(MSE/SUMXMY2(B2:B21,AVERAGE(B2:B21))) t(β1): =Slope/SE(β1) p(β1 two-sided): =T.DIST.2T(ABS(t), COUNT(e_range)-2)
Expected (approx): β0 ≈ 2.4141, β1 ≈ 0.08605, R² ≈ 0.9827, t(β1) ≈ 31.98 (p < 1e−12).

B) Matrix method (teaches mechanics)

Let X be [1, Study] in G2:H21, Y=A2:A21. G2: =1 (fill to G21) H2: =B2 (fill to H21) β̂ (in N2:N3): =LET(X,$G$2:$H$21,Y,$A$2:$A$21, MMULT(MINVERSE(MMULT(TRANSPOSE(X),X)), MMULT(TRANSPOSE(X),Y))) ŷ (O2:O21): =MMULT($G$2:$H$21,$N$2:$N$3) e (P2:P21): =A2-O2 SSE (P23): =SUMSQ(P2:P21) df (P24): =ROWS(A2:A21)-ROWS(N2:N3) MSE (P25): =P23/P24 (X′X)^{-1} (Q2:R3): =MINVERSE(MMULT(TRANSPOSE($G$2:$H$21),$G$2:$H$21)) SE β̂_j (S2): =SQRT($P$25*INDEX($Q$2:$R$3,ROW($A1),ROW($A1))) (fill down) t (T2): =N2/S2 (fill down) p (U2): =T.DIST.2T(ABS(T2),$P$24) (fill down)

6) Multiple Linear Regression (≥4 predictors)

Y = GPA. Predictors: StudyHours (B), Attendance (C), SAT (D), dummies for Major (E) and Gender (F). Baselines: STEM and F.

Build design matrix X (7 columns)

G2: =1 H2: =B2 (StudyHours) I2: =C2 (Attendance) J2: =D2 (SAT) K2: =--($E2="Business") (Major_Business dummy) L2: =--($E2="Arts") (Major_Arts dummy) M2: =--($F2="M") (Gender_M dummy) ↓ Fill G2:M2 down to row 21

Estimate β̂ = (X′X)^{-1}X′Y and full inference

β̂ (N2:N8): =LET(X,$G$2:$M$21,Y,$A$2:$A$21, MMULT(MINVERSE(MMULT(TRANSPOSE(X),X)), MMULT(TRANSPOSE(X),Y))) ŷ (O2:O21): =MMULT($G$2:$M$21,$N$2:$N$8) e (P2:P21): =A2-O2 SSE (P23): =SUMSQ(P2:P21) df (P24): =ROWS(A2:A21)-ROWS(N2:N8) MSE (P25): =P23/P24 (X′X)^{-1} (Q2:W8): =MINVERSE(MMULT(TRANSPOSE($G$2:$M$21),$G$2:$M$21)) SE β̂_j (R2): =SQRT($P$25*INDEX($Q$2:$W$8,ROW($A1),ROW($A1))) ← put in R2; fill down to R8 t (S2): =N2/R2 ← fill down p (T2): =T.DIST.2T(ABS(S2),$P$24) ← fill down α (U1): 0.05 CI lo (U2): =N2 - T.INV.2T($U$1,$P$24)*R2 ← fill down CI hi (V2): =N2 + T.INV.2T($U$1,$P$24)*R2 ← fill down ȳ (W10): =AVERAGE($A$2:$A$21) SST (W11): =SUMPRODUCT(($A$2:$A$21-$W$10)^2) SSR (W12): =$W$11-$P$23 #slopes (W13): =COLUMNS($G$2:$M$2)-1 R² (W14): =$W$12/$W$11 Adj R² (W15): =1-(1-$W$14)*(ROWS($A$2:$A$21)-1)/$P$24 F (W16): =($W$12/$W$13)/$P$25 F p (W17): =F.DIST.RT($W$16,$W$13,$P$24)
Expected (approx):
β̂: Intercept ≈ 1.15486, Study ≈ 0.05222, Attend ≈ 0.01263, SAT ≈ 0.000435, Biz ≈ -0.05425, Arts ≈ 0.01571, Male ≈ -0.01290.
R² ≈ 0.99197, adj-R² ≈ 0.98826, df ≈ 13, MSE ≈ 0.001225.

7) OPTIONAL — Residual analysis, leverage & Cook’s D Optional

Recommended for distinction/curiosity. Not required for the baseline project grade. Use if you want to investigate outliers, influence, or normality more rigorously.

A) Leverage hᵢ, standardized residuals, Cook’s D (click to expand)
h_i (AC2): =INDEX(MMULT(MMULT($G2:$M2,$Q$2:$W$8),TRANSPOSE($G2:$M2)),1,1) std res (AD2): =$P2/SQRT($P$25*(1-AC2)) Cook D (AE2): =($P2^2/($P$25*COLUMNS($G$2:$M$2)))*(AC2/(1-AC2)^2) flag line (AF1):=4/ROWS($A$2:$A$21) ↓ Fill AC2:AE2 to row 21
Rule of thumb: flag Cook’s D > 4/n (≈ 0.20) or > 1, then investigate data quality/context.
B) QQ plot (click to expand)
Sort residuals into AG2:AG21 (smallest→largest). Rank (AH2): =ROW()-1 p_i (AI2): =(AH2-0.5)/ROWS($AG$2:$AG$21) z_i (AJ2): =NORM.S.INV(AI2) Plot (z_i on x, sorted residuals on y) → check near-straight line.

8) Multicollinearity (VIF)

VIFj = 1/(1 − R²j) from regressing Xj on the other predictors (with intercept). Quick via LINEST on auxiliaries:

Study (AA2): =1/(1-INDEX(LINEST($H$2:$H$21,CHOOSE({1,2,3,4,5},$I$2:$I$21,$J$2:$J$21,$K$2:$K$21,$L$2:$L$21,$M$2:$M$21),TRUE,TRUE),3,1)) Attend (AA3): =1/(1-INDEX(LINEST($I$2:$I$21,CHOOSE({1,2,3,4,5},$H$2:$H$21,$J$2:$J$21,$K$2:$K$21,$L$2:$L$21,$M$2:$M$21),TRUE,TRUE),3,1)) SAT (AA4): =1/(1-INDEX(LINEST($J$2:$J$21,CHOOSE({1,2,3,4,5},$H$2:$H$21,$I$2:$I$21,$K$2:$K$21,$L$2:$L$21,$M$2:$M$21),TRUE,TRUE),3,1)) Biz (AA5): =1/(1-INDEX(LINEST($K$2:$K$21,CHOOSE({1,2,3,4,5},$H$2:$H$21,$I$2:$I$21,$J$2:$J$21,$L$2:$L$21,$M$2:$M$21),TRUE,TRUE),3,1)) Arts (AA6): =1/(1-INDEX(LINEST($L$2:$L$21,CHOOSE({1,2,3,4,5},$H$2:$H$21,$I$2:$I$21,$J$2:$J$21,$K$2:$K$21,$M$2:$M$21),TRUE,TRUE),3,1)) Male (AA7): =1/(1-INDEX(LINEST($M$2:$M$21,CHOOSE({1,2,3,4,5},$H$2:$H$21,$I$2:$I$21,$J$2:$J$21,$K$2:$K$21,$L$2:$L$21),TRUE,TRUE),3,1))
Expected (approx): VIFs — Study ≈ 38.0, Attend ≈ 29.0, SAT ≈ 37.4, Biz ≈ 2.25, Arts ≈ 6.99, Male ≈ 1.29. High VIFs on numeric X indicate strong collinearity.

9) What to submit (Term Project)

  1. Excel workbook with all sheets for steps 1–6 (required). Step 7 (residuals/Cook’s/QQ) is optional—include if you use it.
  2. 5–8 page Word report: dataset/variables, methods, key tables/figures (with captions), findings, assumptions/limits.
  3. Interpretation: one numeric and one dummy coefficient; discuss multicollinearity; comment on residuals if you did step 7.

Tip: Master the GPA demo first; then repeat the same steps on your approved dataset.

10) (Optional) Analysis Toolpak shortcuts

11) Why Excel here? When to use Stata / R / Python instead?

Excel is for learning transparency. You see every step (build X, compute β̂, residuals, MSE, SE/t/p, CI/PI, VIF). For real work, use Stata / R / Python (faster, robust/clustered SE, better diagnostics, reproducible scripts/notebooks). This page is a demonstration so you can produce every result by hand in Excel, then you can switch to those packages confidently.