ENGR200 Term Project — Excel Demo (Statics Grade)
Instructor: Dr. Maggie Foley • Course: Engineering Statistics • Individual Assignment.
This page mirrors the Term Project Guide using the official dataset
ENGR200_Term_Project_Data.xlsx (Statics course). It shows how to do the project
in Excel using built-in functions and (optionally) the Analysis Toolpak.
You may choose either:
(A) Analysis Toolpak for descriptive stats & regression, or
(B) Excel functions/matrix formulas as shown here.
If you use Excel functions/matrix, you must submit the Excel file (see Section 9).
0) Open the Statics dataset
Download and open the Excel file ENGR200_Term_Project_Data.xlsx from Blackboard. You should see 150 students with the following columns:
- A: Student_ID
- B: HS_GPA
- C: SAT_Math
- D: SAT_Verbal
- E: Study_Hours_per_week
- F: Attendance_%
- G: Class_Participation
- H: Statics_Grade (your Y)
Snippet of the first few rows (for a quick sanity check only):
Mean HS_GPA ≈ 3.17 • Mean Study_Hours_per_week ≈ 11.61
Mean Attendance_% ≈ 89.79 • Mean Statics_Grade ≈ 83.54.
If your means are wildly different, your ranges are probably wrong.
1) Descriptive statistics (mean, median, stdev, count)
Compute these for every numeric variable B:H. Example layout below uses rows 2..151 as data and puts summaries in columns P:V.
2) Histograms (Toolpak or FREQUENCY)
Make at least one histogram for Statics_Grade and one for a predictor (e.g., HS_GPA or SAT_Math).
Option A — Analysis Toolpak:
Data → Data Analysis → Histogram
• Input Range: for Statics_Grade use $H$2:$H$151
• Bin Range: choose a column of bin cut points (e.g., 50, 60, 70, …, 100)
• Check Labels if you included the header
• Choose an Output Range and check Chart Output.
Option B — FREQUENCY (no Toolpak):
Put Statics_Grade bins in U2:U12 (e.g., 50, 60, 65, …, 100), then use FREQUENCY.
3) Scatter plots (Statics_Grade vs predictors)
Make scatterplots with Statics_Grade (H) as the vertical axis and a predictor as the horizontal axis. Examples:
- Statics_Grade vs HS_GPA (B)
- Statics_Grade vs SAT_Math (C)
- Statics_Grade vs SAT_Verbal (D)
- Statics_Grade vs Study_Hours_per_week (E)
One method: select the X and Y columns (e.g., E1:E151 and H1:H151 while holding Ctrl), then Insert → Scatter → first option. Add a trendline if you want a visual slope.
4) Correlation matrix (numeric variables)
Build a correlation matrix for numeric variables B:H using CORREL. Example layout: headers B:H across the top and down the left; fill with CORREL formulas.
corr(Statics, SAT_Verbal) ≈ 0.14 • corr(Statics, Study_Hours) ≈ −0.10 • corr(Statics, Attendance) ≈ 0.09
All are fairly weak in this demo dataset.
5) Simple Linear Regression: Statics_Grade ~ Study_Hours
Here we regress Y = Statics_Grade (H) on X = Study_Hours_per_week (E). We use direct Excel functions for β, R², and t.
A) Direct Excel functions (recommended)
β₀ ≈ 85.80 • β₁ (Study_Hours) ≈ −0.195
R² ≈ 0.010 (almost no explanatory power) • t(β₁) ≈ −1.24, p ≈ 0.22.
B) Toolpak alternative (no extra formulas)
Data → Data Analysis → Regression
• Input Y Range: $H$2:$H$151 (Statics_Grade)
• Input X Range: $E$2:$E$151 (Study_Hours_per_week)
• Check Labels, choose Output Range, check Residuals if desired
→ Toolpak will give β₀, β₁, SE, t, p, R² directly.
6) Multiple Linear Regression (Statics_Grade on several predictors)
Now build a multiple regression model with Y = Statics_Grade (H) and several X’s: HS_GPA (B), SAT_Math (C), SAT_Verbal (D), Study_Hours (E), Attendance_% (F), Class_Participation (G). You may choose a subset that makes sense for your project.
Option 1 — Analysis Toolpak (easiest, strongly recommended)
1. Data → Data Analysis → Regression
2. Input Y Range: $H$2:$H$151 (Statics_Grade)
3. Input X Range: $B$2:$G$151 (all six predictors) or a subset
4. Check Labels; choose an Output Range on a new sheet (e.g., "MLR_Toolpak")
5. Under Residuals, check Residuals (and optionally Residual Plots)
6. Click OK — you now have β’s, SE, t, p, R², Adjusted R², F, etc.
If you choose this route, you can interpret coefficients and model fit directly from the Toolpak regression table. See Section 9 for what to submit.
Option 2 — Excel functions + matrix method (for transparency)
This option reproduces Toolpak’s regression using matrix formulas. It’s more work but shows the mechanics. Use if you want to see the “inside” of OLS or if Toolpak is unavailable.
6.1 Build the design matrix X (7 columns)
6.2 Estimate β̂, ŷ, residuals, R², F, etc.
6.3 Standard errors, t-tests, and confidence intervals
β̂: Intercept ≈ 57.6, HS_GPA ≈ 4.10, SAT_Math ≈ 0.018, SAT_Verbal ≈ 0.037,
Study_Hours ≈ −0.162, Attendance ≈ 0.144, Participation ≈ 0.180.
R² ≈ 0.206, adj-R² ≈ 0.174, df error ≈ 143, MSE ≈ 93.20.
7) OPTIONAL — Residual analysis, leverage & Cook’s D Optional
Optional but great for students aiming for distinction. Use these only if you want to check for outliers/influential points and normality more rigorously.
A) Leverage hᵢ, standardized residuals, Cook’s D (click to expand)
B) QQ plot of residuals (click to expand)
8) Multicollinearity (VIF)
VIFj = 1/(1 − R²j) where R²j comes from regressing Xj on the other predictors (with an intercept). We can use LINEST for each predictor in turn.
so multicollinearity is not a serious issue in the full model.
Special Note — Using SPSS to Get VIF
Excel’s Analysis Toolpak does not report VIF directly. You can either use the Excel VIF formulas in this section, or you can run the same regression in SPSS and let SPSS calculate VIF for you.
SPSS menu path for VIF:
Practical shortcut: run the full model in Excel, then replicate the same Y and X’s in SPSS once, just to read off the Tolerance and VIF values and copy them into your report.
9) What to submit (Term Project)
-
Word report (5–8 pages) — required for everyone.
- Describe the dataset and selected variables.
- Summarize descriptive stats, histograms, correlation matrix.
- Present your regression model(s) with tables (coefficients, SE, t, p, R², F).
- Interpret one numeric coefficient and (if you used any) one dummy coefficient.
- Discuss multicollinearity using your VIFs.
- Comment briefly on residuals/diagnostics if you used Section 7.
-
If you used the Analysis Toolpak for regression/descriptives (Regression, Descriptive
Statistics, Histogram):
- Include the original Toolpak output tables/figures (pasted or screenshot) in your Word report.
- You do not have to submit the Excel file, but you may attach it if you wish.
-
If you used Excel functions / matrix formulas (SLOPE, INTERCEPT, RSQ, CORREL, MMULT, MINVERSE, etc.):
- Submit your Excel workbook showing all steps (descriptives, histograms, scatterplots, simple regression, multiple regression, VIF, and any diagnostics you used).
- Submit your Word report as above.
Practical strategy: master this Statics demo first (follow steps 1–8), then repeat the same workflow on your own approved model for the project.
10) (Optional) Toolpak shortcuts vs functions
- Descriptive Statistics (Toolpak): quick table of mean, median, SD, min, max.
- Histogram (Toolpak): automatic bins + chart.
- Regression (Toolpak): β, SE, t, p, R², adjusted R², ANOVA all at once.
- Functions path: COUNT, AVERAGE, MEDIAN, STDEV.S, CORREL, SLOPE, INTERCEPT, RSQ, T.DIST.2T, F.DIST.RT, MMULT, MINVERSE, etc., give the same results but let you see each step.
11) Why Excel here? When to use Stata / R / Python instead?
Excel is for learning transparency. In this project you see every step: building X, computing β̂, residuals, MSE, standard errors, t-tests, confidence intervals, and VIF. For serious professional work, you would normally use Stata, R, or Python: they handle big datasets, robust/clustered SEs, better graphics, and fully reproducible code. This assignment is a bridge so that when you move to those tools, you already understand exactly what each command is doing.