ENGR200 Term Project — Excel ToolPak Version (Statics & GPA Dataset)

Instructor: Dr. Maggie Foley • Course: Engineering Statistics • Individual Assignment.
This page shows how to do the project using the Excel Analysis ToolPak with the full ENGR200 dataset (150 students). Practice here, then repeat the same steps on your own approved dataset.

0) Open the ENGR200 project dataset

Download the Excel file ENGR200_Term_Project_Data.xlsx from Blackboard / course website and open the sheet ENGR200_Term_Project_Data_GPA.

Columns (Excel layout): A: Student_ID (1–150, ID only — do NOT use as X or Y) B: HS_GPA (high school GPA, 0–4 scale) C: SAT_Math (200–800) D: SAT_Verbal (200–800) E: Study_Hours_per_week (self-reported hours per week) F: Attendance_% (% of Statics classes attended) G: Class_Participation (0–10, instructor rating) H: Statics_Grade (0–100, numeric final course grade)

Quick data sanity checks (for this file):
• n = 150 students (rows 2–151).
• Mean HS_GPA ≈ 3.17, Min ≈ 2.15, Max = 4.00.
• Mean Statics_Grade ≈ 83.54, Min ≈ 69.0, Max = 100.0.
If your numbers are very different, you probably filtered or copied something incorrectly.

From now on we treat Statics_Grade (H) as a natural response variable Y and the academic/effort variables (B–G) as predictors X. For your own project dataset, you may choose a different Y.

1) Descriptive statistics (ToolPak)

Goal: get a table of mean, median, std dev, min, max, etc. for the numeric variables (HS_GPA, SAT, Study, Attendance, Participation, Statics_Grade).

Turn on the Analysis ToolPak if needed: File → Options → Add-ins → Manage: “Excel Add-ins” → Go… → check Analysis ToolPak → OK.
On the Data tab, click Data Analysis… → choose Descriptive Statistics → OK.
Input Range: $B$1:$H$151 (HS_GPA through Statics_Grade, including headers in row 1).
Grouped by: Columns. Check Labels in first row.
Output options: choose New Worksheet Ply (e.g., name it “Descriptives_ToolPak”) and check Summary statistics. Then click OK.

Expected means (approximately):
• HS_GPA ≈ 3.17    • SAT_Math ≈ 653.7    • SAT_Verbal ≈ 638.4
• Study_Hours_per_week ≈ 10.3    • Attendance_% ≈ 86.4
• Class_Participation ≈ 7.06    • Statics_Grade ≈ 83.54
(Small rounding differences are OK.)

You may also want to copy a few key numbers (mean, median, std dev) into a clean summary table that you will paste into your Word report.

2) Histograms for GPA and Statics grades

Goal: visualize the distribution of HS_GPA and Statics_Grade. Are they symmetric? Skewed? Any clear cutoffs?

A) Histogram for HS_GPA (ToolPak)

Choose some bin cutpoints (for example) in an empty column:
2.0, 2.3, 2.6, 2.9, 3.2, 3.5, 3.8, 4.1.
Data → Data Analysis → Histogram → OK.
Input Range: $B$2:$B$151 (HS_GPA values).
Bin Range: the cells where you typed the GPA bin edges. Check Chart Output.
Choose an output location or New Worksheet Ply → OK. Format the column chart nicely and label axes.

B) Histogram for Statics_Grade (ToolPak)

Pick bins such as: 65, 70, 75, 80, 85, 90, 95, 100.
Repeat the same Histogram steps using Input Range $H$2:$H$151.

In your report, comment on the shape: “Statics grades are roughly [symmetric / right-skewed / left-skewed], with most students between ~[80–90], etc.”

3) Scatter plots (Statics_Grade vs predictors)

Goal: see visually how Statics_Grade relates to the predictors. We’ll make at least three scatterplots:

Statics_Grade vs HS_GPA
Statics_Grade vs Study_Hours_per_week
Statics_Grade vs Attendance_% (or SAT_Verbal)

Select the Y column and X column together (e.g., click H1, then Ctrl-click B1 to select both headers; then drag to select down to row 151).
Insert → Scatter → first option (Scatter with only markers).
Add a trendline if you like: click a point → right-click → “Add Trendline” → Linear; check “Display Equation on chart” and “Display R-squared on chart”.

Save each scatter plot in a clean worksheet (or as separate charts) with clear titles and labeled axes. You will paste a few into your Word report.

4) Correlation matrix (ToolPak)

Goal: compute the correlation matrix for HS_GPA, SAT_Math, SAT_Verbal, Study_Hours_per_week, Attendance_%, Class_Participation, and Statics_Grade.

Data → Data Analysis → Correlation → OK.
Input Range: $B$1:$H$151 (HS_GPA through Statics_Grade, with labels).
Grouped by: Columns. Check Labels in first row.
Output: New Worksheet Ply (e.g., “Corr_ToolPak”) → OK.

Sample correlations (approximate) with Statics_Grade (H):
• corr(Statics, HS_GPA) ≈ 0.05 (very weak)
• corr(Statics, SAT_Math) ≈ −0.03 (very weak)
• corr(Statics, SAT_Verbal) ≈ 0.14 (still weak)
• corr(Statics, Study_Hours) ≈ 0.03
• corr(Statics, Attendance_%) ≈ 0.08
• corr(Statics, Class_Participation) ≈ 0.01
None of these are large; this is a fairly “messy” real-world-style dataset.

Also look at correlations among the X’s. Here they are all small (about −0.1 to 0.2), so we do not expect serious multicollinearity.

5) Simple Linear Regression (ToolPak): Statics_Grade ~ HS_GPA

Goal: run a simple regression using the ToolPak with Statics_Grade as Y and HS_GPA as X.

Data → Data Analysis → Regression → OK.
Input Y Range: $H$1:$H$151 (Statics_Grade).
Input X Range: $B$1:$B$151 (HS_GPA only).
Check Labels. Leave “Constant is Zero” unchecked.
Output: New Worksheet Ply (e.g., “SLR_GPA”) → OK.

Expected (approximate) ToolPak output for this file:
Model: Statics_Grade = β₀ + β₁·HS_GPA + ε
• β₀ (Intercept) ≈ 80.59
• β₁ (HS_GPA) ≈ 0.93 (p ≈ 0.57)
• R² ≈ 0.002 (almost no variation explained)
Interpretation: in this particular dataset, high school GPA by itself is not a strong predictor of Statics grade.

In your report, write one short paragraph explaining what β₀ and β₁ mean in context, and whether the slope is statistically significant (look at its p-value).

6) Multiple Linear Regression (ToolPak): Statics_Grade ~ HS_GPA + SAT + Study + Attendance + Participation

Now we let Statics_Grade depend on all the academic/effort variables at once.

Data → Data Analysis → Regression → OK.
Input Y Range: $H$1:$H$151 (Statics_Grade).
Input X Range: $B$1:$G$151 (HS_GPA, SAT_Math, SAT_Verbal, Study_Hours_per_week, Attendance_%, Class_Participation).
Check Labels. Output: New Worksheet Ply (e.g., “MLR_ToolPak”) → OK.

Expected overall fit (approximate):
• R² ≈ 0.054, Adjusted R² ≈ 0.014 (very small; predictors explain only ~5% of variation).
• n = 150; df for error ≈ 143 (150 − 7 parameters).

Expected coefficients (rough ballpark):
(Model: Statics_Grade = β₀ + β₁·HS_GPA + β₂·SAT_Math + β₃·SAT_Verbal + β₄·Study + β₅·Attendance + β₆·Participation + ε)
• β₀ (Intercept) ≈ 78.3 (significant)
• β₁ (HS_GPA) ≈ 0.49 (p large, not significant)
• β₂ (SAT_Math) ≈ −0.007 (p large)
• β₃ (SAT_Verbal) ≈ 0.012 (p large)
• β₄ (Study_Hours_per_week) ≈ 0.021 (p large)
• β₅ (Attendance_%) ≈ 0.021 (p large)
• β₆ (Class_Participation) ≈ −0.060 (p large)

Conclusion for this dataset: none of the individual slopes are strongly significant; this is what “noisy” real-world data often looks like.

In your report, you might say something like: “When all predictors are included simultaneously, the model explains only about 5% of the variation in Statics grades; none of the slopes are statistically significant at the 5% level. This suggests that other unmeasured factors (e.g., study quality, prior coursework, motivation) may be driving performance.”

7) Multicollinearity check (simple version)

For this project, you do not need to compute full VIF formulas by hand. A simpler check:

Look at the correlation matrix among the predictors (HS_GPA, SAT_Math, SAT_Verbal, Study, Attendance, Participation).
If many correlations are above about |0.7|, multicollinearity is a concern.
In this dataset, all predictor–predictor correlations are small (about −0.1 to 0.2).

If you do compute VIFs (optional), they are all about 1.0–1.1 for this dataset, which is essentially “no multicollinearity”.

For your own dataset in the term project, you should at least inspect the correlation matrix and comment on whether multicollinearity seems to be a problem.

Special Note — Using SPSS to Get VIF

Excel’s Analysis Toolpak does not report VIF directly. You can either use the Excel VIF formulas in this section, or you can run the same regression in SPSS and let SPSS calculate VIF for you.

SPSS menu path for VIF:

1. Analyze → Regression → Linear… 2. In the Linear Regression dialog: • Move your Y (e.g., Statics_Grade) into the Dependent box. • Move your predictors (e.g., HS_GPA, SAT_Math, SAT_Verbal, Study_Hours_per_week, Attendance_%, Class_Participation) into Independent(s). • Make sure Method is: Enter. 3. Click Statistics… • Under Regression Coefficients / Model, check: Estimates and Model fit. • Under Collinearity diagnostics, check: Collinearity diagnostics. • Click Continue. 4. Click OK. 5. In the Output Viewer, open the Coefficients table and look at the right-most columns: Tolerance and VIF (one row per predictor).

Practical shortcut: run the full model in Excel, then replicate the same Y and X’s in SPSS once, just to read off the Tolerance and VIF values and copy them into your report.

8) What to submit (Term Project structure)

Excel workbook including:
- Original data sheet (either this ENGR200 file or your own approved dataset).
- ToolPak Descriptive Statistics output.
- Histograms and scatter plots (can be on one or more chart sheets).
- ToolPak Correlation matrix output.
- ToolPak Regression output for:
  - At least one simple linear regression (Y on one X).
  - At least one multiple regression (Y on several X’s).
5–8 page Word report that includes:
- Brief description of the dataset and variables (what is Y, what are the X’s).
- Clean tables of descriptive statistics (mean, median, std dev) and the correlation matrix.
- At least two histograms and two scatter plots with captions.
- Regression results: one simple and one multiple model (state the model, report key coefficients, R², and main conclusions).
- Interpretation of at least one numeric slope in context (“holding other variables fixed…”).
- Short comment on multicollinearity (based on correlation matrix and/or VIF, if you compute it).

Tip: Use the Statics/GPA example here to debug your Excel workflow. Once it works and your numbers match the “Expected” boxes roughly, repeat the exact same steps on your own project dataset.

9) Why ToolPak here? How it relates to the Excel-functions version

The Excel ToolPak version is meant for speed and for students who want a clear click-path: Descriptive Stats → Histogram → Correlation → Regression. You still need to know how to read the output: interpret coefficients, p-values, R², and check the model assumptions with plots.

The separate Excel-functions/matrix version of this term project shows what is happening under the hood (design matrix X, β̂ = (X′X)⁻¹X′Y, SSE, MSE, standard errors, t, p, etc.). Together, the two pages give you both the mechanics (functions) and the practical workflow (ToolPak).