Lab 3 - Coffee ratings

STA 210 - Spring 2022

Welcome

Goals

Meet your team!
Ice breaker / get to know your team
Team agreement
Lab 3 - Coffee ratings

Meet your team!

Click here to find your team.
Sit with your team.

Icebreaker

Quick introductions: Name and hometown
Choose a reporter: The person whose birthday is closest to January 31
Identify 8 things everyone in the group has in common:
- Not clothes, e.g., we’re all wearing shoes!
- Not body parts, e.g., we all have a nose!

05:00

Team name + agreement

Come up with a team name. You can’t have the same name as another group in the class, so be creative!
- Your TA will get your team name by the end of lab.
Fill out the team agreement. The goals of the agreement are to…
- Gain a common understanding of the team’s goals and expectations for collaboration
- Make a plan for team communication
- Make a plan for working outside of lab

Team workflow

Only one team member should type at a time. There are markers in today’s lab to help you determine whose turn it is to type.
- Every team member should still be engaged in discussion for all questions, even if it’s not your turn type.
Don’t forget to pull to get your teammates’ updates before making changes to the .qmd file.
Only one submission for the team on Gradescope.

Team workflow, in action

Complete the “Workflow: Using Git and GitHub as a team” section of the lab in your teams.
When done, pause and wait for your TA to walk you through the rest of the slides before continuing to the following section.

10:00

Tips for working in a team

Do not pressure each other to finish early; use the time wisely to really learn the material and produce a quality report.
The labs are structured to help you learn the steps of a data analysis. Do not split up the lab among the team members; work on it together in its entirety.
Everyone has something to contribute! Use the lab groups as an opportunity to share ideas and learn from each other.

Model diagnostics

The data

The data + an outlier

The data + influential point

Influential point

An observation is influential if removing it substantially changes the coefficients of the regression model.

Influential points

Influential points have a large impact on the coefficients and standard errors used for inference
These points can sometimes be identified in a scatterplot if there is only one predictor variable, this is often not the case when there are multiple predictors
We will use measures to quantify an individual observation’s influence on the regression model: leverage, standardized residuals, and Cook’s distance

Remember `augment()`?

mtcars_fit <- linear_reg() %>%
  set_engine("lm") %>%
  fit(mpg ~ disp, data = mtcars)

augment(mtcars_fit$fit)

# A tibble: 32 × 9
   .rownames           mpg  disp .fitted .resid   .hat .sigma .cooksd .std.resid
   <chr>             <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>      <dbl>
 1 Mazda RX4          21    160     23.0 -2.01  0.0418   3.29 8.65e-3     -0.630
 2 Mazda RX4 Wag      21    160     23.0 -2.01  0.0418   3.29 8.65e-3     -0.630
 3 Datsun 710         22.8  108     25.1 -2.35  0.0629   3.28 1.87e-2     -0.746
 4 Hornet 4 Drive     21.4  258     19.0  2.43  0.0328   3.27 9.83e-3      0.761
 5 Hornet Sportabout  18.7  360     14.8  3.94  0.0663   3.22 5.58e-2      1.25 
 6 Valiant            18.1  225     20.3 -2.23  0.0313   3.28 7.82e-3     -0.696
 7 Duster 360         14.3  360     14.8 -0.462 0.0663   3.31 7.70e-4     -0.147
 8 Merc 240D          24.4  147.    23.6  0.846 0.0461   3.30 1.72e-3      0.267
 9 Merc 230           22.8  141.    23.8 -0.997 0.0482   3.30 2.50e-3     -0.314
10 Merc 280           19.2  168.    22.7 -3.49  0.0396   3.24 2.48e-2     -1.10 
# … with 22 more rows

Model diagnostics

Use the augment() function to output statistics that can be used to diagnose the model, along with the predicted values and residuals:

outcome and predictor variables in the model
.fitted: predicted values
.se.fit: standard errors of predicted values
.resid: residuals
.hat: leverage
.sigma: estimate of residual standard deviation when the corresponding observation is dropped from model
.cooksd: Cook’s distance
.std.resid: standardized residuals

Lab 3 - Coffee ratings

Welcome

Goals

Meet your team!

Icebreaker

Team name + agreement

Team workflow

Team workflow, in action

Tips for working in a team

Model diagnostics

The data

The data + an outlier

The data + influential point

Influential point

Influential points

Remember augment()?

Model diagnostics

Remember `augment()`?