```
library(tidyverse)
library(tidymodels)
library(knitr)
```

# Lab 5 - General Social Survey

## Introduction

In today’s lab you will analyze data from the General Social Survey.

### Learning goals

By the end of the lab you will be able to…

- Use logistic regression to explore the relationship between a binary response variable and multiple predictor variables
- Conduct exploratory data analysis for logistic regression
- Interpret coefficients of logistic regression model

## Getting started

- A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
- Go to the sta210-s22 organization on GitHub. Click on the repo with the prefix
**lab-5**. It contains the starter documents you need to complete the lab. - Each person on the team should clone the repository and open a new project in RStudio. Throughout the lab, each person should get a chance to make commits and push to the repo.

## Packages

The following packages are used in the lab.

## Exercises

The goal of today’s lab is to use the GSS to examine the relationship between US adults’ political views and attitudes towards government spending on mass transportation projects.

### Part I: Exploratory data analysis

Let’s begin by making a binary variable for respondents’ views on spending on mass transportation. Create a new variable that is equal to “1” if a respondent said spending on mass transportation is about right and “0” otherwise. Then make a plot of the new variable, using informative labels for each category.

Recode

`polviews`

so it is a factor with levels that are in an order that is consistent with question on the survey.*Note how the categories are spelled in the data.*- Make a plot of the distribution of
`polviews`

. - Which political view occurs most frequently in this data set?

- Make a plot of the distribution of
Make a plot displaying the relationship between satisfaction with mass transportation spending and political views. Use the plot to describe the relationship the two variables.

We’d like to use

`age`

as a quantitative variable in your model; however, it is currently a character data type because some observations are coded as`"89 or older"`

.- Recode
`age`

so that is a numeric variable.*Note: Before making the variable numeric, you will need to replace the values*`"89 or older"`

with a single value. - Then plot the distribution of age.

- Recode

### Part II: Logistic regression model

Briefly explain why we should use a logistic regression model to predict the odds a randomly selected person is satisfied with spending on mass transportation.

Let’s start by fitting a model using the demographic factors -

`age`

,`sex`

,`sei10`

, and`region`

- to predict the odds a person is satisfied with spending on mass transportation. Make any necessary adjustments to the variables so the intercept will have a meaningful interpretation. Neatly display the model.Interpret the intercept in the context of the data.

Consider the relationship between age and one’s opinion about spending on mass transportation. Interpret the coefficient of age in terms of the

**odds**of being satisfied with spending on mass transportation.

## Submission

To submit your assignment:

- Go to http://www.gradescope.com and click
*Log in*in the top right corner. - Click
*School Credentials*➡️*Duke NetID*and log in using your NetID credentials. - Click on your
*STA 210*course. - Click on the assignment, and you’ll be prompted to submit it.
- Mark the pages associated with each exercise. All of the pages of your lab should be associated with at least one question (i.e., should be “checked”).
- Select the first page of your PDF submission to be associated with the
*“Workflow & formatting”*section.

## Grading

Total points available: 50 points.

Component | Points |
---|---|

Ex 1 - 10 | 45 |

Workflow & formatting | 5^{1} |

## Footnotes

The “Workflow & formatting” grade is to assess the reproducible workflow. This includes having at least 3 informative commit messages and updating the name and date in the YAML.↩︎