Important

Go to the course GitHub organization and locate the repo titled `ae-7-exam-2-review-YOUR_GITHUB_USERNAME` to get started.

## Packages

``````library(tidyverse)
library(tidymodels)
library(knitr)
library(openintro)

# fix data!
loans_full_schema <- droplevels(loans_full_schema)``````

## Goal

Create a model for precicting `interest_rate`.

## View data

Note the dimensions of the data and the variable names. Review the data dictionary.

``# add code here``

## Split data into training and testing

Split your data into testing and training sets.

``# add code here``

## Write the model

Write the model for predicting interest rate (`interest_rate`) from debt to income ratio (`debt_to_income`), the term of loan (`term`), the number of inquiries (credit checks) into the applicant’s credit during the last 12 months (`inquiries_last_12m`), whether there are any bankruptcies listed in the public record for this applicant (`bankrupt`), and the type of application (`application_type`). The model should allow for the effect of to income ratio on interest rate to vary by application type.

## Exploration

Explore characteristics of the variables you’ll use for the model using the training data only.

``# add code here``

## Specify model

Specify a linear regression model. Call it `office_spec`.

``# add code here``

## Create recipe

• Predict `interest_rate` from `debt_to_income`, `term`, `inquiries_last_12m`, `public_record_bankrupt`, and `application_type`.
• Mean center `debt_to_income`.
• Make `term` a factor.
• Create a new variable: `bankrupt` that takes on the value “no” if `public_record_bankrupt` is 0 and the value “yes” if `public_record_bankrupt` is 1 or higher. Then, remove `public_record_bankrupt`.
• Interact `application_type` with `debt_to_income`.
• Create dummy variables where needed and drop any zero variance variables.
``# add code here``

## Create workflow

Create the workflow that brings together the model specification and recipe.

``# add code here``

## Cross validation

Conduct 10-fold cross validation.

``# add code here``

## Summarize CV metrics

Summarize metrics from your CV resamples.

``# add code here``

Why are we focusing on R-squared and RMSE instead of adjusted R-squared, AIC, BIC?