library(tidyverse)
library(tidymodels)
library(knitr)
library(openintro)
# fix data!
<- droplevels(loans_full_schema) loans_full_schema
AE 7: Exam 2 Review
Packages
Goal
Create a model for precicting interest_rate
.
View data
Note the dimensions of the data and the variable names. Review the data dictionary.
# add code here
Split data into training and testing
Split your data into testing and training sets.
# add code here
Write the model
Write the model for predicting interest rate (interest_rate
) from debt to income ratio (debt_to_income
), the term of loan (term
), the number of inquiries (credit checks) into the applicant’s credit during the last 12 months (inquiries_last_12m
), whether there are any bankruptcies listed in the public record for this applicant (bankrupt
), and the type of application (application_type
). The model should allow for the effect of to income ratio on interest rate to vary by application type.
Add model here
Exploration
Explore characteristics of the variables you’ll use for the model using the training data only.
# add code here
Specify model
Specify a linear regression model. Call it office_spec
.
# add code here
Create recipe
- Predict
interest_rate
fromdebt_to_income
,term
,inquiries_last_12m
,public_record_bankrupt
, andapplication_type
. - Mean center
debt_to_income
. - Make
term
a factor. - Create a new variable:
bankrupt
that takes on the value “no” ifpublic_record_bankrupt
is 0 and the value “yes” ifpublic_record_bankrupt
is 1 or higher. Then, removepublic_record_bankrupt
. - Interact
application_type
withdebt_to_income
. - Create dummy variables where needed and drop any zero variance variables.
# add code here
Create workflow
Create the workflow that brings together the model specification and recipe.
# add code here
Cross validation
Conduct 10-fold cross validation.
# add code here
Summarize CV metrics
Summarize metrics from your CV resamples.
# add code here
Why are we focusing on R-squared and RMSE instead of adjusted R-squared, AIC, BIC?
[Add response here]
Next steps…
Depending on time, either
- Create a workflow for another model with a new recipe (omitting the interaction variable), conduct CV, do model selection between these two, and then interpret the coefficients for the selected model.
- Or interpret the coefficients for the one model you fit.
Make sure to interpret the intercept and slope coefficient for at least one numerical, one categorical, and one interaction predictor.