library(ISLR)
library(tibble)
Default = ISLR::DefaultIllustrating Classification
Today’s example will build on material from the Principles lecture (earlier this week).
Predicting Defaults
Today, we will continue to use the ISLR data on defaults:
In our first breakout:
Clean the data so that the
defaultcolumn is a binary indicator for defaultBuild a logistic model to predict
defaultusing any combination of variables and interactions in the data. For now, just use your best judgement for choosing the variables and interactions.Use a Bayes Classifier cutoff of .50 to generate your classifier output. Do you need to alter the cutoff?
Back in class, let’s look at how we did. What variables were most useful in explaining default?
In our second breakout, we will create a ROC curve manually. To do this
Take your model from the first breakout, and using a loop (or
sapply), step through a large number of possible cutoffs for classification ranging from 0 to 1.For each cutoff, generate a confusion matrix with accuracy, sensitivity and specificity.
Combine the cutoff with the sensitivity and specificity results and make a ROC plot. Use
ggplotfor your plot and map the color aesthetic to the cutoff value.Calculate the AUC (the area under the curve). This is a little tricky but can be done with your data.