# Lab: Discriminant Functions

## R-squared

This question requires some on-your-own reading. Linear regression models are often evaluated using metrics such as r-squared, otherwise known as the coefficient of determination. Read about it here.

The basic idea behind sums of squares is:

- Find the differences between one point and another static value (like the mean of a data set). Repeat for all points.
- Square each difference.
- Add all squared differences together.

For the following questions, use this formula for R-squared.

- is the sum of squares between the actual y values and predicted y values and the actual values.
- is the sum of squares between the actual y values and the mean of the y values.

By dividing the two, you get an idea about how far the actual points are away from your prediction line, compared to how far away each point is to the average of all actual points (a flat horizontal line). It tells you how “good”, or how predictive, your model is. The lower the SS_res, the higher the resultant r-squared.

Assume that you have a model that makes the predictions found in the file cars.csv. Use whatever tool you like to answer the following questions.

##### Question 1:

What is SSres for the data above?

##### Question 2:

What is SS_tot for the data above?

##### Question 3:

What is the R-squared for the data above?

## Logistic Regresion

Let’s say you induce a logistic regression model to predict whether someone will default on a loan.

The main effect of `dti`

(debt to income ratio), the main effect of `grade`

, and the interaction of `dti`

and `grade`

were modeled to predict the likelihood of defaulting. An “interaction” is just like a regular feature, except that the interaction “weight” is assigned to the multiplication of two features. In the below case, it in effect gives a differential slope for `dti`

to individuals in different grades. The “main effect” of `dti`

is specified, as is the “main effect” for the different grades, along with an “interaction effect” for `dti:grade`

.

If you are curious: In an R formula, `y ~ dti * grade`

expands to `y ~ dti + grade + dti:grade`

,
where a colon specifies an interaction.

The model output provides the following summary:

*dti* is a ratio calculated using the borrower’s total monthly debt
payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self reported monthly income.

*loan-status* is a binary feature coded as: 0 = Fully Paid and 1 = Charged off (default).

*grade* is lending club assigned loan grade- “A” is good, “G” is bad, etc.

#### Question 4:

Refer to the model output above. What value would this model directly predict (y) for someone with a dti of .94 and a grade of D?

`y = 2.057`

`y = 2.027`

`y = -1.490`

`y = 2.028`

#### Question 5:

Refer to the model output above. What value would this model predict for someone with a dti of .3 and a grade of A?

`y = -3.502`

`y = 0.014`

`y = 0.048`

`y = -3.469`

#### Question 6:

Assume that the model above make a direct prediction of y = .2. What would this be in terms of probability?

`p = .200`

`p = .4`

`p = .550`

`p = .450`