Because many people are intrepreting logistic regressions, I wanted to quickly recap how to do that here. I strongly recommend this page at UCLA that covers both how to interpret logistic regression and how to create predicted probabilities with R. In this document, I’ll show how to do it both manually and with R with a bit less detail and complication than they do.

First things, first, let’s create a simple logistic regression on whether or not a car will have greater-than-average gas mileage.

mako.cars <- mtcars
mako.cars$mpg.gtavg <- mako.cars$mpg > mean(mako.cars$mpg)
m <- glm(mpg.gtavg ~ hp + gear, data=mako.cars, family=binomial("logit"))
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(m)
## 
## Call:
## glm(formula = mpg.gtavg ~ hp + gear, family = binomial("logit"), 
##     data = mako.cars)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.56988  -0.00001   0.00000   0.00843   1.53326  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  26.7692    17.5568   1.525   0.1273  
## hp           -0.3387     0.1974  -1.716   0.0862 .
## gear          3.2290     2.6792   1.205   0.2281  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.8601  on 31  degrees of freedom
## Residual deviance:  5.8697  on 29  degrees of freedom
## AIC: 11.87
## 
## Number of Fisher Scoring iterations: 10

As you can see, there are no particularly well-estimated effects (i.e., the standard errors are quite large). That said, we can try to intrepret the coefficients for pedagogical purposes in any case.

Intrepreting Coefficients

Interpret coefficients directly in logistical regression is relatively straighforward. If we look at the effect of the variable hp, we can see that the coefficient is -0.3387. Coefficients in logistic regression are logged odds ratio. The easiest way to intrepret these are to use the exp() function where the argument is the coefficient itself to turn into a normal odds ratio.

For example, if we look at the estimate for hp we would get:

exp(-0.3378)
## [1] 0.7133379

Interpretation: In this case, we would say that a one unit increase in horsepower is associated with odds of being above average in mileage that are 0.71 times as large (i.e., 71% as large). That’s a big effect!

Predicted Probabilities By Hand

Because odds are not very easy to intrepret for folks that are not from gambling families, I always suggest that folks interpret logistical regression in terms of probabilities instead of just in terms of odds ratios.

Converting to odds from logistic regression is a bit complicated. The idea of predicted probabilities is basically that you will plug numbers into your fitted logistic regression and report the probabilities that your model predicts for what we might of of as hypothetical — or prototypical — individauls.

The standard logistic function is:

\[\frac{1}{1+e^{-k}}\]

In other words, to turn the results from our logistic regression above back into a probability, we plug essentially model in for \(k\):

\[\frac{1}{1+e^{-1(\beta_0 + \beta_1\mathrm{hp} + \beta_2\mathrm{gear})}}\]

In our fitted model above, we know that \(\beta_0 = 26.7692\), \(\beta_1 = -0.3387\), \(\beta_2 = 3.2290\). In order to convert these back to probabilities, we use the logistic function. There’s good information on how this works in practice in the Wikipedia article on logistic regression.

For this example, lets create predicted probabilities for two prototypical cars, both with three gears: one with 100 horsepower and one with 120 horse power. First, lets plug in the numbers:

26.7692 + (-0.3387 * 100) + (3.2290 * 3) # a car with 50 hp and 3 gears
## [1] 2.5862
26.7692 + (-0.3387 * 120) + (3.2290 * 3) # a car with 150 hp and 3 gears
## [1] -4.1878

These numbers reflect \(k\) in the first equation before. We can now just plug these in that equation (and remember we have to use the negative version of the number):

1/(1+exp(-1*2.5862))
## [1] 0.9299681
1/(1+exp(-1*-4.1878))
## [1] 0.01495267

Interpretation: In other words, our model predicts that a car with three gears and 100 horse power will have above average mileage 93% of the time and a car with 120 horsepower will have below average mileage 1.5% of the time.

Created predicted probabilities with predict() in R

You can do the same thing in R using the predict() function. To do so, we make a dataset that includes individuals we would like to predict. For example, if i wanted to represent the two individauls above, I could do:

prototypical.cars <- data.frame(gear=3, hp=c(100, 120))
prototypical.cars
##   gear  hp
## 1    3 100
## 2    3 120

If I had many more variables, I would need columns for each of these. In general, I hold all my control variables at the median sample values for all of my prototypical individuals.

I can now use the predict() function to create a new variable. We use the option type="response" to have it give us predicted probabilities:

predict(m, prototypical.cars, type="response")
##         1         2 
## 0.9296423 0.0148648

These numbers look extremely similar. They’re not exactly the same because of rounding issues above.