Because many people are intrepreting logistic regressions, I wanted to quickly recap how to do that here. I strongly recommend this page at UCLA that covers both how to interpret logistic regression and how to create predicted probabilities with R. In this document, I’ll show how to do it both manually and with R with a bit less detail and complication than they do.

First things, first, let’s create a simple logistic regression on whether or not a car will have greater-than-average gas mileage.

```
mako.cars <- mtcars
mako.cars$mpg.gtavg <- mako.cars$mpg > mean(mako.cars$mpg)
m <- glm(mpg.gtavg ~ hp + gear, data=mako.cars, family=binomial("logit"))
```

`## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred`

`summary(m)`

```
##
## Call:
## glm(formula = mpg.gtavg ~ hp + gear, family = binomial("logit"),
## data = mako.cars)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.56988 -0.00001 0.00000 0.00843 1.53326
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 26.7692 17.5568 1.525 0.1273
## hp -0.3387 0.1974 -1.716 0.0862 .
## gear 3.2290 2.6792 1.205 0.2281
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 43.8601 on 31 degrees of freedom
## Residual deviance: 5.8697 on 29 degrees of freedom
## AIC: 11.87
##
## Number of Fisher Scoring iterations: 10
```

As you can see, there are no particularly well-estimated effects (i.e., the standard errors are quite large). That said, we can try to intrepret the coefficients for pedagogical purposes in any case.

Interpret coefficients directly in logistical regression is relatively straighforward. If we look at the effect of the variable `hp`

, we can see that the coefficient is -0.3387. Coefficients in logistic regression are *logged odds ratio*. The easiest way to intrepret these are to use the `exp()`

function where the argument is the coefficient itself to turn into a normal odds ratio.

For example, if we look at the estimate for `hp`

we would get:

`exp(-0.3378)`

`## [1] 0.7133379`

*Interpretation:* In this case, we would say that a one unit increase in horsepower is associated with odds of being above average in mileage that are 0.71 times as large (i.e., 71% as large). That’s a big effect!

Because odds are not very easy to intrepret for folks that are not from gambling families, I always suggest that folks interpret logistical regression in terms of probabilities instead of just in terms of odds ratios.

Converting to odds from logistic regression is a bit complicated. The idea of predicted probabilities is basically that you will plug numbers into your fitted logistic regression and report the probabilities that your model predicts for what we might of of as hypothetical — or prototypical — individauls.

The standard logistic function is:

\[\frac{1}{1+e^{-k}}\]

In other words, to turn the results from our logistic regression above back into a probability, we plug essentially model in for \(k\):

\[\frac{1}{1+e^{-1(\beta_0 + \beta_1\mathrm{hp} + \beta_2\mathrm{gear})}}\]

In our fitted model above, we know that \(\beta_0 = 26.7692\), \(\beta_1 = -0.3387\), \(\beta_2 = 3.2290\). In order to convert these back to probabilities, we use the logistic function. There’s good information on how this works in practice in the Wikipedia article on logistic regression.

For this example, lets create predicted probabilities for two prototypical cars, both with three gears: one with 100 horsepower and one with 120 horse power. First, lets plug in the numbers:

`26.7692 + (-0.3387 * 100) + (3.2290 * 3) # a car with 50 hp and 3 gears`

`## [1] 2.5862`

`26.7692 + (-0.3387 * 120) + (3.2290 * 3) # a car with 150 hp and 3 gears`

`## [1] -4.1878`

These numbers reflect \(k\) in the first equation before. We can now just plug these in that equation (and remember we have to use the *negative* version of the number):

`1/(1+exp(-1*2.5862))`

`## [1] 0.9299681`

`1/(1+exp(-1*-4.1878))`

`## [1] 0.01495267`

**Interpretation:** In other words, our model predicts that a car with three gears and 100 horse power will have above average mileage 93% of the time and a car with 120 horsepower will have below average mileage 1.5% of the time.

`predict()`

in RYou can do the same thing in R using the `predict()`

function. To do so, we make a dataset that includes individuals we would like to predict. For example, if i wanted to represent the two individauls above, I could do:

```
prototypical.cars <- data.frame(gear=3, hp=c(100, 120))
prototypical.cars
```

```
## gear hp
## 1 3 100
## 2 3 120
```

If I had many more variables, I would need columns for each of these. In general, I hold all my control variables at the median sample values for all of my prototypical individuals.

I can now use the `predict()`

function to create a new variable. We use the option `type="response"`

to have it give us predicted probabilities:

`predict(m, prototypical.cars, type="response")`

```
## 1 2
## 0.9296423 0.0148648
```

These numbers look extremely similar. They’re not exactly the same because of rounding issues above.