Zer0 inflated Models for Inpatient Admissions

We consider Unicare - Study vs all other members, using cohort, as well as select other covariates. Even in the event ignorability, balance, and other assumptions, they often result in reduced bias and efficiency (lower variance) of estimates.

Regression Models

  • Negative binomial regression

  • Zero inflated Poisson, Zero negative binomial

  • Bayesian versions of the above

Modeling Hospital Admissions counts

We consider admissions occurring within 270 day window of observation.

cohort count
control 881
ibis 82

Admissions counts

Overall

mean variance
0.372 1.05

By cohort

cohort mean variance
control 0.390 1.120
ibis 0.171 0.217

With overall variance not equal to the mean, a Poisson model is not appropriate. We consider a negative binomial model, which allows for overdispersion. But note that the dispersion parameter is a global parameter, and not adjusted for cohorts separately. That said, as it is a function of the mean, the variance given by the model will be adjusted for cohort.

Covariates, multicolinearity

Perhaps surprisingly, age and chronic conditions covariates do not exhibit high multicolinearity. This means not only that there are no high pairwise correlations, but that, for example, any one of them is not explainable by the others- regressing any one on the others would results in a low R squared value.

As an illustration, with condition count included, the covariates are completely colinear.

Low multicolinearity is a desirable feature for generalized linear models as it results in more stable coefficient estimates.

Condition number

We find the condition number of the model matrix with age and other covariates. This the square root of the ratio of max and min eigen values of the correlation matrix, and is thus a ratio of the standard deviations along components of the data with max variation and min variation,

\(\sqrt{\frac{\lambda_{max}}{\lambda_{min}}}\) = 2.303

As a rule of thumb, condition number > 30 is indication of high multicolinearity.

Correlations

Selecting covariates with low multicolinearity is also more straightforward, as simply selecting those most highly correlated with the outcome is more justifiable than in the case of high multicolinearity.

condition correlation
atrial_fibrillation 0.2136
chf 0.2113
age 0.1311
parkinsons 0.0792
chd 0.0733
copd 0.0712
lung_cancer 0.0701
anemia 0.0641
dementia 0.0634
endometrial_cancer 0.0489
urologic_cancer 0.0484
hypothyroidism 0.0449
alzheimers 0.0446
pneumonia 0.0405
kidney_disease 0.0356
hip_fracture 0.0345
heart_attack 0.0327
cataract 0.0286
anxiety 0.0257
mild_cognitive_impairment 0.0190
asthma 0.0180
hypertension 0.0092
glaucoma 0.0069
depression 0.0011
stroke -0.0036
osteoporosis -0.0066
hyperlipidemia -0.0140
hyperplasia -0.0205
diabetes -0.0207
breast_cancer -0.0215
arthritis -0.0233
prostate_cancer -0.0425
colorectal_cancer -0.0426

We select age, atrial_fibrillation, and chf, variously in the models below.

Admissions counts; negative binomial

We first compare intercept only and cohort models

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 \end{aligned} \]

term estimate p.value exp_estimate
(Intercept) -0.99 0 0.372

\(\theta\): 0.229

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \end{aligned} \]

term estimate p.value exp_estimate
(Intercept) -0.940 0.000 0.390
cohortibis -0.827 0.022 0.437

\(\theta\): 0.235

Model comparison

Negative binomial with vs without cohort as predictor


Likelihood ratio test, ANOVA, p-value on coefficient all equivalent

term estimate std.error statistic p.value
(Intercept) -0.940 0.088 -10.69 0.000
cohortibis -0.827 0.362 -2.29 0.022

Bootstraps/predictive check, NB model for admissions counts

Model

\[\textrm{counts}_i \sim NB(\mu_i, \theta)\] \[\log \mu_i = \beta_0 + \beta_1 \textrm{cohort}\]

Observed vs expected admissions counts; negative binomial

for the control cohort

inpatient_count frequency prevalence expected prob
0 701 0.797 699.208 0.795
1 99 0.112 102.524 0.117
2 47 0.053 39.528 0.045
3 16 0.018 18.388 0.021
4 9 0.010 9.286 0.011
5 2 0.002 4.912 0.006
6 3 0.003 2.676 0.003
7 0 0.000 1.488 0.002
8 0 0.000 0.841 0.001
9 3 0.003 0.480 0.001
10 0 0.000 0.277 0.000

For the ibis cohort

inpatient_count frequency prevalence expected prob
0 71 0.866 72.126 0.880
1 8 0.098 7.130 0.087
2 3 0.037 1.853 0.023
3 0 0.000 0.581 0.007
4 0 0.000 0.198 0.002
5 0 0.000 0.071 0.001
6 0 0.000 0.026 0.000
7 0 0.000 0.010 0.000
8 0 0.000 0.004 0.000
9 0 0.000 0.001 0.000
10 0 0.000 0.001 0.000

Zero inflated negative binomial

This example for illustration only

\[ \begin{aligned} \textrm{counts}_i &\sim \text{ZINB}(\pi_i, \mu_i, \theta) \\ \log\left( \frac{\pi_i}{1 - \pi_i} \right) &= \gamma_0 + \gamma_1 \, \textrm{cohort}_i \\ \log(\mu_i) &= \beta_0 + \beta_1 \, \textrm{cohort}_i \end{aligned} \]

term estimate
count_(Intercept) -0.685
count_cohortibis -1.082
zero_(Intercept) -1.234
zero_cohortibis -6.463

\(\theta\): 0.331

This example for illustration only The coeffs are not significant, but see more models below.

Zero inflation probabilities

  • control: \(logit^{-1}\) -1.234 = 0.225

  • ibis: \(logit^{-1}\) (-1.234 + -6.463) = 4.539^{-4}

Count mean for ibis also reduced by factor of exp(-1.082) = 0.339.

Observed vs expected admissions counts; zero inflated negative binomial

for the control cohort

inpatient_count frequency prevalence expected prob
0 701 0.797 700.203 0.796
1 99 0.112 100.212 0.114
2 47 0.053 40.264 0.046
3 16 0.018 18.889 0.021
4 9 0.010 9.498 0.011
5 2 0.002 4.968 0.006
6 3 0.003 2.665 0.003
7 0 0.000 1.455 0.002
8 0 0.000 0.805 0.001
9 3 0.003 0.450 0.001
10 0 0.000 0.254 0.000

For the ibis cohort

inpatient_count frequency prevalence expected prob
0 71 0.866 71.456 0.871
1 8 0.098 8.044 0.098
2 3 0.037 1.823 0.022
3 0 0.000 0.482 0.006
4 0 0.000 0.137 0.002
5 0 0.000 0.040 0.000
6 0 0.000 0.012 0.000
7 0 0.000 0.004 0.000
8 0 0.000 0.001 0.000
9 0 0.000 0.000 0.000
10 0 0.000 0.000 0.000

More models; summary and comparisons

Below we fit more models and summarize eight models. All the models use cohort, chf, and atrial_fibrillation as covariates.

  • zero inflated vs not

  • poisson vs negative binomial distribution

  • with and without age as a covariate.

Zero inflated age

Two of the models include age as a covariate for the zero inflation part of the model. We highlight those here. The full summary of all the models follows.

Zero inflated Poisson with age zero inflation

term component estimate p.value
(Intercept) count 1.7435 0.0000
cohortibis count -0.7188 0.0265
age count -0.0246 0.0001
chf count 0.4645 0.0013
atrial_fibrillation count 0.4299 0.0030
(Intercept) zero 4.2559 0.0000
age zero -0.0483 0.0000

Zero inflated negative binomial with age zero inflation

term component estimate p.value
(Intercept) count 1.2582 0.0722
cohortibis count -0.6441 0.0640
age count -0.0287 0.0017
chf count 0.7563 0.0002
atrial_fibrillation count 0.7098 0.0004
Log(theta) count -0.3909 0.2399
(Intercept) zero 4.2462 0.0000
age zero -0.0656 0.0000

Coefficients for all models

Poisson and negative binomial; no zero inflation

model (Intercept) cohortibis chf atrial_fibrillation age
Pois1 -1.31 -0.754 0.671 0.697 NA
Pois2 -1.52 -0.765 0.662 0.674 0.0030
NB1 -1.32 -0.644 0.676 0.689 NA
NB -1.04 -0.616 0.703 0.727 -0.0042

Poisson and negative binomial; no zero inflation - long format with p-values

term estimate pval model
(Intercept) -1.3142 0.0000 Pois1
cohortibis -0.7539 0.0059 Pois1
chf 0.6713 0.0000 Pois1
atrial_fibrillation 0.6968 0.0000 Pois1
(Intercept) -1.5165 0.0000 Pois2
cohortibis -0.7655 0.0053 Pois2
age 0.0030 0.5509 Pois2
chf 0.6621 0.0000 Pois2
atrial_fibrillation 0.6739 0.0000 Pois2
(Intercept) -1.3199 0.0000 NB1
cohortibis -0.6439 0.0641 NB1
chf 0.6763 0.0009 NB1
atrial_fibrillation 0.6891 0.0006 NB1
(Intercept) -1.0430 0.0496 NB
cohortibis -0.6159 0.0782 NB
age -0.0042 0.5893 NB
chf 0.7028 0.0006 NB
atrial_fibrillation 0.7274 0.0005 NB

Zero inflated Poisson and negative binomial.

term component estimate p.value model
(Intercept) count 0.0868 0.4623 ZIP1
cohortibis count -0.7174 0.0274 ZIP1
chf count 0.3070 0.0297 ZIP1
atrial_fibrillation count 0.2890 0.0388 ZIP1
(Intercept) zero 0.8746 0.0000 ZIP1
(Intercept) count -1.3200 0.0000 ZINB1
cohortibis count -0.6429 0.0708 ZINB1
chf count 0.6763 0.0009 ZINB1
atrial_fibrillation count 0.6894 0.0005 ZINB1
Log(theta) count -1.2508 0.0000 ZINB1
(Intercept) zero -9.8760 0.8917 ZINB1
(Intercept) count 0.8308 0.0553 ZIP2
cohortibis count -0.6484 0.0495 ZIP2
age count -0.0109 0.0759 ZIP2
chf count 0.3806 0.0095 ZIP2
atrial_fibrillation count 0.3694 0.0118 ZIP2
(Intercept) zero 0.8842 0.0000 ZIP2
(Intercept) count -1.0428 0.0467 ZINB2
cohortibis count -0.6159 0.0872 ZINB2
age count -0.0042 0.5897 ZINB2
chf count 0.7028 0.0008 ZINB2
atrial_fibrillation count 0.7274 0.0006 ZINB2
Log(theta) count -1.2521 0.0000 ZINB2
(Intercept) zero -9.0452 0.8551 ZINB2
(Intercept) count 1.7435 0.0000 ZIP3
cohortibis count -0.7188 0.0265 ZIP3
age count -0.0246 0.0001 ZIP3
chf count 0.4645 0.0013 ZIP3
atrial_fibrillation count 0.4299 0.0030 ZIP3
(Intercept) zero 4.2559 0.0000 ZIP3
age zero -0.0483 0.0000 ZIP3
(Intercept) count 1.2582 0.0722 ZINB3
cohortibis count -0.6441 0.0640 ZINB3
age count -0.0287 0.0017 ZINB3
chf count 0.7563 0.0002 ZINB3
atrial_fibrillation count 0.7098 0.0004 ZINB3
Log(theta) count -0.3909 0.2399 ZINB3
(Intercept) zero 4.2462 0.0000 ZINB3
age zero -0.0656 0.0000 ZINB3

Fit

model logLik df AIC BIC
Pois1 -839 4 1685 1705
Pois2 -839 5 1687 1711
NB1 -706 5 1421 1445
NB2 -705 6 1423 1452
ZIP1 -742 5 1494 1518
ZINB1 -706 6 1423 1452
ZIP2 -740 6 1493 1522
ZINB2 -705 7 1425 1459
ZIP3 -726 7 1467 1501
ZINB3 -698 8 1412 1451

Bayesian zero inflated Poisson regression

Bayesian models often offer some advantages. We get an entire distribution for every parameter modeled-not just point estimates and confidence intervals. We can also get distributions for any statistic.

We consider just one model here.

Model

\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i + \beta_2\textrm{chf}_i + \beta_3\textrm{atrial\fibrillation}_i\\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 \\ \end{aligned} \]

Model results; compare frequentist ZIP model

Bayes

component term Estimate Est.Error Q2.5 Q97.5
count Intercept 0.118 0.111 -0.105 0.330
zero zi_Intercept 0.938 0.110 0.716 1.149
count cohortibis -0.517 0.271 -1.057 0.003
count chf 0.286 0.133 0.026 0.549
count atrial_fibrillation 0.259 0.133 0.000 0.519

Frequentist

component term Estimate Std. Error
count (Intercept) 0.087 0.118
count cohortibis -0.717 0.325
count chf 0.307 0.141
count atrial_fibrillation 0.289 0.140
zero (Intercept) 0.884 0.115

Posterior distributions

100 draws from the posterior predictive distribution

Post predictive checks

We can check how the distribution of proportions for different values of inpatient_count compares with those in the data.

Math notes

Linear models

Continuous response \(Y\) (eg, heart rate) as a function of predictors; aka covariates, \(X_j\) (eg, cholesterol, smoker)

\[Y \sim N(\mu , \sigma^2)\] where the conditional mean is a linear function of the predictors

\[\mu = E(Y |X ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

  • \(\beta_j\) are estimated by minimizing the sum of squares errors, \[\sum_i (y_i - (\beta_0 - \sum_j \beta_j x_{ij}))^2 = ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2_2\]

  • We can interpret the coefficients directly as effect sizes.

Generalized linear models

We model a function of the conditional mean as a linear function

\[g(\mu ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

\(g\) is called a link function.

Example: Logistic regression

\[Y \sim \textrm{Bernoulli}(\mu )\] Coin toss with probability \(\mu\) of heads, \(1 - \mu\) of tails.

\[g(\mu ) = \rm{logit} \ \mu := \log \left(\frac{\mu}{1 - \mu}\right) = X \boldsymbol{\beta}\]

\[\mu = \rm{logit}^{-1}(X \boldsymbol{\beta})\]

Likelihood

To find \(\beta_j\), we typically maximize the (log) likelihood function.

For logistic regression, with \(n\) independent observations \(y_i =\) 0 or 1, the joint probability is

\[p({\bf{y}} | X_i) = \mu_i^{y_i} (1 - \mu_i)^{1 - y_i}\]

The log likelihood is then

\[logLik(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log(\mu_i) + (1 - y_i)\log (1 - \mu_i)\]

Where are the \(\beta\)’s?

Recall that \(\mu_i = \rm{logit}^{-1}(X_i \boldsymbol{\beta})\))

Poisson regression

Counts \(Y\) modeled as

\[ p(Y = y; \mu) = \frac{\mu^y e^{-\mu }}{y!} \] with “canonical” link

\[ g(\mu ) = \log(\mu ) = X \boldsymbol{\beta} \]

Likelihood

\[logLik(\boldsymbol{\boldsymbol{\beta}}) = \sum_{i=1}^n \left( y_i \log \mu_i - \mu_i - \log (y_i!) \right)\]

Fun fact

For a linear model, the minimum sum of squares estimate is the same as the maximum likelihood estimate.

Models

  • Zero inflated Poisson regression

  • Zero inflated negative binomial regression

  • Bayes versions of the above

Negative binomial distribution

Often used on overdispersed data: \(\mu < \sigma^2\).

\[p(Y = y; \mu, \theta) = \binom{y + \theta - 1}{\theta} \left(\frac{\mu}{\mu + \theta}\right)^y \left(\frac{\theta}{\mu + \theta}\right)^\theta\]

  • Dispersion parameter \(\theta > 0\); fixed- does not vary with covariates like \(\mu\).

  • \(\theta\) is estimated by the model

Conditional mean \(E(Y|X) = \mu\). We again model with log link

\[\log \mu = X \boldsymbol{\beta}\]

Variance

\[Var(Y|X) = \mu + \mu^2/\theta\]

\(\theta\) part of the model fit - unlike \(\sigma^2\) in linear regression.

Zero inflated Poisson regression

Define \(Z \sim Bernoulli(\pi)\) - a coin flip with probability \(\pi\) of heads.

\[ Y = \begin{cases} 0 & \text{if} \ Z = 1 \\ \sim Pois(\mu) & \text{if} \ Z = 0 \end{cases} \]

  • \(Z\) models the zero-inflated portion - the “extra” zeros.

  • The additional count data, including zeros, is modeled with the Poisson distribution with mean \(\mu\).

\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]

(Recall Poisson distribution is \(P(y; \mu) = \mu^y e^{-\mu}/y!\))

  • \(\log \frac{\pi}{1 -\pi} =\) linear function of some covariates

  • \(\log(\mu) =\) linear function of some covariates

Zero inflated Poisson regression

We again estimate the coefficients using log likelihood.

\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]

Likelihood function

\[ \begin{align*} logLik(\mathbf{\beta}, \mathbf{\gamma}) = \sum_{i=1}^n \Big(& \mathbf{1}_{\{y_i = 0\}} \cdot \log \left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right) \\ & + \mathbf{1}_{\{y_i > 0\}} \cdot \left( \log(1 - \pi_i) + y_i \log(\mu_i) - \mu_i - \log(y_i!) \right) \Big) \end{align*} \]

Bayesian statistical modeling

Starts with Bayes’ Rule

The conditional probability of \(B_k\) given \(A\) is

\[\begin{aligned} P(B_k \ | \ A) & = \frac{P(A \ | \ B_k) P(B_k)}{P(A)} \\ & = \frac{P(A \ | \ B_k) P(B_k)}{\sum_i P(A \ | \ B_i) P(B_i)} \end{aligned}\]

where \(A\) and \(B_j\) are events with \(B_j\) a partition of the sample space. The vertical bar indicates conditional probability

The standard disease/positive test example

\[P(D | +) = \frac{P(+ | D) \ P(D)}{P(+ | D) P(D) + P(+ | \bar{D}) P(\bar{D})} \]

Bayes’ Rule for probability densities for model parameters

\[p(\boldsymbol{\theta} | y, X) = \frac{p(\boldsymbol{\theta}) p(y | \boldsymbol{\theta}, X)}{\int p(y | \boldsymbol{\theta}, X) p(\boldsymbol{\theta}) d\boldsymbol{\theta}}\]

  • random vector of model parameters \(\boldsymbol{\theta}\) (our \(\beta\)’s and \(\gamma\)’s)

  • fixed predictor data \(X\) and response data \(y\)

  • \(p(\boldsymbol{\theta})\) is the prior distribution for the parameters

  • \(p(y | \boldsymbol{\theta}, X)\) is the likelihood of the data given the parameters \(\theta\) - same as the likelihood function in frequentist statistics.

  • \(p(\theta | x)\) is the posterior distribution for the parameters \(\theta\).

The denominator is an integration over the parameter space; typically intractable \(\Rightarrow\) Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution.

Posterior predictive distribution

Once you have the \(\theta\) posterior, you can compute the posterior predictive distributions of a new observation \(x_0\), given the data \(X\).

\[\begin{align*} p(x_0|X) & = \int p(x_0, \theta | X)\, d\theta \\ \\ & = \int p(x_0| \theta, X) \, p(\theta |X) , d\theta \\ \\ & = \int p(x_0|\theta) \, p(\theta | X) \, d\theta\\ \end{align*}\]

Can also get a distribution for any statistic of interest.

Bayesian inference

  • Parameters are regarded as random variables, data fixed

  • We get an entire distribution for every parameter, and hence for any statistic we wish to compute

  • No p-values, no hypothesis testing!

Example

  • \(\pi\) is the probability of heads in a single coin toss,

  • data is the sequence of heads and tails in 100 tosses

By contrast, our maximum likelihood estimate of \(\pi\) would be the peak of the likelihood, which is the same as the posterior for flat prior.