cohort | count |
---|---|
control | 881 |
ibis | 82 |
Zer0 inflated Models for Inpatient Admissions
We consider Unicare - Study
vs all other members, using cohort
, as well as select other covariates. Even in the event ignorability, balance, and other assumptions, they often result in reduced bias and efficiency (lower variance) of estimates.
Regression Models
Negative binomial regression
Zero inflated Poisson, Zero negative binomial
Bayesian versions of the above
Modeling Hospital Admissions counts
We consider admissions occurring within 270 day window of observation.
Admissions counts
Overall
mean | variance |
---|---|
0.372 | 1.05 |
By cohort
cohort | mean | variance |
---|---|---|
control | 0.390 | 1.120 |
ibis | 0.171 | 0.217 |
With overall variance not equal to the mean, a Poisson model is not appropriate. We consider a negative binomial model, which allows for overdispersion. But note that the dispersion parameter is a global parameter, and not adjusted for cohorts separately. That said, as it is a function of the mean, the variance given by the model will be adjusted for cohort.
Admissions counts; negative binomial
\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 \end{aligned} \]
term | estimate | p.value | exp_estimate |
---|---|---|---|
(Intercept) | -0.99 | 0 | 0.372 |
\(\theta\): 0.229
\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \end{aligned} \]
term | estimate | p.value | exp_estimate |
---|---|---|---|
(Intercept) | -0.940 | 0.000 | 0.390 |
cohortibis | -0.827 | 0.022 | 0.437 |
\(\theta\): 0.235
Model comparison
Negative binomial with vs without cohort
as predictor
Likelihood ratio test, ANOVA, p-value on coefficient all equivalent
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.940 | 0.088 | -10.69 | 0.000 |
cohortibis | -0.827 | 0.362 | -2.29 | 0.022 |
Bootstraps/predictive check, NB model for admissions counts
Model
\[\textrm{counts}_i \sim NB(\mu_i, \theta)\] \[\log \mu_i = \beta_0 + \beta_1 \textrm{cohort}_i\]
Observed vs expected admissions counts; negative binomial
for the control cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 701 | 0.797 | 699.208 | 0.795 |
1 | 99 | 0.112 | 102.524 | 0.117 |
2 | 47 | 0.053 | 39.528 | 0.045 |
3 | 16 | 0.018 | 18.388 | 0.021 |
4 | 9 | 0.010 | 9.286 | 0.011 |
5 | 2 | 0.002 | 4.912 | 0.006 |
6 | 3 | 0.003 | 2.676 | 0.003 |
7 | 0 | 0.000 | 1.488 | 0.002 |
8 | 0 | 0.000 | 0.841 | 0.001 |
9 | 3 | 0.003 | 0.480 | 0.001 |
10 | 0 | 0.000 | 0.277 | 0.000 |
For the ibis cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 71 | 0.866 | 72.126 | 0.880 |
1 | 8 | 0.098 | 7.130 | 0.087 |
2 | 3 | 0.037 | 1.853 | 0.023 |
3 | 0 | 0.000 | 0.581 | 0.007 |
4 | 0 | 0.000 | 0.198 | 0.002 |
5 | 0 | 0.000 | 0.071 | 0.001 |
6 | 0 | 0.000 | 0.026 | 0.000 |
7 | 0 | 0.000 | 0.010 | 0.000 |
8 | 0 | 0.000 | 0.004 | 0.000 |
9 | 0 | 0.000 | 0.001 | 0.000 |
10 | 0 | 0.000 | 0.001 | 0.000 |
Zero inflated negative binomial
\[ \begin{aligned} \textrm{counts}_i &\sim \text{ZINB}(\pi_i, \mu_i, \theta) \\ \log\left( \frac{\pi_i}{1 - \pi_i} \right) &= \gamma_0 + \gamma_1 \, \textrm{cohort}_i \\ \log(\mu_i) &= \beta_0 + \beta_1 \, \textrm{cohort}_i \end{aligned} \]
term | estimate |
---|---|
count_(Intercept) | -0.685 |
count_cohortibis | -1.082 |
zero_(Intercept) | -1.234 |
zero_cohortibis | -6.463 |
\(\theta\): 0.331
Zero inflated negative binomial
Zero inflation probabilities
control: \(logit^{-1}\) -1.234 = 0.225
ibis: \(logit^{-1}\) (-1.234 + -6.463) = 4.539^{-4}
Count mean for ibis also reduced by factor of exp(-1.082) = 0.339.
Observed vs expected admissions counts; zero inflated negative binomial
for the control cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 701 | 0.797 | 700.203 | 0.796 |
1 | 99 | 0.112 | 100.212 | 0.114 |
2 | 47 | 0.053 | 40.264 | 0.046 |
3 | 16 | 0.018 | 18.889 | 0.021 |
4 | 9 | 0.010 | 9.498 | 0.011 |
5 | 2 | 0.002 | 4.968 | 0.006 |
6 | 3 | 0.003 | 2.665 | 0.003 |
7 | 0 | 0.000 | 1.455 | 0.002 |
8 | 0 | 0.000 | 0.805 | 0.001 |
9 | 3 | 0.003 | 0.450 | 0.001 |
10 | 0 | 0.000 | 0.254 | 0.000 |
For the ibis cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 71 | 0.866 | 71.456 | 0.871 |
1 | 8 | 0.098 | 8.044 | 0.098 |
2 | 3 | 0.037 | 1.823 | 0.022 |
3 | 0 | 0.000 | 0.482 | 0.006 |
4 | 0 | 0.000 | 0.137 | 0.002 |
5 | 0 | 0.000 | 0.040 | 0.000 |
6 | 0 | 0.000 | 0.012 | 0.000 |
7 | 0 | 0.000 | 0.004 | 0.000 |
8 | 0 | 0.000 | 0.001 | 0.000 |
9 | 0 | 0.000 | 0.000 | 0.000 |
10 | 0 | 0.000 | 0.000 | 0.000 |
Model comparison
Coefficients
Poisson and negative binomial; no zero inflation
Poisson and negative binomial; no zero inflation
model | (Intercept) | cohortibis | chf | atrial_fibrillation | age |
---|---|---|---|---|---|
Pois1 | -1.31 | -0.754 | 0.671 | 0.697 | NA |
Pois2 | -1.52 | -0.765 | 0.662 | 0.674 | 0.0030 |
NB1 | -1.32 | -0.644 | 0.676 | 0.689 | NA |
NB | -1.04 | -0.616 | 0.703 | 0.727 | -0.0042 |
Poisson and negative binomial; no zero inflation - long format with p-values
term | estimate | pval | model |
---|---|---|---|
(Intercept) | -1.3142 | 0.0000 | Pois1 |
cohortibis | -0.7539 | 0.0059 | Pois1 |
chf | 0.6713 | 0.0000 | Pois1 |
atrial_fibrillation | 0.6968 | 0.0000 | Pois1 |
(Intercept) | -1.5165 | 0.0000 | Pois2 |
cohortibis | -0.7655 | 0.0053 | Pois2 |
age | 0.0030 | 0.5509 | Pois2 |
chf | 0.6621 | 0.0000 | Pois2 |
atrial_fibrillation | 0.6739 | 0.0000 | Pois2 |
(Intercept) | -1.3199 | 0.0000 | NB1 |
cohortibis | -0.6439 | 0.0641 | NB1 |
chf | 0.6763 | 0.0009 | NB1 |
atrial_fibrillation | 0.6891 | 0.0006 | NB1 |
(Intercept) | -1.0430 | 0.0496 | NB |
cohortibis | -0.6159 | 0.0782 | NB |
age | -0.0042 | 0.5893 | NB |
chf | 0.7028 | 0.0006 | NB |
atrial_fibrillation | 0.7274 | 0.0005 | NB |
Zero inflated Poisson and negative binomial.
term | component | estimate | p.value | model |
---|---|---|---|---|
(Intercept) | count | 0.0868 | 0.4623 | ZIP1 |
cohortibis | count | -0.7174 | 0.0274 | ZIP1 |
chf | count | 0.3070 | 0.0297 | ZIP1 |
atrial_fibrillation | count | 0.2890 | 0.0388 | ZIP1 |
(Intercept) | zero | 0.0868 | 0.4623 | ZIP1 |
cohortibis | zero | -0.7174 | 0.0274 | ZIP1 |
chf | zero | 0.3070 | 0.0297 | ZIP1 |
atrial_fibrillation | zero | 0.2890 | 0.0388 | ZIP1 |
(Intercept) | count | -1.3200 | 0.0000 | ZINB1 |
cohortibis | count | -0.6429 | 0.0708 | ZINB1 |
chf | count | 0.6763 | 0.0009 | ZINB1 |
atrial_fibrillation | count | 0.6894 | 0.0005 | ZINB1 |
Log(theta) | count | -1.2508 | 0.0000 | ZINB1 |
(Intercept) | zero | -1.3200 | 0.0000 | ZINB1 |
cohortibis | zero | -0.6429 | 0.0708 | ZINB1 |
chf | zero | 0.6763 | 0.0009 | ZINB1 |
atrial_fibrillation | zero | 0.6894 | 0.0005 | ZINB1 |
Log(theta) | zero | -1.2508 | 0.0000 | ZINB1 |
(Intercept) | count | 0.8308 | 0.0553 | ZIP2 |
cohortibis | count | -0.6484 | 0.0495 | ZIP2 |
age | count | -0.0109 | 0.0759 | ZIP2 |
chf | count | 0.3806 | 0.0095 | ZIP2 |
atrial_fibrillation | count | 0.3694 | 0.0118 | ZIP2 |
(Intercept) | zero | 0.8308 | 0.0553 | ZIP2 |
cohortibis | zero | -0.6484 | 0.0495 | ZIP2 |
age | zero | -0.0109 | 0.0759 | ZIP2 |
chf | zero | 0.3806 | 0.0095 | ZIP2 |
atrial_fibrillation | zero | 0.3694 | 0.0118 | ZIP2 |
(Intercept) | count | -1.0428 | 0.0467 | ZINB2 |
cohortibis | count | -0.6159 | 0.0872 | ZINB2 |
age | count | -0.0042 | 0.5897 | ZINB2 |
chf | count | 0.7028 | 0.0008 | ZINB2 |
atrial_fibrillation | count | 0.7274 | 0.0006 | ZINB2 |
Log(theta) | count | -1.2521 | 0.0000 | ZINB2 |
(Intercept) | zero | -1.0428 | 0.0467 | ZINB2 |
cohortibis | zero | -0.6159 | 0.0872 | ZINB2 |
age | zero | -0.0042 | 0.5897 | ZINB2 |
chf | zero | 0.7028 | 0.0008 | ZINB2 |
atrial_fibrillation | zero | 0.7274 | 0.0006 | ZINB2 |
Log(theta) | zero | -1.2521 | 0.0000 | ZINB2 |
Fit
model | logLik | df | AIC | BIC |
---|---|---|---|---|
Pois1 | -839 | 4 | 1685 | 1705 |
Pois2 | -839 | 5 | 1687 | 1711 |
NB1 | -706 | 5 | 1421 | 1445 |
NB2 | -705 | 6 | 1423 | 1452 |
ZIP1 | -742 | 5 | 1494 | 1518 |
ZINB1 | -706 | 6 | 1423 | 1452 |
ZIP2 | -740 | 6 | 1493 | 1522 |
ZINB2 | -705 | 7 | 1425 | 1459 |
Bayesian models
Bayesian zero inflated Poisson regression
Model
\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 + \gamma_1 \textrm{condition\_count}_i\\ \end{aligned} \]
Model results; compare frequentist ZIP model
Bayes
component | term | Estimate | Est.Error | Q2.5 | Q97.5 |
---|---|---|---|---|---|
count | Intercept | 0.118 | 0.111 | -0.105 | 0.330 |
zero | zi_Intercept | 0.938 | 0.110 | 0.716 | 1.149 |
count | cohortibis | -0.517 | 0.271 | -1.057 | 0.003 |
count | chf | 0.286 | 0.133 | 0.026 | 0.549 |
count | atrial_fibrillation | 0.259 | 0.133 | 0.000 | 0.519 |
Frequentist
component | term | Estimate | Std. Error |
---|---|---|---|
count | (Intercept) | 0.087 | 0.118 |
count | cohortibis | -0.717 | 0.325 |
count | chf | 0.307 | 0.141 |
count | atrial_fibrillation | 0.289 | 0.140 |
zero | (Intercept) | 0.884 | 0.115 |
Posterior distributions
100 draws from the posterior predictive distribution
Post predictive checks
We can check how the distribution of proportions for different values of inpatient_count
compares with those in the data.
Math notes
Linear models
Continuous response \(Y\) (eg, heart rate) as a function of predictors; aka covariates, \(X_j\) (eg, cholesterol, smoker)
\[Y \sim N(\mu , \sigma^2)\] where the conditional mean is a linear function of the predictors
\[\mu = E(Y |X ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]
\(\beta_j\) are estimated by minimizing the sum of squares errors, \[\sum_i (y_i - (\beta_0 - \sum_j \beta_j x_{ij}))^2 = ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2_2\]
We can interpret the coefficients directly as effect sizes.
Generalized linear models
We model a function of the conditional mean as a linear function
\[g(\mu ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]
\(g\) is called a link function.
Example: Logistic regression
\[Y \sim \textrm{Bernoulli}(\mu )\] Coin toss with probability \(\mu\) of heads, \(1 - \mu\) of tails.
\[g(\mu ) = \rm{logit} \ \mu := \log \left(\frac{\mu}{1 - \mu}\right) = X \boldsymbol{\beta}\]
\[\mu = \rm{logit}^{-1}(X \boldsymbol{\beta})\]
Likelihood
To find \(\beta_j\), we typically maximize the (log) likelihood function.
For logistic regression, with \(n\) independent observations \(y_i =\) 0 or 1, the joint probability is
\[p({\bf{y}} | X_i) = \Pi \ \mu_i^{y_i} (1 - \mu_i)^{1 - y_i}\]
The log likelihood is then
\[logLik(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log(\mu_i) + (1 - y_i)\log (1 - \mu_i)\]
Where are the \(\beta\)’s?
Recall that \(\mu_i = \rm{logit}^{-1}(X_i \boldsymbol{\beta})\))
Poisson regression
Counts \(Y\) modeled as
\[ p(Y = y; \mu) = \frac{\mu^y e^{-\mu }}{y!} \] with “canonical” link
\[ g(\mu ) = \log(\mu ) = X \boldsymbol{\beta} \]
Likelihood
\[logLik(\boldsymbol{\boldsymbol{\beta}}) = \sum_{i=1}^n \left( y_i \log \mu_i - \mu_i - \log (y_i!) \right)\]
Fun fact
For a linear model, the minimum sum of squares estimate is the same as the maximum likelihood estimate.
Models
Zero inflated Poisson regression
Zero inflated negative binomial regression
Bayes versions of the above
Negative binomial distribution
Often used on overdispersed data: \(\mu < \sigma^2\).
\[p(Y = y; \mu, \theta) = \binom{y + \theta - 1}{\theta} \left(\frac{\mu}{\mu + \theta}\right)^y \left(\frac{\theta}{\mu + \theta}\right)^\theta\]
Dispersion parameter \(\theta > 0\); fixed- does not vary with covariates like \(\mu\).
\(\theta\) is estimated by the model
Conditional mean \(E(Y|X) = \mu\). We again model with log link
\[\log \mu = X \boldsymbol{\beta}\]
Variance
\[Var(Y|X) = \mu + \mu^2/\theta\]
\(\theta\) part of the model fit - unlike \(\sigma^2\) in linear regression.
Zero inflated Poisson regression
Define \(Z \sim Bernoulli(\pi)\) - a coin flip with probability \(\pi\) of heads.
\[ Y = \begin{cases} 0 & \text{if} \ Z = 1 \\ \sim Pois(\mu) & \text{if} \ Z = 0 \end{cases} \]
\(Z\) models the zero-inflated portion - the “extra” zeros.
The additional count data, including zeros, is modeled with the Poisson distribution with mean \(\mu\).
\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]
(Recall Poisson distribution is \(P(y; \mu) = \mu^y e^{-\mu}/y!\))
\(\log \frac{\pi}{1 -\pi} =\) linear function of some covariates
\(\log(\mu) =\) linear function of some covariates
Zero inflated Poisson regression
We again estimate the coefficients using log likelihood.
\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]
Likelihood function
\[ \begin{align*} logLik(\mathbf{\beta}, \mathbf{\gamma}) = \sum_{i=1}^n \Big(& \mathbf{1}_{\{y_i = 0\}} \cdot \log \left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right) \\ & + \mathbf{1}_{\{y_i > 0\}} \cdot \left( \log(1 - \pi_i) + y_i \log(\mu_i) - \mu_i - \log(y_i!) \right) \Big) \end{align*} \]
Bayesian statistical modeling
Starts with Bayes’ Rule
The conditional probability of \(B_k\) given \(A\) is
\[\begin{aligned} P(B_k \ | \ A) & = \frac{P(A \ | \ B_k) P(B_k)}{P(A)} \\ & = \frac{P(A \ | \ B_k) P(B_k)}{\sum_i P(A \ | \ B_i) P(B_i)} \end{aligned}\]
where \(A\) and \(B_j\) are events with \(B_j\) a partition of the sample space. The vertical bar indicates conditional probability
The standard disease/positive test example
\[P(D | +) = \frac{P(+ | D) \ P(D)}{P(+ | D) P(D) + P(+ | \bar{D}) P(\bar{D})} \]
Bayes’ Rule for probability densities for model parameters
\[p(\boldsymbol{\theta} | y, X) = \frac{p(\boldsymbol{\theta}) p(y | \boldsymbol{\theta}, X)}{\int p(y | \boldsymbol{\theta}, X) p(\boldsymbol{\theta}) d\boldsymbol{\theta}}\]
random vector of model parameters \(\boldsymbol{\theta}\) (our \(\beta\)’s and \(\gamma\)’s)
fixed predictor data \(X\) and response data \(y\)
\(p(\boldsymbol{\theta})\) is the prior distribution for the parameters
\(p(y | \boldsymbol{\theta}, X)\) is the likelihood of the data given the parameters \(\theta\) - same as the likelihood function in frequentist statistics.
\(p(\theta | x)\) is the posterior distribution for the parameters \(\theta\).
The denominator is an integration over the parameter space; typically intractable \(\Rightarrow\) Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution.
Posterior predictive distribution
Once you have the \(\theta\) posterior, you can compute the posterior predictive distributions of a new observation \(x_0\), given the data \(X\).
\[\begin{align*} p(x_0|X) & = \int p(x_0, \theta | X)\, d\theta \\ \\ & = \int p(x_0| \theta, X) \, p(\theta |X) , d\theta \\ \\ & = \int p(x_0|\theta) \, p(\theta | X) \, d\theta\\ \end{align*}\]
Can also get a distribution for any statistic of interest.
Bayesian inference
Parameters are regarded as random variables, data fixed
We get an entire distribution for every parameter, and hence for any statistic we wish to compute
No p-values, no hypothesis testing!
Example
\(\pi\) is the probability of heads in a single coin toss,
data is the sequence of heads and tails in 100 tosses
By contrast, our maximum likelihood estimate of \(\pi\) would be the peak of the likelihood, which is the same as the posterior for flat prior.