Zer0 inflated Models for Inpatient Admissions

We consider Unicare - Study vs all other members, using cohort, as well as select other covariates. Even in the event ignorability, balance, and other assumptions, they often result in reduced bias and efficiency (lower variance) of estimates.

Regression Models

Negative binomial regression
Zero inflated Poisson, Zero negative binomial
Bayesian versions of the above

Modeling Hospital Admissions counts

We consider admissions occurring within 270 day window of observation.

cohort	count
control	881
ibis	82

Admissions counts

Overall

mean	variance
0.372	1.05

By cohort

cohort	mean	variance
control	0.390	1.120
ibis	0.171	0.217

With overall variance not equal to the mean, a Poisson model is not appropriate. We consider a negative binomial model, which allows for overdispersion. But note that the dispersion parameter is a global parameter, and not adjusted for cohorts separately. That said, as it is a function of the mean, the variance given by the model will be adjusted for cohort.

Admissions counts; negative binomial

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 \end{aligned} \]

term	estimate	p.value	exp_estimate
(Intercept)	-0.99	0	0.372

\(\theta\): 0.229

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \end{aligned} \]

term	estimate	p.value	exp_estimate
(Intercept)	-0.940	0.000	0.390
cohortibis	-0.827	0.022	0.437

\(\theta\): 0.235

Model comparison

Negative binomial with vs without cohort as predictor

Likelihood ratio test, ANOVA, p-value on coefficient all equivalent

term	estimate	std.error	statistic	p.value
(Intercept)	-0.940	0.088	-10.69	0.000
cohortibis	-0.827	0.362	-2.29	0.022

Bootstraps/predictive check, NB model for admissions counts

Model

\[\textrm{counts}_i \sim NB(\mu_i, \theta)\] \[\log \mu_i = \beta_0 + \beta_1 \textrm{cohort}_i\]

Observed vs expected admissions counts; negative binomial

for the control cohort

inpatient_count	frequency	prevalence	expected	prob
0	701	0.797	699.208	0.795
1	99	0.112	102.524	0.117
2	47	0.053	39.528	0.045
3	16	0.018	18.388	0.021
4	9	0.010	9.286	0.011
5	2	0.002	4.912	0.006
6	3	0.003	2.676	0.003
7	0	0.000	1.488	0.002
8	0	0.000	0.841	0.001
9	3	0.003	0.480	0.001
10	0	0.000	0.277	0.000

For the ibis cohort

inpatient_count	frequency	prevalence	expected	prob
0	71	0.866	72.126	0.880
1	8	0.098	7.130	0.087
2	3	0.037	1.853	0.023
3	0	0.000	0.581	0.007
4	0	0.000	0.198	0.002
5	0	0.000	0.071	0.001
6	0	0.000	0.026	0.000
7	0	0.000	0.010	0.000
8	0	0.000	0.004	0.000
9	0	0.000	0.001	0.000
10	0	0.000	0.001	0.000

Zero inflated negative binomial

\[ \begin{aligned} \textrm{counts}_i &\sim \text{ZINB}(\pi_i, \mu_i, \theta) \\ \log\left( \frac{\pi_i}{1 - \pi_i} \right) &= \gamma_0 + \gamma_1 \, \textrm{cohort}_i \\ \log(\mu_i) &= \beta_0 + \beta_1 \, \textrm{cohort}_i \end{aligned} \]

term	estimate
count_(Intercept)	-0.685
count_cohortibis	-1.082
zero_(Intercept)	-1.234
zero_cohortibis	-6.463

\(\theta\): 0.331

Zero inflated negative binomial

Zero inflation probabilities

control: \(logit^{-1}\) -1.234 = 0.225
ibis: \(logit^{-1}\) (-1.234 + -6.463) = 4.539^{-4}

Count mean for ibis also reduced by factor of exp(-1.082) = 0.339.

Observed vs expected admissions counts; zero inflated negative binomial

for the control cohort

inpatient_count	frequency	prevalence	expected	prob
0	701	0.797	700.203	0.796
1	99	0.112	100.212	0.114
2	47	0.053	40.264	0.046
3	16	0.018	18.889	0.021
4	9	0.010	9.498	0.011
5	2	0.002	4.968	0.006
6	3	0.003	2.665	0.003
7	0	0.000	1.455	0.002
8	0	0.000	0.805	0.001
9	3	0.003	0.450	0.001
10	0	0.000	0.254	0.000

For the ibis cohort

inpatient_count	frequency	prevalence	expected	prob
0	71	0.866	71.456	0.871
1	8	0.098	8.044	0.098
2	3	0.037	1.823	0.022
3	0	0.000	0.482	0.006
4	0	0.000	0.137	0.002
5	0	0.000	0.040	0.000
6	0	0.000	0.012	0.000
7	0	0.000	0.004	0.000
8	0	0.000	0.001	0.000
9	0	0.000	0.000	0.000
10	0	0.000	0.000	0.000

Model comparison

Coefficients

Poisson and negative binomial; no zero inflation

model	(Intercept)	cohortibis	chf	atrial_fibrillation	age
Pois1	-1.31	-0.754	0.671	0.697	NA
Pois2	-1.52	-0.765	0.662	0.674	0.0030
NB1	-1.32	-0.644	0.676	0.689	NA
NB	-1.04	-0.616	0.703	0.727	-0.0042

Poisson and negative binomial; no zero inflation - long format with p-values

term	estimate	pval	model
(Intercept)	-1.3142	0.0000	Pois1
cohortibis	-0.7539	0.0059	Pois1
chf	0.6713	0.0000	Pois1
atrial_fibrillation	0.6968	0.0000	Pois1
(Intercept)	-1.5165	0.0000	Pois2
cohortibis	-0.7655	0.0053	Pois2
age	0.0030	0.5509	Pois2
chf	0.6621	0.0000	Pois2
atrial_fibrillation	0.6739	0.0000	Pois2
(Intercept)	-1.3199	0.0000	NB1
cohortibis	-0.6439	0.0641	NB1
chf	0.6763	0.0009	NB1
atrial_fibrillation	0.6891	0.0006	NB1
(Intercept)	-1.0430	0.0496	NB
cohortibis	-0.6159	0.0782	NB
age	-0.0042	0.5893	NB
chf	0.7028	0.0006	NB
atrial_fibrillation	0.7274	0.0005	NB

Zero inflated Poisson and negative binomial.

term	component	estimate	p.value	model
(Intercept)	count	0.0868	0.4623	ZIP1
cohortibis	count	-0.7174	0.0274	ZIP1
chf	count	0.3070	0.0297	ZIP1
atrial_fibrillation	count	0.2890	0.0388	ZIP1
(Intercept)	zero	0.0868	0.4623	ZIP1
cohortibis	zero	-0.7174	0.0274	ZIP1
chf	zero	0.3070	0.0297	ZIP1
atrial_fibrillation	zero	0.2890	0.0388	ZIP1
(Intercept)	count	-1.3200	0.0000	ZINB1
cohortibis	count	-0.6429	0.0708	ZINB1
chf	count	0.6763	0.0009	ZINB1
atrial_fibrillation	count	0.6894	0.0005	ZINB1
Log(theta)	count	-1.2508	0.0000	ZINB1
(Intercept)	zero	-1.3200	0.0000	ZINB1
cohortibis	zero	-0.6429	0.0708	ZINB1
chf	zero	0.6763	0.0009	ZINB1
atrial_fibrillation	zero	0.6894	0.0005	ZINB1
Log(theta)	zero	-1.2508	0.0000	ZINB1
(Intercept)	count	0.8308	0.0553	ZIP2
cohortibis	count	-0.6484	0.0495	ZIP2
age	count	-0.0109	0.0759	ZIP2
chf	count	0.3806	0.0095	ZIP2
atrial_fibrillation	count	0.3694	0.0118	ZIP2
(Intercept)	zero	0.8308	0.0553	ZIP2
cohortibis	zero	-0.6484	0.0495	ZIP2
age	zero	-0.0109	0.0759	ZIP2
chf	zero	0.3806	0.0095	ZIP2
atrial_fibrillation	zero	0.3694	0.0118	ZIP2
(Intercept)	count	-1.0428	0.0467	ZINB2
cohortibis	count	-0.6159	0.0872	ZINB2
age	count	-0.0042	0.5897	ZINB2
chf	count	0.7028	0.0008	ZINB2
atrial_fibrillation	count	0.7274	0.0006	ZINB2
Log(theta)	count	-1.2521	0.0000	ZINB2
(Intercept)	zero	-1.0428	0.0467	ZINB2
cohortibis	zero	-0.6159	0.0872	ZINB2
age	zero	-0.0042	0.5897	ZINB2
chf	zero	0.7028	0.0008	ZINB2
atrial_fibrillation	zero	0.7274	0.0006	ZINB2
Log(theta)	zero	-1.2521	0.0000	ZINB2

Fit

model	logLik	df	AIC	BIC
Pois1	-839	4	1685	1705
Pois2	-839	5	1687	1711
NB1	-706	5	1421	1445
NB2	-705	6	1423	1452
ZIP1	-742	5	1494	1518
ZINB1	-706	6	1423	1452
ZIP2	-740	6	1493	1522
ZINB2	-705	7	1425	1459

Bayesian models

Bayesian zero inflated Poisson regression

Model

\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 + \gamma_1 \textrm{condition\_count}_i\\ \end{aligned} \]

Model results; compare frequentist ZIP model

Bayes

component	term	Estimate	Est.Error	Q2.5	Q97.5
count	Intercept	0.118	0.111	-0.105	0.330
zero	zi_Intercept	0.938	0.110	0.716	1.149
count	cohortibis	-0.517	0.271	-1.057	0.003
count	chf	0.286	0.133	0.026	0.549
count	atrial_fibrillation	0.259	0.133	0.000	0.519

Frequentist

component	term	Estimate	Std. Error
count	(Intercept)	0.087	0.118
count	cohortibis	-0.717	0.325
count	chf	0.307	0.141
count	atrial_fibrillation	0.289	0.140
zero	(Intercept)	0.884	0.115

Posterior distributions

100 draws from the posterior predictive distribution

Post predictive checks

We can check how the distribution of proportions for different values of inpatient_count compares with those in the data.

Math notes

Linear models

Continuous response \(Y\) (eg, heart rate) as a function of predictors; aka covariates, \(X_j\) (eg, cholesterol, smoker)

\[Y \sim N(\mu , \sigma^2)\] where the conditional mean is a linear function of the predictors

\[\mu = E(Y |X ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

\(\beta_j\) are estimated by minimizing the sum of squares errors, \[\sum_i (y_i - (\beta_0 - \sum_j \beta_j x_{ij}))^2 = ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2_2\]
We can interpret the coefficients directly as effect sizes.

Generalized linear models

We model a function of the conditional mean as a linear function

\[g(\mu ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

\(g\) is called a link function.

Example: Logistic regression

\[Y \sim \textrm{Bernoulli}(\mu )\] Coin toss with probability \(\mu\) of heads, \(1 - \mu\) of tails.

\[g(\mu ) = \rm{logit} \ \mu := \log \left(\frac{\mu}{1 - \mu}\right) = X \boldsymbol{\beta}\]

\[\mu = \rm{logit}^{-1}(X \boldsymbol{\beta})\]

Likelihood

To find \(\beta_j\), we typically maximize the (log) likelihood function.

For logistic regression, with \(n\) independent observations \(y_i =\) 0 or 1, the joint probability is

\[p({\bf{y}} | X_i) = \Pi \ \mu_i^{y_i} (1 - \mu_i)^{1 - y_i}\]

The log likelihood is then

\[logLik(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log(\mu_i) + (1 - y_i)\log (1 - \mu_i)\]

Where are the \(\beta\)’s?

Recall that \(\mu_i = \rm{logit}^{-1}(X_i \boldsymbol{\beta})\))

Poisson regression

Counts \(Y\) modeled as

\[ p(Y = y; \mu) = \frac{\mu^y e^{-\mu }}{y!} \] with “canonical” link

\[ g(\mu ) = \log(\mu ) = X \boldsymbol{\beta} \]

Likelihood

\[logLik(\boldsymbol{\boldsymbol{\beta}}) = \sum_{i=1}^n \left( y_i \log \mu_i - \mu_i - \log (y_i!) \right)\]

Fun fact

For a linear model, the minimum sum of squares estimate is the same as the maximum likelihood estimate.

Models

Zero inflated Poisson regression
Zero inflated negative binomial regression
Bayes versions of the above

Negative binomial distribution

Often used on overdispersed data: \(\mu < \sigma^2\).

\[p(Y = y; \mu, \theta) = \binom{y + \theta - 1}{\theta} \left(\frac{\mu}{\mu + \theta}\right)^y \left(\frac{\theta}{\mu + \theta}\right)^\theta\]

Dispersion parameter \(\theta > 0\); fixed- does not vary with covariates like \(\mu\).
\(\theta\) is estimated by the model

Conditional mean \(E(Y|X) = \mu\). We again model with log link

\[\log \mu = X \boldsymbol{\beta}\]

Variance

\[Var(Y|X) = \mu + \mu^2/\theta\]

\(\theta\) part of the model fit - unlike \(\sigma^2\) in linear regression.

Zero inflated Poisson regression

Define \(Z \sim Bernoulli(\pi)\) - a coin flip with probability \(\pi\) of heads.

\[ Y = \begin{cases} 0 & \text{if} \ Z = 1 \\ \sim Pois(\mu) & \text{if} \ Z = 0 \end{cases} \]

\(Z\) models the zero-inflated portion - the “extra” zeros.
The additional count data, including zeros, is modeled with the Poisson distribution with mean \(\mu\).

\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]

(Recall Poisson distribution is \(P(y; \mu) = \mu^y e^{-\mu}/y!\))

\(\log \frac{\pi}{1 -\pi} =\) linear function of some covariates
\(\log(\mu) =\) linear function of some covariates

Zero inflated Poisson regression

We again estimate the coefficients using log likelihood.

\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]

Likelihood function

\[ \begin{align*} logLik(\mathbf{\beta}, \mathbf{\gamma}) = \sum_{i=1}^n \Big(& \mathbf{1}_{\{y_i = 0\}} \cdot \log \left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right) \\ & + \mathbf{1}_{\{y_i > 0\}} \cdot \left( \log(1 - \pi_i) + y_i \log(\mu_i) - \mu_i - \log(y_i!) \right) \Big) \end{align*} \]

Bayesian statistical modeling

Starts with Bayes’ Rule

The conditional probability of \(B_k\) given \(A\) is

\[\begin{aligned} P(B_k \ | \ A) & = \frac{P(A \ | \ B_k) P(B_k)}{P(A)} \\ & = \frac{P(A \ | \ B_k) P(B_k)}{\sum_i P(A \ | \ B_i) P(B_i)} \end{aligned}\]

where \(A\) and \(B_j\) are events with \(B_j\) a partition of the sample space. The vertical bar indicates conditional probability

The standard disease/positive test example

\[P(D | +) = \frac{P(+ | D) \ P(D)}{P(+ | D) P(D) + P(+ | \bar{D}) P(\bar{D})} \]

Bayes’ Rule for probability densities for model parameters

\[p(\boldsymbol{\theta} | y, X) = \frac{p(\boldsymbol{\theta}) p(y | \boldsymbol{\theta}, X)}{\int p(y | \boldsymbol{\theta}, X) p(\boldsymbol{\theta}) d\boldsymbol{\theta}}\]

random vector of model parameters \(\boldsymbol{\theta}\) (our \(\beta\)’s and \(\gamma\)’s)
fixed predictor data \(X\) and response data \(y\)
\(p(\boldsymbol{\theta})\) is the prior distribution for the parameters
\(p(y | \boldsymbol{\theta}, X)\) is the likelihood of the data given the parameters \(\theta\) - same as the likelihood function in frequentist statistics.
\(p(\theta | x)\) is the posterior distribution for the parameters \(\theta\).

The denominator is an integration over the parameter space; typically intractable \(\Rightarrow\) Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution.

Posterior predictive distribution

Once you have the \(\theta\) posterior, you can compute the posterior predictive distributions of a new observation \(x_0\), given the data \(X\).

\[\begin{align*} p(x_0|X) & = \int p(x_0, \theta | X)\, d\theta \\ \\ & = \int p(x_0| \theta, X) \, p(\theta |X) , d\theta \\ \\ & = \int p(x_0|\theta) \, p(\theta | X) \, d\theta\\ \end{align*}\]

Can also get a distribution for any statistic of interest.

Bayesian inference

Parameters are regarded as random variables, data fixed
We get an entire distribution for every parameter, and hence for any statistic we wish to compute
No p-values, no hypothesis testing!

Example

\(\pi\) is the probability of heads in a single coin toss,
data is the sequence of heads and tails in 100 tosses

By contrast, our maximum likelihood estimate of \(\pi\) would be the peak of the likelihood, which is the same as the posterior for flat prior.