Zer0 inflated Models for Inpatient Admissions

We consider Unicare - Study vs all other members, using cohort, as well as select other covariates. Even in the event ignorability, balance, and other assumptions, they often result in reduced bias and efficiency (lower variance) of estimates.

Regression Models

Negative binomial regression
Zero inflated Poisson, Zero negative binomial
Bayesian versions of the above

Modeling Hospital Admissions counts

We consider admissions occurring within 270 day window of observation.

cohort	count
control	881
ibis	82

Admissions counts

Overall

mean	variance
0.372	1.05

By cohort

cohort	mean	variance
control	0.390	1.120
ibis	0.171	0.217

With overall variance not equal to the mean, a Poisson model is not appropriate. We consider a negative binomial model, which allows for overdispersion. But note that the dispersion parameter is a global parameter, and not adjusted for cohorts separately. That said, as it is a function of the mean, the variance given by the model will be adjusted for cohort.

Covariates, multicolinearity

Perhaps surprisingly, age and chronic conditions covariates do not exhibit high multicolinearity. This means not only that there are no high pairwise correlations, but that, for example, any one of them is not explainable by the others- regressing any one on the others would results in a low R squared value.

As an illustration, with condition count included, the covariates are completely colinear.

Low multicolinearity is a desirable feature for generalized linear models as it results in more stable coefficient estimates.

Condition number

We find the condition number of the model matrix with age and other covariates. This the square root of the ratio of max and min eigen values of the correlation matrix, and is thus a ratio of the standard deviations along components of the data with max variation and min variation,

\(\sqrt{\frac{\lambda_{max}}{\lambda_{min}}}\) = 2.303

As a rule of thumb, condition number > 30 is indication of high multicolinearity.

Correlations

Selecting covariates with low multicolinearity is also more straightforward, as simply selecting those most highly correlated with the outcome is more justifiable than in the case of high multicolinearity.

condition	correlation
atrial_fibrillation	0.2136
chf	0.2113
age	0.1311
parkinsons	0.0792
chd	0.0733
copd	0.0712
lung_cancer	0.0701
anemia	0.0641
dementia	0.0634
endometrial_cancer	0.0489
urologic_cancer	0.0484
hypothyroidism	0.0449
alzheimers	0.0446
pneumonia	0.0405
kidney_disease	0.0356
hip_fracture	0.0345
heart_attack	0.0327
cataract	0.0286
anxiety	0.0257
mild_cognitive_impairment	0.0190
asthma	0.0180
hypertension	0.0092
glaucoma	0.0069
depression	0.0011
stroke	-0.0036
osteoporosis	-0.0066
hyperlipidemia	-0.0140
hyperplasia	-0.0205
diabetes	-0.0207
breast_cancer	-0.0215
arthritis	-0.0233
prostate_cancer	-0.0425
colorectal_cancer	-0.0426

We select age, atrial_fibrillation, and chf, variously in the models below.

Admissions counts; negative binomial

We first compare intercept only and cohort models

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 \end{aligned} \]

term	estimate	p.value	exp_estimate
(Intercept)	-0.99	0	0.372

\(\theta\): 0.229

\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \end{aligned} \]

term	estimate	p.value	exp_estimate
(Intercept)	-0.940	0.000	0.390
cohortibis	-0.827	0.022	0.437

\(\theta\): 0.235

Model comparison

Negative binomial with vs without cohort as predictor

Likelihood ratio test, ANOVA, p-value on coefficient all equivalent

term	estimate	std.error	statistic	p.value
(Intercept)	-0.940	0.088	-10.69	0.000
cohortibis	-0.827	0.362	-2.29	0.022

Bootstraps/predictive check, NB model for admissions counts

Model

\[\textrm{counts}_i \sim NB(\mu_i, \theta)\] \[\log \mu_i = \beta_0 + \beta_1 \textrm{cohort}\]

Observed vs expected admissions counts; negative binomial

for the control cohort

inpatient_count	frequency	prevalence	expected	prob
0	701	0.797	699.208	0.795
1	99	0.112	102.524	0.117
2	47	0.053	39.528	0.045
3	16	0.018	18.388	0.021
4	9	0.010	9.286	0.011
5	2	0.002	4.912	0.006
6	3	0.003	2.676	0.003
7	0	0.000	1.488	0.002
8	0	0.000	0.841	0.001
9	3	0.003	0.480	0.001
10	0	0.000	0.277	0.000

For the ibis cohort

inpatient_count	frequency	prevalence	expected	prob
0	71	0.866	72.126	0.880
1	8	0.098	7.130	0.087
2	3	0.037	1.853	0.023
3	0	0.000	0.581	0.007
4	0	0.000	0.198	0.002
5	0	0.000	0.071	0.001
6	0	0.000	0.026	0.000
7	0	0.000	0.010	0.000
8	0	0.000	0.004	0.000
9	0	0.000	0.001	0.000
10	0	0.000	0.001	0.000

Zero inflated negative binomial

This example for illustration only

\[ \begin{aligned} \textrm{counts}_i &\sim \text{ZINB}(\pi_i, \mu_i, \theta) \\ \log\left( \frac{\pi_i}{1 - \pi_i} \right) &= \gamma_0 + \gamma_1 \, \textrm{cohort}_i \\ \log(\mu_i) &= \beta_0 + \beta_1 \, \textrm{cohort}_i \end{aligned} \]

term	estimate
count_(Intercept)	-0.685
count_cohortibis	-1.082
zero_(Intercept)	-1.234
zero_cohortibis	-6.463

\(\theta\): 0.331

This example for illustration only The coeffs are not significant, but see more models below.

Zero inflation probabilities

control: \(logit^{-1}\) -1.234 = 0.225
ibis: \(logit^{-1}\) (-1.234 + -6.463) = 4.539^{-4}

Count mean for ibis also reduced by factor of exp(-1.082) = 0.339.

Observed vs expected admissions counts; zero inflated negative binomial

for the control cohort

inpatient_count	frequency	prevalence	expected	prob
0	701	0.797	700.203	0.796
1	99	0.112	100.212	0.114
2	47	0.053	40.264	0.046
3	16	0.018	18.889	0.021
4	9	0.010	9.498	0.011
5	2	0.002	4.968	0.006
6	3	0.003	2.665	0.003
7	0	0.000	1.455	0.002
8	0	0.000	0.805	0.001
9	3	0.003	0.450	0.001
10	0	0.000	0.254	0.000

For the ibis cohort

inpatient_count	frequency	prevalence	expected	prob
0	71	0.866	71.456	0.871
1	8	0.098	8.044	0.098
2	3	0.037	1.823	0.022
3	0	0.000	0.482	0.006
4	0	0.000	0.137	0.002
5	0	0.000	0.040	0.000
6	0	0.000	0.012	0.000
7	0	0.000	0.004	0.000
8	0	0.000	0.001	0.000
9	0	0.000	0.000	0.000
10	0	0.000	0.000	0.000

More models; summary and comparisons

Below we fit more models and summarize eight models. All the models use cohort, chf, and atrial_fibrillation as covariates.

zero inflated vs not
poisson vs negative binomial distribution
with and without age as a covariate.

Zero inflated age

Two of the models include age as a covariate for the zero inflation part of the model. We highlight those here. The full summary of all the models follows.

Zero inflated Poisson with age zero inflation

term	component	estimate	p.value
(Intercept)	count	1.7435	0.0000
cohortibis	count	-0.7188	0.0265
age	count	-0.0246	0.0001
chf	count	0.4645	0.0013
atrial_fibrillation	count	0.4299	0.0030
(Intercept)	zero	4.2559	0.0000
age	zero	-0.0483	0.0000

Zero inflated negative binomial with age zero inflation

term	component	estimate	p.value
(Intercept)	count	1.2582	0.0722
cohortibis	count	-0.6441	0.0640
age	count	-0.0287	0.0017
chf	count	0.7563	0.0002
atrial_fibrillation	count	0.7098	0.0004
Log(theta)	count	-0.3909	0.2399
(Intercept)	zero	4.2462	0.0000
age	zero	-0.0656	0.0000

Coefficients for all models

Poisson and negative binomial; no zero inflation

model	(Intercept)	cohortibis	chf	atrial_fibrillation	age
Pois1	-1.31	-0.754	0.671	0.697	NA
Pois2	-1.52	-0.765	0.662	0.674	0.0030
NB1	-1.32	-0.644	0.676	0.689	NA
NB	-1.04	-0.616	0.703	0.727	-0.0042

Poisson and negative binomial; no zero inflation - long format with p-values

term	estimate	pval	model
(Intercept)	-1.3142	0.0000	Pois1
cohortibis	-0.7539	0.0059	Pois1
chf	0.6713	0.0000	Pois1
atrial_fibrillation	0.6968	0.0000	Pois1
(Intercept)	-1.5165	0.0000	Pois2
cohortibis	-0.7655	0.0053	Pois2
age	0.0030	0.5509	Pois2
chf	0.6621	0.0000	Pois2
atrial_fibrillation	0.6739	0.0000	Pois2
(Intercept)	-1.3199	0.0000	NB1
cohortibis	-0.6439	0.0641	NB1
chf	0.6763	0.0009	NB1
atrial_fibrillation	0.6891	0.0006	NB1
(Intercept)	-1.0430	0.0496	NB
cohortibis	-0.6159	0.0782	NB
age	-0.0042	0.5893	NB
chf	0.7028	0.0006	NB
atrial_fibrillation	0.7274	0.0005	NB

Zero inflated Poisson and negative binomial.

term	component	estimate	p.value	model
(Intercept)	count	0.0868	0.4623	ZIP1
cohortibis	count	-0.7174	0.0274	ZIP1
chf	count	0.3070	0.0297	ZIP1
atrial_fibrillation	count	0.2890	0.0388	ZIP1
(Intercept)	zero	0.8746	0.0000	ZIP1
(Intercept)	count	-1.3200	0.0000	ZINB1
cohortibis	count	-0.6429	0.0708	ZINB1
chf	count	0.6763	0.0009	ZINB1
atrial_fibrillation	count	0.6894	0.0005	ZINB1
Log(theta)	count	-1.2508	0.0000	ZINB1
(Intercept)	zero	-9.8760	0.8917	ZINB1
(Intercept)	count	0.8308	0.0553	ZIP2
cohortibis	count	-0.6484	0.0495	ZIP2
age	count	-0.0109	0.0759	ZIP2
chf	count	0.3806	0.0095	ZIP2
atrial_fibrillation	count	0.3694	0.0118	ZIP2
(Intercept)	zero	0.8842	0.0000	ZIP2
(Intercept)	count	-1.0428	0.0467	ZINB2
cohortibis	count	-0.6159	0.0872	ZINB2
age	count	-0.0042	0.5897	ZINB2
chf	count	0.7028	0.0008	ZINB2
atrial_fibrillation	count	0.7274	0.0006	ZINB2
Log(theta)	count	-1.2521	0.0000	ZINB2
(Intercept)	zero	-9.0452	0.8551	ZINB2
(Intercept)	count	1.7435	0.0000	ZIP3
cohortibis	count	-0.7188	0.0265	ZIP3
age	count	-0.0246	0.0001	ZIP3
chf	count	0.4645	0.0013	ZIP3
atrial_fibrillation	count	0.4299	0.0030	ZIP3
(Intercept)	zero	4.2559	0.0000	ZIP3
age	zero	-0.0483	0.0000	ZIP3
(Intercept)	count	1.2582	0.0722	ZINB3
cohortibis	count	-0.6441	0.0640	ZINB3
age	count	-0.0287	0.0017	ZINB3
chf	count	0.7563	0.0002	ZINB3
atrial_fibrillation	count	0.7098	0.0004	ZINB3
Log(theta)	count	-0.3909	0.2399	ZINB3
(Intercept)	zero	4.2462	0.0000	ZINB3
age	zero	-0.0656	0.0000	ZINB3

Fit

model	logLik	df	AIC	BIC
Pois1	-839	4	1685	1705
Pois2	-839	5	1687	1711
NB1	-706	5	1421	1445
NB2	-705	6	1423	1452
ZIP1	-742	5	1494	1518
ZINB1	-706	6	1423	1452
ZIP2	-740	6	1493	1522
ZINB2	-705	7	1425	1459
ZIP3	-726	7	1467	1501
ZINB3	-698	8	1412	1451

Bayesian zero inflated Poisson regression

Bayesian models often offer some advantages. We get an entire distribution for every parameter modeled-not just point estimates and confidence intervals. We can also get distributions for any statistic.

We consider just one model here.

Model

\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i + \beta_2\textrm{chf}_i + \beta_3\textrm{atrial\fibrillation}_i\\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 \\ \end{aligned} \]

Model results; compare frequentist ZIP model

Bayes

component	term	Estimate	Est.Error	Q2.5	Q97.5
count	Intercept	0.118	0.111	-0.105	0.330
zero	zi_Intercept	0.938	0.110	0.716	1.149
count	cohortibis	-0.517	0.271	-1.057	0.003
count	chf	0.286	0.133	0.026	0.549
count	atrial_fibrillation	0.259	0.133	0.000	0.519

Frequentist

component	term	Estimate	Std. Error
count	(Intercept)	0.087	0.118
count	cohortibis	-0.717	0.325
count	chf	0.307	0.141
count	atrial_fibrillation	0.289	0.140
zero	(Intercept)	0.884	0.115

Posterior distributions

100 draws from the posterior predictive distribution

Post predictive checks

We can check how the distribution of proportions for different values of inpatient_count compares with those in the data.

Math notes

Linear models

Continuous response \(Y\) (eg, heart rate) as a function of predictors; aka covariates, \(X_j\) (eg, cholesterol, smoker)

\[Y \sim N(\mu , \sigma^2)\] where the conditional mean is a linear function of the predictors

\[\mu = E(Y |X ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

\(\beta_j\) are estimated by minimizing the sum of squares errors, \[\sum_i (y_i - (\beta_0 - \sum_j \beta_j x_{ij}))^2 = ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2_2\]
We can interpret the coefficients directly as effect sizes.

Generalized linear models

We model a function of the conditional mean as a linear function

\[g(\mu ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]

\(g\) is called a link function.

Example: Logistic regression

\[Y \sim \textrm{Bernoulli}(\mu )\] Coin toss with probability \(\mu\) of heads, \(1 - \mu\) of tails.

\[g(\mu ) = \rm{logit} \ \mu := \log \left(\frac{\mu}{1 - \mu}\right) = X \boldsymbol{\beta}\]

\[\mu = \rm{logit}^{-1}(X \boldsymbol{\beta})\]

Likelihood

To find \(\beta_j\), we typically maximize the (log) likelihood function.

For logistic regression, with \(n\) independent observations \(y_i =\) 0 or 1, the joint probability is

\[p({\bf{y}} | X_i) = \mu_i^{y_i} (1 - \mu_i)^{1 - y_i}\]

The log likelihood is then

\[logLik(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log(\mu_i) + (1 - y_i)\log (1 - \mu_i)\]

Where are the \(\beta\)’s?

Recall that \(\mu_i = \rm{logit}^{-1}(X_i \boldsymbol{\beta})\))

Poisson regression

Counts \(Y\) modeled as

\[ p(Y = y; \mu) = \frac{\mu^y e^{-\mu }}{y!} \] with “canonical” link

\[ g(\mu ) = \log(\mu ) = X \boldsymbol{\beta} \]

Likelihood

\[logLik(\boldsymbol{\boldsymbol{\beta}}) = \sum_{i=1}^n \left( y_i \log \mu_i - \mu_i - \log (y_i!) \right)\]

Fun fact

For a linear model, the minimum sum of squares estimate is the same as the maximum likelihood estimate.

Models

Zero inflated Poisson regression
Zero inflated negative binomial regression
Bayes versions of the above

Negative binomial distribution

Often used on overdispersed data: \(\mu < \sigma^2\).

\[p(Y = y; \mu, \theta) = \binom{y + \theta - 1}{\theta} \left(\frac{\mu}{\mu + \theta}\right)^y \left(\frac{\theta}{\mu + \theta}\right)^\theta\]

Dispersion parameter \(\theta > 0\); fixed- does not vary with covariates like \(\mu\).
\(\theta\) is estimated by the model

Conditional mean \(E(Y|X) = \mu\). We again model with log link

\[\log \mu = X \boldsymbol{\beta}\]

Variance

\[Var(Y|X) = \mu + \mu^2/\theta\]

\(\theta\) part of the model fit - unlike \(\sigma^2\) in linear regression.

Zero inflated Poisson regression

Define \(Z \sim Bernoulli(\pi)\) - a coin flip with probability \(\pi\) of heads.

\[ Y = \begin{cases} 0 & \text{if} \ Z = 1 \\ \sim Pois(\mu) & \text{if} \ Z = 0 \end{cases} \]

\(Z\) models the zero-inflated portion - the “extra” zeros.
The additional count data, including zeros, is modeled with the Poisson distribution with mean \(\mu\).

\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]

(Recall Poisson distribution is \(P(y; \mu) = \mu^y e^{-\mu}/y!\))

\(\log \frac{\pi}{1 -\pi} =\) linear function of some covariates
\(\log(\mu) =\) linear function of some covariates

Zero inflated Poisson regression

We again estimate the coefficients using log likelihood.

\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]

Likelihood function

\[ \begin{align*} logLik(\mathbf{\beta}, \mathbf{\gamma}) = \sum_{i=1}^n \Big(& \mathbf{1}_{\{y_i = 0\}} \cdot \log \left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right) \\ & + \mathbf{1}_{\{y_i > 0\}} \cdot \left( \log(1 - \pi_i) + y_i \log(\mu_i) - \mu_i - \log(y_i!) \right) \Big) \end{align*} \]

Bayesian statistical modeling

Starts with Bayes’ Rule

The conditional probability of \(B_k\) given \(A\) is

\[\begin{aligned} P(B_k \ | \ A) & = \frac{P(A \ | \ B_k) P(B_k)}{P(A)} \\ & = \frac{P(A \ | \ B_k) P(B_k)}{\sum_i P(A \ | \ B_i) P(B_i)} \end{aligned}\]

where \(A\) and \(B_j\) are events with \(B_j\) a partition of the sample space. The vertical bar indicates conditional probability

The standard disease/positive test example

\[P(D | +) = \frac{P(+ | D) \ P(D)}{P(+ | D) P(D) + P(+ | \bar{D}) P(\bar{D})} \]

Bayes’ Rule for probability densities for model parameters

\[p(\boldsymbol{\theta} | y, X) = \frac{p(\boldsymbol{\theta}) p(y | \boldsymbol{\theta}, X)}{\int p(y | \boldsymbol{\theta}, X) p(\boldsymbol{\theta}) d\boldsymbol{\theta}}\]

random vector of model parameters \(\boldsymbol{\theta}\) (our \(\beta\)’s and \(\gamma\)’s)
fixed predictor data \(X\) and response data \(y\)
\(p(\boldsymbol{\theta})\) is the prior distribution for the parameters
\(p(y | \boldsymbol{\theta}, X)\) is the likelihood of the data given the parameters \(\theta\) - same as the likelihood function in frequentist statistics.
\(p(\theta | x)\) is the posterior distribution for the parameters \(\theta\).

The denominator is an integration over the parameter space; typically intractable \(\Rightarrow\) Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution.

Posterior predictive distribution

Once you have the \(\theta\) posterior, you can compute the posterior predictive distributions of a new observation \(x_0\), given the data \(X\).

\[\begin{align*} p(x_0|X) & = \int p(x_0, \theta | X)\, d\theta \\ \\ & = \int p(x_0| \theta, X) \, p(\theta |X) , d\theta \\ \\ & = \int p(x_0|\theta) \, p(\theta | X) \, d\theta\\ \end{align*}\]

Can also get a distribution for any statistic of interest.

Bayesian inference

Parameters are regarded as random variables, data fixed
We get an entire distribution for every parameter, and hence for any statistic we wish to compute
No p-values, no hypothesis testing!

Example

\(\pi\) is the probability of heads in a single coin toss,
data is the sequence of heads and tails in 100 tosses

By contrast, our maximum likelihood estimate of \(\pi\) would be the peak of the likelihood, which is the same as the posterior for flat prior.