cohort | count |
---|---|
control | 881 |
ibis | 82 |
Zer0 inflated Models for Inpatient Admissions
We consider Unicare - Study
vs all other members, using cohort
, as well as select other covariates. Even in the event ignorability, balance, and other assumptions, they often result in reduced bias and efficiency (lower variance) of estimates.
Regression Models
Negative binomial regression
Zero inflated Poisson, Zero negative binomial
Bayesian versions of the above
Modeling Hospital Admissions counts
We consider admissions occurring within 270 day window of observation.
Admissions counts
Overall
mean | variance |
---|---|
0.372 | 1.05 |
By cohort
cohort | mean | variance |
---|---|---|
control | 0.390 | 1.120 |
ibis | 0.171 | 0.217 |
With overall variance not equal to the mean, a Poisson model is not appropriate. We consider a negative binomial model, which allows for overdispersion. But note that the dispersion parameter is a global parameter, and not adjusted for cohorts separately. That said, as it is a function of the mean, the variance given by the model will be adjusted for cohort.
Covariates, multicolinearity
Perhaps surprisingly, age and chronic conditions covariates do not exhibit high multicolinearity. This means not only that there are no high pairwise correlations, but that, for example, any one of them is not explainable by the others- regressing any one on the others would results in a low R squared value.
As an illustration, with condition count
included, the covariates are completely colinear.
Low multicolinearity is a desirable feature for generalized linear models as it results in more stable coefficient estimates.
Condition number
We find the condition number of the model matrix with age and other covariates. This the square root of the ratio of max and min eigen values of the correlation matrix, and is thus a ratio of the standard deviations along components of the data with max variation and min variation,
\(\sqrt{\frac{\lambda_{max}}{\lambda_{min}}}\) = 2.303
As a rule of thumb, condition number > 30 is indication of high multicolinearity.
Correlations
Selecting covariates with low multicolinearity is also more straightforward, as simply selecting those most highly correlated with the outcome is more justifiable than in the case of high multicolinearity.
condition | correlation |
---|---|
atrial_fibrillation | 0.2136 |
chf | 0.2113 |
age | 0.1311 |
parkinsons | 0.0792 |
chd | 0.0733 |
copd | 0.0712 |
lung_cancer | 0.0701 |
anemia | 0.0641 |
dementia | 0.0634 |
endometrial_cancer | 0.0489 |
urologic_cancer | 0.0484 |
hypothyroidism | 0.0449 |
alzheimers | 0.0446 |
pneumonia | 0.0405 |
kidney_disease | 0.0356 |
hip_fracture | 0.0345 |
heart_attack | 0.0327 |
cataract | 0.0286 |
anxiety | 0.0257 |
mild_cognitive_impairment | 0.0190 |
asthma | 0.0180 |
hypertension | 0.0092 |
glaucoma | 0.0069 |
depression | 0.0011 |
stroke | -0.0036 |
osteoporosis | -0.0066 |
hyperlipidemia | -0.0140 |
hyperplasia | -0.0205 |
diabetes | -0.0207 |
breast_cancer | -0.0215 |
arthritis | -0.0233 |
prostate_cancer | -0.0425 |
colorectal_cancer | -0.0426 |
We select age
, atrial_fibrillation
, and chf
, variously in the models below.
Admissions counts; negative binomial
We first compare intercept only and cohort models
\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 \end{aligned} \]
term | estimate | p.value | exp_estimate |
---|---|---|---|
(Intercept) | -0.99 | 0 | 0.372 |
\(\theta\): 0.229
\[ \begin{aligned} {\rm{count}}_i & \sim NB(\mu_i, \theta) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i \end{aligned} \]
term | estimate | p.value | exp_estimate |
---|---|---|---|
(Intercept) | -0.940 | 0.000 | 0.390 |
cohortibis | -0.827 | 0.022 | 0.437 |
\(\theta\): 0.235
Model comparison
Negative binomial with vs without cohort
as predictor
Likelihood ratio test, ANOVA, p-value on coefficient all equivalent
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -0.940 | 0.088 | -10.69 | 0.000 |
cohortibis | -0.827 | 0.362 | -2.29 | 0.022 |
Bootstraps/predictive check, NB model for admissions counts
Model
\[\textrm{counts}_i \sim NB(\mu_i, \theta)\] \[\log \mu_i = \beta_0 + \beta_1 \textrm{cohort}\]
Observed vs expected admissions counts; negative binomial
for the control cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 701 | 0.797 | 699.208 | 0.795 |
1 | 99 | 0.112 | 102.524 | 0.117 |
2 | 47 | 0.053 | 39.528 | 0.045 |
3 | 16 | 0.018 | 18.388 | 0.021 |
4 | 9 | 0.010 | 9.286 | 0.011 |
5 | 2 | 0.002 | 4.912 | 0.006 |
6 | 3 | 0.003 | 2.676 | 0.003 |
7 | 0 | 0.000 | 1.488 | 0.002 |
8 | 0 | 0.000 | 0.841 | 0.001 |
9 | 3 | 0.003 | 0.480 | 0.001 |
10 | 0 | 0.000 | 0.277 | 0.000 |
For the ibis cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 71 | 0.866 | 72.126 | 0.880 |
1 | 8 | 0.098 | 7.130 | 0.087 |
2 | 3 | 0.037 | 1.853 | 0.023 |
3 | 0 | 0.000 | 0.581 | 0.007 |
4 | 0 | 0.000 | 0.198 | 0.002 |
5 | 0 | 0.000 | 0.071 | 0.001 |
6 | 0 | 0.000 | 0.026 | 0.000 |
7 | 0 | 0.000 | 0.010 | 0.000 |
8 | 0 | 0.000 | 0.004 | 0.000 |
9 | 0 | 0.000 | 0.001 | 0.000 |
10 | 0 | 0.000 | 0.001 | 0.000 |
Zero inflated negative binomial
This example for illustration only
\[ \begin{aligned} \textrm{counts}_i &\sim \text{ZINB}(\pi_i, \mu_i, \theta) \\ \log\left( \frac{\pi_i}{1 - \pi_i} \right) &= \gamma_0 + \gamma_1 \, \textrm{cohort}_i \\ \log(\mu_i) &= \beta_0 + \beta_1 \, \textrm{cohort}_i \end{aligned} \]
term | estimate |
---|---|
count_(Intercept) | -0.685 |
count_cohortibis | -1.082 |
zero_(Intercept) | -1.234 |
zero_cohortibis | -6.463 |
\(\theta\): 0.331
This example for illustration only The coeffs are not significant, but see more models below.
Zero inflation probabilities
control: \(logit^{-1}\) -1.234 = 0.225
ibis: \(logit^{-1}\) (-1.234 + -6.463) = 4.539^{-4}
Count mean for ibis also reduced by factor of exp(-1.082) = 0.339.
Observed vs expected admissions counts; zero inflated negative binomial
for the control cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 701 | 0.797 | 700.203 | 0.796 |
1 | 99 | 0.112 | 100.212 | 0.114 |
2 | 47 | 0.053 | 40.264 | 0.046 |
3 | 16 | 0.018 | 18.889 | 0.021 |
4 | 9 | 0.010 | 9.498 | 0.011 |
5 | 2 | 0.002 | 4.968 | 0.006 |
6 | 3 | 0.003 | 2.665 | 0.003 |
7 | 0 | 0.000 | 1.455 | 0.002 |
8 | 0 | 0.000 | 0.805 | 0.001 |
9 | 3 | 0.003 | 0.450 | 0.001 |
10 | 0 | 0.000 | 0.254 | 0.000 |
For the ibis cohort
inpatient_count | frequency | prevalence | expected | prob |
---|---|---|---|---|
0 | 71 | 0.866 | 71.456 | 0.871 |
1 | 8 | 0.098 | 8.044 | 0.098 |
2 | 3 | 0.037 | 1.823 | 0.022 |
3 | 0 | 0.000 | 0.482 | 0.006 |
4 | 0 | 0.000 | 0.137 | 0.002 |
5 | 0 | 0.000 | 0.040 | 0.000 |
6 | 0 | 0.000 | 0.012 | 0.000 |
7 | 0 | 0.000 | 0.004 | 0.000 |
8 | 0 | 0.000 | 0.001 | 0.000 |
9 | 0 | 0.000 | 0.000 | 0.000 |
10 | 0 | 0.000 | 0.000 | 0.000 |
More models; summary and comparisons
Below we fit more models and summarize eight models. All the models use cohort
, chf
, and atrial_fibrillation
as covariates.
zero inflated vs not
poisson vs negative binomial distribution
with and without
age
as a covariate.
Zero inflated age
Two of the models include age
as a covariate for the zero inflation part of the model. We highlight those here. The full summary of all the models follows.
Zero inflated Poisson with age zero inflation
term | component | estimate | p.value |
---|---|---|---|
(Intercept) | count | 1.7435 | 0.0000 |
cohortibis | count | -0.7188 | 0.0265 |
age | count | -0.0246 | 0.0001 |
chf | count | 0.4645 | 0.0013 |
atrial_fibrillation | count | 0.4299 | 0.0030 |
(Intercept) | zero | 4.2559 | 0.0000 |
age | zero | -0.0483 | 0.0000 |
Zero inflated negative binomial with age zero inflation
term | component | estimate | p.value |
---|---|---|---|
(Intercept) | count | 1.2582 | 0.0722 |
cohortibis | count | -0.6441 | 0.0640 |
age | count | -0.0287 | 0.0017 |
chf | count | 0.7563 | 0.0002 |
atrial_fibrillation | count | 0.7098 | 0.0004 |
Log(theta) | count | -0.3909 | 0.2399 |
(Intercept) | zero | 4.2462 | 0.0000 |
age | zero | -0.0656 | 0.0000 |
Coefficients for all models
Poisson and negative binomial; no zero inflation
model | (Intercept) | cohortibis | chf | atrial_fibrillation | age |
---|---|---|---|---|---|
Pois1 | -1.31 | -0.754 | 0.671 | 0.697 | NA |
Pois2 | -1.52 | -0.765 | 0.662 | 0.674 | 0.0030 |
NB1 | -1.32 | -0.644 | 0.676 | 0.689 | NA |
NB | -1.04 | -0.616 | 0.703 | 0.727 | -0.0042 |
Poisson and negative binomial; no zero inflation - long format with p-values
term | estimate | pval | model |
---|---|---|---|
(Intercept) | -1.3142 | 0.0000 | Pois1 |
cohortibis | -0.7539 | 0.0059 | Pois1 |
chf | 0.6713 | 0.0000 | Pois1 |
atrial_fibrillation | 0.6968 | 0.0000 | Pois1 |
(Intercept) | -1.5165 | 0.0000 | Pois2 |
cohortibis | -0.7655 | 0.0053 | Pois2 |
age | 0.0030 | 0.5509 | Pois2 |
chf | 0.6621 | 0.0000 | Pois2 |
atrial_fibrillation | 0.6739 | 0.0000 | Pois2 |
(Intercept) | -1.3199 | 0.0000 | NB1 |
cohortibis | -0.6439 | 0.0641 | NB1 |
chf | 0.6763 | 0.0009 | NB1 |
atrial_fibrillation | 0.6891 | 0.0006 | NB1 |
(Intercept) | -1.0430 | 0.0496 | NB |
cohortibis | -0.6159 | 0.0782 | NB |
age | -0.0042 | 0.5893 | NB |
chf | 0.7028 | 0.0006 | NB |
atrial_fibrillation | 0.7274 | 0.0005 | NB |
Zero inflated Poisson and negative binomial.
term | component | estimate | p.value | model |
---|---|---|---|---|
(Intercept) | count | 0.0868 | 0.4623 | ZIP1 |
cohortibis | count | -0.7174 | 0.0274 | ZIP1 |
chf | count | 0.3070 | 0.0297 | ZIP1 |
atrial_fibrillation | count | 0.2890 | 0.0388 | ZIP1 |
(Intercept) | zero | 0.8746 | 0.0000 | ZIP1 |
(Intercept) | count | -1.3200 | 0.0000 | ZINB1 |
cohortibis | count | -0.6429 | 0.0708 | ZINB1 |
chf | count | 0.6763 | 0.0009 | ZINB1 |
atrial_fibrillation | count | 0.6894 | 0.0005 | ZINB1 |
Log(theta) | count | -1.2508 | 0.0000 | ZINB1 |
(Intercept) | zero | -9.8760 | 0.8917 | ZINB1 |
(Intercept) | count | 0.8308 | 0.0553 | ZIP2 |
cohortibis | count | -0.6484 | 0.0495 | ZIP2 |
age | count | -0.0109 | 0.0759 | ZIP2 |
chf | count | 0.3806 | 0.0095 | ZIP2 |
atrial_fibrillation | count | 0.3694 | 0.0118 | ZIP2 |
(Intercept) | zero | 0.8842 | 0.0000 | ZIP2 |
(Intercept) | count | -1.0428 | 0.0467 | ZINB2 |
cohortibis | count | -0.6159 | 0.0872 | ZINB2 |
age | count | -0.0042 | 0.5897 | ZINB2 |
chf | count | 0.7028 | 0.0008 | ZINB2 |
atrial_fibrillation | count | 0.7274 | 0.0006 | ZINB2 |
Log(theta) | count | -1.2521 | 0.0000 | ZINB2 |
(Intercept) | zero | -9.0452 | 0.8551 | ZINB2 |
(Intercept) | count | 1.7435 | 0.0000 | ZIP3 |
cohortibis | count | -0.7188 | 0.0265 | ZIP3 |
age | count | -0.0246 | 0.0001 | ZIP3 |
chf | count | 0.4645 | 0.0013 | ZIP3 |
atrial_fibrillation | count | 0.4299 | 0.0030 | ZIP3 |
(Intercept) | zero | 4.2559 | 0.0000 | ZIP3 |
age | zero | -0.0483 | 0.0000 | ZIP3 |
(Intercept) | count | 1.2582 | 0.0722 | ZINB3 |
cohortibis | count | -0.6441 | 0.0640 | ZINB3 |
age | count | -0.0287 | 0.0017 | ZINB3 |
chf | count | 0.7563 | 0.0002 | ZINB3 |
atrial_fibrillation | count | 0.7098 | 0.0004 | ZINB3 |
Log(theta) | count | -0.3909 | 0.2399 | ZINB3 |
(Intercept) | zero | 4.2462 | 0.0000 | ZINB3 |
age | zero | -0.0656 | 0.0000 | ZINB3 |
Fit
model | logLik | df | AIC | BIC |
---|---|---|---|---|
Pois1 | -839 | 4 | 1685 | 1705 |
Pois2 | -839 | 5 | 1687 | 1711 |
NB1 | -706 | 5 | 1421 | 1445 |
NB2 | -705 | 6 | 1423 | 1452 |
ZIP1 | -742 | 5 | 1494 | 1518 |
ZINB1 | -706 | 6 | 1423 | 1452 |
ZIP2 | -740 | 6 | 1493 | 1522 |
ZINB2 | -705 | 7 | 1425 | 1459 |
ZIP3 | -726 | 7 | 1467 | 1501 |
ZINB3 | -698 | 8 | 1412 | 1451 |
Bayesian zero inflated Poisson regression
Bayesian models often offer some advantages. We get an entire distribution for every parameter modeled-not just point estimates and confidence intervals. We can also get distributions for any statistic.
We consider just one model here.
Model
\[ \begin{aligned} \textrm{counts}_i & \sim ZIP(\pi_i, \mu_i) \\ \log \mu_i & = \beta_0 + \beta_1 \textrm{cohort}_i + \beta_2\textrm{chf}_i + \beta_3\textrm{atrial\fibrillation}_i\\ \log \frac{\pi_i}{1 - \pi_i} & = \gamma_0 \\ \end{aligned} \]
Model results; compare frequentist ZIP model
Bayes
component | term | Estimate | Est.Error | Q2.5 | Q97.5 |
---|---|---|---|---|---|
count | Intercept | 0.118 | 0.111 | -0.105 | 0.330 |
zero | zi_Intercept | 0.938 | 0.110 | 0.716 | 1.149 |
count | cohortibis | -0.517 | 0.271 | -1.057 | 0.003 |
count | chf | 0.286 | 0.133 | 0.026 | 0.549 |
count | atrial_fibrillation | 0.259 | 0.133 | 0.000 | 0.519 |
Frequentist
component | term | Estimate | Std. Error |
---|---|---|---|
count | (Intercept) | 0.087 | 0.118 |
count | cohortibis | -0.717 | 0.325 |
count | chf | 0.307 | 0.141 |
count | atrial_fibrillation | 0.289 | 0.140 |
zero | (Intercept) | 0.884 | 0.115 |
Posterior distributions
100 draws from the posterior predictive distribution
Post predictive checks
We can check how the distribution of proportions for different values of inpatient_count
compares with those in the data.
Math notes
Linear models
Continuous response \(Y\) (eg, heart rate) as a function of predictors; aka covariates, \(X_j\) (eg, cholesterol, smoker)
\[Y \sim N(\mu , \sigma^2)\] where the conditional mean is a linear function of the predictors
\[\mu = E(Y |X ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]
\(\beta_j\) are estimated by minimizing the sum of squares errors, \[\sum_i (y_i - (\beta_0 - \sum_j \beta_j x_{ij}))^2 = ||\mathbf{y} - \mathbf{X}\boldsymbol{\beta}||^2_2\]
We can interpret the coefficients directly as effect sizes.
Generalized linear models
We model a function of the conditional mean as a linear function
\[g(\mu ) = \beta_0 + \sum_{j=1}^p \beta_j X_j = X \boldsymbol{\beta}\]
\(g\) is called a link function.
Example: Logistic regression
\[Y \sim \textrm{Bernoulli}(\mu )\] Coin toss with probability \(\mu\) of heads, \(1 - \mu\) of tails.
\[g(\mu ) = \rm{logit} \ \mu := \log \left(\frac{\mu}{1 - \mu}\right) = X \boldsymbol{\beta}\]
\[\mu = \rm{logit}^{-1}(X \boldsymbol{\beta})\]
Likelihood
To find \(\beta_j\), we typically maximize the (log) likelihood function.
For logistic regression, with \(n\) independent observations \(y_i =\) 0 or 1, the joint probability is
\[p({\bf{y}} | X_i) = \mu_i^{y_i} (1 - \mu_i)^{1 - y_i}\]
The log likelihood is then
\[logLik(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log(\mu_i) + (1 - y_i)\log (1 - \mu_i)\]
Where are the \(\beta\)’s?
Recall that \(\mu_i = \rm{logit}^{-1}(X_i \boldsymbol{\beta})\))
Poisson regression
Counts \(Y\) modeled as
\[ p(Y = y; \mu) = \frac{\mu^y e^{-\mu }}{y!} \] with “canonical” link
\[ g(\mu ) = \log(\mu ) = X \boldsymbol{\beta} \]
Likelihood
\[logLik(\boldsymbol{\boldsymbol{\beta}}) = \sum_{i=1}^n \left( y_i \log \mu_i - \mu_i - \log (y_i!) \right)\]
Fun fact
For a linear model, the minimum sum of squares estimate is the same as the maximum likelihood estimate.
Models
Zero inflated Poisson regression
Zero inflated negative binomial regression
Bayes versions of the above
Negative binomial distribution
Often used on overdispersed data: \(\mu < \sigma^2\).
\[p(Y = y; \mu, \theta) = \binom{y + \theta - 1}{\theta} \left(\frac{\mu}{\mu + \theta}\right)^y \left(\frac{\theta}{\mu + \theta}\right)^\theta\]
Dispersion parameter \(\theta > 0\); fixed- does not vary with covariates like \(\mu\).
\(\theta\) is estimated by the model
Conditional mean \(E(Y|X) = \mu\). We again model with log link
\[\log \mu = X \boldsymbol{\beta}\]
Variance
\[Var(Y|X) = \mu + \mu^2/\theta\]
\(\theta\) part of the model fit - unlike \(\sigma^2\) in linear regression.
Zero inflated Poisson regression
Define \(Z \sim Bernoulli(\pi)\) - a coin flip with probability \(\pi\) of heads.
\[ Y = \begin{cases} 0 & \text{if} \ Z = 1 \\ \sim Pois(\mu) & \text{if} \ Z = 0 \end{cases} \]
\(Z\) models the zero-inflated portion - the “extra” zeros.
The additional count data, including zeros, is modeled with the Poisson distribution with mean \(\mu\).
\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]
(Recall Poisson distribution is \(P(y; \mu) = \mu^y e^{-\mu}/y!\))
\(\log \frac{\pi}{1 -\pi} =\) linear function of some covariates
\(\log(\mu) =\) linear function of some covariates
Zero inflated Poisson regression
We again estimate the coefficients using log likelihood.
\[ p(Y = y; \mu, \pi) = \begin{cases} \pi + (1 - \pi) e^{-\mu} & \text{if } y = 0 \\ (1 - \pi) \frac{\mu^y e^{-\mu}}{y!} & \text{if } y > 0 \end{cases} \]
Likelihood function
\[ \begin{align*} logLik(\mathbf{\beta}, \mathbf{\gamma}) = \sum_{i=1}^n \Big(& \mathbf{1}_{\{y_i = 0\}} \cdot \log \left( \pi_i + (1 - \pi_i) e^{-\mu_i} \right) \\ & + \mathbf{1}_{\{y_i > 0\}} \cdot \left( \log(1 - \pi_i) + y_i \log(\mu_i) - \mu_i - \log(y_i!) \right) \Big) \end{align*} \]
Bayesian statistical modeling
Starts with Bayes’ Rule
The conditional probability of \(B_k\) given \(A\) is
\[\begin{aligned} P(B_k \ | \ A) & = \frac{P(A \ | \ B_k) P(B_k)}{P(A)} \\ & = \frac{P(A \ | \ B_k) P(B_k)}{\sum_i P(A \ | \ B_i) P(B_i)} \end{aligned}\]
where \(A\) and \(B_j\) are events with \(B_j\) a partition of the sample space. The vertical bar indicates conditional probability
The standard disease/positive test example
\[P(D | +) = \frac{P(+ | D) \ P(D)}{P(+ | D) P(D) + P(+ | \bar{D}) P(\bar{D})} \]
Bayes’ Rule for probability densities for model parameters
\[p(\boldsymbol{\theta} | y, X) = \frac{p(\boldsymbol{\theta}) p(y | \boldsymbol{\theta}, X)}{\int p(y | \boldsymbol{\theta}, X) p(\boldsymbol{\theta}) d\boldsymbol{\theta}}\]
random vector of model parameters \(\boldsymbol{\theta}\) (our \(\beta\)’s and \(\gamma\)’s)
fixed predictor data \(X\) and response data \(y\)
\(p(\boldsymbol{\theta})\) is the prior distribution for the parameters
\(p(y | \boldsymbol{\theta}, X)\) is the likelihood of the data given the parameters \(\theta\) - same as the likelihood function in frequentist statistics.
\(p(\theta | x)\) is the posterior distribution for the parameters \(\theta\).
The denominator is an integration over the parameter space; typically intractable \(\Rightarrow\) Markov Chain Monte Carlo (MCMC) methods to approximate the posterior distribution.
Posterior predictive distribution
Once you have the \(\theta\) posterior, you can compute the posterior predictive distributions of a new observation \(x_0\), given the data \(X\).
\[\begin{align*} p(x_0|X) & = \int p(x_0, \theta | X)\, d\theta \\ \\ & = \int p(x_0| \theta, X) \, p(\theta |X) , d\theta \\ \\ & = \int p(x_0|\theta) \, p(\theta | X) \, d\theta\\ \end{align*}\]
Can also get a distribution for any statistic of interest.
Bayesian inference
Parameters are regarded as random variables, data fixed
We get an entire distribution for every parameter, and hence for any statistic we wish to compute
No p-values, no hypothesis testing!
Example
\(\pi\) is the probability of heads in a single coin toss,
data is the sequence of heads and tails in 100 tosses
By contrast, our maximum likelihood estimate of \(\pi\) would be the peak of the likelihood, which is the same as the posterior for flat prior.