Ecological Fallacy Federated Analysis COVID-19 Methodology
Multi-site medical studies and federated analysis networks routinely pool site-level summaries to draw conclusions about individual patients. We test this practice empirically and show when it breaks.
Ecological Bias in Multi-Site COVID-19 Studies
Ecological Bias in Multi-Site COVID-19 Studies: Two Failure Modes in 42 Million Patient Records
Methodology
Using 42 million individual COVID-19 patient records from two countries, two failure modes of ecological bias are identified. First, the signal disappears: site-level analysis of 38 million CDC records cannot detect the relationship between age and mortality (p = 0.12), while individual-level analysis finds a 9.9-fold odds ratio (p < 10^-300). Second, the signal distorts: site-level analysis of 4 million Mexican records detects a significant relationship (p < 0.0001), but the ecological effect size bears no resemblance to the individual-level truth. Applied to the 4CE consortium’s 22,000-patient neurological COVID-19 dataset (21 hospitals, 6 countries), meta-analytic methods diverge by 350-fold on the pooled effect (I2 = 99.8%), and a multivariate consistency test detects coordinated cross-site heterogeneity (z = 25, p < 0.0001) that standard tests miss entirely.