Challenges in reproducing results from publicly available data: an example of sexual orientation and cardiovascular disease risk


BACKGROUND: Replication is a vital part of the research process and has recently received considerable attention. Analyses using publicly available data should, if adequately described, be reproducible without assistance from the original investigators. Using data from the US National Health and Nutrition Examination Survey (NHANES), a recent study reported a statistically significant difference in cardiovascular disease risk comparing subgroups of sexual minority men. We attempted to reproduce these findings and assessed whether the results were robust to alternative analytic strategies and assumptions. METHODS: We used the exclusion criteria and coding strategy described in the original paper to construct our analytical data set. Sampling weights were constructed in accordance with NHANES analytical guidelines. We estimated crude and covariate-adjusted associations between sexual orientation and vascular age using the regression models specified in the original report. We also conducted a series of sensitivity analyses to improve on the original findings. RESULTS: Our replication attempt was partially successful: we replicated the general trends reported in the original analysis, but not identical effect estimates. Importantly, we identified a potential misapplication of the Framingham Risk Score; correcting for this increased the probability that the reported null hypothesis test was a type I error. CONCLUSIONS: This paper supports the recent calls for greater transparency and improved reporting in research. Even with a publicly available and well-documented data source, we were unable to exactly replicate another study’s original findings. Our sensitivity analyses revealed key issues in the original analysis and demonstrate the scientific importance of research replication.

J Epidemiol Community Health
Sam Harper
Sam Harper
Associate Professor of Epidemiology

My research interests include impact evaluation, reproducible research, and social epidemiology.