Gang Li, Laura Raffield, Mark Logue, Mark W Miller, Hudson P Santos Jr, T Michael O’Shea, Rebecca C Fry, Yun Li
Epigenetics. 2020 Oct 4;1-11.
Here, we present CpG impUtation Ensemble (CUE), which leverages multiple statistical and modern machine learning methods, to impute from Illumina Human-Methylation450 (HM450) BeadChip to the Illumina Human-Methylation-EPIC (HM850) BeadChip. Data were analyzed from 2 cohorts with methylation measured both by HM450 and HM850: the ELGAN study (n = 127, placenta) and the VA Boston PTSD genetics repository (n = 144, whole blood). Cross-validation results show CUE achieves lowest predicted root-mean-square error (RMSE) (0.026 in PTSD) and highest accuracy (99.97% in PTSD) compared with five methods tested, including k-nearest-neighbors, logistic regression, penalized functional regression, random forest, and XGBoost. Finally, among all 339,033 HM850-only CpG sites shared between ELGAN and PTSD, CUE successfully (RMSE < 0.05 and accuracy >95%in PTSD) imputed 289,604 (85.4%) sites. In summary, CUE is a valuable tool for imputing CpG methylation from the HM450 to HM850 platform.