Tuesday, September 19, 2017
3:30 pm - 4:30 pm
701 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104
Title: An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis
Abstract: The dramatic improvement in data collection and acquisition technologies in the past two decades has enabled scientists to collect vast amounts of health-related data in biomedical studies. If analyzed properly, these data can help us to improve contemporary healthcare services from diagnosis to prevention to personalized treatment, and also provide us some insights toward reducing healthcare costs. However, the biomedical data can be rather complex, which are often characterized by some mixture of missing data, high dimensionality, heterogeneity, high
variety, high volume, high velocity, etc. How to analyze these data has posed many challenges on existing methods. Toward an efficient use of biomedical complex data, we propose an imputation-consistency (IC) algorithm as a general algorithm for
high-dimensional missing data problems. The IC algorithm works by iterating between an imputation step and a consistency step. At the imputation step, the missing data are imputed conditional on the observed data and the current estimate of parameters;
and at the consistency step, a consistent estimate is found for the minimizer of a Kullback-Leibler divergence defined on the pseudo-complete data. The consistency of the averaged IC estimate is established under quite general conditions. Then, under the principles of conditioning and consistency, we extend the IC algorithm to address some other challenges encountered in biomedical complex data analysis. In particular, we propose some highly efficient algorithms that address the heterogeneity and high-dimensionality issues encountered in biomarker identifications and eQTL analysis.