GGEB Dissertation Defense - Rong Ma

Monday, June 7, 2021
1:30 pm - 2:30 pm
06/07/21 - 1:30pm to 06/07/21 - 2:30pm
Add to Calendar
Virtual
With rapid technological advancements in data collection and processing, massive large-scale and complex datasets are widely available nowadays in diverse research fields such as genomics, metabolomics and microbiomics. The analysis of large datasets with complex structures poses significant challenges and calls for new theory and methodologies. In this talk, I will address two sets of high-dimensional statistical problems, motivated by interesting applications in such data-driven interdisciplinary research. In the first part of the talk, in light of the ubiquitous availability of high-dimensional datasets with binary outcomes, I will introduce computationally efficient and theoretically justified procedures for large-scale statistical inference in high-dimensional logistic regression(s). Underlying our proposed methods are novel bias-correction techniques for inferring low-dimensional components or functionals of high-dimensional objects. We show empirically the effectiveness and stability of our methods in extracting useful information in real applications, especially in the context of metabolomic association analysis and in the analysis of genetic relatedness between phenotypes. In the second part of the talk, I will discuss statistical problems motivated by important questions in large-scale human microbiome and metagenomic research. We propose a generic permuted monotone matrix model, and build up new principles, theory and methods for inferring the underlying model parameters. An efficient spectral approach is introduced to attack these problems, whose performance is rigorously justified by statistical decision theory. The methods are applied to a real dataset to compare the growth rates of gut bacteria between inflammatory bowel disease patients and/or normal controls.