Tuesday, November 28, 2017
3:30 pm - 4:30 pm
701 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104
Title: Statistical Methods in Microbiome Data Analysis
Abstract: Microbiome study involves new computational and statistical challenges due to the characteristics of microbiome data: highly sparse, skewed, over-dispersed, and high-dimensional. I am going to present two methods that take the characteristics of microbiome data into account: 1) a GLM-based latent variable ordination method and 2) compositional mediation analysis.
1) GLM-based latent variable ordination method: Distance-based ordination methods, such as the principal coordinate analysis, are incapable of distinguishing between location effect (i.e., the difference in mean) and dispersion effect (i.e., the difference in variation) when there is a strong dispersion effect. In other words, a distance-based ordination method may falsely display a location effect when there is a strong dispersion effect. To resolve this potential problem, we propose, as an ordination method, a zero-inflated quasi-Poisson factor model whose estimated factors are used to display the similarity of samples.
2) Compositional mediation analysis: The causal mediation model has been extended to incorporate nonlinearity, treatment-mediation interaction, and multiple mediators. These models, however, are not directly applicable when mediators are compositional. We propose a causal, compositional mediation model utilizing the algebra for compositions in the simplex space and characteristics of compositional data in high-dimensional settings. We show that the estimator of the total mediation effect has a causal interpretation under the potential outcomes framework. The methods involve a novel integration of statistical methods in high dimensional regression analysis, compositional data analysis, and causal inference.