Tuesday, February 18, 2020
3:30 pm - 4:30 pm
701 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104
Title: A Bayesian statistical framework for microbiome sequence data analysisAbstract: The human microbiome consists of trillions of cells and collectively affects host health. Recently, advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the study of the microbiome. However, microbiome count data are high-dimensional and usually suffer from uneven sampling depth, over-dispersion, and zero-inflation. In this talk, I introduce a general Bayesian framework to properly model microbiome count data. It is composed of two hierarchical levels. The first level is a multivariate count-generating process that can incorporate multiple choices of statistical distributions (e.g., the zero-inflated negative binomial model to take into account the skewness and excess zeros of the data), provide model-based normalization through prior distributions with stochastic constraints, and utilize phylogenetic tree information via the Markov random field prior. The second level can be customized for specific microbiome data analysis tasks. I will present use cases for this framework in microbiome differential abundant analysis, integrative analysis and microbial network analysis using simulations and real datasets.