Tuesday, December 5, 2017
3:30 pm - 4:30 pm
701 Blockley Hall
Abstract: Understanding the functional effects that genetic variants have in non-coding regions of the genome is a difficult problem, but with great implications for advancing the field of complex trait genetics. Being able to distinguish between variants that play a regulatory function in a tissue or cell type under study, and the vast majority of non-functional variants can help identify those variants likely to be causal for a trait of interest. While experimental assays (such as massively parallel reporter assays, CRISPR/Cas9-mediated in situ saturating mutagenesis) are continuously being improved, they are still laborious and can only be applied to relatively modest number of variants. In this talk I will first discuss an unsupervised approach based on a latent Dirichlet allocation model to predict functional effects in a cell type/tissue specific manner. I will then introduce further extensions to the setting where high-quality experimentally derived labels are available for a small to modest number of variants. In particular, I will describe a semi-supervised approach to jointly utilize experimentally confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell type/tissue specific functional annotations on each variant to predict functional consequences of qnon-coding genetic variants. Through the application to several experimental datasets, including massively parallel reporter assay validated variants, sets of eQTLs and dsQTLs, I demonstrate that the proposed methods significantly improve prediction accuracy compared to existing functional prediction methods, both at the organism level and at the tissue/cell type level. I will end with an application to a Metabochip dataset on 12,281 individuals illustrating how an integrative analysis using such functional predictions can help in the discovery of genes associated with lipid phenotypes.