Abstract:
Gene expression is a readily-observed quanti_cation of transcriptional activity and cellular state that enables the
recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory
networks from gene expression data is a problem that has attracted much attention, but previous work often makes
the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent
tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for
regulator activity. The latent tree model is a type of Markov random _eld that includes both observed gene
variables and latent (hidden) variables, which factorize on a Markov tree. Through e_cient unsupervised learning
approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of
those regulators. Post-processing annotates many of these discovered latent variables as speci_c transcription
factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators
but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent
tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-
speci_c binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory
network. These include groups of co-regulated genes, condition-speci_c regulator activity, and combinatorial
regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene
expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where
transcription factors physically bind. Consequently, it is promising for studying expression datasets in species and
conditions where these types of information are not available or not reliable.