Abstract:
We study classification and regression problems in lung tumors where high throughput gene expression is measured at multiple levels: epi-genetics, transcription and protein. We uncover the correlates of smoking and gender-specificity in lung tumors. Different genes are indicative of smoking levels, gender and survival rates at these different levels. We also carry out an integrative anaysis, by feature selection from the pool of all three levels of features. Our results show that the epigenetic information in DNA methylation is a better marker for smoking status than gene expression either at the transcript or protein levels. Further, surprisingly, integrative anlysis using multi-level gene expression offers no significant advantage over the individual levels in the classification and survival prediction problems considered.