Abstract:
Purpose: Evolving technologies allow us to measure human molecular data
in a wide reach. Those data are extensively used by researchers in many
studies and help in advancements of medical field. Transcriptome, proteome,
metabolome, and epigenome are few such molecular data. This study utilizes
the transcriptome data of COVID-19 patients to uncover the dysregulated genes
in the SARS-COV-2.
Method: Selected genes are used in machine learning models to predict various
phenotypes of those patients. Ten different phenotypes are studied here such
as time since onset, COVID-19 status, connection between age and COVID-19,
hospitalization status and ICU status, using classification models. Further, this
study compares molecular characterization of COVID-19 patients with other
respiratory diseases.
Results: Gene ontology analysis on the selected features shows that they are
highly related to viral infection. Features are selected using two methods and
selected features are individually used in the classification of patients using
six different machine learning algorithms. For each of the selected phenotype,
results are compared to find the best prediction model.
Conclusion: Even though, there are not any significant differences between
the feature selection methods, random forest and SVM performs very well
throughout all the phenotype studies