Abstract:
Severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) is identified as a highly transmissible
coronavirus which threatens the world with this deadly pandemic. WHO reported that it spreads through contact,
droplet, airborne, formite, fecal-oral, bloodborne, mother-to-child and animal-to-human. Hence, viral shedding
has a huge impact on this pandemic. This study uses transcriptome data of coronavirus disease 2019 (COVID-19)
patients to predict the prolonged viral shedding of the corresponding patient. This prediction starts with the
transcriptome features which gives the lowest root mean squared value of 16.3±3.3 using top 25 feature selected
using forward feature selection algorithm and linear regression algorithm. Then to see the impact of few nonmolecular features in this prediction, they were added to the model one by one along with the selected transcriptome features. However, this study shows that those features do not have any impact on prolonged viral
shedding prediction. Further this study predicts the day since onset in the same way. Here also top 25 transcriptome features selected using forward feature selection algorithm gives a comparably good accuracy (accuracy value of 0.74±0.1). However, the best accuracy was obtained using the best 20 features from feature
importance using SVM (0.78±0.1). Moreover, adding non-molecular features shows a great impact on mutual
information selected features in this prediction.