Abstract:
The timely identification of pathogens is vital in order to effectively
control diseases and avoid antimicrobial resistance. Non-invasive
point-of-care diagnostic tools are recently trending in identification
of the pathogens and becoming a helpful tool especially for rural
areas. Machine learning approaches have been widely applied on
biological markers for predicting diseases and pathogens. However,
there are few studies in the literature that have utilized volatile
organic compounds (VOCs) as non-invasive biological markers to
identify bacterial pathogens. Furthermore, there is no comprehensive
study investigating the effect of different distance and similarity
metrics for pathogen classification based on VOC data. In this study,
we compared various non-Euclidean distance and similarity metrics
with Euclidean metric to identify significantly contributing VOCs
to predict pathogens. In addition, we also utilized backward feature
elimination (BFE) method to accurately select the best set of features.
The dataset we utilized for experiments was composed from
the publications published between 1977 and 2016, and consisted of
associations in between 703 VOCs and 11 pathogens.We performed
extensive set of experiments with five different distance metrics in
both uniform and weighted manner. Comprehensive experiments
showed that it is possible to correctly predict pathogens by using 68
VOCs among 703 with 78.6% accuracy using k-nearest neighbour
classifier and Sorensen distance metric.