Abstract:
PredictingAntimicrobialResistance(AMR)fromgenomicdatahasimportantimplicationsforhumanandanimalhealthcare, andespeciallygivenitspotentialformorerapiddiagnosticsandinformedtreatmentchoices.Withtherecentadvancesinsequencing technologies,applyingmachinelearningtechniquesforAMRpredictionhaveindicatedpromisingresults.Despitethis,thereare shortcomingsintheliteratureconcerningmethodologiessuitableformulti-drugAMRpredictionandespeciallywheresampleswith missinglabelsexist.Toaddressthisshortcoming,weintroduceaRectifiedClassifierChain(RCC)methodforpredictingmulti-drug resistance.ThisRCCmethodwastestedusingannotatedfeaturesofgenomicssequencesandcomparedwithsimilarmulti-label classificationmethodologies.WefoundthatapplyingtheeXtremeGradientBoosting(XGBoost)basemodeltoourRCCmodel outperformedthesecond-bestmodel,XGBoostbasedbinaryrelevancemodel,by3.3%inHammingaccuracyand7.8%inF1-score. Additionally,wenotethatintheliteraturemachinelearningmodelsappliedtoAMRpredictiontypicallyareunsuitableforidentifying biomarkersinformativeoftheirdecisions;inthisstudy,weshowthatbiomarkerscontributingtoAMRpredictioncanalsobeidentified usingtheproposedRCCmethod.Weexpectthiscanfacilitategenomeannotationandpavethepathtowardsidentifyingnew biomarkersindicativeofAMR.