Please use this identifier to cite or link to this item: http://localhost:8080/xmlui/handle/123456789/464
Full metadata record
DC FieldValueLanguage
dc.contributor.authorTaiwo, O. O-
dc.contributor.authorKasali, F. A-
dc.contributor.authorAkinyemi, I. O-
dc.contributor.authorKuyoro, S. O-
dc.contributor.authorAwodele, D. D-
dc.contributor.authorOgbaro, D. D-
dc.contributor.authorOlaniyan, T. S-
dc.date.accessioned2022-07-04T13:11:12Z-
dc.date.available2022-07-04T13:11:12Z-
dc.date.issued2019-04-
dc.identifier.citationTaiwo, O. O., Kasali, F. A., Akinyemi, I. O., Kuyoro, S. O., Awodele, O. Ogbaro, D. D. And Olaniyan, T. S. (2019), Stratification of Chronic Myeloid Leukemia Cancer Dataset into Risk Groups using Four Machine Learning Algorithms with Minimal Loss Function, Afr. J. MIS, Vol.1, Issue 2, pp. 1 - 18en_US
dc.identifier.urihttp://localhost:8080/xmlui/handle/123456789/464-
dc.description.abstractChronic Myeloid Leukemia (CML) had been stratified into risk groups using scoring systems but these systems have limitation of overfitting data. Machine Learning (ML) algorithms were used to extract meaningful information from the datasets, but the loss function (empirical risk) of the algorithms was not considered to determine the risk that was incurred in adopting the algorithms for stratification. In this paper, secondary dataset of 1640 CML patients, between 2003 and 2017 was collected from Obafemi Awolowo University Teaching Hospitals Complex, Ile-Ife, Osun Sate, Nigeria. An experimental analysis was performed in Waikato Environment for Knowledge Analysis 3.8.0 using basophil count and spleen size values on four ML algorithms (BayesNet, Multilayered perceptron, Projective Adaptive Resonance Theory (PART) and Logistic Regression) to determine low and high risk patients. Holdout and 10-fold cross-validation techniques were used to evaluate the performance of the algorithms on correctly classified instances, time to learn, kappa statistics, sensitivity and specificity. Considering the performance metrics, Logistic regression and PART algorithms were the two algorithms with better performance in stratifying patients’ risk group as against other algorithms used in this study. Afterwards, the loss functions of the two algorithms were determined by finding the difference between the true output and the predicted output . The results of the loss function of Logistic regression algorithm for low and high risk in holdout and 10-fold cross-validation showed 0.22%, 1.40% and -0.22%, -0.02% respectively. Similarly, PART algorithm yielded -1.58%, 1.40% and -0.22%, - 0.26%. From the findings, the Logistic regression algorithm had the minimum non-negative loss function in holdout technique and was used in the developed model to stratify CML into their risk groups. Therefore, the determination of loss function of algorithms minimizes the empirical risk and as such plays a significant role in producing optimum and faster results for accurate stratification.en_US
dc.description.sponsorshipTaiwo, O.O., Kasali, F.A., Akinyemi, I.O., Kuyoro, S.O., Awodele, D.D., Ogbaro, D.D. and Olaniyan, T.S.en_US
dc.language.isoenen_US
dc.publisherAfr. J. MISen_US
dc.relation.ispartofseries1;2-
dc.subjectClassification algorithm, Data stratification, Empirical risk minimization, Loss function, Machine learningen_US
dc.titleStratification of Chronic Myeloid Leukemia Cancer Dataset into Risk Groups using Four Machine Learning Algorithms with Minimal Loss Functien_US
dc.typeArticleen_US
Appears in Collections:Computer Science

Files in This Item:
File Description SizeFormat 
taiwoetal2019pdf.pdf1.16 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.