DSpace Repository

Stratification of Chronic Myeloid Leukemia Cancer Dataset into Risk Groups using Four Machine Learning Algorithms with Minimal Loss Functi

Show simple item record

dc.contributor.author Taiwo, O. O
dc.contributor.author Kasali, F. A
dc.contributor.author Akinyemi, I. O
dc.contributor.author Kuyoro, S. O
dc.contributor.author Awodele, D. D
dc.contributor.author Ogbaro, D. D
dc.contributor.author Olaniyan, T. S
dc.date.accessioned 2022-07-04T13:11:12Z
dc.date.available 2022-07-04T13:11:12Z
dc.date.issued 2019-04
dc.identifier.citation Taiwo, O. O., Kasali, F. A., Akinyemi, I. O., Kuyoro, S. O., Awodele, O. Ogbaro, D. D. And Olaniyan, T. S. (2019), Stratification of Chronic Myeloid Leukemia Cancer Dataset into Risk Groups using Four Machine Learning Algorithms with Minimal Loss Function, Afr. J. MIS, Vol.1, Issue 2, pp. 1 - 18 en_US
dc.identifier.uri http://localhost:8080/xmlui/handle/123456789/464
dc.description.abstract Chronic Myeloid Leukemia (CML) had been stratified into risk groups using scoring systems but these systems have limitation of overfitting data. Machine Learning (ML) algorithms were used to extract meaningful information from the datasets, but the loss function (empirical risk) of the algorithms was not considered to determine the risk that was incurred in adopting the algorithms for stratification. In this paper, secondary dataset of 1640 CML patients, between 2003 and 2017 was collected from Obafemi Awolowo University Teaching Hospitals Complex, Ile-Ife, Osun Sate, Nigeria. An experimental analysis was performed in Waikato Environment for Knowledge Analysis 3.8.0 using basophil count and spleen size values on four ML algorithms (BayesNet, Multilayered perceptron, Projective Adaptive Resonance Theory (PART) and Logistic Regression) to determine low and high risk patients. Holdout and 10-fold cross-validation techniques were used to evaluate the performance of the algorithms on correctly classified instances, time to learn, kappa statistics, sensitivity and specificity. Considering the performance metrics, Logistic regression and PART algorithms were the two algorithms with better performance in stratifying patients’ risk group as against other algorithms used in this study. Afterwards, the loss functions of the two algorithms were determined by finding the difference between the true output and the predicted output . The results of the loss function of Logistic regression algorithm for low and high risk in holdout and 10-fold cross-validation showed 0.22%, 1.40% and -0.22%, -0.02% respectively. Similarly, PART algorithm yielded -1.58%, 1.40% and -0.22%, - 0.26%. From the findings, the Logistic regression algorithm had the minimum non-negative loss function in holdout technique and was used in the developed model to stratify CML into their risk groups. Therefore, the determination of loss function of algorithms minimizes the empirical risk and as such plays a significant role in producing optimum and faster results for accurate stratification. en_US
dc.description.sponsorship Taiwo, O.O., Kasali, F.A., Akinyemi, I.O., Kuyoro, S.O., Awodele, D.D., Ogbaro, D.D. and Olaniyan, T.S. en_US
dc.language.iso en en_US
dc.publisher Afr. J. MIS en_US
dc.relation.ispartofseries 1;2
dc.subject Classification algorithm, Data stratification, Empirical risk minimization, Loss function, Machine learning en_US
dc.title Stratification of Chronic Myeloid Leukemia Cancer Dataset into Risk Groups using Four Machine Learning Algorithms with Minimal Loss Functi en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account