Epidemiol Health System J. 2024;11(1): 7-12.
doi: 10.34172/ehsj.26085
  Abstract View: 98
  PDF Download: 47

Original Article

Survivability Prediction of Breast Cancer Patients Using Three Data Mining Methods: A Comparative Study

Maryam Jalali 1 ORCID logo, Navid Reza Ghasemi 2 ORCID logo, Samane Nematolahi 3* ORCID logo, Najaf Zare 4 ORCID logo

1 Colorectal Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
2 Project Managers at Gas Company, Bam, Kerman, Iran
3 Noncommunicable Diseases Research Center, Bam University of Medical Sciences, Bam, Kerman, Iran
4 Infertility Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
*Corresponding Author: Samane Nematolahi, Email: samanematolahi@yahoo.com


Background and aims: Breast cancer (BC) is the leading cause of mortality among women. Early diagnosis is crucial for effective treatment. This study applied suitable data mining methods that provide rules and present influential prognostic factors on the survival time of BC patients.

Methods: The dataset consisted of 1574 women diagnosed between January 2002 and December 2012 at the Cancer Registry Center of Nemazi hospital in Fars Province, Iran. Patients were classified based on prognostic factors using three popular data mining methods, including decision tree (J48), Naïve Bayes (NB), and nominal logistic regression (NLR). The Weka software was considered to compare these methods using sensitivity, specificity, and accuracy metrics. The outcome of the study was the median survival time, which was categorized into three classes.

Results: In total, 212 women (13.5%) died of BC, whose mean age was 49.74 years old. Overall survival rates at 2, 3, 5, and 10 years were 0.98, 0.94, 0.87, and 0.76, respectively. The mean and median survival times were 4.81 and 4.27 years. Sensitivity, specificity, and accuracy for J48 and NB were 0.480, 0.570, and 0.572, as well as 0.483, 0.610, and 0.584, respectively. In addition, the corresponding values were 0.488, 0.584, and 0.579 for NLR, respectively. Further, J48 showed that the Nottingham Prognostic Index (NPI) was the most influential prognostic factor.

Conclusion: This paper sought to improve the accuracy of BC classification using data mining methods. Comparing multiple prediction models gave us an insight into the relative prediction abilities of different data mining methods. The results suggested NB as the best classifier due to its higher accuracy and specificity. Finally, J48 identified the NPI as the most effective prognostic factor.

First Name
Last Name
Email Address
Security code

Abstract View: 99

Your browser does not support the canvas element.

PDF Download: 47

Your browser does not support the canvas element.

Submitted: 14 Nov 2023
Accepted: 24 Jan 2024
ePublished: 29 Mar 2024
EndNote EndNote

(Enw Format - Win & Mac)

BibTeX BibTeX

(Bib Format - Win & Mac)

Bookends Bookends

(Ris Format - Mac only)

EasyBib EasyBib

(Ris Format - Win & Mac)

Medlars Medlars

(Txt Format - Win & Mac)

Mendeley Web Mendeley Web
Mendeley Mendeley

(Ris Format - Win & Mac)

Papers Papers

(Ris Format - Win & Mac)

ProCite ProCite

(Ris Format - Win & Mac)

Reference Manager Reference Manager

(Ris Format - Win only)

Refworks Refworks

(Refworks Format - Win & Mac)

Zotero Zotero

(Ris Format - Firefox Plugin)