Abstract
Abstract Background: Breast cancer is the leading cause of mortality among women. Early diagnosis is crucial for effective treatment. This study applied suitable data mining methods that providing rules and presenting influential prognostic factors on the survival time of breast cancer patients. Methods: The dataset consisted of 1574 women diagnosed between January 2002 and December 2012 at the Cancer Registry Center of Nemazi hospital in Fars Province, Iran. We classify patients based on prognostic factors using three popular data mining methods: decision three (J48), Naïve Bayes (NB), and nominal logistic regression (NLR). The Weka software was used to compare these methods using sensitivity, specificity, and accuracy metrics. The outcome of the study was the median survival time, which was categorized into three classes. Results: Totally,212 women (13.5%) were died. The mean age was 49.74 years. Overall survival rates at 2,3,5 and 10 years were 0.98,0.94,0.87and 0.76, respectively. The mean and median of survival time were4.81 and 4.27 years. Sensitivity, specificity, and accuracy for J48 were 0.480, 0.570 and 0.572; for NB,0.483,0.610, and 0.584; and for NLR,0.488,0.584 and 0.579, respectively. J48 showed that the Nottingham Prognostic Index (NPI) was the most influential prognostic factor. Conclusion: This paper tried to improve the accuracy of breast cancer classification using data mining methods. Comparing multiple prediction models gave us an insight into the relative prediction ability of different data mining methods. the results suggested NB as the best classifier due to higher accuracy and specificity. J48 identified the NPI as the most effective prognostic factor.