Improvement of prognostic model for breast cancer recurrence by data mining methods

Seyed mohammad saleh Hadavi,1,*

1. Department of Computer Engineering, Shiraz University of Technology, Shiraz Iran



In iran, every year 10,000 people suffer from breast cancer, of which half of them are under the age of 50. the recurrence of this cancer in iran is common. one of the reasons of incidence in the community is the lack of timely and correct diagnosis of this disease. nowadays, all patient information was recorded on computer files. with the data mining technique, the correct prediction of the recurrence of disease to the rest of the body can be made. with this knowledge, it may be possible to prevent the breast cancer recurrence. target the purpose of this study is to provide a reliable and accurate prediction of the possibility of breast cancer recurrence using data mining techniques.


In this study, the medical records from 400 breast cancer patients, with 10 functional characteristics during 5 years were included in the model. in order to provide a prognostic model for breast cancer recurrence, spss modeler software was used. after the accurate identification of the data, k-neighboring algorithms, bayesian network and neural network were applied for model fitness.


The suggested models from bayesian network, k-neighborhood and neural networks were compared. the results showed that the prediction accuracy of the bayesian network model was 81.818%. the prediction accuracy for k-neighborhoods and neural networks was 76.224% and 75.524%, respectively. it was also found that degree's malignancy and lymph node capsule involvement, two of the functional characteristics, had the most and least effects on the recurrence of cancer, respectively.


Bayesian network model has the least error rate and the highest accuracy to predict the breast cancer in comparison to other models. the neural network method has the maximum coverage. this study revealed that bayesian network model can accurately predict the breast cancer recurrence in the shortest possible time.


Breast cancer, data mining, prognostic model