Classification of genes related to infectious disease: new hybrid method for high imbalanced data sets

Sima Soltani,^1,* Javad sadri,² Mehrdad jalali,³

1. Department of Computer Engineering, Islamic Azad University, Mashhad branch
2. Department of Computer Science & Software Engineering Faculty of Engineering and Computer Science Concordia University
3. Department of Computer Engineering, Islamic Azad University, Mashhad branch

Abstract

Introduction

Gene functionality explorations has a great importance in health science research. developing gene classifiers having accurate prediction is crucial and desirable research.

Methods

in this paper, we introduce hybrid model for classification of 24 genes related to infectious disease from many unrelated genes, a “high imbalanced dataset”, in which the number of instances of one class is much lower than the other class. problems arise when the dataset is imbalanced, misclassification of minority class sample occurs due to an incorrect learning of the real boundaries samples, therefore our model apply clustering for under sampling of negative genes and a smot oversampling method for increasing positive gene samples. we select a decision tree model for classification, and use ensemble of some classifiers for gene classification using a majority voting technique.

Results

We success to build classifier which classified huge and high imbalanced data set with 81.12% accuracy, 79% sensitivity and 89% specificity. our model could perform on similar data sets.

Conclusion

According to our simulation study it is observed that the proposed approach improves classification performance compared to other similar approaches in the literature.furthermore, it is obvious that the smot method is suitable for reducing error rate.

Keywords

Gene classification, imbalanced data set, cluster based undersampling, smot, ensemble, decision tree

Congress Poster

Festival Poster

Abstracts [Archives]

Association between mtDNA D-Loop Mutation and secondary lung cancer

Cancer; Prevention, Diagnosis and Treatment

Lung cancer is one of the leading cause of cancer death in the world. The aim o...Read more
Production and evaluation of scFvs against HER2 antigen for breast cancer immunotherapy

Cancer; Prevention, Diagnosis and Treatment

Breast cancer is common cause of women death in the world. There is ErBb2/HER2 ...Read more
Interaction of Methotrexate on Stability and structure of Human Serum Albumin

Nano-Biotech Medical

Human serum albumin(HSA) as the most abundant carrier protein in blood plasma ha...Read more
FGF10: Type III Epithelial Mesenchymal Transition and Invasion in Breast Cancer Cell Lines

Cancer; Prevention, Diagnosis and Treatment

Metastasis is one of the important characteristics of a cancer cell. the process...Read more

International Congress on Biomedicine

Biomedicine Congress - Biology Congress - Medicine Congress