Communications on Applied Electronics |
Foundation of Computer Science (FCS), NY, USA |
Volume 5 - Number 7 |
Year of Publication: 2016 |
Authors: Varsha Babar, Roshani Ade |
10.5120/cae2016652323 |
Varsha Babar, Roshani Ade . A Novel Approach for Handling Imbalanced Data in Medical Diagnosis using Undersampling Technique. Communications on Applied Electronics. 5, 7 ( Jul 2016), 36-42. DOI=10.5120/cae2016652323
In many data mining applications the imbalanced learning problem is becoming ubiquitous nowadays. When the data sets have an unequal distribution of samples among classes, then these data sets are known as imbalanced data sets. When such highly imbalanced data sets are given to any classifier, then classifier may misclassify the rare samples from the minority class. To deal with such type of imbalance, several undersampling as well as oversampling methods were proposed. Many undersampling techniques do not consider distribution of information among the classes, similarly some oversampling techniques lead to the overfitting or may cause overgeneralization problem. This paper proposes an MLP-based undersampling technique (MLPUS) which will preserve the distribution of information while doing undersampling. This technique uses stochastic measure evaluation for identifying important samples from the majority as well as minority samples. Experiments are performed on 5 real world data sets for the evaluation of performance of proposed work.