Call for Paper

CAE solicits original research papers for the August 2021 Edition. Last date of manuscript submission is July 31, 2021.

Read More

Classification of Imbalanced Data of Medical Diagnosis using Sampling Techniques

Varsha Babar. Published in Information Sciences.

Communications on Applied Electronics
Year of Publication: 2021
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Varsha Babar
10.5120/cae2021652883

Varsha Babar. Classification of Imbalanced Data of Medical Diagnosis using Sampling Techniques. Communications on Applied Electronics 7(36):7-12, May 2021. BibTeX

@article{10.5120/cae2021652883,
	author = {Varsha Babar},
	title = {Classification of Imbalanced Data of Medical Diagnosis using Sampling Techniques},
	journal = {Communications on Applied Electronics},
	issue_date = {May 2021},
	volume = {7},
	number = {36},
	month = {May},
	year = {2021},
	issn = {2394-4714},
	pages = {7-12},
	numpages = {6},
	url = {http://www.caeaccess.org/archives/volume7/number36/881-2021652883},
	doi = {10.5120/cae2021652883},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

When there is gigantic difference between the ratio of two classes in the classification algorithms, then the classifier may tend to favor the instances of majority class whereas, it becomes difficult for the classifier to learn the minority class samples. Either, undersampling is used or oversampling is used for this imbalance but, most of the undersampling techniques does not consider distribution of information among the classes while the oversampling technique leads overfitting of the trained model. So, to resolve this issue integration of undersampling as well as oversampling technique can be done. Majority class samples can be undersampled using a new approach, namely, MLP-based undersampling technique (MLPUS). Majority Weighted Minority Oversampling Technique (MWMOTE) can be used for generating the synthetic samples for minority class. The main objective is to handle the imbalance classification problem occurring in the medical diagnosis of rare diseases and combines the benefits of both undersampling and oversampling Experiments are performed on 7 real world data sets for the evaluation of proposed framework’s performance.

References

  1. P.M. Murphy and D.W. Aha, UCI Repository of Machine Learning Databases, Dept. of Information and Computer Science, Univ. of California, Irvine, CA, 1994.
  2. M. Kubat, R.C. Holte, and S. Matwin, Machine Learning for the Detection of Oil Spills in Satellite Radar Images, Machine Learning, vol. 30, no. 2/3, pp. 195-215, 1998.
  3. D. Lewis and J. Catlett, Heterogeneous Uncertainty Sam-pling for Supervised Learning, Proc. Intl Conf. Machine Learning, pp. 148- 156, 1994.
  4. N. Japkowicz, C. Myers, and M. Gluck, A Novelty Detec-tion Approach to Classification, Proc. 14th Joint Conf. Ar-tificial Intelligence, pp. 518-523, 1995.
  5. C.X. Ling and C. Li, Data Mining for Direct Marketing: Problems and Solutions, Proc. Intl Conf. Knowledge Dis-covery and Data Mining, pp. 73-79, 1998.
  6. Wing W. Y. Ng, Junjie Hu, Daniel S. Yeung, Shaohua Yin, and Fabio Roli, Diversified Sensitivity-Based Undersam-pling for Imbalance Classification Problems, IEEE Trans. Cybernetics vol. 45, no. 11, Nov. 2015.
  7. Sukarna Barua, Md. Monirul Islam,Xin Yao, MW-MOTEMajority Weighted Minority Oversampling Tech-nique for imbalanced data set learning, IEEE Trans. Knowledge and data engineering, vol. 26, no. 2, February 2014
  8. Varsha S. Babar, Roshani Ade, A Review on Imbalanced Learning Methods, International Journal of Computer Ap-plications (IJCA) 0975 8887 , Dec 2015.
  9. Xingyi LIUi, -sensitive Decision Tree with Missing Values and Multiple Cost Scales,Intl Joint Conf. on Artificial In-telligence, 2009.
  10. Jing Zhang, XindongWu and Victor S. Sheng, Active Learning with Imbalanced Multiple Noisy Labeling, IEEE Trans. on Cybernetics, vol. 45, no. 5, May 2015.
  11. ZhiQiang ZENG and ShunZhi ZHU, A Kernel-based Sam-pling to Train SVM with Imbalanced Data Set, Conference Anthology, IEEE, January 2013.
  12. H. He and E.A. Garcia, Learning from Imbalanced Data, IEEE Trans. Knowledge Data Eng., vol. 21, no. 9, pp. 1263-1284, Sept. 2009.
  13. X.Y. Liu, J.Wu, and Z.H. Zhou, Exploratory Under Sam-pling for Class Imbalance Learning, Proc. Intl Conf. Data Mining, pp. 965- 969, 2006.
  14. J. Zhang and I. Mani, KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extrac-tion, Proc. Intl Conf. Machine Learning,Workshop Learn-ing from Imbalanced Data Sets, 2003.
  15. M. Kubat and S. Matwin, Addressing the Curse of Imbal-anced Training Sets: One-Sided Selection, Proc. Intl Conf. Machine Learning, pp. 179-186, 1997.
  16. Victor H. Barella, Eduardo p. Costa, and Andre C P L F Carvalho, ClusterOSS: a new undersampling method for imbalanced learning.
  17. H.He, Self-Adaptive Systems for Machine Intelli-gence,Wiley, Aug 2011
  18. N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyr, SMOTE: Synthetic Minority oversampling Technique,J. Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
  19. H. Han, W.Y. Wang, and B.H. Mao, Borderline-SMOTE: A New Oversampling Method in Imbalanced Data Sets Learning, Proc. Intl Conf. Intelligent Computing, pp. 878-887, 2005.
  20. C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, Safe-level-SMOTE: Safe level-synthetic minority over-sampling technique for handling the class imbalanced problem, in Advances in Knowledge Discovery and Data Mining. Berlin, Germany: Springer, 2009, pp. 475 482, 2009.
  21. T. Maciejewski and J. Stefanowski, Local neighbourhood extension of SMOTE for mining imbalanced data, in Proc. IEEE Symp. Comput. Intell. Data Min. (CIDM), Paris, France, pp. 104111, 2011.
  22. E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, SMOTE-RSB*: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory, Knowl. Inf. Syst., vol. 33, no. 2, pp. 245265, 2012.
  23. Reshma C. Bhagat and Sachin S. Patil, Enhanced SMOTE Algorithm or Classification of Imbalanced Big-Data using Random Forest, IEEE International Advance Comput-ing Conference (IACC), 2015.
  24. H. He, Y. Bai, E.A. Garcia, and S. Li, ADASYN: Adap-tive Synthetic Sampling Approach for Imbalanced Learn-ing, Proc. Intl Joint Conf. Neural Networks, pp. 1322-1328, 2008.
  25. S. Chen, H. He, and E.A. Garcia, RAMOBoost: Ranked Minority Oversampling in Boosting, IEEE Trans. Neural Networks, vol. 21, no. 20, pp. 1624-1642.

Keywords

Sampling technique, Imbalance Data, MLPUS, MWMOTE, Ensemble Technique