Call for Paper

CAE solicits original research papers for the August 2020 Edition. Last date of manuscript submission is July 31, 2020.

Read More

Data Deduplication: Its Significant Effect on Network Intrusion Dataset

Aladesote O. Isaiah, Adetunji A. Ademola. Published in Security.

Communications on Applied Electronics
Year of Publication: 2019
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Aladesote O. Isaiah, Adetunji A. Ademola
10.5120/cae2019652845

Aladesote O Isaiah and Adetunji A Ademola. Data Deduplication: Its Significant Effect on Network Intrusion Dataset. Communications on Applied Electronics 7(32):21-26, December 2019. BibTeX

@article{10.5120/cae2019652845,
	author = {Aladesote O. Isaiah and Adetunji A. Ademola},
	title = {Data Deduplication: Its Significant Effect on Network Intrusion Dataset},
	journal = {Communications on Applied Electronics},
	issue_date = {December 2019},
	volume = {7},
	number = {32},
	month = {Dec},
	year = {2019},
	issn = {2394-4714},
	pages = {21-26},
	numpages = {6},
	url = {http://www.caeaccess.org/archives/volume7/number32/864-2019652845},
	doi = {10.5120/cae2019652845},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

This research work adopted future extraction techniques on NSL KDD data set, using deduplication software written in C++ Programming Language, duplicated records of four attack types (DOS, R2L, Robing and U2R) were removed. Among the attack types for DOS, Mailbomb with 98.63% has highest percentage reduction rate while Apache2 with 40.30% reduction rate has the least. For R2L, Smpgetattack with 92.70% reduction has the highest while there was no reduction for Ftp_write. With 93.15% reduction, Nmap has the highest reduction rate under Probing attack while Mscan with 60.84% reduction rate has the least while 50% reduction rate for Sqlattack is the highest for U2R attack type. Wilcoxon Sign test is used to test for the significance of the deduplication and results revealed that all the attack types except U2R have significant reduction rate at 5% level.

References

  1. Aladesote O., Alese, K. & Dahunsi F. 2014. Intrusion Detection System using Hypothesis Testing. Proceedings of the World Congress on Engineering and Computer Science (WCECS) vol. I, 22-24.
  2. Amudha, P., Karthik & Sivakumari 2015. A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Feature. The Scientific World Journal, vol. 2015.
  3. Devi, R. & Thigarasu, V. 2014. A Novel Approach for Record Deduplication using Hidden Markov Model (HMM). International Journal of Computer Science and Information Technologies. 5(6), 8070 – 8073.
  4. Dirk M. 2013. Advanced Data Deduplication Technique and their Application. Dissertation Submitted at the Department of Mathematics & Informatics, Johannes Gutenberg University Mainz.
  5. Farid, Daramont, Harbi, et al., 2009. Adaptive Network Intrusion Detection Learning: Attribute Selection & Classification. International Journal of Computer and Information Engineering 3(12), 2009.
  6. Jaiganesh, V., Sumathi, D. & Mangayarkarasi, S. 2013. An Analysis of Intrusion Detection System using Back Propagation Neural Network. IEEE Computer Society Publication 2013.
  7. Jiang, Y., Lin, C., Meng, W. et al, 2014. Rule-based deduplication of article records from bibliographic databases. Database. Vol. 2014.
  8. Prajowal, M. 2014. A Practical Approach to Anomaly based Intrusion Detection System by Outlier Mining in Network Traffic. A Thesis Presented to the Masdar Institute of Science and Technology in Partial Fulfilment of the Requirements for the Degree of Master of Science in Computing and Information Science.
  9. Shona D. & Senthilkumar, 2016. An Ensemble Data Preprocessing Approach for Intrusion Detection System using Variant Firefly & BK-NN Techniques. International Journal of Applied Engineering Research, 11(6), 4161 – 4166.

Keywords

Deduplication, extraction techniques, attack types, Wilcoxon sign test, NSL-KDD