Call for Paper

CAE solicits original research papers for the March 2019 Edition. Last date of manuscript submission is February 28, 2019.

Read More

Efficient Model to Reduce False Positives using Outliers Detection in Big Data

Esraa Samir Ahmed, Laila A. Abd-Elmegid, Hala Abdel-Galil. Published in Information Sciences.

Communications on Applied Electronics
Year of Publication: 2018
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Esraa Samir Ahmed, Laila A. Abd-Elmegid, Hala Abdel-Galil
10.5120/cae2018652802

Esraa Samir Ahmed, Laila A Abd-Elmegid and Hala Abdel-Galil. Efficient Model to Reduce False Positives using Outliers Detection in Big Data. Communications on Applied Electronics 7(24):11-15, December 2018. BibTeX

@article{10.5120/cae2018652802,
	author = {Esraa Samir Ahmed and Laila A. Abd-Elmegid and Hala Abdel-Galil},
	title = {Efficient Model to Reduce False Positives using Outliers Detection in Big Data},
	journal = {Communications on Applied Electronics},
	issue_date = {December 2018},
	volume = {7},
	number = {24},
	month = {Dec},
	year = {2018},
	issn = {2394-4714},
	pages = {11-15},
	numpages = {5},
	url = {http://www.caeaccess.org/archives/volume7/number24/840-2018652802},
	doi = {10.5120/cae2018652802},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Emerging fields like Internet of Thing (IoT), sensor data, mobile computing, and social media are driving new forms and sources of data with distinct features. Big data is the term used to describe such type of data. Analytics of big data aims to use advanced analytic techniques on very large data sets that are collected from different sources in different formats. Mining anomalies from big data is a powerful mining task that is used mainly in critical systems. Applications work with big data require efficient outliers detection system. Outlier detection system in big data need to be efficiently designed to cope with its distinct features of volume, speed, complexity, and variety. The huge volume of outliers detected in big data is a barrier for outlier's diagnosis and analysis. Due to the cost of analysis each anomaly, outlier detector needs to be accurate as possible. Minimization of false positive alerts is a key feature that increases the accuracy of the detector system. This paper present a new propose model for reducing the false positive alerts using outliers detection. The proposed model uses cluster analysis by DBscan algorithm to highlight the outliers and then validates these outliers and reduces false positive alerts using Support Vector Machine (SVM) algorithm. The experimental study proves the efficiency of the proposed model with reported accuracy equals to 99%.

References

  1. Kaufman, L., & Rousseeuw, P. J. 2009. Finding Groups in Data: An Introduction to Cluster Analysis (Vol. 344). John Wiley & Sons, United States.
  2. Min Chen, Simone A. Ludwig, and Keqin Li, Clustering in Big Data , “K29224_C016” — 2017/1/12
  3. Han, Jiawei, and Micheline Kamber. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers, 2001.
  4. Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  5. YADIGAR ERDEM, CANER OZCAN "Fast Data Clustering And Outlier Detection Using K-Means Clustering On Apache Spark" International Journal of Advanced Computational Engineering and Networking, ISSN: 2320-2106, Volume-5, Issue-7, Jul.-2017
  6. L Rettig, M Khayati, P Cudr-Mauroux et al., "Online anomaly detection over Big Data streams", 2015 IEEE International Conference on, pp. 1113-1122, 2015.
  7. Ehsani-Besheli, F., Zarandi, H.R.: Context-aware anomaly detection in embedded systems. In: Zamojski, W.,Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J. (eds.) DepCoS-RELCOMEX 2017. AISC, vol. 582, pp. 151–165. Springer, Cham (2018). 
  8.  M. A. Hayes, M. A. Capretz, "Contextual anomaly detection framework for big sensor data", Journal of Big Data, vol. 2, no. 1, pp. 1-22, 2015.
  9. Janković, S., Zdravković, S., Mladenović, S., Mladenović, D., & Uzelac, A. (2016). The Use of Big Data Technology in the Analysis of Speed on Roads in the Republic of Serbia. Proceedings of the Third International Conference on Traffic and Transport Engineering (ICTTE Belgrade 2016), Belgrade, Serbia, 219-226.
  10. Lee, W-J. (2017) ‘Contextual air leakage detection in train braking pipes’, in Benferhat, S., Tabia, K. and Ali, M. (Eds.): International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017: Advances in Artificial Intelligence: From Theory to Practice, pp.191–200, Springer, Cham, Arras, France.
  11. Berhane Araya, Daniel, "Collective Contextual Anomaly Detection for Building Energy Consumption" (2016). Electronic Thesis and Dissertation Repository. 4027

Keywords

Big Data; DBscan; False Positive; SPARK; SVM