Call for Paper

CAE solicits original research papers for the October 2019 Edition. Last date of manuscript submission is September 30, 2019.

Read More

Segmentation Accuracy for Offline Arabic Handwritten Recognition Based on Bounding Box Algorithm

Ismail A. Humied. Published in Algorithms.

Communications on Applied Electronics
Year of Publication: 2016
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: Ismail A. Humied
10.5120/cae2016652364

Ismail A Humied. Segmentation Accuracy for Offline Arabic Handwritten Recognition Based on Bounding Box Algorithm. Communications on Applied Electronics 5(9):20-30, September 2016. BibTeX

@article{10.5120/cae2016652364,
	author = {Ismail A. Humied},
	title = {Segmentation Accuracy for Offline Arabic Handwritten Recognition Based on Bounding Box Algorithm},
	journal = {Communications on Applied Electronics},
	issue_date = {September 2016},
	volume = {5},
	number = {9},
	month = {Sep},
	year = {2016},
	issn = {2394-4714},
	pages = {20-30},
	numpages = {11},
	url = {http://www.caeaccess.org/archives/volume5/number9/649-2016652364},
	doi = {10.5120/cae2016652364},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Character segmentation plays an important role in the Arabic optical character recognition (OCR) system, because the letters incorrectly segmented perform to unrecognized character. Accuracy of character recognition depends mainly on the segmentation algorithm used. The domain of off-line handwriting in the Arabic script presents unique technical challenges and has been addressed more recently than other domains. Many different segmentation algorithms for off-line Arabic handwriting recognition have been proposed and applied to various types of word images. This paper provides modify segmentation algorithm based on bounding box to improve segmentation accuracy using two main stages: preprocessing stage and segmentation stage. In preprocessing stage, used a set of methods such as noise removal, binarization, skew correction, thinning and slant correction, which retains shape of the character. In segmentation stage, the modify bounding box algorithm is done. In this algorithm a distance analysis use on bounding boxes of two connected components (CCs): main (CCs), auxiliary (CCs). The modified algorithm is presented and taking place according to three cases. Cut points also determined using structural features for segmentation character. The modified bounding box algorithm has been successfully tested on 450 word images of Arabic handwritten words. The results were very promising, indicating the efficiency of the suggested approach.

References

  1. Elzobi, M., Al-Hamadi, A., Dinges, L., Michaelis, B.: A structural features based segmentation for off-line handwritten Arabic text. In: 2010 5th International Symposium on I/V Communications and Mobile Network (ISVC), pp. 1–4. Rabat, Morocco (2010).
  2. Belaïd, A., Choisy, C.: Human reading based strategies for offline arabic word recognition. In: Proceedings of the 2006 Conference on Arabic and Chinese Handwriting Recognition, SACH’06, pp. 36–56. Springer-Verlag, Berlin, Heidelberg (2008).
  3. Al Aghbari, Z., Brook, S.: Hahmanuscripts: a holistic paradigm for classifying and retrieving historical arabic handwritten documents. Expert Syst. Appl. 36(8), pp. 10942–10951 (2009).
  4. Lavrenko, V., Rath, T.M., Manmatha, R.: Holisticword recognition for handwritten historical documents. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries, pp. 278–287. ACM, New York (2004).
  5. Blumenstein, M.: Cursive character segmentation using neural network techniques. In: Marinai, S., Fujisawa, H. (eds.) Machine Learning in Document Analysis and Recognition, vol. 90 of Studies in Computational Intelligence, pp. 259–275. Springer, Berlin (2008).
  6. Lorigo, L., and Govindaraju, V.: Segmentation and Pre-Recognition of Arabic Handwriting. In Proceedings of the Eighth International Conference on Document Analysis and Recognition, vol. 2, pp. 605-609 (2005).
  7. Xiu, P., Peng, L., Ding, X., and Wang, H.: Offline Handwritten Arabic Character Segmentation with Probabilistic Model. Document Analysis Systems. VII, pp. 402-412(2006).
  8. Abdulla, S., Al-Nassiri, A., and Salam, R.A.: Offline Arabic Handwriting Word Segmentation Using Rotational Invariant Segments Features. The International Arab Journal of Information Technology. vol. 5, no. 2 . pp. 200-208(2008).
  9. AlKhateeb,J.H.,Jiang ,J., Ren, J., & Ipson, S.: Component-based Segmentation of words from handwritten Arabic text. International Journal of Computer Systems Science and Engineering, 5(1) (2009).
  10. Al-Hamad H.A., Zitar R. A.: Development of an efficient neural -Based Segmentation Technique for Arabic Handwriting Recognition. Pattern Recognition, vol. 43, no. 8, pp. 2773–2798(2010).
  11. Elzobi, M., Al-Hamadi, A., Al Aghbari, Z.: Off-line Handwritten Arabic Words Segmentation Based on Structural Features and Connected Components Analysis. In I/V Communications and Mobile Network (ISVC) (2011).
  12. Lawgali, A., Bouridane, A., Angelova, M., and Ghassemlooy, Z.: Automatic segmentation for Arabic characters in handwriting documents. In Image Processing (ICIP), 18th International Conference on IEEE, pp. 3529-3532. IEEE (2011).
  13. Eraqi, H., M., and Abdelazeem. S.: A new Efficient Graphemes Segmentation Technique for Offline Arabic Handwriting. Frontiers in Handwriting Recognition (ICFHR), International Conference on. IEEE, 2012.
  14. Samoud, F.B., Maddouri, S.S., and Amiri, H.: Three Evaluation Criteria's towards a Comparison of Two Characters Segmentation Methods for Handwritten Arabic Script. Frontiers in Handwriting Recognition (ICFHR), International Conference on IEEE (2012).
  15. Al Hamad, Husam A.: Neural-Based Segmentation Technique for Arabic Handwriting Scripts. 21st International Conference on Computer Graphics, Visualization and Computer Vision, WSCG (2013).
  16. Osman, Y.: Segmentation Algorithm for Arabic Handwritten Text based on Contour Analysis. International Conference on computing, Electrical and Engineering (ICCEEE) (2013).
  17. Elnagar, A., and Bentrcia, R.: A Recognition-Based Approach to Segmenting Arabic Handwritten Text. Journal of Intelligent Learning Systems and Applications, 7, pp. 93-103 (2015).
  18. MELHI, M., H.: Off-Line Arabic Cursive Handwriting Recognition Using Artificial Neural Networks; PhD thesis. Department of Cybernetics, Internet and Virtual Systems. Bradford, University Bradford (2001).
  19. PLAMONDON, R., and SRIHARI, S. N.: Online and off-line handwriting recognition: a comprehensive survey. Pattern Analysis and Machine Intelligence, IEEE Transactions on, pp. 22, 63-84 (2000).
  20. W. M. Newman and R. F. Sproull: Principles of Interactive Computer Graphics. Sec 17.2. 2nd edition, McGraw Hill (1989).
  21. Ali, A., Shaout, A., Elhafiz, M.: Two stage classifier for Arabic Handwritten Character Recognition, International Journal of Advanced Research in Computer and Communication Engineering, pp 646- 650 (2015).
  22. Bassil, Y., Alwani, M.: Ocr Post-Processing Error Correction Algorithm Using Google's Online Spelling Suggestion, Journal of Emerging Trends in Computing and Information Sciences, Vol. 3, No. 1 (2012).
  23. Zeki, A.M.: The segmentation problem in Arabic character recognition: The state of the art. First International Conference on Information and Communication Technologies, ICICT, pp. 11–26 (2005).
  24. Farooq, F., Govindaraju, V., and Perrone, M.: Pre-processing Methods for Handwritten Arabic Documents”, In Eighth International Conference on Document Analysis and Recognition, vol. 1, pp. 267–271 (2005).
  25. Gonzalez, R., and Woods, R., Digital Image Processing (3rd Edition), Prentice Hall, August (2008).
  26. Boubaker, H., Kherallah, M., Alimi, A.M.: New Algorithm of Straight or Curved Baseline Detection for Short Arabic Handwritten Writing. 10th International Conference on Document Analysis and Recognition. ICDAR '09, pp. 778-782. Washington, DC, USA. IEEE Computer Society (2009).
  27. ZHANG, T., and SUEN C.: A Fast Parallel Algorithm for Thinning Digital Patterns, Communications of the ACM, Volume 27 Number 3, pp 236-239 (1984).
  28. Bunke.H. and Wang. P.S.P.: Handbook of Character Recognition and Document Image Analysis. Chapter Image Processing Methods for Document Image Analysis, pp. 15, 19. World Scientic ( 1997).
  29. Slavik, P., Govindaraju, V. (eds.): Equivalence of different methods for slant and skew corrections in word recognition applications. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), pp. 323–326 (2001).
  30. Al Aghbari, Z.: HAH manuscripts: Aholistic paradigm for classifying and retrieving historical Arabic handwritten documents, Expert Systems with Applications, Vol 36, pp. 10943- 10951 (2009).
  31. El-Abed, H., and Margner, V.: Comparison of Different Preprocessing and Feature Extraction Methods for Offline Recognition of Handwritten Arabic Words. In Ninth International Conference on Document Analysis and Recognition, vol. 2, pp. 974-978 (2007).
  32. Elzobi, M., Al-Hamadi, A., Al Aghbari, Z., and Dings, L.: IESK-ArDB: a database for handwritten Arabic and an optimized topological segmentation approach, In International Journal on Document Analysis and Recognition (IJDAR) (2012).
  33. Vincent, L.: Morphological grayscale reconstruction in image analysis: applications and efficient algorithms. IEEE Trans. Image Process. 2, pp. 176–201 (1993).

Keywords

Arabic OCR, Off-line Handwriting Segmentation; Connected Components, Pattern Recognition.