Call for Paper

CAE solicits original research papers for the July 2019 Edition. Last date of manuscript submission is June 30, 2019.

Read More

Design of parallelized HDWT Hardware Architecture with an Increased Throughput

James Ntaganda. Published in Parallel Computing.

Communications on Applied Electronics
Year of Publication: 2016
Publisher: Foundation of Computer Science (FCS), NY, USA
Authors: James Ntaganda

James Ntaganda. Design of parallelized HDWT Hardware Architecture with an Increased Throughput. Communications on Applied Electronics 5(2):22-27, May 2016. BibTeX

	author = {James Ntaganda},
	title = {Design of parallelized HDWT Hardware Architecture with an Increased Throughput},
	journal = {Communications on Applied Electronics},
	issue_date = {May 2016},
	volume = {5},
	number = {2},
	month = {May},
	year = {2016},
	issn = {2394-4714},
	pages = {22-27},
	numpages = {6},
	url = {},
	doi = {10.5120/cae2016652225},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Discrete Wavelet Transform-based applications such as Multimedia CODECs require intensive computation. With modern technologies, Hardware-Software Co-designing has become a common practice. Such designs are regarded as SoC (System on Chip), where designers dedicate hardware modules for computational-intensive functions and are called routinely by the main program code, which is why they are called hardware accelerators. Comparatively, multiplication operation takes longer, consumes more processing power and utilizes more hardware resources. Keeping other factors constant, any technique that can reduce multiplication operations is considered a better approach in hardware designs. This paper is based on our previous published concept, where we exploited the simplicity of Haar function and proposed a recursive structure of a non normalized Haar Discreet Transform. We implemented it as a multiplier-less Haar DWT (HDWT) hardware module which can spatially transform 16x16 arrays. However, in our previous work, most of the processing nodes remained idle and exhibited less throughput. To increase parallelization and hence throughput, in this paper we propose “mirror-reflection” approach. This reduces the number of redundant and idle processing nodes in the circuit module and makes the circuitry scalable. With this approach, circuit redundancy was reduced and circuit utilization efficiency increased from 47% to 69%. The scalable hardware Module is implemented using FPGA.


  1. James Ntaganda. 2015. Design and Implementation of a Relaxed Haar Discreet Wavelet Transform: Hardware module for Multimedia Compression. IOSR Journal of VLSI and Signal Processing (IOSR-JVSP)
  2. Chih-Hsien Hsia, et al. 2011 Memory-efficient architecture of 2-D lifting-based discrete wavelet transform, Journal of the Chinese Institute of Engineers
  3. Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. 1995 Wavelets for computer graphics: A primer, part 1. IEEE Computer Graphics and Applications.
  4. A. Jensen and A. la Cour-Harbo. 2001 Ripples in Mathematics, the Discrete Wavelet Transform, Springer-Verlag.
  5. Jason Spielfogel, Why we like MJPEG compression,[Accessed on 4th may, 2015].
  6. Dragomir El Mezeni et el . 2010 JPEG-XR encoder implementation on a heterogeneous multiprocessor system, 5th European Conference on Circuits and Systems for Communications (ECCSC'10), Belgrade, Serbia.
  7. Cristian Perra .2013 Re-encoding JPEG images for smart phone applications, 21st Telecommunications forum TELFOR, IEE.
  8. Koichi Hattori, Hiroshi Tsutsui et al.2009. A High-Throughput Pipelined Architecture for JPEG XR Encoding, IEEE Conference Publications, IEEE Conference Publications.
  9. Harish Yagain, Srinivas Donapati.2011. Addressing the Interoperability Issues While Using JPEG-XR, International Symposium on Electronic System Design.
  10. C. A. Christopoulos T. Ebrahimi and A. N. Skodras, JPEG2000: The New Still Picture Compression Standard, Media Lab, Ericsson Research, Ericsson Radio Systems AB, S-16480 Stockholm, Sweden.
  11. Saini, S. ; Mahajan, A. ; Mandalika, Implementation of Low Power FFT Structure using a Method Based on Conditionally Coded Blocks, IEEE Conference Publications 2010.
  12. ]. JOHN E. SHORE. 1973. On the Application of Haar Functions , Concise Papers , IEEE TRANSACTIONS ON COMMUNICATIONS
  13. Iain Richardson, The new standard for video coding released by ISO MPEG and ITU-T VCEG
  14. Zunera Idrees, Eliza Hashemiaghjekandi, Image Compression by Using Haar Wavelet Transform and Singular Value Decomposition, School of computer Science, Physics and mathematics Linnaeus University
  15. Viktor K. Prasanna, Viktor K. Prasanna.2014 Energy- and area-efficient parameterized lifting-based 2-D DWT architecture on FPGA, IEEE
  16. M. Nagabushanam P. Cyril Prasanna Raj, S. Ramachandran.2011. Design and FPGA implementation of modified Distributive Arithmetic based DWT-IDWT processor for image compression, Communications and Signal Processing (ICCSP), 2011 International Conference
  17. Parvatham vijay, Seetharaman Gopalakrishnan. 2013. Implementation of One Level 2D DWT Using Multiplier Less Modified Flipping Architecture, 2013 7th Asia Modelling Symposium , Hong Kong
  18. Khamees Khalaf Hasan, Nibong Tebal, Malaysia Umi Kalthum Ngah Mohd Fadzli Mohd Salleh.2013. Low complexity image compression architecture based on lifting wavelet transform and embedded hierarchical structures. Control System, Computing and Engineering (ICCSCE), 2013 IEEE International Conference


Haar DWT, Image Compression, Hardware Implementation, Mirror-reflection, FPGA