This paper proposes a novel low-power surveillance video coding system that reduces both the internal power consumption inside the video encoder chip and the external power consumption by the external memory outside the chip. However, the high computation complexity of video coding standards requires a considerable amount of power consumption. H.264/AVC video coding standard and HEVC standard are widely used in many video surveillance systems due to their high compression efficiency. The increasing demand for video surveillance systems which support high resolutions and operate with battery power makes it important to reduce the power consumption and to prolong battery lifetimes. Various commercial video surveillance systems have become popular in recent years. The performance analysis showed that the optimized decoder used only 62mW for decoding per frame of FHD video by reducing power consumption by 37.0% and area by 39.9% than the original design thus, our technology would become an attractive solution for developers of the emerging consumer products. The proposed decoder was functionally verified on an FPGA-based platform and synthesized with a 65nm standard cell library. So our optimization method can save many resources used by the DSC decoder. In this paper, we describe the detailed design of the DSC decoder and optimize a line buffer size that occupies most of the decoder’s resources in terms of power and area consumption. Thus, many video compression techniques have been actively studied to support the bandwidth in a limited communication environment, and the Video Electronics Standard Association (VESA) has standardized a display stream compression (DSC) that provides visually lossless video quality while preserving low power consumption and implementation cost. In recent years, new emerging display-based consumer products such as AR/VR headsets and automotive video systems have required higher communication bandwidth due to higher bits per pixel and refresh rates. Our architecture can achieve fps real-time processing at the working frequency of 134.6 MHz, with 135 K gates and 8.93 KB on-chip memory. Only 320 processing elements (PE) within 550 cycles are required for IME search, where the SW is set to 256 × 256. Experimental results show that the proposed architecture can reach a good balance among complexity, on-chip memory, bandwidth, and the data flow regularity. Finally, the three levels are parallelized and pipelined to guarantee the gradual refinement of MMEA and the hardware utilization. Data sharing between IME and fractional motion estimation (FME) is achieved by loading only a local predictive SW at the finest level. Second, sub-sampled RDSS coupled with Level C + are adopted to reduce on-chip memory and bandwidth at the coarsest and middle level. First, a hardware-friendly MMEA algorithm is mapped into three-level pipelined architecture with neglected coding quality loss. In this paper, a three-level pipelined VLSI architecture design is proposed, where efficiently integrates the reference data sharing search (RDSS) into multi-resolution motion estimation algorithm (MMEA). Huge memory bandwidth requirements and unbearable computational resource demanding are two key bottlenecks in IME engine design, especially for large search window (SW) cases. Integer motion estimation (IME), which acts as a key component in video encoder, is to remove temporal redundancies by searching the best integer motion vectors for dynamic partition blocks in a macro-block (MB).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |