"Modeling and Real-Time Embedded Implementation of an H.263+ Decoder"
Literature Survey
Jianlan Song – Qian Wang
March 13, 1997
Table of contents
0. Abstract
1. Introduction
2. Brief Specification of H.263
3. Improvements of H.263+ over H.263
4. Performance and quality enhancement of H.263 codec
5. TMS320C54x features
6. TMS320C54x Optimizing C Compiler/Assembler/Linker
7. Future direction
8. Reference
H.263+ is the second version of H.263, which is a provisional ITU-T standard. It provides twelve new negotiable modes. These modes improve compression performance, allow the use of scalable bit streams, enhance performance over packet-switched networks, support custom picture size and clock frequency and provide supplemental display and external usage capabilities. TMS320C54s generation of DSPs combine high performance, a large degree of parallelism, and a specialized instruction set to effectively implement a variety of complex algorithms and applications. This literature survey is for implementing the H.263+ decoder on a ‘C54 DSP.
In the past few years, there has been significant interest in digital video applications. Digital video compression is one of the key issues in video coding, enabling efficient interchange and distribution of visual information. Several successful standards have emerged, e.g. ITU-T H.261, H.263, ISO/IEC MPEG-1, MPEG-2 and MPEG-4.
H.263 is a provisional ITU-T standard. It was designed for low bitrate communication. Version 1 of the international standard ITU-T H.263 entitled "Line Transmission of Non-Telephone Signal, Video coding for Low Bit Rate Communications" provides better picture quality at low bit rates with little additional complexity comparing to H.261. H.263+ is version 2 of H.263. The objective of H.263+ is to broaden the range of applications and to improve compression efficiency.
2. Brief Specification of H.263
An outline block diagram of the codec is given in figure 1.
The H.263 video standard is based on techniques common to many current video coding standards. Motion compensated prediction, usually based on simple Block Matching Algorithms (BMAs), first removes temporal redundancies. The decoder has motion compensation capability, allowing optional incorporation of this technique in the coder. DCT-based algorithms are then used for residual coding. The quantized DCT coefficients, motion vectors, and headers are entropy coded using Variable Length Codes (VLCs). H.263 supports five standardized picture formats: sub-QCIF, QCIF, CIF, 4CIF and 16CIF. The luminance components of the picture is sampled at these resolutions while the chrominance components, Cb and Cr, are downsampled by 2 in both the horizontal and vertical directions.
In addition to the core encoding and decoding algorithms described above, H.263 includes four negotiable advanced coding modes: Unrestricted Motion Vector mode, Syntax-based Arithmetic Coding mode, Advanced Prediction mode, PB-frames mode.
H.263+ offers many improvements over H.263. It allows the use of a wide range of custom source formats while only five video source formats defining picture size, picture shape and clock frequency can be used by H.263. Moreover, picture size, aspect ratio and clock frequency can be specified as part of the H.263+ bit stream. Another major improvement of H.263+ over H.263 is scalability, which can improve the delivery of video information in error-prone, packet-lossy, or heterogeneous environments by allowing multiple display rates, bit rates, and resolution to be available at the decoder.
H.263+ provides 12 new optional modes: Unrestricted Motion Vector mode, advanced Intra Coding mode, Deblocking Filter mode, Slice Structured mode, Supplemental Enhancement Information mode, Improved PB-frames mode, Reference Picture Selection mode, Temporal, SNR, and Spatial Scalability mode, Reference Picture Resampling mode, Reduced Resolution Update mode, Independently Segmented Decoding mode, Alternative Inter VLC mode, Modified Quantization mode. These modes improve compression performance, allow the use of scalable bit streams, enhance performance over packet-switched networks, support custom picture size and clock frequency and provide supplemental display and external usage capabilities.
The H.263 coder suffers from "blocking" effects when operating at very low bit rates. In [2], the authors analyzed performance of H.263 coder at constant bit rates and frame rates. The H.263 coder, when operating at very low bit rates of 8 kb/s at 7.5f/s, produces usable images but with a lot of visual artifacts, the so-called "blocking" effects. The artifacts are produced as a result of coarse quantization of the high frequency DCT coefficients to maintain a low bit rate, thereby creating discontinuities along the block boundaries. Visual quantization matrix shown in figure 3 is applied to both luminance and chrominance components to effectively provide a nonlinear quantization process where each DCT coefficient has its own effective step size. This process is reversed in the decoder. So the visual degradations can be greatly reduced by employing visual quantization which has the advantage of easy implementation without increasing the bit rate. Simulation results showed that there is an overall improvement in Peak Signal-to-Noise Ratio (PSNR) values but, more significantly, the "blocking" effects are greatly reduced.
In real time implementation, in order to transmit the variable rate compressed bit stream over a fixed rate channel, a channel buffer is required. To prevent the buffer from overflowing and underflowing, buffer control mechanism must be used to regulate the fluctuation of the output bits. For low bit-rate video compression, in order to limit the delay to an acceptable value, the size of the buffer needs to be scaled down. However, the frame-rate is considerably reduced to give each frame sufficient number of bits for reasonable picture quality, the average number of bits per frame will be relatively larger compared to the buffer size. This implied that the effectiveness of the buffer is more limited and the buffer regulation is more difficult.
In a previous work [3], the authors applied a new buffer control algorithm, bit allocation algorithm, to a modified version of the H.263 algorithm for very low bit-rate video coding. The algorithm is based on the use of bit allocation algorithm to determine the quantization scale factors in such coder to meet a given target bit rate. The salient features of the scheme are that i) the quantization scale factors are determined using information of the whole picture; ii) it has precise control of the buffer; and iii) it tries to allocate the given number of bits as efficient as possible in a rate-distortion sense. Simulation results show that the modified H.263 codec has a better visual quality at comparable PSNR values.
As we will implement the H.263+ decoder on a TMS320C54 DSP chip, here we will address about the TI ‘C54x family.
The ‘C54x DSPs are optimized for wireless communications terminals and basestations. They have specialized features such as a Viterbi Accelerator for reducing Viterbi "butterfly update" and three power-down modes for extended battery life in mobile applications.
The ‘C54x fixed-point digital signal processors are fabricated with an advanced modified Harvard architecture that has one program memory bus and three data memory buses. The core’s key features include a 17-bit ‘ 17-bit multiplier (16-bit signed or unsigned), a dedicated 40-bit ALU for increased parallelism, two 40-bit accumulators, and a compare, select, store unit (Viterbi accelerator). The ‘C54x utilizes a highly specialized dual-operand instruction set, which is the bases of the operational flexibility and speed of these DSPs. The ‘C54x also includes eight auxiliary registers and a software stack to enable a highly-optimized C compiler. The devices lower power consumption, reduce chip count, and enable system cost savings for communications applications.
The TI optimizing C compilers translate ANSI-standard, C language files into highly efficient TMS320 assembly language source files, which are then input to the TMS320 assembler/linker. TI C compilers are complemented by the standard TMS320 programmer’s interface for debugging C and assembly source code. The C compilers produce a rich set of debugging information, which allows source-level debugging in C to enhance productivity and shorten the application development cycle.
The TMS320C54x Optimizing C Compiler/Assembler/Linker has three major efficiency goals in mind: Produce compiled general-purpose C code that approaches the performance of hand-coded assembly language; Provide a simple and accessible programming interface to the C run-time environment so that critical DSP algorithms, demanding extreme performance, can be implemented in assembly language; Establish a comprehensive, easy-to-use tool set for the development of high-performance DS applications in C. Additional key features of the TMS320 C code generation tools include: ANSI standard runtime-support library; ROM-albe, relocateable, and re-entrant code; a C shell program that facilitated one-step translation from C source to executable code.
In this literature survey, we have defined our problem, explored how others have approached the problem. The next step involves the modeling and implementation of the H.263+ real time decoder on a C54 DSP.
8. References
[1] Guy Cote, Berna Erol, Michael Gallant, and Faouzi Kossentini, "H.263+:
Video Coding at Low Bit Rates,".............
[2] K. N. Ngan, D. Chai, A. Millin, "Very low bit rate video coding using
H.263 coder," IEEE Trans. on Curcuits and Systems for Video Technology,
Vol. 6, No. 3, pp. 308-312, 1996.
[3] K. T. Ng, S. C. Chan, T. S. Ng, "A modified H.263 algorithm using bit
allocation buffer control algorithm," Proc. of 1997 IEEE International
Symposium on Circuits and Systems. Circuits and Systems in the Information
Age. ISCAS '97, vol. 2, pp. 1389-1392.
[4] ITU Telecom. Standardization Sector of ITU, "Video Coding for Low Bitrate
Communication," Draft ITU-T Recommendation H.263, March 1996.