Tuesday, September 21, 2010

H.264 - Overview

H.264 Video Coding - Overview


H.264/MPEG-4 Part 10 or AVC (Advanced Video Coding) is a standard for video compression. The final drafting work on the first version of the standard was completed in May 2003.
H.264/MPEG-4 AVC is a block-oriented motion-compensation-based codec standard developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG). It was the product of a partnership effort known as the Joint Video Team (JVT). The ITU-T H.264 standard and the ISO/IEC MPEG-4 AVC standard (formally, ISO/IEC 14496-10 - MPEG-4 Part 10, Advanced Video Coding) are jointly maintained so that they have identical technical content. H.264 is used in such applications as players for Blu-ray Discs, videos from YouTube and the iTunes Store, web software such as the Adobe Flash Player and Microsoft Silverlight, broadcast services for DVB and SBTVD, direct-broadcast satellite television services, cable television services, and real-time videoconferencing.
The block diagram for H.264 codec is shown in Figure 1. The Encoder [Figure 1(a)] includes two dataflow paths, a "forward" path (left to right, shown in blue) and a "reconstruction" path (right to left, shown in magenta). The dataflow path in the Decoder [Figure 1(b)] is shown from right to left to illustrate the similarities between Encoder and Decoder.

Figure 1: Block diagram of H.264 Coding


Encoder: Forward Path

An input frame Fn is presented for encoding. The frame is processed in units of a macroblock (corresponding to 16x16 pixels in the original image). Each macroblock is encoded in intra or inter mode. In either case, a prediction macroblock P is formed based on a reconstructed frame. Intra coding provide access points to the coded sequence where decoding can begin correctly. Intra coding uses various spatial prediction modes to reduce spatial redundancy in the source signal for a single picture. In Intra mode, P is formed from samples in the current frame Fn that have previously encoded, decoded and reconstructed (uF'n in the Figures; note that the unfiltered samples are used to form P). Inter coding (predictive or bi-predictive) is more efficient, where prediction of each block of sample values is done from from one or more reference frame(s) using motion vectors. In the Figures, the reference frame is shown as the previous encoded frame F'n-1; however, the predicton for each macroblock may be formed from one or two past or future frames (in time order) that have already been encoded and reconstructed. The prediction P is subtracted from the current macroblock to produce a residual or difference macroblock Dn. The prediction residual is then further compressed using a transform (using a block transform) to remove spatial correlation in the block before it is quantized. This is transformed and quantized to give X, a set of quantized transform coefficients. These coefficients are re-ordered and encoded using entropy code such as context-adaptive variable length codes (CAVLC) or context adaptive binary arithmetic coding (CABAC). The entropy encoded coefficients, together with side information required to decode the macroblock (such as the macroblock prediction mode, quantizer step size, motion vector information describing how the macroblock was motion-compensated, etc) form the compressed bitstream. This is passed to a Network Abstraction Layer (NAL) for transmission or storage.


Encoder: Reconstruction Path

The quantized macroblock coefficients X are decoded in order to reconstruct a frame for encoding of further macroblocks. The coefficients X are re-scaled (Q-1) and inverse transformed (T-1) to produce a difference macroblock Dn'. This is not identical to the original difference macroblock Dn; the quantization process introduces losses and so Dn' is a distorted version of Dn. The prediction macroblock P is added to Dn' to create a reconstructed macroblock uF'n (a distorted version of the original macroblock). A deblocking filter is applied to reduce the effects of blocking distortion at the block boundaries and reconstructed reference frame is created from a series of macroblocks F'n.


Figure 2: Typical Structure of an H.264/AVC Encoder


The decoder receives a compressed bitstream from the NAL. The data elements are entropy decoded and reordered to produce a set of quantized coefficients X. These are rescaled (inverse quantized) and inverse transformed to give Dn' (this identical to the Dn' as in the Encoder). Using the header information decoded from the bitstream, the decoder creates a prediction macroblock P, identical to the original prediction P formed in the encoder. P is added to Dn' to produce uF'n which this is filtered to create the decoded macroblock F'n.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.