Transformation and Quantization
Both source pictures and prediction residuals have high spatial redundancies. H.264 Standard is based on the use of a block-based transform for spatial redundancy removal. After inter prediction from previously-decoded samples in other pictures or spatial-based prediction from previously-decoded samples within the current picture, the resulting prediction residual is split into 4 x 4 or 8 x 8 blocks. These are converted into the transform domain where they are quantized.
H.264 uses an adaptive transform block size, 4 x 4 and 8 x 8 (High Profiles only), whereas previous video coding standards used the 8 x 8 DCT. The smaller block size leads to a significant reduction in ringing artifacts. Also, the 4 x 4 transform has the additional benefit of removing the need for multiplications.
For improved compression efficiency, H.264 also employs a hierarchical transform structure, in which the DC coefficients of neighboring 4 x 4 transforms for the luma signals are grouped into 4 x 4 blocks and transformed again by the Hadamard transform.
For blocks with mostly flat pel values, there is significant correlation among transform DC coefficients of neighboring blocks. Therefore, the standard specifies the 4 x 4 Hadamard transform for luma DC coefficients for 16 x 16 Intra-mode only, and 2 x 2 Hadamard transform for chroma DC coefficients.
In some applications, it is desired to reduce the quantization step size to improve PSNR to levels that can be considered visually lossless. To achieve this, the H.264 extends the quantization step sizes QP by two additional octaves, redefining the tables and allowing QP to vary from 0 to 51.
In general transform and quantization require several multiplications resulting in high complexity for implementation. So, for simple implementation, the exact transform process is modified to avoid the multiplications. Then the transform and quantization are combined by the modified integer forward transform, quantization, scaling.
Described below are the steps for the forward integer transform, post-scaling, and quantization for the encoding; and inverse quantization, pre-scaling, and inverse integer transform for the decoding.
- Encoding Process
For DCT transform of a 4 x 4 input luma data F, the exact formula is as follows:
The variables a, b, c are as follows:
However, in order to simplify the implementation of the transform, c is approximated by 0.5. For ensuring orthogonality, b also needs to be modified so that:
Multiplication in the transform process is avoided by integrating it with the quantization. So, equation (1) is modified as
The symbol denotes the element by element multiplication of the corresponding matrices.
In step 1, only integer transform is implemented without multiplication term SF. The term SF is combined with quantization in step 2.Step 2: Post-Scaling and Quantization
A transformed and quantized signal Y is obtained by SF ('post-scaling') and quantization with step size Qstep. H.264 defines a total of 52 values for Qstep.
- Decoding Process
A received signal Y in the decoder is scaled with Qstep and SF as the inverse quantization and a part of inverse transform.Step 2: Inverse Integer Transform
Additionally, for luma (4 x 4) DC coefficients in 16 x 16 Intra-mode, 2D Hadamard transform is applied. This is a hierarchical transform.
Also, for chroma DC coefficients in 4:2:0 format, the transform matrix is as follows. For 4:2:2 and 4:4:4 formats, the Hadamard block size is increased to reflect the enlarged block.
The following integer DCT matrix is applied to 8 x 8 luma components, defined only in High Profiles.
Note that only 4x4 IntDCT is always applied to chroma components.
FRExtensions suggest default perceptual weighting matrices for (4x4) and (8x8) IntDCT coefficients. Scaling matrix reflecting visual perception is simply a multiplier applied during the inverse quantization. (This, itself is a multiplication) Weighting matrices can be customized separately for 4x4 Intra Y, 4x4 Intra Cb, Cr, 4x4 Inter Y, 4x4 Inter Cb, Cr, 8x8 Intra Y and 8x8 Inter Y. Besides these two new schemes, FRExt has added entropy-coded transform-bypass lossless macroblocks, residual color transform for efficient RGB coding without color format conversion loss or bit expansion and 9 intra 8x8 prediction modes.
Encoder can design and use customized perceptual scaling matrices. These are to be sent to the decoder at the sequence or picture level. Default scaling matrix for (8x8) IntDCT coefficients is shown below.
In FRExt, two scans similar to 4x4 transform switched for frame/field coding are shown. Coefficient scanning is based on the decreasing variances and to maximize number of zero-valued coefficients along the scan.
If a macroblock is compressed using the 4x4 transform in the field mode, the scanning order of the coefficients is modified to be more efficient for field scanning as shown in Fig. 4b – reflecting the decreased correlation of the source data in the vertical dimension. For other block sizes (8x8 for luma, 2x2 for 4:2:0 chroma DC, and 2x4 for 4:2:2 chroma DC), the same basic concepts apply, with scan orders specified for each block size.
- Quantization Scaling Matrices
The High Profiles support the perceptual-based quantization scaling matrices as same concept used in MPEG-2. The encoder can specify a matrix for scaling factor according to the specific frequency associated with the transform coefficient for use in inverse quantization scaling by the decoder. This allows the optimization of the subjective quality according to the sensitivity of the human visual system, less sensitive to the coded error in high frequency transform coefficients. It typically does not improve objective fidelity as measured by mean-squared error (or, equivalently, PSNR), but it does improve subjective fidelity, which is really the more important criterion. Default values for the quantization scaling matrices are specified in the standard, and the encoder can choose to instead use customized values by sending a representation of those values at the sequence or picture level.
For the quantization scaling, (8) and (9) are modified as follows.
where wij is a scaling factor.
- Transform Coding
The transmission order of all coefficients is shown in Figure 1. If the macroblock is predicted using the intra prediction type INTRA_16×16 the block with the label "−1" is transmitted first. This block contains the DC coefficients of all blocks of the luminance component. Afterwards all blocks labeled "0"–"25" are transmitted whereas blocks "0"–"15" comprise all AC coefficients of the blocks of the luminance component. Finally, blocks "16" and "17" comprise the DC coefficients and blocks "18"–"25" the AC coefficients of the chrominance components.
Compared to a DCT, all applied integer transforms have only integer numbers ranging from −2 to 2 in the transform matrix (see eqns 6, 11, 12). This allows computing the transform and the inverse transform in 16-bit arithmetic using only low complex shift, add, and subtract operations.