FAQs: Transformation and Quantization
1) What type of transformation is used in H.264?
Ans: H.264 uses an adaptive transform of block size, 4 x 4 and 8 x 8 (High Profiles only), whereas previous video coding standards used the 8 x 8 DCT. The smaller block size leads to a significant reduction in ringing artifacts. And for improved compression efficiency, H.264 also employs a hierarchical transform structure, in which the DC coefficients of neighboring 4 x 4 transforms for the luma signals are grouped into 4 x 4 blocks and transformed again by the Hadamard transform.
2) Why Hardmard transform is neccesary and when this transform will apply?
Ans: For blocks with mostly flat pel values, there is significant correlation among transform DC coefficients of neighboring blocks. Therefore, the standard specifies the 4 x 4 Hadamard transform for luma DC coefficients for 16 x 16 Intra-mode only, and 2 x 2 Hadamard transform for chroma DC coefficients, in 4:2:0 format. For 4:2:2 and 4:4:4 formats, the Hadamard block size is increased to reflect the enlarged block.
3) what is the adentage of having transformation block size 4x4?
Ans: The size of H.264 transforms is mainly 4×4, in special cases 2×2 (8x8 in High Profile). This smaller block size of 4×4 instead of 8×8 enables the encoder to better adapt the prediction error coding to the boundaries of moving objects, to match the transform block size with the smallest block size of the motion compensation, and to generally better adapt the transform to the local prediction error signal.
4) How transformation and quantization procedure is modified to avoid complex
Ans: For simple implementation, the exact transform process is modified to avoid the multiplications. Then the transform and quantization are combined by the modified forward integer transform, post-scaling, and quantization for the encoding; and inverse quantization, pre-scaling, and inverse integer transform for the decoding.
5) is this H.264 supports Perceptual-based quantization scaling matrices there any changes in the quantization procedure?
Ans: Yes, The High Profiles support the perceptual-based quantization scaling matrices as same concept used in MPEG-2. The encoder can specify a matrix for scaling factor according to the specific frequency associated with the transform coefficient for use in inverse quantization scaling by the decoder. This allows the optimization of the subjective quality according to the sensitivity of the human visual system, less sensitive to the coded error in high frequency transform coefficients. It typically does not improve objective fidelity as measured by mean-squared error (or, equivalently, PSNR), but it does improve subjective fidelity, which is really the more important criterion.
6) is there any default scaling matrix is defined in H.264?
7)And any ways to develop dynamic matrices?
Ans: Yes, by defining 2 QPs and using logarithemic/laplacian or any other distributions, we can generate perceptual-based quantization scaling matrices.
8) How many transform are applied in H.264?
Ans: Three different types of transforms are used.
The first type is applied to all samples of all prediction error blocks of the luminance component Y and also for all blocks of both chrominance components Cb and Cr regardless of whether motion compensated prediction or intra prediction was used. The size of this transform is 4×4. Its transform matrix H1 is shown in Figure 1.
If the macroblock is predicted using the type INTRA_16×16, the second transform, a Hadamard transform with matrix H2 (see Figure 1), is applied in addition to the first one. It transforms all 16 DC coefficients of the already transformed blocks of the luminance signal. The size of this transform is also 4×4.
The third transform is also a Hadamard transform but of size 2×2. It is used for the transform of the 4 DC coefficients of each chrominance component. Its matrix H3 is shown in Figure 1.
Figure 1: Matrices of 3 Different transforms applied in H.264/AVC
9) What is the transmission order in H.264, in INTRA_16X16?
Ans: The transmission order of all coefficients is shown in Figure 2. If the macroblock is predicted using the intra prediction type INTRA_16×16 the block with the label "−1" is transmitted first. This block contains the DC coefficients of all blocks of the luminance component. Afterwards all blocks labeled "0"–"25" are transmitted whereas blocks "0"–"15" comprise all AC coefficients of the blocks of the luminance component. Finally, blocks "16" and "17" comprise the DC coefficients and blocks "18"–"25" the AC coefficients of the chrominance components.
Figure 2: The Transmission Order of all Coefficients
10) Compared to previous coding algorithm how H.264 transformation technique is really good?
Ans: Yes, Compared to a DCT, all applied integer transforms have only integer numbers ranging from −2 to 2 in the transform matrix (see eqns 6, 11, 12). This allows computing the transform and the inverse transform in 16-bit arithmetic using only low complex shift, add, and subtract operations.
In the case of a Hadamard transform, only add and subtract operations are necessary. Furthermore, due to the exclusive use of integer operations mismatches of the inverse transform are completely avoided which was not the case in former standards and caused problems.All coefficients are quantized by a scalar quantizer. The quantization step size is chosen by a so called quantization parameter QP which supports 52 different quantization parameters. The step size doubles with each increment of 6 of QP. An increment of QP by 1 results in an increase of the required data rate of approximately 12.5%.