H.264 Video Codec - B Slice
Bidirectional prediction is very efficient to reduce the temporal correlation by using more reference pictures. Existing standards with B pictures utilize the bidirectional mode means two motion vectors, representing two estimates of the motion per macroblock partition or sub-macroblock partition. They can be from any reference picture in future or past in display order. Again, a constraint on the number of reference pictures that can be used for motion estimation is specified in the Levels definition. One prediction signal is derived from the subsequent inter picture, another from a previous picture. A weighted average of the pixel values in the reference pictures is then used as the predictor for each sample.
This H.264 generalizes this concept and supports not only forward/backward prediction pair, but also forward/forward and backward/backward pairs. Two forward references can be beneficial for motion compensated prediction of a region just before scene change, and two backward references just after scene change. In contrast to some other previous standards, bi-directionally predictive-coded slice may also be used as references for inter coding of other pictures.
Also H.264 introduces the direct-mode which does not require such side information but derives reference picture, block size, and motion vector data from the subsequent inter picture. Weighted prediction is also added for the gradual transitions from scene to scene.
- Direct Mode
In this mode the motion vectors for a macroblock are not explicitly sent. The encoder can specify in the slice header either for the decoder to derive the motion vectors by scaling the motion vector of the co-located macroblock in another reference picture or to derive it by inferring motion from spatially-neighboring regions. This feature allows B-frames to use "predicted" motion vectors instead of actually coding each frame's motion. This should save some bitrate and will improve the compression. Therefore it's recommended to always keep this setting enabled. There are four different modes available:
None: Disabled. For testing only. Not recommended.
Auto: Let the encoder decide the optimal setting for each frame. Highly recommended for all RC modes, but more efficient in Two Pass mode.
Temporal: Enforce prediction from neighboring frames.
Spatial: Enforce prediction from neighboring blocks within the current frame (usually preferred over Temporal).
- Weighted Prediction
All existing standards consider equal weights for reference pictures, i.e., a prediction signal is obtained by averaging with equal weights of reference signals. But, the gradual transitions from scene to scene need the different weights. The gradual transition is very popular in movies, a fade to black scene transition ('fade to black' : the luma samples of the scene gradually approach zero and the chroma samples of the scene gradually approach 128), a fade from black scene transition ('fade from black').
H.264 uses weighted prediction method for a macroblock of P slice or B slice. A prediction signal p for B slice is obtained by different weights from two reference signals, r1 and r2.
p = w1 ´ r1 + w2 ´ r2 (1)
where w1 and w2 are weights. These are differently determined according to two types, explicit and implicit, in encoder. For explicit, the factors are transmitted in the slice header. For implicit, the factors are calculated based on the temporal distance between the pictures. The smaller weight is applied if the temporal distance between the reference and current pictures is close and the larger weight for the temporally long distance.
This comes at the cost of some encoding speed. Since weighted B-Frames will generally improve the visual quality, it's recommended to always keep this setting enabled, except encoding speed is more important than quality.
The influence of B-frames on the data transfer complexity increase varies depending on the test case from 11 to 29%. The use of B-frames has an important effect on the decoding time: introducing a first B-frame requires an extra 50% cost for the very low bit rate video, 20 to 35% for medium and high bite-rate video. The extra time required by the second B-frame is much lower (a few %).