Multiview Video Coding
The need for multiview video coding is driven by two recent technological developments: new 3D video/display technologies and the growing use of multi-camera arrays. A variety of companies are starting to produce 3D displays that do not require glasses and can be viewed by multiple people simultaneously. The immersive experience provided by these 3D displays are compelling and have the potential to create a growing market for 3D video and hence for multiview video compression. Furthermore, even with 2D displays, multi-camera arrays are increasingly being used to capture a scene from many angles. The resulting multiview data sets allow the viewer to observe a scene from any viewpoint and serve as another application of multiview video compression.
In July 2008, MPEG officially approved an amendment of the ITU-T Rec. H.264 & ISO/IEC 14996-10 Advanced Video Coding (AVC) standard on Multiview Video Coding. MVC is an extension of the AVC/H.264 standard that provides efficient encoding or compressed representation of sequences captured simultaneously from multiple cameras by exploiting correlation among neighboring camera views. MVC is intended for encoding stereoscopic (two-view) video, as well as free viewpoint television and multi-view 3D television applications. The Stereo High profile has been standardized in June 2009; the profile is based on MVC toolset and is used in stereoscopic Blu-ray 3D releases.
MVC stream is backward compatible with H.264/AVC, which allows older devices and software to decode stereoscopic video streams, ignoring additional information for the second view. 3D video (3DV) and free viewpoint video (FVV) are new types of visual media that expand the user's experience beyond what is offered by 2D video. 3DV offers a 3D depth impression of the observed scenery, while FVV allows for an interactive selection of viewpoint and direction within a certain operating range. A common element of 3DV and FVV systems is the use of multiple views of the same scene that are transmitted to the user.
The overall structure of MVC defining the interfaces is illustrated in the figure below. The encoder receives N temporally synchronized video streams and generates one bitstream. The decoder receives the bitstream, decodes and outputs the N video signals.
Fig 1: Multiview Video Coding (MVC)
Multiview video contains a large amount of inter-view statistical dependencies, since all cameras capture the same scene from different viewpoints. Therefore, combined temporal and inter-view prediction is the key for efficient MVC. As illustrated in the figure below a picture of a certain camera can not only be predicted from temporally related pictures of the same camera. Also pictures of neighboring cameras can be used for efficient prediction.
Fig 2: Temporal/inter-view prediction structure for MVC
- Stereo Scope Video
- 3D Video/Display
- Free Viewpoint Video
- Multiview Coding Structures
- Multiview Coding Tools
- Inter-view Prediction
- Illumination Compensation
- Disparity, Depth and 3D Geometry Coding
- View Interpolation Prediction
- Scene Analysis and View Synthesis
- Random Access Aspects in MVC
- Multiview Communication Systems
- Multiview Coding Applications
- Performance and Complexity Issues in MVC