Quality Assessment in the context of FTV: challenges and first answers


FTV a step towards interactivity: addressing QoE but first Image/video quality as one component of QoE Bench of work and standard on how to measure image and video quality, including latest for 3DTV, from both subjective and objective point of view. Nevertheless FTV brings new issues, especially because we cannot ignore the uses case, one particular interest is the affordance for navigation from one view to another how to test this? what type of artefacts could bring the technlogy? (citation not found: Bosc_2011_3)(Zhang 2014)(Lee 2011)

Navigation and FTV: generating new perceptual artefacts

here one or two pictures of multiview (texture plus depth) + time + view synthesis + transmission scheme and possibility to generate many paths in this dimension

Towards new protocols to test the effect of coding/transmission and rendering technology on quality of Navigation

EPFL/IRCCyN stress the limitation => a limited factor of QoE (Bosc 2013) (Bosc 2011)

Objective measures of quality of navigation: not yet there

Currently there is an increasing need for objective measures of the quality of navigation but not so many solutions have been devised so far. Preliminary studies have analyzed the behavior of state-of-the-art 2D full reference metrics (e.g. PSNR, SSIM) applied to FVV content. As expected, these metrics are not able to predict the MOS basically due to the presence of artifacts that are typical of this kind of content as previously explained. A detailed study of this issue is presented by the authors in [1]. In this paper the correlation between the DMOS collected by subjective experiments and the one predicted by 2D image quality metrics is analyzed. The performed analysis has highlighted that none of the considered 2D image quality metrics is able to reliably predict the MOS and that this correlation slightly increases if content characteristics are taken into account. A similar conclusion is reached in the studies carried out in [4]. In this case, the impact of the synthesis process is evaluated in stereoscopic conditions and the comparison with the monoscopic case is also addressed. The work presented by the authors in [3] includes some guidelines to be used for defining new quality metrics for 3D synthesized view assessment. In particular, an analysis of the performances of twelve 2D image quality metrics is performed on a database created by three multiview sequences processed with seven DIBR algorithms to create new viewpoints. Subjective experiments have been carried out on 43 observers in order to efficiently evaluate the ability of these metrics to predict the MOS. Also in this case the correlation is poor and the results are dependent on the sequence content. Based on the achieved results, the authors propose some improvements that could be taken into account for the design of new quality metrics. In particular the attention is focused on the location of the artifacts that are created by the synthesis process. In fact, the humans are more sensible to disocclusions especially because they appear along contours. A first attempt to apply these findings to SSIM reveals an increased correlation with the MOS. For this reason the research community started moving towards the definition of quality metrics able to take into account the characteristics of the FVV content. In [1] an image quality metric, 3DswIM, is presented for 3D synthesized views. This is a full-reference metric that takes into account DIBR-specific distortions. It is based on the analysis of statistical features extracted from the wavelet decomposition of the synthesized and original images. The performed experiments show improved performances with respect to 2D image quality metrics. The authors in [9] propose a new metric, MW-PSNR, based on a multi-resolution image decomposition to specifically address the presence of artifacts along edges. This is possible thanks to the use of non-linear morphological filters, that are able to preserve geometric information across different resolution levels. The same authors in METTI REF, present another multi-scale metric, MP-PSNR, that is based on the computation of the MSE of the considered pyramid subbands. Both metrics, result to be more correlated to the MOS than 2D image quality metrics. Another metric, 3VQM, specifically designed for 3D videos generated by DIBR is presented in METTI REF SOHL. 3VQM is defined as the combination of three distortion measures that are: spatial outliers (to take into account spatial inconsistencies), temporal outliers (to deal with temporal inconsistencies), and temporal inconsistency (to account for fast changing disparities). Also this metric presents a good correlation with the DMOS. From the analysis of the current state-of-the-art on objective metrics for FVV content, it is evident that there is a urgent need for new metrics especially designed to take into account the specificity of the artifacts that occur during the view synthesis process.

Datasets to follow up

In this context, one crucial aspect is the presence of only a small number of datasets that can be used for quality assessment in the field of FTV. More specifically ...... A list of available datasets is presented here below while a summary of the main characteristics of these datasets is in Table \ref{tab:datasets}:

  • DIBR Images (citation not found: DIBRImage): three multiview plus depth (MVD) sequences are considered: Book Arrival (1024x768 pixels, 16 cameras with 6.5 cm spacing), Lovebird1 (1024x768 pixels, 12 cameras with 3.5 cm spacing) and Newspaper (1024x768 pixels, 9 cameras with 5 cm spacing). Seven DIBR algorithms are used to create for each sequence four new viewpoints. From the created sequences, key frames are extracted and included in the dataset. Absolute Category Rating (ACR) and pair comparison have been used for Mean Opinion Score (MOS) collection.

  • DIBR Videos (citation not found: DIBRVideo): 102 video sequences of length 6s with 1024x768 pixel resolution frame rate between 15 and 30 frames per second are created. The original sequences are three multiview plus depth videos processed with 7 DIBR algorithms to generate 4 new viewpoints for each sequence. ACR-HR has been used for MOS collection;

  • MCL 3D Database (Song 2015): this database contains 693 stereoscopic image pairs. Nine image-plus-depth sources are first selected, and a DIBR technique is used to render stereoscopic image pairs. Distortions applied to either the texture image or the depth image before stereoscopic image rendering include: Gaussian blur, additive white noise, downsampling blur, JPEG and JPEG-2000 (JP2K) compression and transmission error;

  • SIAT Synthesized Video Quality Database: 10 MVD sequences and for each sequence, 14 different texture/depth quantization combinations were used to generate the texture/depth view pairs with compr