Proc. IEEE Int. Conference on Image Processing, Sep. 27-30, 2015, Quebec City, Canada.
Paper won a Top 10% Paper Award.

Full-Reference Visual Quality Assessment for Synthetic Images: A Subjective Study

Debarati Kundu and Brian L. Evans

Embedded Signal Processing Laboratory, Wireless Networking and Communications Group, The University of Texas at Austin, Austin, TX 78712 USA
debarati@utexas.edu - bevans@ece.utexas.edu

Paper Draft - Poster (PDF) - Poster (PowerPoint) - Table I: Correlation Scores - ESPL Synthetic Image Database

Abstract

Measuring visual quality, as perceived by human observers, is becoming increasingly important in the many applications in which humans are the ultimate consumers of visual information. For assessing subjective quality of natural images, such as those taken by optical cameras, significant progress has been made for several decades. To aid in the benchmarking of objective image quality assessment (IQA) algorithms, many natural image databases have been annotated with subjective ratings of the images by human observers. Similar information, however, is not readily available for synthetic images commonly found in video games and animated movies. In this paper, our primary contributions are

conducting subjective tests on our publicly available ESPL Synthetic Image Database, and
evaluating the performance of more than 20 full reference IQA algorithms for natural images on the synthetic image database.

The ESPL Synthetic Image Database contains 500 distorted images (20 distorted images for each of the 25 original images) in 1920 x 1080 format. After collecting 26000 individual human ratings, we compute the differential mean opinion score (DMOS) for each image to evaluate IQA algorithm performance.

Questions and Answers

The following is a summary of the questions that arose during the poster presentation by Debarati Kundu and her answers:

What method did you use for subjective evaluation?
We used Single Stimulus Continuous Quality Scale method. The sequence of images was randomized for every session and every subject. The testing phase was preceded by a short training phase. Even in the testing phase, some images which were at the beginning of the session were repeated (without informing the observer) at the end in order to allow the time needed for stabilization of the scores.
Why do you think Structural Similarity (SSIM) index is doing that well for your database?
This is because we have lightly distorted images compared to the other standard natural image databases like LIVE and TID. Many observers found the lightly blurred image to be more visually acceptable than the corresponding pristine image. These "inversion" of the scores led us to conjecture that visual difference does not always correspond to visual annoyance, especially for synthetic images, which are subjected to a higher degree of cinematographic processing (for animation sequences).
What differences did you observe in the statistical properties of natural and synthetic scenes?
We found that the empirical distributions of the pixels in synthetic scenes, both in spatial and transform domain can be modeled by Generalized Gaussian distributions, with some difference in the shape and scale parameters. The exact degree of this difference can be found in our 2014 Asilomar paper.

COPYRIGHT NOTICE: All the documents on this server have been submitted by their authors to scholarly journals or conferences as indicated, for the purpose of non-commercial dissemination of scientific work. The manuscripts are put on-line to facilitate this purpose. These manuscripts are copyrighted by the authors or the journals in which they were published. You may copy a manuscript for scholarly, non-commercial purposes, such as research or instruction, provided that you agree to respect these copyrights.

Last Updated 10/03/15.