Blind visual quality assessment on 360° Video based on progressive learning

Related tags

Deep LearningProVQA
Overview

Blind visual quality assessment on omnidirectional or 360 video (ProVQA)

Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and Video

This repository contains the official PyTorch implementation of the following paper:

Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and Video
Li Yang, Mai Xu, ShengXi Li, YiChen Guo and Zulin Wang (School of Electronic and Information Engineering, Beihang University)
Paper link: https://arxiv.org/abs/2111.09503
Abstract: Blind visual quality assessment (BVQA) on 360° video plays a key role in optimizing immersive multimedia systems. When assessing the quality of 360° video, human tends to perceive its quality degradation from the viewport-based spatial distortion of each spherical frame to motion artifact across adjacent frames, ending with the video-level quality score, i.e., a progressive quality assessment paradigm. However, the existing BVQA approaches for 360° video neglect this paradigm. In this paper, we take into account the progressive paradigm of human perception towards spherical video quality, and thus propose a novel BVQA approach (namely ProVQA) for 360° video via progressively learning from pixels, frames and video. Corresponding to the progressive learning of pixels, frames and video, three sub-nets are designed in our ProVQA approach, i.e., the spherical perception aware quality prediction (SPAQ), motion perception aware quality prediction (MPAQ) and multi-frame temporal non-local (MFTN) sub-nets. The SPAQ sub-net first models the spatial quality degradation based on spherical perception mechanism of human. Then, by exploiting motion cues across adjacent frames, the MPAQ sub-net properly incorporates motion contextual information for quality assessment on 360° video. Finally, the MFTN sub-net aggregates multi-frame quality degradation to yield the final quality score, via exploring long-term quality correlation from multiple frames. The experiments validate that our approach significantly advances the state-of-the-art BVQA performance on 360° video over two datasets, the code of which has been public in \url{https://github.com/yanglixiaoshen/ProVQA.}
Note: Since this paper is under review, you can first ask for the paper from me to ease the implementation of this project but you have no rights to use this paper in any purpose. Unauthorized use of this article for all activities will be investigated for legal responsibility. Contact me for accessing my paper (Email: [email protected])

Preparation

Requriments

First, download the conda environment of ProVQA from ProVQA_dependency and install my conda enviroment <envs> in Linux sys (Ubuntu 18.04+); Then, activate <envs> by running the following command:

conda env create -f ProVQA_environment.yaml

Second, install all dependencies by running the following command:

pip install -r ProVQA_environment.txt

If the above installation don't work, you can download the environment file with .tar.gz format. Then, unzip the file into a directory (e.g., pro_env) in your home directiory and activate the environment every time before you run the code.

source activate /home/xxx/pro_env

Implementation

The architecture of the proposed ProVQA is shown in the following figure, which contains four novel modules, i.e., SPAQ, MPAQ, MFTN and AQR.

Dataset

We trained our ProVQA on the large-scale 360° VQA dataset VQA-ODV, which includes 540 impaired 360° videos deriving from 60 reference 360° videos under equi-rectangular projection (ERP) (Training set: 432-Testing set:108). Besides, we also evaluate the performance of our ProVQA over 144 distorted 360° videos in BIT360 dataset.

Training the ProVQA

Our network is implemented based on the PyTorch framework, and run on two NVIDIA Tesla V100 GPUs with 32G memory. The number of sampled frames is 6 and the batch size is 3 per GPU for each iteration. The training set of VQA-ODV dataset has been packed as an LMDB file ODV-VQA_Train, which is used in our approach.

First, to run the training code as follows,

CUDA_VISIBLE_DEVICES=0,1 python ./train.py -opt ./options/train/bvqa360_240hz.yaml

Note that all the settings of dataset, training implementation and network can be found in "bvqa360_240hz.yaml". You can modify the settings to satisfy your experimental environment, for example, the dataset path should be modified to be your sever path. For the final BVQA result, we choose the trained model at iter=26400, which can be download at saved_model. Moreover, the corresponding training state can be obtained at saved_optimized_state.

Testing the ProVQA

Download the saved_model and put it in your own experimental directory. Then, run the following code for evaluating the BVQA performance over the testing set ODV-VQA_TEST. Note that all the settings of testing set, testing implementation and results can be found in "test_bvqa360_OURs.yaml". You can modify the settings to satisfy your experimental environment.

CUDA_VISIBLE_DEVICES=0 python ./test.py -opt ./options/test/test_bvqa360_OURs.yaml

The test results of predicted quality scores of all test 360° Video frames can be found in All_frame_scores and latter you should run the following code to generate the final 108 scores corresponding to 108 test 360° Videos, which can be downloaded from predicted_DMOS.

python ./evaluate.py

Evaluate BVQA performance

We have evaluate the BVQA performance for 360° Videos by 5 general metrics: PLCC, SROCC, KROCC, RMSE and MAE. we employ a 4-order logistic function for fitting the predicted quality scores to their corresponding ground truth, such that the fitted scores have the same scale as the ground truth DMOS gt_dmos. Note that the fitting procedure are conducted on our and all compared approaches. Run the code bvqa360_metric in the following command :

./bvqa360_metric1.m

As such, you can get the final results of PLCC=0.9209, SROCC=0.9236, KROCC=0.7760, RMSE=4.6165 and MAE=3.1136. The following tables shows the comparison on BVQA performance between our and other 13 approaches, over VQA-ODV and BIT360 dataset.

Tips

(1) We have summarized the information about how to run the compared algorithms in details, which can be found in the file "compareAlgoPreparation.txt".
(2) The details about the pre-processing on the ODV-VQA dataset and BIT360 dataset can be found in the file "pre_process_dataset.py".

Citation

If this repository can offer you help in your research, please cite the paper:

@misc{yang2021blind,
      title={Blind VQA on 360{\deg} Video via Progressively Learning from Pixels, Frames and Video}, 
      author={Li Yang and Mai Xu and Shengxi Li and Yichen Guo and Zulin Wang},
      year={2021},
      eprint={2111.09503},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

  1. https://github.com/xinntao/EDVR
  2. https://github.com/AlexHex7/Non-local_pytorch
  3. https://github.com/ChiWeiHsiao/SphereNet-pytorch

Please enjoy it and best wishes. Plese contact with me if you have any questions about the ProVQA approach.

My email address is 13021041[at]buaa[dot]edu[dot]cn

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 163 Dec 26, 2022
SAMO: Streaming Architecture Mapping Optimisation

SAMO: Streaming Architecture Mapping Optimiser The SAMO framework provides a method of optimising the mapping of a Convolutional Neural Network model

Alexander Montgomerie-Corcoran 20 Dec 10, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 08, 2023
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
Simple image captioning model - CLIP prefix captioning.

CLIP prefix captioning. Inference Notebook: 🥳 New: 🥳 Our technical papar is finally out! Official implementation for the paper "ClipCap: CLIP Prefix

688 Jan 04, 2023
Myia prototyping

Myia Myia is a new differentiable programming language. It aims to support large scale high performance computations (e.g. linear algebra) and their g

Mila 456 Nov 07, 2022
Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

GAN stability This repository contains the experiments in the supplementary material for the paper Which Training Methods for GANs do actually Converg

Lars Mescheder 885 Jan 01, 2023
This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

Ad^2Attack:Adaptive Adversarial Attack on Real-Time UAV Tracking Demo video 📹 Our video on bilibili demonstrates the test results of Ad^2Attack on se

Intelligent Vision for Robotics in Complex Environment 10 Nov 07, 2022
Animatable Neural Radiance Fields for Modeling Dynamic Human Bodies

To make the comparison with Animatable NeRF easier on the Human3.6M dataset, we save the quantitative results at here, which also contains the results of other methods, including Neural Body, D-NeRF,

ZJU3DV 359 Jan 08, 2023
ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data

ARKitScenes This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D

Apple 371 Jan 05, 2023
This program writes christmas wish programmatically. It is using turtle as a pen pointer draw christmas trees and stars.

Introduction This is a simple program is written in python and turtle library. The objective of this program is to wish merry Christmas programmatical

Gunarakulan Gunaretnam 1 Dec 25, 2021
Pytorch implementation of

EfficientTTS Unofficial Pytorch implementation of "EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture"(arXiv). Disclaimer: Somebo

Liu Songxiang 109 Nov 16, 2022
Official implementation of Protected Attribute Suppression System, ICCV 2021

Official implementation of Protected Attribute Suppression System, ICCV 2021

Prithviraj Dhar 6 Jan 01, 2023
PyTorch 1.0 inference in C++ on Windows10 platforms

Serving PyTorch Models in C++ on Windows10 platforms How to use Prepare Data examples/data/train/ - 0 - 1 . . . - n examples/data/test/

Henson 88 Oct 15, 2022
The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Generative Occupancy Fields for 3D Surface-Aware Image Synthesis (NeurIPS 2021) Project Page | Paper Xudong Xu, Xingang Pan, Dahua Lin and Bo Dai GOF

xuxudong 97 Nov 10, 2022
DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

Denys Rozumnyi 139 Dec 26, 2022
MohammadReza Sharifi 27 Dec 13, 2022
Finding Donors for CharityML

Finding-Donors-for-CharityML - Investigated factors that affect the likelihood of charity donations being made based on real census data.

Moamen Abdelkawy 1 Dec 30, 2021
Contains a bunch of different python programm tasks

py_tasks Contains a bunch of different python programm tasks Armstrong.py - calculate Armsrong numbers in range from 0 to n with / without cache and c

Dmitry Chmerenko 1 Dec 17, 2021