Blind visual quality assessment on 360° Video based on progressive learning

Related tags

Deep LearningProVQA
Overview

Blind visual quality assessment on omnidirectional or 360 video (ProVQA)

Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and Video

This repository contains the official PyTorch implementation of the following paper:

Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and Video
Li Yang, Mai Xu, ShengXi Li, YiChen Guo and Zulin Wang (School of Electronic and Information Engineering, Beihang University)
Paper link: https://arxiv.org/abs/2111.09503
Abstract: Blind visual quality assessment (BVQA) on 360° video plays a key role in optimizing immersive multimedia systems. When assessing the quality of 360° video, human tends to perceive its quality degradation from the viewport-based spatial distortion of each spherical frame to motion artifact across adjacent frames, ending with the video-level quality score, i.e., a progressive quality assessment paradigm. However, the existing BVQA approaches for 360° video neglect this paradigm. In this paper, we take into account the progressive paradigm of human perception towards spherical video quality, and thus propose a novel BVQA approach (namely ProVQA) for 360° video via progressively learning from pixels, frames and video. Corresponding to the progressive learning of pixels, frames and video, three sub-nets are designed in our ProVQA approach, i.e., the spherical perception aware quality prediction (SPAQ), motion perception aware quality prediction (MPAQ) and multi-frame temporal non-local (MFTN) sub-nets. The SPAQ sub-net first models the spatial quality degradation based on spherical perception mechanism of human. Then, by exploiting motion cues across adjacent frames, the MPAQ sub-net properly incorporates motion contextual information for quality assessment on 360° video. Finally, the MFTN sub-net aggregates multi-frame quality degradation to yield the final quality score, via exploring long-term quality correlation from multiple frames. The experiments validate that our approach significantly advances the state-of-the-art BVQA performance on 360° video over two datasets, the code of which has been public in \url{https://github.com/yanglixiaoshen/ProVQA.}
Note: Since this paper is under review, you can first ask for the paper from me to ease the implementation of this project but you have no rights to use this paper in any purpose. Unauthorized use of this article for all activities will be investigated for legal responsibility. Contact me for accessing my paper (Email: [email protected])

Preparation

Requriments

First, download the conda environment of ProVQA from ProVQA_dependency and install my conda enviroment <envs> in Linux sys (Ubuntu 18.04+); Then, activate <envs> by running the following command:

conda env create -f ProVQA_environment.yaml

Second, install all dependencies by running the following command:

pip install -r ProVQA_environment.txt

If the above installation don't work, you can download the environment file with .tar.gz format. Then, unzip the file into a directory (e.g., pro_env) in your home directiory and activate the environment every time before you run the code.

source activate /home/xxx/pro_env

Implementation

The architecture of the proposed ProVQA is shown in the following figure, which contains four novel modules, i.e., SPAQ, MPAQ, MFTN and AQR.

Dataset

We trained our ProVQA on the large-scale 360° VQA dataset VQA-ODV, which includes 540 impaired 360° videos deriving from 60 reference 360° videos under equi-rectangular projection (ERP) (Training set: 432-Testing set:108). Besides, we also evaluate the performance of our ProVQA over 144 distorted 360° videos in BIT360 dataset.

Training the ProVQA

Our network is implemented based on the PyTorch framework, and run on two NVIDIA Tesla V100 GPUs with 32G memory. The number of sampled frames is 6 and the batch size is 3 per GPU for each iteration. The training set of VQA-ODV dataset has been packed as an LMDB file ODV-VQA_Train, which is used in our approach.

First, to run the training code as follows,

CUDA_VISIBLE_DEVICES=0,1 python ./train.py -opt ./options/train/bvqa360_240hz.yaml

Note that all the settings of dataset, training implementation and network can be found in "bvqa360_240hz.yaml". You can modify the settings to satisfy your experimental environment, for example, the dataset path should be modified to be your sever path. For the final BVQA result, we choose the trained model at iter=26400, which can be download at saved_model. Moreover, the corresponding training state can be obtained at saved_optimized_state.

Testing the ProVQA

Download the saved_model and put it in your own experimental directory. Then, run the following code for evaluating the BVQA performance over the testing set ODV-VQA_TEST. Note that all the settings of testing set, testing implementation and results can be found in "test_bvqa360_OURs.yaml". You can modify the settings to satisfy your experimental environment.

CUDA_VISIBLE_DEVICES=0 python ./test.py -opt ./options/test/test_bvqa360_OURs.yaml

The test results of predicted quality scores of all test 360° Video frames can be found in All_frame_scores and latter you should run the following code to generate the final 108 scores corresponding to 108 test 360° Videos, which can be downloaded from predicted_DMOS.

python ./evaluate.py

Evaluate BVQA performance

We have evaluate the BVQA performance for 360° Videos by 5 general metrics: PLCC, SROCC, KROCC, RMSE and MAE. we employ a 4-order logistic function for fitting the predicted quality scores to their corresponding ground truth, such that the fitted scores have the same scale as the ground truth DMOS gt_dmos. Note that the fitting procedure are conducted on our and all compared approaches. Run the code bvqa360_metric in the following command :

./bvqa360_metric1.m

As such, you can get the final results of PLCC=0.9209, SROCC=0.9236, KROCC=0.7760, RMSE=4.6165 and MAE=3.1136. The following tables shows the comparison on BVQA performance between our and other 13 approaches, over VQA-ODV and BIT360 dataset.

Tips

(1) We have summarized the information about how to run the compared algorithms in details, which can be found in the file "compareAlgoPreparation.txt".
(2) The details about the pre-processing on the ODV-VQA dataset and BIT360 dataset can be found in the file "pre_process_dataset.py".

Citation

If this repository can offer you help in your research, please cite the paper:

@misc{yang2021blind,
      title={Blind VQA on 360{\deg} Video via Progressively Learning from Pixels, Frames and Video}, 
      author={Li Yang and Mai Xu and Shengxi Li and Yichen Guo and Zulin Wang},
      year={2021},
      eprint={2111.09503},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

  1. https://github.com/xinntao/EDVR
  2. https://github.com/AlexHex7/Non-local_pytorch
  3. https://github.com/ChiWeiHsiao/SphereNet-pytorch

Please enjoy it and best wishes. Plese contact with me if you have any questions about the ProVQA approach.

My email address is 13021041[at]buaa[dot]edu[dot]cn

Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

a project built with deep convolutional neural network and ❤️ Table of Contents Motivation The Database The Model 3.1 Input Layer 3.2 Convolutional La

Jostine Ho 761 Dec 05, 2022
Trading Strategies for Freqtrade

Freqtrade Strategies Strategies for Freqtrade, developed primarily in a partnership between @werkkrew and @JimmyNixx from the Freqtrade Discord. Use t

Bryan Chain 242 Jan 07, 2023
Pure python implementation reverse-mode automatic differentiation

MiniGrad A minimal implementation of reverse-mode automatic differentiation (a.k.a. autograd / backpropagation) in pure Python. Inspired by Andrej Kar

Kenny Song 76 Sep 12, 2022
PAIRED in PyTorch 🔥

PAIRED This codebase provides a PyTorch implementation of Protagonist Antagonist Induced Regret Environment Design (PAIRED), which was first introduce

UCL DARK Lab 46 Dec 12, 2022
JupyterLite demo deployed to GitHub Pages 🚀

JupyterLite Demo JupyterLite deployed as a static site to GitHub Pages, for demo purposes. ✨ Try it in your browser ✨ ➡️ https://jupyterlite.github.io

JupyterLite 223 Jan 04, 2023
Neural Caption Generator with Attention

Neural Caption Generator with Attention Tensorflow implementation of "Show

Taeksoo Kim 510 Nov 30, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022
A curated list of resources for Image and Video Deblurring

A curated list of resources for Image and Video Deblurring

Subeesh Vasu 1.7k Jan 01, 2023
Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

TailCalibX : Feature Generation for Long-tail Classification by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi [arXiv] [

Rahul Vigneswaran 34 Jan 02, 2023
Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

JAX + Attention Learn To Solve Routing Problems Reinplementation of the paper Attention, Learn to Solve Routing Problems! using Jax and Flax. Fully su

Gabriela Surita 7 Dec 01, 2022
Continual Learning of Electronic Health Records (EHR).

Continual Learning of Longitudinal Health Records Repo for reproducing the experiments in Continual Learning of Longitudinal Health Records (2021). Re

Jacob 7 Oct 21, 2022
FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

FairEdit Relevent Publication FairEdit: Preserving Fairness in Graph Neural Networks through Greedy Graph Editing

5 Feb 04, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 01, 2022
A task Provided by A respective Artenal Ai and Ml based Company to complete it

A task Provided by A respective Alternal Ai and Ml based Company to complete it .

Parth Madan 1 Jan 25, 2022
CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

CIFAR-10_train-test - training and testing codes for dataset CIFAR-10

Frederick Wang 3 Apr 26, 2022
Code Repository for The Kaggle Book, Published by Packt Publishing

The Kaggle Book Data analysis and machine learning for competitive data science Code Repository for The Kaggle Book, Published by Packt Publishing "Lu

Packt 1.6k Jan 07, 2023
Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction

Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction. arxiv This repository contains python scripts for tr

12 Dec 12, 2022
Second-order Attention Network for Single Image Super-resolution (CVPR-2019)

Second-order Attention Network for Single Image Super-resolution (CVPR-2019) "Second-order Attention Network for Single Image Super-resolution" is pub

516 Dec 28, 2022
PyTorch implementation of our paper How robust are discriminatively trained zero-shot learning models?

How robust are discriminatively trained zero-shot learning models? This repository contains the PyTorch implementation of our paper How robust are dis

Mehmet Kerim Yucel 5 Feb 04, 2022
GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications

GPOEO GPOEO is a micro-intrusive GPU online energy optimization framework for iterative applications. We also implement ODPP [1] as a comparison. [1]

瑞雪轻飏 8 Sep 10, 2022