A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

Related tags

Deep Learningbrave
Overview

BraVe

This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

The model provided in this package was implemented based on the internal model that was used to compute results for the accompanying paper. It achieves comparable results on the evaluation tasks when evaluated side-by-side. Not all details are guaranteed to be identical though, and some results may differ from those given in the paper. In particular, this implementation does not provide the option to train with optical flow.

We provide a selection of pretrained checkpoints in the table below, which can directly be evaluated against HMDB 51 with the evaluation tools this package. These are exactly the checkpoints that were used to provide the numbers in the accompanying paper, and were not trained with the exact trainer given in this package. For details on training a model with this package, please see the end of this readme.

In the table below, the different configurations are represented by using e.g. V/A for video (narrow view) to audio (broad view), or V/F for a narrow view containing video, and a broad view containing optical flow.

The backbone in each case is TSMResnet, with a given width multiplier (please see the accompanying paper for further details). For all of the given numbers below, the SVM regularization constant used is 0.0001. For HMDB 51, the average is given in brackets, followed by the top-1 percentages for each of the splits.

Views Architecture HMDB51 UCF-101 K600 Trained with this package Checkpoint
V/AF TSM (1X) (69.2%) 71.307%, 68.497%, 67.843% 92.9% 69.2% download
V/AF TSM (2X) (69.9%) 72.157%, 68.432%, 69.02% 93.2% 70.2% download
V/A TSM (1X) (69.4%) 70.131%, 68.889%, 69.085% 93.0% 70.6% download
V/VVV TSM (1X) (65.4%) 66.797%, 63.856%, 65.425% 92.6% 70.8% download

Reproducing results from the paper

This package provides everything needed to evaluate the above checkpoints against HMDB 51. It supports Python 3.7 and above.

To get started, we recommend using a clean virtualenv. You may then install the brave package directly from GitHub using,

pip install git+https://github.com/deepmind/brave.git

A pre-processed version of the HMDB 51 dataset can be downloaded using the following command. It requires that both ffmpeg and unrar are available. The following will download the dataset to /tmp/hmdb51/, but any other location would also work.

  python -m brave.download_hmdb --output_dir /tmp/hmdb51/

To evaluate a checkpoint downloaded from the above table, the following may be used. The dataset shards arguments should be set to match the paths used above.

  python -m brave.evaluate_video_embeddings \
    --checkpoint_path <path/to/downloaded/checkpoint>.npy \
    --train_dataset_shards '/tmp/hmdb51/split_1/train/*' \
    --test_dataset_shards '/tmp/hmdb51/split_1/test/*' \
    --svm_regularization 0.0001 \
    --batch_size 8

Note that any of the three splits can be evaluated by changing the dataset split paths. To run this efficiently using a GPU, it is also necessary to install the correct version of jaxlib. To install jaxlib with support for cuda 10.1 on linux, the following install should be sufficient, though other precompiled packages may be found through the JAX documentation.

  pip install https://storage.googleapis.com/jax-releases/cuda101/jaxlib-0.1.69+cuda101-cp39-none-manylinux2010_x86_64.whl

Depending on the available GPU memory available, the batch_size parameter may be tuned to obtain better performance, or to reduce the required GPU memory.

Training a network

This package may also be used to train a model from scratch using jaxline. In order to try this, first ensure the configuration is set appropriately by modifying brave/config.py. At minimum, it would also be necessary to choose an appropriate global batch size (by default, the setting of 512 is likely too large for any single-machine training setup). In addition, a value must be set for dataset_shards. This should contain the paths of the tfrecord files containing the serialized training data.

For details on checkpointing and distributing computation, see the jaxline documentation.

Similarly to above, it is necessary to install the correct jaxlib package to enable training on a GPU.

The training may now be launched using,

  python -m brave.experiment --config=brave/config.py

Training datasets

This model is able to read data stored in the format specified by DMVR. For an example of writing training data in the correct format see the code in dataset/fixtures.py, which is used to write the test fixtures used in the tests for this package.

Running the tests

After checking out this code locally, you may run the package tests using

  pip install -e .
  pytest brave

We recommend doing this from a clean virtual environment.

Citing this work

If you use this code (or any derived code), data or these models in your work, please cite the relevant accompanying paper.

@misc{recasens2021broaden,
      title={Broaden Your Views for Self-Supervised Video Learning},
      author={Adrià Recasens and Pauline Luc and Jean-Baptiste Alayrac and Luyu Wang and Ross Hemsley and Florian Strub and Corentin Tallec and Mateusz Malinowski and Viorica Patraucean and Florent Altché and Michal Valko and Jean-Bastien Grill and Aäron van den Oord and Andrew Zisserman},
      year={2021},
      eprint={2103.16559},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Disclaimer

This is not an official Google product

Owner
DeepMind
DeepMind
This porject is intented to build the most accurate model for predicting the porbability of loan default

Estimating-Loan-Default-Probability IBA ML2 Mid-project / Kaggle Competition This porject is intented to build the most accurate model for predicting

Adil Gahramanov 1 Jan 24, 2022
Download and preprocess popular sequential recommendation datasets

Sequential Recommendation Datasets This repository collects some commonly used sequential recommendation datasets in recent research papers and provid

125 Dec 06, 2022
Tensorflow implementation and notebooks for Implicit Maximum Likelihood Estimation

tf-imle Tensorflow 2 and PyTorch implementation and Jupyter notebooks for Implicit Maximum Likelihood Estimation (I-MLE) proposed in the NeurIPS 2021

NEC Laboratories Europe 69 Dec 13, 2022
Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

Jianfei Guo 239 Dec 22, 2022
Table-Extractor 表格抽取

(t)able-(ex)tractor 本项目旨在实现pdf表格抽取。 Models 版面分析模块(Yolo) 表格结构抽取(ResNet + Transformer) 文字识别模块(CRNN + CTC Loss) Acknowledgements TableMaster attention-i

2 Jan 15, 2022
This app is a simple example of using Strealit to create a financial data web app.

Streamlit Demo: Finance Chart This app is a simple example of using Streamlit to create a financial data web app. This demo use streamlit, pandas and

91 Jan 02, 2023
This repo is customed for VisDrone.

Object Detection for VisDrone(无人机航拍图像目标检测) My environment 1、Windows10 (Linux available) 2、tensorflow = 1.12.0 3、python3.6 (anaconda) 4、cv2 5、ensemble

53 Jul 17, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This is a Python package available on PyPI for NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pyto

Artit 'Art' Wangperawong 5 Sep 29, 2021
Unofficial Implementation of MLP-Mixer in TensorFlow

mlp-mixer-tf Unofficial Implementation of MLP-Mixer [abs, pdf] in TensorFlow. Note: This project may have some bugs in it. I'm still learning how to i

Rishabh Anand 24 Mar 23, 2022
Meta Representation Transformation for Low-resource Cross-lingual Learning

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning This repo hosts the code for MetaXL, published at NAACL 2021. [Meta

Microsoft 36 Aug 17, 2022
Texture mapping with variational auto-encoders

vae-textures This is an experiment with using variational autoencoders (VAEs) to perform mesh parameterization. This was also my first project using J

Alex Nichol 41 May 24, 2022
Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [PaddlePaddle Implementation] Homepage of paper: Paint Transformer: Fee

442 Dec 16, 2022
PyTorch implementation of the cross-modality generative model that synthesizes dance from music.

Dancing to Music PyTorch implementation of the cross-modality generative model that synthesizes dance from music. Paper Hsin-Ying Lee, Xiaodong Yang,

NVIDIA Research Projects 485 Dec 26, 2022
CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning This repository contains the code and relevant instructions

XiaoMing 5 Aug 19, 2022
Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV)

BayesOpt-LV Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV) About This repository contains the s

1 Nov 11, 2021
Tiny Kinetics-400 for test

Kinetics-400迷你数据集 English | 简体中文 该数据集旨在解决的问题:参照Kinetics-400数据格式,训练基于自己数据的视频理解模型。 数据集介绍 Kinetics-400是视频领域benchmark常用数据集,详细介绍可以参考其官方网站Kinetics。整个数据集包含40

38 Jan 06, 2023
NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring

NudeNet: Neural Nets for Nudity Classification, Detection and selective censoring Uncensored version of the following image can be found at https://i.

notAI.tech 1.1k Dec 29, 2022
Landmarks Recogntion Web application using Streamlit.

Landmark Recognition Web-App using Streamlit Watch Tutorial for this project Source Trained model landmarks_classifier_asia_V1/1 is taken from the Ten

Kushal Bhavsar 5 Dec 12, 2022
CAST: Character labeling in Animation using Self-supervision by Tracking

CAST: Character labeling in Animation using Self-supervision by Tracking (Published as a conference paper at EuroGraphics 2022) Note: The CAST paper c

15 Nov 18, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022