Bridging Vision and Language Model

Related tags

Deep LearningBriVL
Overview

BriVL

BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果,超过了同期其他常见的多模态预训练模型(例如UNITER、CLIP)。

BriVL论文:WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

适用场景

适用场景示例:图像检索文本、文本检索图像、图像标注、图像零样本分类、作为其他下游多模态任务的输入特征等。

技术特色

  1. BriVL使用对比学习算法将图像和文本映射到了同一特征空间,可用于弥补图像特征和文本特征之间存在的隔阂。
  2. 基于视觉-语言弱相关的假设,除了能理解对图像的描述性文本外,也可以捕捉图像和文本之间存在的抽象联系。
  3. 图像编码器和文本编码器可分别独立运行,有利于实际生产环境中的部署。

下载专区

模型 语言 参数量(单位:亿) 文件(file)
BriVL-1.0 中文 10亿 BriVL-1.0-5500w.tar

使用BriVL

搭建环境

# 环境要求
lmdb==0.99
timm==0.4.12
easydict==1.9
pandas==1.2.4
jsonlines==2.0.0
tqdm==4.60.0
torchvision==0.9.1
numpy==1.20.2
torch==1.8.1
transformers==4.5.1
msgpack_numpy==0.4.7.1
msgpack_python==0.5.6
Pillow==8.3.1
PyYAML==5.4.1

配置要求在requirements.txt中,可使用下面的命令:

pip install -r requirements.txt

特征提取与计算检索结果

cd evaluation/
bash test_xyb.sh

数据解释

现已放入3个图文对示例:

./data/imgs  # 放入图像
./data/jsonls # 放入图文对描述

引用BriVL

@article{DBLP:journals/corr/abs-2103-06561,
  author    = {Yuqi Huo and
               Manli Zhang and
               Guangzhen Liu and
               Haoyu Lu and
               Yizhao Gao and
               Guoxing Yang and
               Jingyuan Wen and
               Heng Zhang and
               Baogui Xu and
               Weihao Zheng and
               Zongzheng Xi and
               Yueqian Yang and
               Anwen Hu and
               Jinming Zhao and
               Ruichen Li and
               Yida Zhao and
               Liang Zhang and
               Yuqing Song and
               Xin Hong and
               Wanqing Cui and
               Dan Yang Hou and
               Yingyan Li and
               Junyi Li and
               Peiyu Liu and
               Zheng Gong and
               Chuhao Jin and
               Yuchong Sun and
               Shizhe Chen and
               Zhiwu Lu and
               Zhicheng Dou and
               Qin Jin and
               Yanyan Lan and
               Wayne Xin Zhao and
               Ruihua Song and
               Ji{-}Rong Wen},
  title     = {WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training},
  journal   = {CoRR},
  volume    = {abs/2103.06561},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.06561},
  archivePrefix = {arXiv},
  eprint    = {2103.06561},
  timestamp = {Tue, 03 Aug 2021 12:35:30 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-06561.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Owner
Wudao is a large-scale pre-training model project initiated by BAAI, aiming to break through the core technology and promote the development of AGI.
This is an implementation of PIFuhd based on Pytorch

Open-PIFuhd This is a unofficial implementation of PIFuhd PIFuHD: Multi-Level Pixel-Aligned Implicit Function forHigh-Resolution 3D Human Digitization

Lingteng Qiu 235 Dec 19, 2022
Repository for the AugmentedPCA Python package.

Overview This Python package provides implementations of Augmented Principal Component Analysis (AugmentedPCA) - a family of linear factor models that

Billy Carson 6 Dec 07, 2022
Real Time Object Detection and Classification using Yolo Algorithm.

Real time Object detection & Classification using YOLO algorithm. Real Time Object Detection and Classification using Yolo Algorithm. What is Object D

Ketan Chawla 1 Apr 17, 2022
Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)

Skyformer This repository is the official implementation of Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr"om Method (NeurIPS 2021).

Qi Zeng 46 Sep 20, 2022
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 03, 2023
Who calls the shots? Rethinking Few-Shot Learning for Audio (WASPAA 2021)

rethink-audio-fsl This repo contains the source code for the paper "Who calls the shots? Rethinking Few-Shot Learning for Audio." (WASPAA 2021) Table

Yu Wang 34 Dec 24, 2022
Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Segformer - Pytorch Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch. Install $ pip install segformer-pytorch

Phil Wang 208 Dec 25, 2022
Invert and perturb GAN images for test-time ensembling

GAN Ensembling Project Page | Paper | Bibtex Ensembling with Deep Generative Views. Lucy Chai, Jun-Yan Zhu, Eli Shechtman, Phillip Isola, Richard Zhan

Lucy Chai 93 Dec 08, 2022
A collection of resources and papers on Diffusion Models, a darkhorse in the field of Generative Models

This repository contains a collection of resources and papers on Diffusion Models and Score-based Models. If there are any missing valuable resources

5.1k Jan 08, 2023
Implementation of Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

acLSTM_motion This folder contains an implementation of acRNN for the CMU motion database written in Pytorch. See the following links for more backgro

Yi_Zhou 61 Sep 07, 2022
Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation [OpenReview] [arXiv] [Code] The official implementation of GeoDiff: A Geome

Minkai Xu 155 Dec 26, 2022
PFLD pytorch Implementation

PFLD-pytorch Implementation of PFLD A Practical Facial Landmark Detector by pytorch. 1. install requirements pip3 install -r requirements.txt 2. Datas

zhaozhichao 669 Jan 02, 2023
An end-to-end project on customer segmentation

End-to-end Customer Segmentation Project Note: This project is in progress. Tools Used in This Project Prefect: Orchestrate workflows hydra: Manage co

Ocelot Consulting 8 Oct 06, 2022
Introduction to Statistics and Basics of Mathematics for Data Science - The Hacker's Way

HackerMath for Machine Learning “Study hard what interests you the most in the most undisciplined, irreverent and original manner possible.” ― Richard

Amit Kapoor 1.4k Dec 22, 2022
Software Platform for solving and manipulating multiparametric programs in Python

PPOPT Python Parametric OPtimization Toolbox (PPOPT) is a software platform for solving and manipulating multiparametric programs in Python. This pack

10 Sep 13, 2022
For storing the complete exploration of Visual Question Answering for our B.Tech Project

Multi-Image vqa @authors: Akhilesh, Janhavi, Harsh Paper summary, Ideas tried and their corresponding results: on wiki Other discussions: on discussio

Harsh Raj 3 Jun 16, 2022
Equivariant layers for RC-complement symmetry in DNA sequence data

Equi-RC Equivariant layers for RC-complement symmetry in DNA sequence data This is a repository that implements the layers as described in "Reverse-Co

7 May 19, 2022
The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Voxelator The all new way to turn your boring vector meshes into the new fad in town; Voxels! Notes: I have not tested this on a rotated mesh. With fu

6 Feb 03, 2022
PyTorch implementation of our method for adversarial attacks and defenses in hyperspectral image classification.

Self-Attention Context Network for Hyperspectral Image Classification PyTorch implementation of our method for adversarial attacks and defenses in hyp

22 Dec 02, 2022
A Quick and Dirty Progressive Neural Network written in TensorFlow.

prog_nn .▄▄ · ▄· ▄▌ ▐ ▄ ▄▄▄· ▐ ▄ ▐█ ▀. ▐█▪██▌•█▌▐█▐█ ▄█▪ •█▌▐█ ▄▀▀▀█▄▐█▌▐█▪▐█▐▐▌ ██▀

SynPon 53 Dec 12, 2022