Bridging Vision and Language Model

Related tags

Deep LearningBriVL
Overview

BriVL

BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果,超过了同期其他常见的多模态预训练模型(例如UNITER、CLIP)。

BriVL论文:WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training

适用场景

适用场景示例:图像检索文本、文本检索图像、图像标注、图像零样本分类、作为其他下游多模态任务的输入特征等。

技术特色

  1. BriVL使用对比学习算法将图像和文本映射到了同一特征空间,可用于弥补图像特征和文本特征之间存在的隔阂。
  2. 基于视觉-语言弱相关的假设,除了能理解对图像的描述性文本外,也可以捕捉图像和文本之间存在的抽象联系。
  3. 图像编码器和文本编码器可分别独立运行,有利于实际生产环境中的部署。

下载专区

模型 语言 参数量(单位:亿) 文件(file)
BriVL-1.0 中文 10亿 BriVL-1.0-5500w.tar

使用BriVL

搭建环境

# 环境要求
lmdb==0.99
timm==0.4.12
easydict==1.9
pandas==1.2.4
jsonlines==2.0.0
tqdm==4.60.0
torchvision==0.9.1
numpy==1.20.2
torch==1.8.1
transformers==4.5.1
msgpack_numpy==0.4.7.1
msgpack_python==0.5.6
Pillow==8.3.1
PyYAML==5.4.1

配置要求在requirements.txt中,可使用下面的命令:

pip install -r requirements.txt

特征提取与计算检索结果

cd evaluation/
bash test_xyb.sh

数据解释

现已放入3个图文对示例:

./data/imgs  # 放入图像
./data/jsonls # 放入图文对描述

引用BriVL

@article{DBLP:journals/corr/abs-2103-06561,
  author    = {Yuqi Huo and
               Manli Zhang and
               Guangzhen Liu and
               Haoyu Lu and
               Yizhao Gao and
               Guoxing Yang and
               Jingyuan Wen and
               Heng Zhang and
               Baogui Xu and
               Weihao Zheng and
               Zongzheng Xi and
               Yueqian Yang and
               Anwen Hu and
               Jinming Zhao and
               Ruichen Li and
               Yida Zhao and
               Liang Zhang and
               Yuqing Song and
               Xin Hong and
               Wanqing Cui and
               Dan Yang Hou and
               Yingyan Li and
               Junyi Li and
               Peiyu Liu and
               Zheng Gong and
               Chuhao Jin and
               Yuchong Sun and
               Shizhe Chen and
               Zhiwu Lu and
               Zhicheng Dou and
               Qin Jin and
               Yanyan Lan and
               Wayne Xin Zhao and
               Ruihua Song and
               Ji{-}Rong Wen},
  title     = {WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training},
  journal   = {CoRR},
  volume    = {abs/2103.06561},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.06561},
  archivePrefix = {arXiv},
  eprint    = {2103.06561},
  timestamp = {Tue, 03 Aug 2021 12:35:30 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-06561.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Owner
Wudao is a large-scale pre-training model project initiated by BAAI, aiming to break through the core technology and promote the development of AGI.
Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification

Pytorch Implementation of Adversarial Deep Network Embedding for Cross-Network Node Classification (ACDNE) This is a pytorch implementation of the Adv

陈志豪 8 Oct 13, 2022
Pythonic particle-based (super-droplet) warm-rain/aqueous-chemistry cloud microphysics package with box, parcel & 1D/2D prescribed-flow examples in Python, Julia and Matlab

PySDM PySDM is a package for simulating the dynamics of population of particles. It is intended to serve as a building block for simulation systems mo

Atmospheric Cloud Simulation Group @ Jagiellonian University 32 Oct 18, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Optimization for Oriented Object Detection via Representation Invariance Loss By Qi Ming, Zhiqiang Zhou, Lingjuan Miao, Xue Yang, and Yunpeng Dong. Th

ming71 56 Nov 28, 2022
Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION This is the official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSU

电线杆 14 Dec 15, 2022
Scripts used to make and evaluate OpenAlex's concept tagging model

openalex-concept-tagging This repository contains all of the code for getting the concept tagger up and running. To learn more about where this model

OurResearch 18 Dec 09, 2022
Fiddle is a Python-first configuration library particularly well suited to ML applications.

Fiddle Fiddle is a Python-first configuration library particularly well suited to ML applications. Fiddle enables deep configurability of parameters i

Google 227 Dec 26, 2022
This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

JigsawClustering Jigsaw Clustering for Unsupervised Visual Representation Learning Pengguang Chen, Shu Liu, Jiaya Jia Introduction This project provid

DV Lab 73 Sep 18, 2022
Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

LPTN Paper | Supplementary Material | Poster High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Ji

372 Dec 26, 2022
An implementation of an abstract algebra for music tones (pitches).

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

Open Music Kit 0 Oct 10, 2022
CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

CrossMLP Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation Bin Ren1, Hao Tang2, Nicu Sebe1. 1University of Trento, Italy, 2ETH, Switzerla

Bingoren 16 Jul 27, 2022
The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

Wenhao Wang 89 Jan 02, 2023
Deep Learning Visuals contains 215 unique images divided in 23 categories

Deep Learning Visuals contains 215 unique images divided in 23 categories (some images may appear in more than one category). All the images were originally published in my book "Deep Learning with P

Daniel Voigt Godoy 1.3k Dec 28, 2022
Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes

Lepard: Learning Partial point cloud matching in Rigid and Deformable scenes [Paper] Method overview 4DMatch Benchmark 4DMatch is a benchmark for matc

103 Jan 06, 2023
Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

DHF1K =========================================================================== Wenguan Wang, J. Shen, M.-M Cheng and A. Borji, Revisiting Video Sal

Wenguan Wang 126 Dec 03, 2022
Simulation-based inference for the Galactic Center Excess

Simulation-based inference for the Galactic Center Excess Siddharth Mishra-Sharma and Kyle Cranmer Abstract The nature of the Fermi gamma-ray Galactic

Siddharth Mishra-Sharma 3 Jan 21, 2022
BraTs-VNet - BraTS(Brain Tumour Segmentation) using V-Net

BraTS(Brain Tumour Segmentation) using V-Net This project is an approach to dete

Rituraj Dutta 7 Nov 27, 2022
A Python package for generating concise, high-quality summaries of a probability distribution

GoodPoints A Python package for generating concise, high-quality summaries of a probability distribution GoodPoints is a collection of tools for compr

Microsoft 28 Oct 10, 2022
CTF challenges from redpwnCTF 2021

redpwnCTF 2021 Challenges This repository contains challenges from redpwnCTF 2021 in the rCDS format; challenge information is in the challenge.yaml f

redpwn 27 Dec 07, 2022
Scalable implementation of Lee / Mykland (2012) and Ait-Sahalia / Jacod (2012) Jump tests for noisy high frequency data

JumpDetectR Name of QuantLet : JumpDetectR Published in : 'To be published as "Jump dynamics in high frequency crypto markets"' Description : 'Scala

LvB 12 Jan 01, 2023