CLIPfa: Connecting Farsi Text and Images

Overview

CLIPfa: Connecting Farsi Text and Images

OpenAI released the paper Learning Transferable Visual Models From Natural Language Supervision in which they present the CLIP (Contrastive Language–Image Pre-training) model. This model is trained to connect text and images, by matching their corresponding vector representations using a contrastive learning objective. CLIP consists of two separate models, a vision encoder and a text encoder. These were trained on a wooping 400 Million images and corresponding captions. We have trained a Farsi (Persian) version of OpenAI's CLIP on a dataset of 400,000 (image, text) pairs. We used Farahani's RoBERTa-fa as the text encoder and ‍‍ViT‍ as the vision encoder from Original CLIP and finetuned them.

CLIPfa image

It should be noted that only 400K pairs were used for this training, whereas 4 million pairs were used for the Original CLIP. Also, the training took 30 days across 592 GPUs powered by the V100 chip.

How to use?

Both models generate vectors with 768 dimensions.

from transformers import CLIPVisionModel, RobertaModel, AutoTokenizer, CLIPFeatureExtractor
# download pre-trained models
vision_encoder = CLIPVisionModel.from_pretrained('SajjadAyoubi/clip-fa-vision')
preprocessor = CLIPFeatureExtractor.from_pretrained('SajjadAyoubi/clip-fa-vision')
text_encoder = RobertaModel.from_pretrained('SajjadAyoubi/clip-fa-text')
tokenizer = AutoTokenizer.from_pretrained('SajjadAyoubi/clip-fa-text')
# define input image and input text
text = 'something'
image = PIL.Image.open('my_favorite_image.jpg')
# compute embeddings
text_embedding = text_encoder(**tokenizer(text, return_tensors='pt')).pooler_output
image_embedding = vision_encoder(**preprocessor(image, return_tensors='pt')).pooler_output
text_embedding.shape == image_embedding.shape

Demo:

The followings are just some use cases of CLIPfa on 25K Unsplash images

  • use pip install -q git+https://github.com/sajjjadayobi/clipfa.git
from clipfa import CLIPDemo
demo = CLIPDemo(vision_encoder, text_encoder, tokenizer)
demo.compute_text_embeddings(['گاو' ,'اسب' ,'ماهی'])
demo.compute_image_embeddings(test_df.image_path.to_list())

Image Search:

demo.image_search(query='غروب خورشید')

demo.image_search(query='جنگل در زمستان برفی')

Analogy:

demo.anology('sunset.jpg', additional_text='دریا')

demo.anology('sunset.jpg', additional_text='برف')

Zero Shot Image Classification:

demo.zero_shot(image_path='apples.jpg')
  • Provided labels with their probability for each image.
گاو:36 , ماهی:22, اسب:42 گاو:41 , ماهی:23, اسب:36 گاو:26 , ماهی:45, اسب:27
image image image

Online Demo: CLIPfa at Huggingface 🤗 spaces

We used a small set of images (25K) to keep this app almost real-time, but it's obvious that the quality of image search depends heavily on the size of the image database.

Dataset: 400K

We started with this question that how much the original Clip model depends on its big training dataset containing a lot of conceptual samples. Our model shows that It is possible to meet an acceptable enough target with only a little amount of data even though, It may not have known enough concepts and subjects to be used widely. Our model trained on a dataset gathered from different resources such as The Flickr30k, MS-COCO 2017, Google CCm3, ... . We used these datasets and translated them into the Persian language with a tool prepared by ourselves. Using the Google Translate and Multilingual Similarity Check method we provided an automatic translator that has been given a list of English captions and filtered by the best translations.

  • Note: We used image2ds a great tool to download large scale image datasets such as MS-COCO. It can download, resize and package 100M urls in 20h on one machine. Also supports saving captions for url+caption datasets.
  • coco-flickr-fa 130K on Kaggle

Training:

Any dataset can be used with little change by the training code. CLIPfa can be trained with other encoders as long as they have the same hidden size at the last layer. In this notebook I used training code to train a small CLIP on translated flickr30K dataset.

Citation: ↩️

If you have a technical question regarding the model, code or publication, create an issue in the repository. we didn't publish any papers on the work. However, if you did, please cite us properly with an entry like one below.

@misc{ParsBigBird,
  author          = {Sajjad Ayoubi, Navid Kanaani},
  title           = {CLIPfa: Connecting Farsi Text and Images},
  year            = 2021,
  publisher       = {GitHub},
  journal         = {GitHub repository},
  howpublished    = {\url{https://github.com/SajjjadAyobi/CLIPfa}},
}

Made with ❤️ in my basement 🤫

Owner
Sajjad Ayoubi
Wants to be a Machine Learning Engineer
Sajjad Ayoubi
Code repository for "It's About Time: Analog clock Reading in the Wild"

it's about time Code repository for "It's About Time: Analog clock Reading in the Wild" Packages required: pytorch (used 1.9, any reasonable version s

52 Nov 10, 2022
String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022
ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

CAiRE 11 Dec 09, 2022
Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

Jeff Johannsen 3 Nov 27, 2022
Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

Phil Wang 364 Jan 06, 2023
gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

决赛答辩已经过去一段时间了,我们队伍ac milan最终获得了复赛第3,决赛第4的成绩。在此首先感谢一些队友的carry~ 经过2个多月的比赛,学习收获了很多,也认识了很多大佬,在这里记录一下自己的参赛体验和学习收获。

102 Dec 19, 2022
Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

Sae_Chan_Oh 11 Feb 18, 2022
A BERT-based reverse dictionary of Korean proverbs

Wisdomify A BERT-based reverse-dictionary of Korean proverbs. 김유빈 : 모델링 / 데이터 수집 / 프로젝트 설계 / back-end 김종윤 : 데이터 수집 / 프로젝트 설계 / front-end / back-end 임용

94 Dec 08, 2022
Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Twitter-Sentiment-Analysis Twitter sentiment analysis for india's top online retailers(2019 to 2022) Project Overview : Sentiment Analysis helps us to

Balaji R 1 Jan 01, 2022
Basic Utilities for PyTorch Natural Language Processing (NLP)

Basic Utilities for PyTorch Natural Language Processing (NLP) PyTorch-NLP, or torchnlp for short, is a library of basic utilities for PyTorch NLP. tor

Michael Petrochuk 2.1k Jan 01, 2023
Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized

texttron 193 Jan 04, 2023
Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Negative Sampling for NER Unlabeled entity problem is prevalent in many NER scenarios (e.g., weakly supervised NER). Our paper in ICLR-2021 proposes u

Yangming Li 128 Dec 29, 2022
BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network) BERTAC is a framework that combines a

6 Jan 24, 2022
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

Yuchao Zhang 204 Jul 14, 2022
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x using fastT5.

Reduce T5 model size by 3X and increase the inference speed up to 5X. Install Usage Details Functionalities Benchmarks Onnx model Quantized onnx model

Kiran R 399 Jan 05, 2023
The PyTorch based implementation of continuous integrate-and-fire (CIF) module.

CIF-PyTorch This is a PyTorch based implementation of continuous integrate-and-fire (CIF) module for end-to-end (E2E) automatic speech recognition (AS

Minglun Han 24 Dec 29, 2022
The SVO-Probes Dataset for Verb Understanding

The SVO-Probes Dataset for Verb Understanding This repository contains the SVO-Probes benchmark designed to probe for Subject, Verb, and Object unders

DeepMind 20 Nov 30, 2022
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa

70 Dec 11, 2022