Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

Last update: Oct 08, 2022

Related tags

Computer Vision CSCBLI

Overview

CSCBLI

Code for our ACL Findings 2021 paper,
"Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction".

Requirements

python >= 3.6
numpy >= 1.9.0
pytorch >= 1.0

Supervised

How to train

CUDA_VISIBLE_DEVICES=0 python train.py --src_lang $lg --tgt_lang en\
        --static_src_emb_path $ssemb --static_tgt_emb_path $stemb\
        --context_src_emb_path $csemb --context_tgt_emb_path $ctemb\
        --train_data_path $data_path --save_path $save_path

--static_src_emb_path   aligned source static embedding path 
--static_tgt_emb_path   aligned target static embedding path
--context_src_emb_path  source context embedding path
--context_tgt_emb_path  target context embedding path

How to Test

CUDA_VISIBLE_DEVICES=0 python test_on_all_word.py --src_lang $lg\
        --tgt_lang en --model_path $model_path\
        --dict_path $dict_path\
        --vecmap_context_src_emb_path $vcpath\
        --vecmap_context_tgt_emb_path $vspath\
        --vecmap

--vecmap_context_src_emb_path aligned source context embedding path
--vecmap_context_tgt_emb_path aligned target context embedding path
--vecmap use interpolation method, else unified method

Unsupervised

How to train

lg=ar
CUDA_VISIBLE_DEVICES=0 python train.py --src_lang en --tgt_lang $lg\
  --static_src_emb_path $ssemb --static_tgt_emb_path $stemb\
  --context_src_emb_path $csemb --context_tgt_emb_path $ctemb\
   --save_path $save_path

--static_src_emb_path   aligned source static embedding path 
--static_tgt_emb_path   aligned target static embedding path
--context_src_emb_path  source context embedding path
--context_tgt_emb_path  target context embedding path

How to Test

src=ar
tgt=en
model_path=../checkpoints/$src-$tgt-add_orign_nw.pkl_last
CUDA_VISIBLE_DEVICES=0 python test.py  --model_path $model_path \
        --dict_path ../$src-$tgt.5000-6500.txt  --mode v2 \
        --src_lang $src --tgt_lang $tgt  \
        --reload_src_ctx   $path1 \
        --reload_tgt_ctx   $path2 --lambda_w1 0.11

--mode type    use v1 for unified method and v2 for interpolated 
--lambda_w1    the weight for interpolation
--reload_src_ctx   aligned source context embedding
--reload_tgt_ctx   aligned targte context embedding

Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

Related tags

Overview

CSCBLI

Requirements

Supervised

How to train

How to Test

Unsupervised

How to train

How to Test

Owner

Jinpeng Zhang

Official code for :rocket: Unsupervised Change Detection of Extreme Events Using ML On-Board :rocket:

"Very simple but works well" Computer Vision based ID verification solution provided by LibraX.

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

A curated list of promising OCR resources

This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

A selectional auto-encoder approach for document image binarization

Opencv face recognition desktop application

Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

天池2021"全球人工智能技术创新大赛"【赛道一】：医学影像报告异常检测 - 第三名解决方案

A novel region proposal network for more general object detection ( including scene text detection ).

A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Fun program to overlay a mask to yourself using a webcam

Layout Analysis Evaluator for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts

Awesome multilingual OCR toolkits based on PaddlePaddle （practical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices）

QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.

Automatically resolve RidderMaster based on TensorFlow & OpenCV