Partially offline multi-language translator built upon Huggingface transformers.

Overview

Translate

Command-line interface to translation pipelines, powered by Huggingface transformers. This tool can download translation models, and then using them to translate sentences offline. By default, tries using models from Helsinki-NLP (each model is about 300MB large).

Install

$ git clone https://github.com/Teuze/translate
$ cd translate
$ pip3 install --user -r requirements.py

If you want to be able to use this script from anywhere in your system, you can symlink or copy the translate script file into one of your path folders, like for example $HOME/.local/bin.

Usage

Listing available and installed translation models :

$ # Also available on https://huggingface.co/models
$ ./translate model list online | less
$ ./translate model list local | less

Downloading models :

$ ./translate download model "Helsinki-NLP/opus-mt-en-es"
$ ./translate download model "Helsinki-NLP/opus-mt-fr-en"

Using models to translate from CLI arguments or from standard input :

$ ./translate text -e "Helsinki-NLP/opus-mt-en-es" "Hello World!"
¡Hola Mundo!
$ echo "Ceci est une phrase d'exemple simple" | ./translate text -s fr -t en
This is a simple example sentence
Owner
Richard Jarry
Richard Jarry
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides

ASYML 2.3k Jan 07, 2023
Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

SyntaxGen Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022) In this repo, we upload all the scripts for this work. Due to siz

Zhuosheng Zhang 3 Jun 13, 2022
Beautiful visualizations of how language differs among document types.

Scattertext 0.1.0.0 A tool for finding distinguishing terms in corpora and displaying them in an interactive HTML scatter plot. Points corresponding t

Jason S. Kessler 2k Dec 27, 2022
Codes to pre-train Japanese T5 models

t5-japanese Codes to pre-train a T5 (Text-to-Text Transfer Transformer) model pre-trained on Japanese web texts. The model is available at https://hug

Megagon Labs 37 Dec 25, 2022
🗣️ NALP is a library that covers Natural Adversarial Language Processing.

NALP: Natural Adversarial Language Processing Welcome to NALP. Have you ever wanted to create natural text from raw sources? If yes, NALP is for you!

Gustavo Rosa 21 Aug 12, 2022
A toolkit for document-level event extraction, containing some SOTA model implementations

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker Source code for ACL-IJCNLP 2021 Long paper: Document-le

84 Dec 15, 2022
RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

RoNER RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2. It is meant to be an easy to use, hi

Stefan Dumitrescu 9 Nov 07, 2022
Topic Modelling for Humans

gensim – Topic Modelling in Python Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Targ

RARE Technologies 13.8k Jan 02, 2023
Graphical user interface for Argos Translate

Argos Translate GUI Website | GitHub | PyPI Graphical user interface for Argos Translate. Install pip3 install argostranslategui

Argos Open Tech 16 Dec 07, 2022
Write Alphabet, Words and Sentences with your eyes.

The-Next-Gen-AI-Eye-Writer The Eye tracking Technique has become one of the most popular techniques within the human and computer interaction era, thi

Rohan Kasabe 2 Apr 05, 2022
scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. from skift import FirstColFtClassifier df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

Shay Palachy 233 Sep 09, 2022
Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

SpeechMix Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together. Introduction For the same input: from datas

Eric Lam 31 Nov 07, 2022
A curated list of efficient attention modules

awesome-fast-attention A curated list of efficient attention modules

Sepehr Sameni 891 Dec 22, 2022
Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

SongNet SongNet: SongCi + Song (Lyrics) + Sonnet + etc. @inproceedings{li-etal-2020-rigid, title = "Rigid Formats Controlled Text Generation",

Piji Li 212 Dec 17, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 03, 2021
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Dec 30, 2022
A minimal code for fairseq vq-wav2vec model inference.

vq-wav2vec inference A minimal code for fairseq vq-wav2vec model inference. Runs without installing the fairseq toolkit and its dependencies. Usage ex

Vladimir Larin 7 Nov 15, 2022
2021海华AI挑战赛·中文阅读理解·技术组·第三名

文字是人类用以记录和表达的最基本工具,也是信息传播的重要媒介。透过文字与符号,我们可以追寻人类文明的起源,可以传播知识与经验,读懂文字是认识与了解的第一步。对于人工智能而言,它的核心问题之一就是认知,而认知的核心则是语义理解。

21 Dec 26, 2022
Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources Description This is the repository for the paper Unifying Cross-

Sapienza NLP group 16 Sep 09, 2022
Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

Housegan-data-reader House-GAN++ (data-reader) Code and instructions for converting rplan dataset (raster images) to housegan++ data format. House-GAN

Sepid Hosseini 13 Nov 24, 2022