Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Overview

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation

This package provides easy to use, state-of-the-art machine translation for more than 100+ languages. The highlights of this package are:

  • Easy installation and usage: Use state-of-the-art machine translation with 3 lines of code
  • Automatic download of pre-trained machine translation models
  • Translation between 150+ languages
  • Automatic language detection for 170+ languages
  • Sentence and document translation
  • Multi-GPU and multi-process translation

At the moment, we provide the following models:

Examples:

Installation

You can install the package via:

pip install -U easynmt

The models are based on PyTorch. If you have a GPU available, see how to install PyTorch with GPU support. If you use Windows and have issues with the installation, see this issue how to solve it.

Usage

The usage is simple:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

#Translate a single sentence to German
print(model.translate('This is a sentence we want to translate to German', target_lang='de'))

#Translate several sentences to German
sentences = ['You can define a list with sentences.',
             'All sentences are translated to your target language.',
             'Note, you could also mix the languages of the sentences.']
print(model.translate(sentences, target_lang='de'))

Document Translation

The available models are based on the Transformer architecture, which provide state-of-the-art translation quality. However, the input length is limited to 512 word pieces for the opus-mt model and 1024 word pieces for the M2M models.

The translate() performs automatic sentence splitting to be able to translate also longer documents:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

document = """Berlin is the capital and largest city of Germany by both area and population.[6][7] Its 3,769,495 inhabitants as of 31 December 2019[2] make it the most-populous city of the European Union, according to population within city limits.[8] The city is also one of Germany's 16 federal states. It is surrounded by the state of Brandenburg, and contiguous with Potsdam, Brandenburg's capital. The two cities are at the center of the Berlin-Brandenburg capital region, which is, with about six million inhabitants and an area of more than 30,000 km2,[9] Germany's third-largest metropolitan region after the Rhine-Ruhr and Rhine-Main regions. Berlin straddles the banks of the River Spree, which flows into the River Havel (a tributary of the River Elbe) in the western borough of Spandau. Among the city's main topographical features are the many lakes in the western and southeastern boroughs formed by the Spree, Havel, and Dahme rivers (the largest of which is Lake Müggelsee). Due to its location in the European Plain, Berlin is influenced by a temperate seasonal climate. About one-third of the city's area is composed of forests, parks, gardens, rivers, canals and lakes.[10] The city lies in the Central German dialect area, the Berlin dialect being a variant of the Lusatian-New Marchian dialects.

First documented in the 13th century and at the crossing of two important historic trade routes,[11] Berlin became the capital of the Margraviate of Brandenburg (1417–1701), the Kingdom of Prussia (1701–1918), the German Empire (1871–1918), the Weimar Republic (1919–1933), and the Third Reich (1933–1945).[12] Berlin in the 1920s was the third-largest municipality in the world.[13] After World War II and its subsequent occupation by the victorious countries, the city was divided; West Berlin became a de facto West German exclave, surrounded by the Berlin Wall (1961–1989) and East German territory.[14] East Berlin was declared capital of East Germany, while Bonn became the West German capital. Following German reunification in 1990, Berlin once again became the capital of all of Germany.

Berlin is a world city of culture, politics, media and science.[15][16][17][18] Its economy is based on high-tech firms and the service sector, encompassing a diverse range of creative industries, research facilities, media corporations and convention venues.[19][20] Berlin serves as a continental hub for air and rail traffic and has a highly complex public transportation network. The metropolis is a popular tourist destination.[21] Significant industries also include IT, pharmaceuticals, biomedical engineering, clean tech, biotechnology, construction and electronics."""

#Translate the document to German
print(model.translate(document, target_lang='de'))

The function breaks down the document into sentences and then translates the sentences individually using the specified model.

Automatic Language Detection

You can set the source_lang for the translate method to define the source language. If source_lang is not set, fastText will be used to automatically determine the source language. This also allows you to provide a list with sentences / documents that have various languages:

from easynmt import EasyNMT
model = EasyNMT('opus-mt')

#Translate several sentences to English
sentences = ['Dies ist ein Satz in Deutsch.',   #This is a German sentence
             '这是一个中文句子',    #This is a chinese sentence
             'Esta es una oración en español.'] #This is a spanish sentence
print(model.translate(sentences, target_lang='en'))

Available Models

The following models are currently available. They provide translations between 150+ languages.

Model Reference #Languages Size Speed GPU (Sentences/Sec on V100) Speed CPU (Sentences/Sec) Comment
opus-mt Helsinki-NLP 186 300 MB 53 6 Inidivudal models (~300 MB) per translation direction
mbart50_m2m Facebook Research 52 1.2 GB 35 0.9
m2m_100_418M Facebook Research 100 0.9 GB 39 1.1
m2m_100_1.2B Facebook Research 100 2.4 GB 23 0.5

Translation Quality

Comparing model translation quality will be added soon here. So far, my personal subjective impression is, that opus-mt and m2m_100_1.2B yield the best translations.

Opus-MT

We provide a wrapper for the pre-trained models from Opus-MT.

Opus-MT provides 1200+ different translation models, each capable to translate one direction (e.g. from German to English). Each model is about 300 MB of size.

Supported languages: aav, aed, af, alv, am, ar, art, ase, az, bat, bcl, be, bem, ber, bg, bi, bn, bnt, bzs, ca, cau, ccs, ceb, cel, chk, cpf, crs, cs, csg, csn, cus, cy, da, de, dra, ee, efi, el, en, eo, es, et, eu, euq, fi, fj, fr, fse, ga, gaa, gil, gl, grk, guw, gv, ha, he, hi, hil, ho, hr, ht, hu, hy, id, ig, ilo, is, iso, it, ja, jap, ka, kab, kg, kj, kl, ko, kqn, kwn, kwy, lg, ln, loz, lt, lu, lua, lue, lun, luo, lus, lv, map, mfe, mfs, mg, mh, mk, mkh, ml, mos, mr, ms, mt, mul, ng, nic, niu, nl, no, nso, ny, nyk, om, pa, pag, pap, phi, pis, pl, pon, poz, pqe, pqw, prl, pt, rn, rnd, ro, roa, ru, run, rw, sal, sg, sh, sit, sk, sl, sm, sn, sq, srn, ss, ssp, st, sv, sw, swc, taw, tdt, th, ti, tiv, tl, tll, tn, to, toi, tpi, tr, trk, ts, tum, tut, tvl, tw, ty, tzo, uk, umb, ur, ve, vi, vsl, wa, wal, war, wls, xh, yap, yo, yua, zai, zh, zne

Usage:

from easynmt import EasyNMT
model = EasyNMT('opus-mt', max_loaded_models=10)

The system will automatically detect the suitable Opus-MT model and load it. With the optional parameter max_loaded_models you can specify the maximal number of models that are simoultanously loaded. If you then translate with an unseen language direction, the oldest model is unloaded and the new model is loaded.

mBERT_50

We provide a wrapper for the mBART50 model from Facebook, that is able to translate between any pair of 50+ languages.

Usage:

from easynmt import EasyNMT
model = EasyNMT('mbart50_m2m')

Supported languages: af, ar, az, bn, cs, de, en, es, et, fa, fi, fr, gl, gu, he, hi, hr, id, it, ja, ka, kk, km, ko, lt, lv, mk, ml, mn, mr, my, ne, nl, pl, ps, pt, ro, ru, si, sl, sv, sw, ta, te, th, tl, tr, uk, ur, vi, xh, zh

M2M_100

We provide a wrapper for the M2M 100 model from Facebook, that is able to translate between any pair of 100 languages.

Supported languages: af, am, ar, ast, az, ba, be, bg, bn, br, bs, ca, ceb, cs, cy, da, de, el, en, es, et, fa, ff, fi, fr, fy, ga, gd, gl, gu, ha, he, hi, hr, ht, hu, hy, id, ig, ilo, is, it, ja, jv, ka, kk, km, kn, ko, lb, lg, ln, lo, lt, lv, mg, mk, ml, mn, mr, ms, my, ne, nl, no, ns, oc, or, pa, pl, ps, pt, ro, ru, sd, si, sk, sl, so, sq, sr, ss, su, sv, sw, ta, th, tl, tn, tr, uk, ur, uz, vi, wo, xh, yi, yo, zh, zu

As the moment, we provide wrapper for two M2M 100 models:

  • m2m_100_418M: M2M model with 418 million parameters (0.9 GB)
  • m2m_100_1.2B: M2M model with 1.2 billion parameters (2.4 GB)

Usage:

from easynmt import EasyNMT
model = EasyNMT('m2m_100_418M')   #or: EasyNMT('m2m_100_1.2B') 

You can find more information here. Note: the 12 billion M2M parameters model is currently not supported.

As soon as you call EasyNMT('m2m_100_418M') / EasyNMT('m2m_100_1.2B'), the respective model is downloaded and cached locally.

Author

Contact person: Nils Reimers; [email protected]

https://www.ukp.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software to encourage future research.

Comments
  • Missing supported translate pair with M2M_100 model

    Missing supported translate pair with M2M_100 model

    Hi, I found that M2M_100 support translate directly between any pair in 100 languages (9900 pairs). But when I use EasyNMT with M2M_100 model, it doesn't support all of these pairs.

    Example: EasyNMT can't translate directly from 'th' (Thai) to 'en' (English) while M2M_100 model does support this pair.

    And when I tried to use HuggingFace to translate directly between Thai and English, it work perfectly.

    Can you please solve the problem? By the way, thank you for creating EasyNMT.

    opened by nguyenhuuthuat09 12
  • Can't access other models in docker image

    Can't access other models in docker image

    Hi,

    I'm sorry for this noobish question/issue and maybe it is easy to resolve (I'm not experienced with docker). I've built a web app which uses easyNMT in the back via the docker images and REST. When translating from romanian to german I noticed that the docker image is only using the opus model which does not provide this language direction. But when executing the "/model_name" request it shows me only "opus" as part of the docker image.

    So how can I get the other models? I have 3 docker images of easynmt (one with 7.7gb, one with 6.02 and one with 3.8 gb size) but it seems none of them contains the other models. Am I doing something wrong here? And also when they are part of the image, is there some kind of auto selection if a language is not available in one of the packages?

    I installed the docker images via the "build-docker-hub.sh" file.

    Best regards, André

    opened by 4quen 5
  • Library not translating, just returning input

    Library not translating, just returning input

    Hello I am running the following code

    from easynmt import EasyNMT
    
    model = EasyNMT('opus-mt')
    
    print(model.translate("停", target_lang='en'))
    

    The result of the code is just "停", which is the exact same thing as the input. How can i fix this?

    opened by geekjr 5
  • Can this project support num-beams in opus-mt model ?

    Can this project support num-beams in opus-mt model ?

    I find similar project called ktrain support this. located in https://github.com/amaiya/ktrain/blob/5c9c6b333115be44433639c4bc4c091bd79ab65c/ktrain/text/translation/core.py and have some accuracy measurement output to summarize the conclusion will more interesting. Can multilingual sentence embedding can do some help ?

    opened by svjack 5
  • AttributeError: 'float' object has no attribute 'split'

    AttributeError: 'float' object has no attribute 'split'

    Hi Team, I have a question. I am trying to translate a column which has blanks in between. I am using EasyMT and its giving an error. won't it work if there is a blanks or missing in between the rows of a column?

    Thanks Srinivas

    opened by sriprad 4
  • Sending large documents for translation with GET endpoint can sometimes result in URL parser error

    Sending large documents for translation with GET endpoint can sometimes result in URL parser error

    With large documents (several thousand characters), I see a crash in the URL parser. Tried with different source and target languages. Also, for the same source text, the translation may sometimes succeed, but then fail again. For example, for the sample request URL given below, the URL parser may sometimes succeed and sometimes fail with exception.

    I'm using the last 1.x commit of EasyNMT (specifically commit 61fcf7154f01f56c02be6d30b1c5d0921b91aa2e) as it has better benchmarks than 2.x for fairseq models, but I believe the same issue should be there for the latest version too as I don't think the URL parser would have changed. I'm using the m2m_100_418M model with a T4 GPU if that matters at all.

    EasyNMT error logs:

    [2021-05-25 14:58:29 +0000] [16] [WARNING] Invalid HTTP request received.
    Traceback (most recent call last):
      File "httptools/parser/parser.pyx", line 245, in httptools.parser.parser.cb_on_url
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 216, in on_url
        parsed_url = httptools.parse_url(url)
      File "httptools/parser/parser.pyx", line 468, in httptools.parser.parser.parse_url
    httptools.parser.errors.HttpParserInvalidURLError: invalid url b'ividual+level+we+need+to+take+up+responsibility+to+curb+the+spread+of+this+virus%2C%22+said+Dr+Rao%0D%0A%0D%0APromotedListen+to+the+latest+songs%2C+only+on+JioSaavn.com%0D%0A%0D%0AWhen+it+comes+to+Bengaluru%2C+Dr+Rao+said+the+lockdown+had+reduced+the+number+of+emergency+oxygen+requirements+and+the+panic.+%22That+is+because+the+virus+has+stopped+moving+because+we+have+stopped+moving%2C%22+he+said.+%22Generally%2C+as+a+rule%2C+a+health+care+system+will+not+be+able+to+cope+with+a+sudden+rise+in+numbers%2C+emergency+oxygen+requirements+or+health+care.+The+other+big+concern+is+trained+manpower.%22%0D%0A%0D%0AComments%0D%0AMucormycosis%2C+commonly+known+as+Black+Fungus%2C+is+also+on+the+rise+in+the+state.+Dr+Rao+said%3A+%22At+HCG+we+are+treating+30+cases+and+the+number+is+on+the+rise.+In+Karnataka%2C+currently%2C+it+must+be+about+700+cases.+It+looks+like+an+epidemic+within+a+pandemic+at+this+juncture.+We+need+to+understand+the+source+of+this+infection%2C+have+early+detection+and+treatment.+A+committee+will+give+a+clear+strategy+for+the+state.+We+don%27t+need+to+scare+people+about+black+fungus%2C+we+need+to+create+awareness.+What+we+have+seen+in+the+patients+-+they+have+all+been+Covid+positive%2C+most+have+been+given+steroids%2C+majority+had+high+sugar.+30+to+40+per+cent+had+been+given+oxygen+and+most+important+-+none+of+them+had+been+vaccinated.%22%0D%0A%0D%0A'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 167, in data_received
        self.parser.feed_data(data)
      File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
    httptools.parser.errors.HttpParserCallbackError: the on_url callback failed
    [2021-05-25 14:58:29 +0000] [18] [WARNING] Invalid HTTP request received.
    Traceback (most recent call last):
      File "httptools/parser/parser.pyx", line 245, in httptools.parser.parser.cb_on_url
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 216, in on_url
        parsed_url = httptools.parse_url(url)
      File "httptools/parser/parser.pyx", line 468, in httptools.parser.parser.parse_url
    httptools.parser.errors.HttpParserInvalidURLError: invalid url b'irus+is+now+being+reported+more+from+rural+Karnataka+with+often+a+weak+health+infrastructure.%0D%0A%0D%0ADr+Vishal+Rao+of+the+HCG+hospitals+and+a+member+of+the+Karnataka+Covid+task+force+said%2C+%22It+is+going+to+be+an+uphill+task+as+we+move+towards+the+districts+as+the+health+care+systems+get+overburdened+there.+Even+the+oxygen+management.+In+cities%2C+we+have+the+privilege+that+oxygen+comes+to+the+doorstep+of+the+hospital.+Whereas+in+villages+and+districts%2C+hospitals+have+to+carry+their+cylinders+to+refill+them.+Public+health+experts+and+virologists+are+repeatedly+trying+to+enhance+the+surveillance+in+villages+to+ensure+we+are+better+prepared+in+villages.+This+is+the+time+to+ramp+up+the+preparation+for+villages.%22%0D%0A%0D%0AHe+also+said+that+the+lockdown+%22definitely+had+a+very+significant+impact%22+on+the+daily+infections.+%22From+50%2C000+cases+everyday%2C+today+we+are+at+around+20%2C000+odd+cases.It+is+not+a+reassurance+that+once+the+lockdown+is+lifted%2C+we+will+continue+to+have+these+low+numbers.+But+what+is+of+concern+is+that+the+positivity+rate+still+sticks+at+around+20+per+cent+and+the+mortality+has+jumped+to+about+2+per+cent.+We+need+to+understand+that+when+the+waves+flatten%2C+it+is+not+that+the+virus+is+taking+rest.+It+is+a+socio-economic+virus+and+the+more+we+improve+interactions+without+safety%2C+we+are+going+to+explode+and+expand+the+spread+of+this+virus.+At+an+ind'
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/site-packages/uvicorn/protocols/http/httptools_impl.py", line 167, in data_received
        self.parser.feed_data(data)
      File "httptools/parser/parser.pyx", line 193, in httptools.parser.parser.HttpParser.feed_data
    httptools.parser.errors.HttpParserCallbackError: the on_url callback failed
    

    Full request URL path + query params:

    /translate?beam_size=2&source_lang=en&target_lang=de&text=Bengaluru%3A+Karnataka+-+one+of+the+worst+hit+states+in+the+country+in+the+second+wave+of+COVID-19%2C+has+been+witnessing+a+slump+in+the+new+case+numbers+over+the+last+few+weeks.+The+authorities%2C+however%2C+are+of+the+view+that+It+is+far+too+early+to+relax.%0D%0AThe+number+of+COVID-19+outnumbered+fresh+infections+in+Karnataka+yet+again+on+Tuesday%2C+as+the+state+reported+38%2C224+discharges+and+22%2C758+new+cases.+Of+the+new+cases+reported+today%2C+6%2C243+were+from+Bengaluru.%0D%0A%0D%0A%22If+you+look+at+the+numbers%2C+it+has+been+reducing+very+drastically.+Except+for+a+few+districts+where+the+numbers+are+not+coming+down.+In+most+of+the+districts+and+Bengaluru%2C+the+numbers+have+come+down.+The+number+should+come+down+drastically+so+that+we+can+unlock+from+the+lockdown%2C%22+Deputy+Chief+Minister+Dr+Ashwath+Narayan+told+NDTV.%0D%0A%0D%0AThe+state+is+in+the+middle+of+a+strict+shutdown.+But+that+doesn%27t+mean+any+major+reduction+in+the+demand+for+oxygen+in+Bengaluru+as+ICU+beds+still+remain+full.%0D%0A%0D%0A%22Since+the+number+has+come+down+very+drastically+-+now+it+is+5%2C000+odd+cases+in+Bengaluru+%28daily+infections%29+-+from+when+it+had+almost+reached+25%2C000%2C+it+is+a+great+relief.+When+it+comes+to+ICU+or+ventilator%2C+however%2C+there+is+still+a+lot+of+demand%2C%22+he+said.%0D%0A%0D%0AAn+extra+concern+is+that+the+virus+is+now+being+reported+more+from+rural+Karnataka+with+often+a+weak+health+infrastructure.%0D%0A%0D%0ADr+Vishal+Rao+of+the+HCG+hospitals+and+a+member+of+the+Karnataka+Covid+task+force+said%2C+%22It+is+going+to+be+an+uphill+task+as+we+move+towards+the+districts+as+the+health+care+systems+get+overburdened+there.+Even+the+oxygen+management.+In+cities%2C+we+have+the+privilege+that+oxygen+comes+to+the+doorstep+of+the+hospital.+Whereas+in+villages+and+districts%2C+hospitals+have+to+carry+their+cylinders+to+refill+them.+Public+health+experts+and+virologists+are+repeatedly+trying+to+enhance+the+surveillance+in+villages+to+ensure+we+are+better+prepared+in+villages.+This+is+the+time+to+ramp+up+the+preparation+for+villages.%22%0D%0A%0D%0AHe+also+said+that+the+lockdown+%22definitely+had+a+very+significant+impact%22+on+the+daily+infections.+%22From+50%2C000+cases+everyday%2C+today+we+are+at+around+20%2C000+odd+cases.It+is+not+a+reassurance+that+once+the+lockdown+is+lifted%2C+we+will+continue+to+have+these+low+numbers.+But+what+is+of+concern+is+that+the+positivity+rate+still+sticks+at+around+20+per+cent+and+the+mortality+has+jumped+to+about+2+per+cent.+We+need+to+understand+that+when+the+waves+flatten%2C+it+is+not+that+the+virus+is+taking+rest.+It+is+a+socio-economic+virus+and+the+more+we+improve+interactions+without+safety%2C+we+are+going+to+explode+and+expand+the+spread+of+this+virus.+At+an+individual+level+we+need+to+take+up+responsibility+to+curb+the+spread+of+this+virus%2C%22+said+Dr+Rao%0D%0A%0D%0APromotedListen+to+the+latest+songs%2C+only+on+JioSaavn.com%0D%0A%0D%0AWhen+it+comes+to+Bengaluru%2C+Dr+Rao+said+the+lockdown+had+reduced+the+number+of+emergency+oxygen+requirements+and+the+panic.+%22That+is+because+the+virus+has+stopped+moving+because+we+have+stopped+moving%2C%22+he+said.+%22Generally%2C+as+a+rule%2C+a+health+care+system+will+not+be+able+to+cope+with+a+sudden+rise+in+numbers%2C+emergency+oxygen+requirements+or+health+care.+The+other+big+concern+is+trained+manpower.%22%0D%0A%0D%0AComments%0D%0AMucormycosis%2C+commonly+known+as+Black+Fungus%2C+is+also+on+the+rise+in+the+state.+Dr+Rao+said%3A+%22At+HCG+we+are+treating+30+cases+and+the+number+is+on+the+rise.+In+Karnataka%2C+currently%2C+it+must+be+about+700+cases.+It+looks+like+an+epidemic+within+a+pandemic+at+this+juncture.+We+need+to+understand+the+source+of+this+infection%2C+have+early+detection+and+treatment.+A+committee+will+give+a+clear+strategy+for+the+state.+We+don%27t+need+to+scare+people+about+black+fungus%2C+we+need+to+create+awareness.+What+we+have+seen+in+the+patients+-+they+have+all+been+Covid+positive%2C+most+have+been+given+steroids%2C+majority+had+high+sugar.+30+to+40+per+cent+had+been+given+oxygen+and+most+important+-+none+of+them+had+been+vaccinated.%22%0D%0A%0D%0ADelhi+received+144.8+mm+rainfall+in+May+this+year%2C+the+highest+for+the+month+in+13+years%2C+according+to+the+India+Meteorological+Department+%28IMD%29.%0D%0A%22No+rain+is+predicted+in+the+next+four+to+five+days.+So%2C+this+is+the+highest+rainfall+in+May+since+2008%2C%22+Kuldeep+Srivastava%2C+the+head+of+the+IMD%27s+regional+forecasting+centre%2C+said+today.%0D%0A%0D%0AThe+Safdarjung+Observatory%2C+considered+the+official+marker+for+the+city%2C+had+recorded+21.1+mm+rainfall+last+year%2C+26.9+mm+in+2019+and+24.2+mm+in+2018.%0D%0A%0D%0AIt+had+gauged+40.5+mm+precipitation+in+2017%3B+24.3+mm+in+2016%3B+3.1+mm+in+2015+and+100.2+mm+in+2014%2C+according+to+IMD+data.%0D%0A%0D%0A
    
    opened by AgrimPrasad 3
  • Docker image easynmt/api:2.0-cpu crashes when trying to run on mac

    Docker image easynmt/api:2.0-cpu crashes when trying to run on mac

    Running this on a 2017 Macbook. Docker image easynmt/api:2.0-cpu fails to start with exceptions, while easynmt/api:1.1-cpu was running fine with the same docker run command previously.

    docker run -p 24081:80 -v /Users/agrim/Downloads/easynmt-models:/cache easynmt/api:2.0-cpu
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
        self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
    KeyError: 'model_args'
    Checking for script in /app/prestart.sh
    There is no script /app/prestart.sh
    [2021-04-27 14:38:22 +0000] [13] [INFO] Starting gunicorn 20.1.0
    [2021-04-27 14:38:22 +0000] [12] [INFO] Starting gunicorn 20.1.0
    [2021-04-27 14:38:22 +0000] [12] [INFO] Listening at: http://0.0.0.0:8080 (12)
    [2021-04-27 14:38:22 +0000] [13] [INFO] Listening at: http://0.0.0.0:80 (13)
    [2021-04-27 14:38:22 +0000] [13] [INFO] Using worker: uvicorn.workers.UvicornWorker
    [2021-04-27 14:38:22 +0000] [12] [INFO] Using worker: uvicorn.workers.UvicornWorker
    [2021-04-27 14:38:22 +0000] [17] [INFO] Booting worker with pid: 17
    [2021-04-27 14:38:22 +0000] [18] [INFO] Booting worker with pid: 18
    [2021-04-27 14:38:22 +0000] [19] [INFO] Booting worker with pid: 19
    [2021-04-27 14:38:24 +0000] [19] [INFO] Started server process [19]
    [2021-04-27 14:38:24 +0000] [17] [INFO] Started server process [17]
    [2021-04-27 14:38:24 +0000] [17] [INFO] Waiting for application startup.
    [2021-04-27 14:38:24 +0000] [19] [INFO] Waiting for application startup.
    [2021-04-27 14:38:24 +0000] [17] [INFO] Application startup complete.
    [2021-04-27 14:38:24 +0000] [19] [INFO] Application startup complete.
    {"loglevel": "info", "workers": "1", "bind": "0.0.0.0:8080", "graceful_timeout": 120, "timeout": 120, "keepalive": 5, "errorlog": "-", "accesslog": "-", "host": "0.0.0.0", "port": "8080"}
    Booted as backend: True
    Load model: opus-mt
    [2021-04-27 14:38:25 +0000] [18] [ERROR] Exception in worker process
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
        worker.init_process()
      File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 63, in init_process
        super(UvicornWorker, self).init_process()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
        self.load_wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
        self.wsgi = self.app.wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
        self.callable = self.load()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
        return self.load_wsgiapp()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
        return util.import_app(self.app_uri)
      File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
        mod = importlib.import_module(module)
      File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 783, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/app/main.py", line 36, in <module>
        model = EasyNMT(model_name, load_translator=IS_BACKEND, **model_args)
      File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
        self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
    KeyError: 'model_args'
    [2021-04-27 14:38:25 +0000] [18] [INFO] Worker exiting (pid: 18)
    [2021-04-27 14:38:25 +0000] [12] [INFO] Shutting down: Master
    [2021-04-27 14:38:25 +0000] [12] [INFO] Reason: Worker failed to boot.
    {"loglevel": "info", "workers": "1", "bind": "0.0.0.0:8080", "graceful_timeout": 120, "timeout": 120, "keepalive": 5, "errorlog": "-", "accesslog": "-", "host": "0.0.0.0", "port": "8080"}
    One of the processes has already exited.
    
    opened by AgrimPrasad 3
  • No module named 'easynmt.models.OpusMT' in PyCharm

    No module named 'easynmt.models.OpusMT' in PyCharm

    Hello, I'm running this simple code in pycharm `from easynmt import EasyNMT model = EasyNMT("opus-mt")

    print(model.translate("Hi", target_lang="fr"))`

    and it gives me this error

    Traceback (most recent call last): File "H:/Documents/Python/Random.py", line 2, in model = EasyNMT("opus-mt") File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\site-packages\easynmt\EasyNMT.py", line 69, in init module_class = import_from_string(self.config['model_class']) File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\site-packages\easynmt\util.py", line 56, in import_from_string module = importlib.import_module(module_path) File "C:\Users\Jerz King\AppData\Local\Programs\Python\Python37\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'easynmt.models.OpusMT'

    I had the same error addressed here when installing easynmt, and followed the steps, nothing happened... how do I fix this?

    opened by Lorddickenstein 3
  • OSError after a few translations

    OSError after a few translations

    Hi and thanks for the cool library!

    I want to include the translation function in one of my data pipelines that loops over thousands of text snippets. Without the GPU support and on Windows I was following the instructions in the other issue and successfully added the function.

    from easynmt import EasyNMT
    model = EasyNMT('opus-mt')
    

    and I translate with:

    language = detect_langs(text)
    for each_lang in language:
       if (each_lang.lang != "en"):
          translated_text = model.translate(text, target_lang='en')
    

    whereas text is a string. However, after a few translations (2-3) I always run into this error:

    OSError: Can't load tokenizer for 'Helsinki-NLP/opus-mt-ia-en'. Make sure that:
    - 'Helsinki-NLP/opus-mt-ia-en' is a correct model identifier listed on 'https://huggingface.co/models'
    

    Any idea what the problem could be?

    opened by jonas-nothnagel 3
  • MBart50Converter requires the protobuf library but it was not found in your environment.

    MBart50Converter requires the protobuf library but it was not found in your environment.

    Try to use docker image with model mbart50_m2m Command: docker run --env EASYNMT_MODEL=mbart50_m2m --env TIMEOUT=600 --env MAX_WORKERS_FRONTEND=1 -p 24080:80 easynmt/api:2.0-cpu And Its exited with trace:

    [2022-05-26 12:48:19 +0000] [36] [ERROR] Exception in worker process
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
        worker.init_process()
      File "/usr/local/lib/python3.8/site-packages/uvicorn/workers.py", line 63, in init_process
        super(UvicornWorker, self).init_process()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
        self.load_wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
        self.wsgi = self.app.wsgi()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
        self.callable = self.load()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
        return self.load_wsgiapp()
      File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
        return util.import_app(self.app_uri)
      File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
        mod = importlib.import_module(module)
      File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
      File "<frozen importlib._bootstrap>", line 991, in _find_and_load
      File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 783, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/app/main.py", line 36, in <module>
        model = EasyNMT(model_name, load_translator=IS_BACKEND, **model_args)
      File "/usr/local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 92, in __init__
        self.translator = module_class(easynmt_path=model_path, **self.config['model_args'])
      File "/usr/local/lib/python3.8/site-packages/easynmt/models/AutoModel.py", line 32, in __init__
        self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, **self.tokenizer_args)
      File "/usr/local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 407, in from_pretrained
        return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
      File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1709, in from_pretrained
        return cls._from_pretrained(
      File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1781, in _from_pretrained
        tokenizer = cls(*init_inputs, **init_kwargs)
      File "/usr/local/lib/python3.8/site-packages/transformers/models/mbart/tokenization_mbart50_fast.py", line 128, in __init__
        super().__init__(
      File "/usr/local/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 99, in __init__
        fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
      File "/usr/local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 708, in convert_slow_tokenizer
        return converter_class(transformer_tokenizer).converted()
      File "/usr/local/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 301, in __init__
        requires_protobuf(self)
      File "/usr/local/lib/python3.8/site-packages/transformers/file_utils.py", line 574, in requires_protobuf
        raise ImportError(PROTOBUF_IMPORT_ERROR.format(name))
    ImportError: 
    MBart50Converter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
    installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
    that match your environment.
    

    Does anybody works on docker images and can fix it?

    opened by TheMY3 2
  • Issues installing sentencepiece and fasttext dependencies on Windows and Mac

    Issues installing sentencepiece and fasttext dependencies on Windows and Mac

    Trying to install EasyNMT 2.0.1 on Windows 10 and normal Python 3.10.0 installation (not Anaconda)

    A colleague said he had the same issue on his Mac.

    Building wheels for collected packages: fasttext, sentencepiece
      Building wheel for fasttext (setup.py) ... error
      error: subprocess-exited-with-error
    
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [52 lines of output]
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
            warnings.warn(
          running bdist_wheel
          running build
          running build_py
          creating build
          creating build\lib.win-amd64-3.10
          creating build\lib.win-amd64-3.10\fasttext
          copying python\fasttext_module\fasttext\FastText.py -> build\lib.win-amd64-3.10\fasttext
          copying python\fasttext_module\fasttext\__init__.py -> build\lib.win-amd64-3.10\fasttext
          creating build\lib.win-amd64-3.10\fasttext\util
          copying python\fasttext_module\fasttext\util\util.py -> build\lib.win-amd64-3.10\fasttext\util
          copying python\fasttext_module\fasttext\util\__init__.py -> build\lib.win-amd64-3.10\fasttext\util
          creating build\lib.win-amd64-3.10\fasttext\tests
          copying python\fasttext_module\fasttext\tests\test_configurations.py -> build\lib.win-amd64-3.10\fasttext\tests
          copying python\fasttext_module\fasttext\tests\test_script.py -> build\lib.win-amd64-3.10\fasttext\tests
          copying python\fasttext_module\fasttext\tests\__init__.py -> build\lib.win-amd64-3.10\fasttext\tests
          running build_ext
          building 'fasttext_pybind' extension
          creating build\temp.win-amd64-3.10
          creating build\temp.win-amd64-3.10\Release
          creating build\temp.win-amd64-3.10\Release\python
          creating build\temp.win-amd64-3.10\Release\python\fasttext_module
          creating build\temp.win-amd64-3.10\Release\python\fasttext_module\fasttext
          creating build\temp.win-amd64-3.10\Release\python\fasttext_module\fasttext\pybind
          creating build\temp.win-amd64-3.10\Release\src
          "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include -Isrc -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tppython/fasttext_module/fasttext/pybind/fasttext_pybind.cc /Fobuild\temp.win-amd64-3.10\Release\python/fasttext_module/fasttext/pybind/fasttext_pybind.obj /EHsc /DVERSION_INFO=\\\"0.9.2\\\"
          fasttext_pybind.cc
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2065: 'ssize_t': undeclared identifier
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2672: 'pybind11::init': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'CFunc', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1702): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'Func', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1697): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2974: 'pybind11::init': invalid template argument for 'Args', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1690): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(171): error C2672: 'pybind11::class_<fasttext::Vector>::def': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(170): error C2780: 'pybind11::class_<fasttext::Vector> &pybind11::class_<fasttext::Vector>::def(const char *,Func &&,const Extra &...)': expects 3 arguments - 1 provided
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1416): note: see declaration of 'pybind11::class_<fasttext::Vector>::def'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2065: 'ssize_t': undeclared identifier
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2065: 'ssize_t': undeclared identifier
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2672: 'pybind11::init': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'CFunc', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1702): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'Func', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1697): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2974: 'pybind11::init': invalid template argument for 'Args', type expected
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1690): note: see declaration of 'pybind11::init'
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(185): error C2672: 'pybind11::class_<fasttext::DenseMatrix>::def': no matching overloaded function found
          python/fasttext_module/fasttext/pybind/fasttext_pybind.cc(182): error C2780: 'pybind11::class_<fasttext::DenseMatrix> &pybind11::class_<fasttext::DenseMatrix>::def(const char *,Func &&,const Extra &...)': expects 3 arguments - 1 provided
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\pybind11\include\pybind11\pybind11.h(1416): note: see declaration of 'pybind11::class_<fasttext::DenseMatrix>::def'
          error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for fasttext
      Running setup.py clean for fasttext
      Building wheel for sentencepiece (setup.py) ... error
      error: subprocess-exited-with-error
    
      × python setup.py bdist_wheel did not run successfully.
      │ exit code: 1
      ╰─> [22 lines of output]
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
            warnings.warn(
          running bdist_wheel
          running build
          running build_py
          creating build
          creating build\lib.win-amd64-3.10
          creating build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/__init__.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_model_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          running build_ext
          building 'sentencepiece._sentencepiece' extension
          creating build\temp.win-amd64-3.10
          creating build\temp.win-amd64-3.10\Release
          creating build\temp.win-amd64-3.10\Release\src
          creating build\temp.win-amd64-3.10\Release\src\sentencepiece
          "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpsrc/sentencepiece/sentencepiece_wrap.cxx /Fobuild\temp.win-amd64-3.10\Release\src/sentencepiece/sentencepiece_wrap.obj /MT /I..\build\root\include
          cl : Command line warning D9025 : overriding '/MD' with '/MT'
          sentencepiece_wrap.cxx
          src/sentencepiece/sentencepiece_wrap.cxx(2809): fatal error C1083: Cannot open include file: 'sentencepiece_processor.h': No such file or directory
          error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for sentencepiece
      Running setup.py clean for sentencepiece
    Failed to build fasttext sentencepiece
    Installing collected packages: sentencepiece, fasttext, EasyNMT
      Running setup.py install for sentencepiece ... error
      error: subprocess-exited-with-error
    
      × Running setup.py install for sentencepiece did not run successfully.
      │ exit code: 1
      ╰─> [24 lines of output]
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\dist.py:738: UserWarning: Usage of dash-separated 'description-file' will not be supported in future versions. Please use the underscore name 'description_file' instead
            warnings.warn(
          running install
          C:\Users\zackp\.virtualenvs\autosa-AbZZnTol\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
            warnings.warn(
          running build
          running build_py
          creating build
          creating build\lib.win-amd64-3.10
          creating build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/__init__.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_model_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          copying src\sentencepiece/sentencepiece_pb2.py -> build\lib.win-amd64-3.10\sentencepiece
          running build_ext
          building 'sentencepiece._sentencepiece' extension
          creating build\temp.win-amd64-3.10
          creating build\temp.win-amd64-3.10\Release
          creating build\temp.win-amd64-3.10\Release\src
          creating build\temp.win-amd64-3.10\Release\src\sentencepiece
          "C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\zackp\.virtualenvs\autosa-AbZZnTol\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\include -IC:\Users\zackp\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.28.29910\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" /EHsc /Tpsrc/sentencepiece/sentencepiece_wrap.cxx /Fobuild\temp.win-amd64-3.10\Release\src/sentencepiece/sentencepiece_wrap.obj /MT /I..\build\root\include
          cl : Command line warning D9025 : overriding '/MD' with '/MT'
          sentencepiece_wrap.cxx
          src/sentencepiece/sentencepiece_wrap.cxx(2809): fatal error C1083: Cannot open include file: 'sentencepiece_processor.h': No such file or directory
          error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: legacy-install-failure
    
    × Encountered error while trying to install package.
    ╰─> sentencepiece
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for output from the failure.
    
    opened by ZackPlauche 2
  • Is there randomness in translation or does every translation lead to the exact same output?

    Is there randomness in translation or does every translation lead to the exact same output?

    Thanks a lot for creating this great package! Question: will every translation with equivalent input always lead to the exact same output, or is there some randomness involved (e.g. through beamsearch), which requires setting a seed for full reproducibility? I've discussed this with colleagues and there seem to be some beamsearch algorithms that are stochastic (i.e. introduce randomness) and others do not. Which one is used here? If a stochastic algorithm is used, how would be set a seed to ensure reproducibility?

    opened by MoritzLaurer 0
  • EasyNMT

    EasyNMT

    Hi, Thank you for this useful library. I tried to install in my machine and it gave me this error. Any help please?

    print(model.translate('This is a sentence we want to translate to German', target_lang='de')) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 154, in translate raise e File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 149, in translate translated = self.translate(**method_args) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 181, in translate translated_sentences = self.translate_sentences(splitted_sentences, target_lang=target_lang, source_lang=source_lang, show_progress_bar=show_progress_bar, beam_size=beam_size, batch_size=batch_size, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/EasyNMT.py", line 278, in translate_sentences output.extend(self.translator.translate_sentences(sentences_sorted[start_idx:start_idx+batch_size], source_lang=source_lang, target_lang=target_lang, beam_size=beam_size, device=self.device, **kwargs)) File "/home/karima/.local/lib/python3.8/site-packages/easynmt/models/OpusMT.py", line 49, in translate_sentences translated = model.generate(**inputs, num_beams=beam_size, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 1182, in generate model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation( File "/home/karima/.local/lib/python3.8/site-packages/transformers/generation_utils.py", line 525, in _prepare_encoder_decoder_kwargs_for_generation model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/transformers/models/marian/modeling_marian.py", line 749, in forward inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward return F.embedding( File "/home/karima/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    opened by abidikarima 0
  • How to run test_translation_speed.py

    How to run test_translation_speed.py

    easynmt docker install is working fine through http requests. Now i'd like to run some benchmark. How do you run /examples/test_translation_speed.py ?

    opened by JohnWinner 2
  • Workflow for large datasets

    Workflow for large datasets

    Hi! I was wondering if there is a workflow for large datasets available. I am trying to translate a big amount of tweets using Pandas and Python.

    Best, Daniel

    opened by viajerus 0
  • Enable manually specifying the desired OPUS model?

    Enable manually specifying the desired OPUS model?

    I really like the library, great work! Is there a way to manually specify a specific OPUS model? For example EasyNMT with OPUS currently does not support English as source and Portuguese as target language because it tries to download 'opus-mt-en-pt' by default, which does not exist. There is, however, an en2pt model on the hub now (https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-pt) with a slightly different name. I don't know how to tell EasyNMT to take this specific model instead of throwing the following error:

    OSError: Helsinki-NLP/opus-mt-en-pt is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

    opened by MoritzLaurer 2
Releases(v2.0.0)
  • v2.0.0(Apr 26, 2021)

    mbart50 & m2m models now use huggingface transformers

    The mbart50 & m2m models required in version 1 the fairseq library. This caused several issues: fairseq cannot be used on Windows, multi-processing did not work with fairseq models, loading and using the models were quite complicated.

    With this release, the fairseq dependency is removed and mbart50 / m2m models are loaded with huggingface transformers version >= 4.4.0

    From a user perspective, no changes should be visible. But from a developer perspective, this simplifies the architecture of EasyNMT and allows new futures more easily be integrated.

    Saving models

    Models can now be saved to disc by calling:

    model.save(output_path)
    

    Models can be loaded from disc by calling:

    model = EasyNMT(output_path)
    

    Loadings models from huggingface model hub

    Loading of any Huggingface Translation Model is now simple. Simply pass the name or the model path to the following code:

    from easynmt import EasyNMT, models
    article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
    pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
    sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""
    
    model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro')) 
    print(model.translate(article, source_lang='en_XX', target_lang='ro_RO'))
    

    This loads the facebook/mbart-large-en-ro model from the model hub.

    Note: Models might use different language codes, e.g. the mbart model uses 'en_XX' instead of 'en' and 'ro_RO' instead of 'ro'. To make the language code consistent, you can pass a lang_map:

    from easynmt import EasyNMT, models
    
    article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
    pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
    sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""
    
    output_path = 'output/mbart-large-en-ro'
    model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro', lang_map={'en': 'en_XX', 'ro': 'ro_RO'}))
    
    #Save the model to disc
    model.save(output_path)
    
    # Load the model from disc
    model = EasyNMT(output_path)
    print(model.translate(article,  target_lang='ro'))
    
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Mar 15, 2021)

    This release brings several improvements and is the first step towards the release of a Docker Image + REST API.

    Improvements:

    • Docker REST API: We have published Docker images for a REST API, that allows the easy usage of EasyNMT. Just run the Docker image and starts translating using REST API calls: more info
    • Google Colab REST API Hosting: We have published a colab notenbook that shows to to wrap EasyNMT in a REST API and host it on Google Colab with a free GPU. Useful if you need to translate large amounts.
    • Long sentences are translated first: Sentences are sorted before they are translated in order to waste minimal time with padding tokens. In the previous version, the shortest sentences were translated first and then later the longer sentences. Now the order is reversed. This has several advantages: If an OOM happens, it happens at the start of the translation process and not at the end. Also, the estimate from the progress bar is more accurate as the longest and slowest sentences are now translated first.
    • Improve language detection: Automatic language is still an issue, especially for mixed languages. Language detection is now performed on document level and not on sentence level. If you need sentence level lang. detection on sentence level you can set document_language_detection=False for the translate method. Also, text is now lower cased before the language is detected (the lang. detection scripts had issues with all upper case text
    • Max length parameter: When you create your model like this: model = EasyNMT(model_name, max_length=100), then all sentences with more than 100 word pieces will be truncated to at max 100 word pieces. This can prevent OOM with too long sentences.
    • Load model without translator: If you just want to use the language detection methods, you can now load your model like model = EasyNMT(model_name, load_translator=False). This will prevent the loading of the translation engine.

    Roadmap

    • As soon as Huggingface transformers v4.4.0 is released, the dependency on fairseq can be removed as the mBART50 and m2m models will be available in HF transformers. This will make the installation on a Windows machine possible
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 29, 2021)

    fastText is used for automatic language detection, as it provides the highest speed and best accuracy.

    However, it can be complicated to install it on Windows as it requires a C/C++ compiler.

    This release adds two alternative language identifiers:

    • [langid][(https://github.com/saffsd/langid.py) - Can be installed via pip install langid
    • langdetect - Can be installed via pip install langdetect

    If fastText is not available, langid / langdetect will be used as alternative language detection methods.

    For installation on Windows, you can run the following commands:

    pip install --no-deps easynmt
    pip install tqdm transformers numpy nltk sentencepiece langid 
    

    Further, you have to install pytorch as described here: https://pytorch.org/get-started/locally/

    If you want to install fastText on Windows, I can recommend this link: https://anaconda.org/conda-forge/fasttext

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 27, 2021)

    fastText language detection did not work well if the text was in UPPERCASE.

    Adding lower() to the string before the language identification step significantly improved the performance.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 27, 2021)

Owner
Ubiquitous Knowledge Processing Lab
Ubiquitous Knowledge Processing Lab
A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.

CTC Decoding Algorithms Update 2021: installable Python package Python implementation of some common Connectionist Temporal Classification (CTC) decod

Harald Scheidl 736 Jan 03, 2023
Tools for curating biomedical training data for large-scale language modeling

Tools for curating biomedical training data for large-scale language modeling

BigScience Workshop 242 Dec 25, 2022
Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

Software Analytics Lab 11 Oct 31, 2022
تولید اسم های رندوم فینگیلیش

karafs کرفس تولید اسم های رندوم فینگیلیش installation ➜ pip install karafs usage دو زبانه ➜ karafs -n 10 توت فرنگی بی ناموس toot farangi-ye bi_namoos

Vaheed NÆINI (9E) 36 Nov 24, 2022
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.

BERT Convolutions Code for the paper Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models. Contains expe

mlpc-ucsd 21 Jul 18, 2022
Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation This package provides easy to use, state-of-the-art machine translation for more th

Ubiquitous Knowledge Processing Lab 748 Jan 06, 2023
TensorFlow code and pre-trained models for BERT

BERT ***** New March 11th, 2020: Smaller BERT Models ***** This is a release of 24 smaller BERT models (English only, uncased, trained with WordPiece

Google Research 32.9k Jan 08, 2023
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Yu Zhang 50 Nov 08, 2022
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022
This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using

Abdullah Tarek 3 Mar 11, 2022
PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

760 Jan 03, 2023
Prompt tuning toolkit for GPT-2 and GPT-Neo

mkultra mkultra is a prompt tuning toolkit for GPT-2 and GPT-Neo. Prompt tuning injects a string of 20-100 special tokens into the context in order to

61 Jan 01, 2023
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022
Python utility library for compositing PDF documents with reportlab.

pdfdoc-py Python utility library for compositing PDF documents with reportlab. Installation The pdfdoc-py package can be installed directly from the s

Michael Gale 1 Jan 06, 2022
Beyond Paragraphs: NLP for Long Sequences

Beyond Paragraphs: NLP for Long Sequences

AI2 338 Dec 02, 2022
Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

Zeyao Du 5.6k Jan 04, 2023
Longformer: The Long-Document Transformer

Longformer Longformer and LongformerEncoderDecoder (LED) are pretrained transformer models for long documents. ***** New December 1st, 2020: Longforme

AI2 1.6k Dec 29, 2022
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022