Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Overview
http://applejack.science.ru.nl/lamabadge.php/python-frog Project Status: Active – The project has reached a stable, usable state and is being actively developed.

Frog for Python

This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging, lemmatisation, morphological analysis, named entity recognition, shallow parsing, and dependency parsing. The tool itseelf is implemented in C++ (http://ilk.uvt.nl/frog).

Installation

Easy

For easy installation, please use our LaMachine distribution

Manual

  • Make sure to first install Frog and all its dependencies
  • Install Cython if not yet available on your system: $ sudo apt-get cython cython3 (Debian/Ubuntu, may differ for others)
  • Run: $ sudo python setup.py install

Usage

Example:

from __future__ import print_function, unicode_literals #to make this work on Python 2 as well as Python 3

import frog

frog = frog.Frog(frog.FrogOptions(parser=False))
output = frog.process_raw("Dit is een test")
print("RAW OUTPUT=",output)
output = frog.process("Dit is nog een test.")
print("PARSED OUTPUT=",output)

Output:

RAW OUTPUT= 1   Dit     dit     [dit]   VNW(aanw,pron,stan,vol,3o,ev)
0.777085        O       B-NP
2       is      zijn    [zijn]  WW(pv,tgw,ev)   0.999891        O
B-VP
3       een     een     [een]   LID(onbep,stan,agr)     0.999113        O
B-NP
4       test    test    [test]  N(soort,ev,basis,zijd,stan)     0.789112
O       I-NP


PARSED OUTPUT= [{'chunker': 'B-NP', 'index': '1', 'lemma': 'dit', 'ner':
'O', 'pos': 'VNW(aanw,pron,stan,vol,3o,ev)', 'posprob': 0.777085, 'text':
'Dit', 'morph': '[dit]'}, {'chunker': 'B-VP', 'index': '2', 'lemma':
'zijn', 'ner': 'O', 'pos': 'WW(pv,tgw,ev)', 'posprob': 0.999966, 'text':
'is', 'morph': '[zijn]'}, {'chunker': 'B-NP', 'index': '3', 'lemma': 'nog',
'ner': 'O', 'pos': 'BW()', 'posprob': 0.99982, 'text': 'nog', 'morph':
'[nog]'}, {'chunker': 'I-NP', 'index': '4', 'lemma': 'een', 'ner': 'O',
'pos': 'LID(onbep,stan,agr)', 'posprob': 0.995781, 'text': 'een', 'morph':
'[een]'}, {'chunker': 'I-NP', 'index': '5', 'lemma': 'test', 'ner': 'O',
'pos': 'N(soort,ev,basis,zijd,stan)', 'posprob': 0.903055, 'text': 'test',
'morph': '[test]'}, {'chunker': 'O', 'index': '6', 'eos': True, 'lemma':
'.', 'ner': 'O', 'pos': 'LET()', 'posprob': 1.0, 'text': '.', 'morph':
'[.]'}]

Available keyword arguments for FrogOptions:

  • tok - True/False - Do tokenisation? (default: True)
  • lemma - True/False - Do lemmatisation? (default: True)
  • morph - True/False - Do morpholigical analysis? (default: True)
  • daringmorph - True/False - Do morphological analysis in new experimental style? (default: False)
  • mwu - True/False - Do Multi Word Unit detection? (default: True)
  • chunking - True/False - Do Chunking/Shallow parsing? (default: True)
  • ner - True/False - Do Named Entity Recognition? (default: True)
  • parser - True/False - Do Dependency Parsing? (default: False).
  • xmlin - True/False - Input is FoLiA XML (default: False)
  • xmlout - True/False - Output is FoLiA XML (default: False)
  • docid - str - Document ID (for FoLiA)
  • numThreads - int - Number of threads to use (default: unset, unlimited)

You can specify a Frog configuration file explicitly as second argument upon instantiation, otherwise the default one is used:

frog = frog.Frog(frog.FrogOptions(parser=False), "/path/to/your/frog.cfg")

A third parameter, a dictionary, can be used to override specific configuration values (same syntax as Frog's --override option), you may want to leave the second parameter empty if you want to load the default configuration:

frog = frog.Frog(frog.FrogOptions(parser=False), "", { "tokenizer.rulesFile": "tokconfig-nld-twitter" })

FoLiA support

Frog supports output in the FoLiA XML format (set FrogOptions(xmlout=True)), as well as FoLiA input (set FrogOptions(xmlin=True)). The FoLiA format exposes more details about the linguistic annotation in a more structured and more formal way.

Whenever FoLiA output is requested, the process() method will return an instance of folia.Document, which is provided by the FoLiApy library. This loads the entire FoLiA document in memory and allows you to inspect it in any way you see fit. Extensive documentation for this library can be found here: http://folia.readthedocs.io/

An example can be found below:

from frog import Frog, FrogOptions

frog = Frog(FrogOptions(parser=True,xmlout=True))
output = frog.process("Dit is een FoLiA test.")
#output is now no longer a string but an instance of folia.Document, provided by the FoLiA library in PyNLPl (pynlpl.formats.folia)
print("FOLIA OUTPUT AS RAW XML=")
print(output.xmlstring())

print("Inspecting FoLiA output (just a small example):")
for word in output.words():
    print(word.text() + " " + word.pos() + " " + word.lemma())
Comments
  • Wheel failing to build in Docker container: `#include

    Wheel failing to build in Docker container: `#include "frog/Frog.h"`

    I'm trying to set up Frog 'from scratch' in a more pared-down Docker image (LaMachine is great, but at 3.76GB it's by far our largest image). While Frog seems to build just fine itself (I've included my Dockerfile below), pip install python-frog is still complaining that it can't find frog/Frog.h.

    Here's the complete error log:

    collecting python-frog
      Using cached python-frog-0.3.3.tar.gz
    Building wheels for collected packages: python-frog
      Running setup.py bdist_wheel for python-frog ... error
      Complete output from command /usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-hqzo6jl_/python-frog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmp61scgxewpip-wheel- --python-tag cp36:
      /usr/local/lib/python3.6/distutils/extension.py:131: UserWarning: Unknown Extension options: 'pyrex_gdb'
        warnings.warn(msg)
      running bdist_wheel
      running build
      running build_ext
      cythoning frog_wrapper.pyx to frog_wrapper.cpp
      building 'frog' extension
      creating build
      creating build/temp.linux-x86_64-3.6
      gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/root/local/include/ -I/usr/include/ -I/usr/include/libxml2 -I/usr/local/include/ -I/usr/local/include/python3.6m -c frog_wrapper.cpp -o build/temp.linux-x86_64-3.6/frog_wrapper.o --std=c++0x
      cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
      frog_wrapper.cpp:550:23: fatal error: frog/Frog.h: No such file or directory
       #include "frog/Frog.h"
                             ^
      compilation terminated.
      error: command 'gcc' failed with exit status 1
    
      ----------------------------------------
      Failed building wheel for python-frog
      Running setup.py clean for python-frog
    Failed to build python-frog
    Installing collected packages: python-frog
      Running setup.py install for python-frog ... error
        Complete output from command /usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-hqzo6jl_/python-frog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-v38ixlqg-record/install-record.txt --single-version-externally-managed --compile:
        /usr/local/lib/python3.6/distutils/extension.py:131: UserWarning: Unknown Extension options: 'pyrex_gdb'
          warnings.warn(msg)
        running install
        running build
        running build_ext
        skipping 'frog_wrapper.cpp' Cython extension (up-to-date)
        building 'frog' extension
        creating build
        creating build/temp.linux-x86_64-3.6
        gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/root/local/include/ -I/usr/include/ -I/usr/include/libxml2 -I/usr/local/include/ -I/usr/local/include/python3.6m -c frog_wrapper.cpp -o build/temp.linux-x86_64-3.6/frog_wrapper.o --std=c++0x
        cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
        frog_wrapper.cpp:550:23: fatal error: frog/Frog.h: No such file or directory
         #include "frog/Frog.h"
                               ^
        compilation terminated.
        error: command 'gcc' failed with exit status 1
    
        ----------------------------------------
    Command "/usr/local/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-hqzo6jl_/python-frog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-v38ixlqg-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-hqzo6jl_/python-frog/
    

    And here's my Dockerfile, just in case I'm still missing a dependency (pretty sure there's some redundancy with all the cython installations). The parent image is Debian-based, and gcc is present.

    FROM python:latest
    
    RUN apt-get update && \
        apt-get install -y libfolia-dev \
                           libticcutils2-dev \
                           ucto \
                           timbl \
                           timblserver \
                           mbt \
                           libicu-dev \
                           libxml2-dev \
                           frog \
                           frogdata \
                           python-dev \
                           build-essential \
                           cython3 \
                           cython 
    
    RUN pip install cython \ 
                    python-frog \
    

    Any chance you can tell me what I should be doing differently here?

    Thanks in advance!

    opened by lemontheme 11
  • python-frog stops working after LaMachine update

    python-frog stops working after LaMachine update

    After updating LaMachine, python-frog does not work anymore:

    $ python
    Python 2.7.12+ (default, Sep  1 2016, 20:27:38) 
    [GCC 6.2.0 20160927] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import frog
    INFO:rdflib:RDFLib Version: 4.2.2
    >>> frog = frog.Frog(frog.FrogOptions(parser=False), "/home/rahiel/.virtualenvs/lamachine/etc/frog/frog.cfg")
    20170202:161840:305:mblem:Initiating lemmatizer...
    20170202:161840:306:mbma-:Initiating morphological analyzer...
    20170202:161840:310:tok-:Initiating tokeniser...
    20170202:161840:310:tok-:Cannot read Tokeniser settingsfile tokconfig-nl
    20170202:161840:310:tok-:Unsupported language? (Did you install the uctodata package?)
    20170202:161840:649:mwu-:initiating mwuChunker...
    20170202:161840:649:mwu-:read mwus /home/rahiel/.virtualenvs/lamachine/etc/frog//Frog.mwu.1.0
    20170202:161840:801:iob-mbt-:  Reading the lexicon from: /home/rahiel/.virtualenvs/lamachine/etc/frog/chunker.train.lex.ambi.05 (78570 words).
    20170202:161840:806:iob-mbt-:  Read frequent words list from: /home/rahiel/.virtualenvs/lamachine/etc/frog/chunker.train.top1000 (1000 words).
    20170202:161840:806:iob-mbt-:  Reading case-base for known words from: /home/rahiel/.virtualenvs/lamachine/etc/frog/chunker.train.known.dddwfWawa... 
    20170202:161841:023:ner-mbt-:  Reading the lexicon from: /home/rahiel/.virtualenvs/lamachine/etc/frog/ner.data.lex.ambi.05 (73735 words).
    20170202:161841:027:ner-mbt-:  Read frequent words list from: /home/rahiel/.virtualenvs/lamachine/etc/frog/ner.data.top1000 (1000 words).
    20170202:161841:028:ner-mbt-:  Reading case-base for known words from: /home/rahiel/.virtualenvs/lamachine/etc/frog/ner.data.known.ddwdwfWawawaa... 
    20170202:161841:033:iob-mbt-:  case-base for known words read.
    20170202:161841:033:iob-mbt-:  Reading case-base for unknown words from: /home/rahiel/.virtualenvs/lamachine/etc/frog/chunker.train.unknown.chnppddwFawsss... 
    20170202:161841:070:ner-mbt-:  case-base for known words read.
    20170202:161841:070:ner-mbt-:  Reading case-base for unknown words from: /home/rahiel/.virtualenvs/lamachine/etc/frog/ner.data.unknown.chnppddwdwFawawaasss... 
    20170202:161841:372:pos-tagger-mbt-:  Reading the lexicon from: /home/rahiel/.virtualenvs/lamachine/etc/frog/Frog.mbt.1.0.lex.ambi.05 (229170 words).
    20170202:161841:373:pos-tagger-mbt-:  Read frequent words list from: /home/rahiel/.virtualenvs/lamachine/etc/frog/Frog.mbt.1.0.top500 (500 words).
    20170202:161841:374:pos-tagger-mbt-:  Reading case-base for known words from: /home/rahiel/.virtualenvs/lamachine/etc/frog/Frog.mbt.1.0.known.dddwfWawa... 
    20170202:161841:868:iob-mbt-:  case-base for unknown word read
    20170202:161841:868:iob-mbt-:  Sentence delimiter set to '<utt>'
    20170202:161841:868:iob-mbt-:  Beam size = 1
    20170202:161841:868:iob-mbt-:  Known Tree, Algorithm = IGTREE
    20170202:161841:868:iob-mbt-:  Unknown Tree, Algorithm = IB1
    20170202:161841:868:iob-mbt-:
    20170202:161842:124:ner-mbt-:  case-base for unknown word read
    20170202:161842:124:ner-mbt-:  Sentence delimiter set to 'EL'
    20170202:161842:124:ner-mbt-:  Beam size = 1
    20170202:161842:124:ner-mbt-:  Known Tree, Algorithm = IGTREE
    20170202:161842:124:ner-mbt-:  Unknown Tree, Algorithm = TRIBL
    20170202:161842:124:ner-mbt-:
    20170202:161842:348:pos-tagger-mbt-:  case-base for known words read.
    20170202:161842:348:pos-tagger-mbt-:  Reading case-base for unknown words from: /home/rahiel/.virtualenvs/lamachine/etc/frog/Frog.mbt.1.0.unknown.chnppdddwFawasss... 
    20170202:161843:027:pos-tagger-mbt-:  case-base for unknown word read
    20170202:161843:027:pos-tagger-mbt-:  Sentence delimiter set to '<utt>'
    20170202:161843:027:pos-tagger-mbt-:  Beam size = 1
    20170202:161843:027:pos-tagger-mbt-:  Known Tree, Algorithm = IGTREE
    20170202:161843:027:pos-tagger-mbt-:  Unknown Tree, Algorithm = IB1
    20170202:161843:027:pos-tagger-mbt-:
    20170202:161843:027:Initialization failed for: [tokenizer] 
    terminate called after throwing an instance of 'std::runtime_error'
      what():  Frog init failed
    Aborted
    

    Frog itself works fine:

    $ frog
    
    frog 0.13.7 (c) CLTS, ILK 1998 - 2017
    CLST  - Centre for Language and Speech Technology,Radboud University
    ILK   - Induction of Linguistic Knowledge Research Group,Tilburg University
    based on [ucto 0.9.6, libfolia 1.6, timbl 6.4.8, ticcutils 0.14, mbt 3.2.16]
    frog-:config read from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/frog.cfg
    frog-:configuration version = 0.12
    frog-mblemfrog-mblemfrog-mblemfrog-mblem:Initiating lemmatizer...
    frog-mbma-:Initiating morphological analyzer...
    frog-tok-:Initiating tokeniser...
    frog-tok-:tokconfig-nld: version=0.2
    frog-mwu-:initiating mwuChunker...
    frog-mwu-:read mwus /home/rahiel/.virtualenvs/lamachine/share/frog/nld//Frog.mwu.1.0
    frog-parser-:initiating parser ... 
    frog-parser-:reading /home/rahiel/.virtualenvs/lamachine/share/frog/nld//Frog.mbdp.1.0.pairs.sampled.ibase
    frog-iob-mbt-:  Reading the lexicon from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/chunker.train.lex.ambi.05 (78570 words).
    frog-iob-mbt-:  Read frequent words list from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/chunker.train.top1000 (1000 words).
    frog-iob-mbt-:  Reading case-base for known words from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/chunker.train.known.dddwfWawa... 
    frog-iob-mbt-:  case-base for known words read.
    frog-iob-mbt-:  Reading case-base for unknown words from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/chunker.train.unknown.chnppddwFawsss... 
    frog-ner-mbt-:  Reading the lexicon from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/ner.data.lex.ambi.05 (73735 words).
    frog-ner-mbt-:  Read frequent words list from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/ner.data.top1000 (1000 words).
    frog-ner-mbt-:  Reading case-base for known words from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/ner.data.known.ddwdwfWawawaa... 
    frog-ner-mbt-:  case-base for known words read.
    frog-ner-mbt-:  Reading case-base for unknown words from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/ner.data.unknown.chnppddwdwFawawaasss... 
    frog-pos-tagger-mbt-:  Reading the lexicon from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/Frog.mbt.1.0.lex.ambi.05 (229170 words).
    frog-pos-tagger-mbt-:  Read frequent words list from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/Frog.mbt.1.0.top500 (500 words).
    frog-pos-tagger-mbt-:  Reading case-base for known words from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/Frog.mbt.1.0.known.dddwfWawa... 
    frog-iob-mbt-:  case-base for unknown word read
    frog-iob-mbt-:  Sentence delimiter set to '<utt>'
    frog-iob-mbt-:  Beam size = 1
    frog-iob-mbt-:  Known Tree, Algorithm = IGTREE
    frog-iob-mbt-:  Unknown Tree, Algorithm = IB1
    frog-iob-mbt-:
    frog-parser-:reading /home/rahiel/.virtualenvs/lamachine/share/frog/nld//Frog.mbdp.1.0.dir.ibase
    frog-ner-mbt-:  case-base for unknown word read
    frog-ner-mbt-:  Sentence delimiter set to 'EL'
    frog-ner-mbt-:  Beam size = 1
    frog-ner-mbt-:  Known Tree, Algorithm = IGTREE
    frog-ner-mbt-:  Unknown Tree, Algorithm = TRIBL
    frog-ner-mbt-:
    frog-pos-tagger-mbt-:  case-base for known words read.
    frog-pos-tagger-mbt-:  Reading case-base for unknown words from: /home/rahiel/.virtualenvs/lamachine/share/frog/nld/Frog.mbt.1.0.unknown.chnppdddwFawasss... 
    frog-parser-:reading /home/rahiel/.virtualenvs/lamachine/share/frog/nld//Frog.mbdp.1.0.rels.ibase
    frog-pos-tagger-mbt-:  case-base for unknown word read
    frog-pos-tagger-mbt-:  Sentence delimiter set to '<utt>'
    frog-pos-tagger-mbt-:  Beam size = 1
    frog-pos-tagger-mbt-:  Known Tree, Algorithm = IGTREE
    frog-pos-tagger-mbt-:  Unknown Tree, Algorithm = IB1
    frog-pos-tagger-mbt-:
    frog-:init Parse took: 3 seconds, 915 milliseconds and 844 microseconds
    frog-:Initialization done.
    frog> 
    

    Maybe some helpful information, after installation I saw in the tests:

    ---------------------------------------------------------
    [python] ucto:  FAILED! 
    ---------------------------------------------------------
    Details for failed test [python] ucto:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    ImportError: /home/rahiel/.virtualenvs/lamachine/local/lib/python2.7/site-packages/ucto.so: undefined symbol: _ZN9Tokenizer14TokenizerClass14flushSentencesEi
    ---------------------------------------------------------
    

    And the LaMachine versions of packages:

    --------------------------------------------------------
    Outputting version information of all installed packages
    --------------------------------------------------------
    201702021240
    ticcutils=v0.14
    libfolia=v1.6
    uctodata=v0.4
    ucto=v0.9.6
    foliautils=v0.5
    timbl=v6.4.8
    timblserver=v1.11
    mbt=v3.2.16
    mbtserver=v0.11
    wopr=e55e0f38dc427cfd5fa69125ef2b57dc36dc7537
    frogdata=v0.13
    frog=v0.13.7
    ticcltools=e9b16130d31733f8034a8a32523d3eb17d6a0582
    toad=v0.3
    cython=v0.25.2
    numpy=v1.12.0
    ipython=v5.2.1
    scipy=v0.18.1
    matplotlib=v2.0.0
    lxml=v3.7.2
    scikit-learn=v0.18.1
    django=v1.10.5
    pycrypto=v2.6.1
    pandas=v0.19.2
    textblob=v0.11.1
    nltk=v3.2.2
    psutil=v5.1.0
    flask=v0.12
    requests=v2.13.0
    requests_toolbelt=v0.7.0
    requests_oauthlib=v0.7.0
    pynlpl=v1.1.3
    FoLiA-tools=v1.4.0.53
    foliadocserve=v0.5.1
    clam=v2.1.8
    FoLiA-Linguistic-Annotation-Tool=v0.7.0
    clamservices=v1.0
    LuigiNLP=v0.3
    python-uctov0.4.0
    python-timbl=v2016.06.02
    python-frog=v0.3.2
    colibri-core=v2.4.4
    gecco=v0.2.1
    

    (Thanks for maintaining LaMachine, I couldn't compile python-frog myself.)

    waiting 
    opened by rahiel 7
  • Update for latest development version of Frog

    Update for latest development version of Frog

    There have been some changes in the API interface so the current git master tree of python-frog doesn't compile against Frog anymore, the binding needs to adapt (arose from LanguageMachines/ucto#82)

    bug 
    opened by proycon 5
  • Installation fails on Mac

    Installation fails on Mac

    [...]
    In file included from frog_wrapper.cpp:552:
    In file included from /usr/local/include/frog/FrogAPI.h:36:
    In file included from /usr/local/include/timbl/TimblAPI.h:35:
    In file included from /usr/local/include/timbl/Common.h:32:
    /Library/Developer/CommandLineTools/usr/include/c++/v1/cmath:313:9: error: no member named 'signbit' in the global namespace
    using ::signbit;
          ~~^
    /Library/Developer/CommandLineTools/usr/include/c++/v1/cmath:314:9: error: no member named 'fpclassify' in the global namespace
    using ::fpclassify;
          ~~^
    /Library/Developer/CommandLineTools/usr/include/c++/v1/cmath:315:9: error: no member named 'isfinite' in the global namespace; did
          you mean 'finite'?
    [...]
    

    See attached file for complete output.

    install-error.txt

    opened by gmjonker 5
  • Weird memory error with python-frog test as done by LaMachine

    Weird memory error with python-frog test as done by LaMachine

    This occurs only on ponyland (Python 3.5.4) so far, all the others are fine:

    $ python3 -c 'import frog'
    INFO:rdflib:RDFLib Version: 4.2.2
    *** Error in `python3': double free or corruption (fasttop): 0x0000000001b05d00 ***
    ======= Backtrace: =========
    /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x2b73773547e5]
    /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x2b737735d37a]
    /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x2b737736153c]
    /lib/x86_64-linux-gnu/libc.so.6(+0x39ff8)[0x2b7377316ff8]
    /lib/x86_64-linux-gnu/libc.so.6(+0x3a045)[0x2b7377317045]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf7)[0x2b73772fd837]
    python3(_start+0x29)[0x5d6049]
    

    with interactive mode, it doesn't happen!

    Correction: It does happen, but upon exiting the interpreter

    bug 
    opened by proycon 4
  • Installation in Docker (Debian-Buster) fails on Cython Language Setting

    Installation in Docker (Debian-Buster) fails on Cython Language Setting

    In our development environment we have been having some success with following the build steps as outlined in #6 . Since the libfrog-dev library is available for the upcoming release of debian-buster, we were hoping to speed up our Docker builds by including it.

    However, after updating the dependencies to the relevant Buster releases, we get the following stacktrace:

    Failed to build python-frog
    Installing collected packages: python-frog
      Running setup.py install for python-frog: started
        Running setup.py install for python-frog: finished with status 'error'
        ERROR: Complete output from command /usr/bin/python3 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-defg937t/python-frog/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-tqy26wez/install-record.txt --single-version-externally-managed --compile:
        ERROR: /usr/lib/python3.7/distutils/extension.py:131: UserWarning: Unknown Extension options: 'pyrex_gdb'
          warnings.warn(msg)
        running install
        running build
        running build_ext
        cythoning frog_wrapper.pyx to frog_wrapper.cpp
        /usr/lib/python3/dist-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-defg937t/python-frog/frog_wrapper.pyx
          tree = Parsing.p_module(s, pxd, full_module_name)
        building 'frog' extension
        creating build
        creating build/temp.linux-x86_64-3.7
        x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/ -I/usr/include/libxml2 -I/usr/local/include/ -I/usr/include/python3.7m -c frog_wrapper.cpp -o build/temp.linux-x86_64-3.7/frog_wrapper.o --std=c++0x -D U_USING_ICU_NAMESPACE=1
        frog_wrapper.cpp: In function ‘int __pyx_pf_4frog_4Frog___init__(__pyx_obj_4frog_Frog*, __pyx_obj_4frog_FrogOptions*, PyObject*)’:
        frog_wrapper.cpp:3661:147: error: no matching function for call to ‘FrogAPI::FrogAPI(FrogOptions&, TiCC::Configuration&, TiCC::LogStream*, TiCC::LogStream*)’
           __pyx_v_self->capi = new FrogAPI(__pyx_v_options->capi, __pyx_v_self->configuration, (&__pyx_v_self->logstream), (&__pyx_v_self->debuglogstream));
                                                                                                                                                           ^
        In file included from frog_wrapper.cpp:640:
        /usr/include/frog/FrogAPI.h:98:3: note: candidate: ‘FrogAPI::FrogAPI(FrogOptions&, const TiCC::Configuration&, TiCC::LogStream*)’
           FrogAPI( FrogOptions&,
           ^~~~~~~
        /usr/include/frog/FrogAPI.h:98:3: note:   candidate expects 3 arguments, 4 provided
        /usr/include/frog/FrogAPI.h:96:7: note: candidate: ‘constexpr FrogAPI::FrogAPI(const FrogAPI&)’
         class FrogAPI {
               ^~~~~~~
        /usr/include/frog/FrogAPI.h:96:7: note:   candidate expects 1 argument, 4 provided
        error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
        ----------------------------------------
    ERROR: Command "/usr/bin/python3 -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-defg937t/python-frog/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-tqy26wez/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-defg937t/python-frog/
    ERROR: Service 'app' failed to build: The command '/bin/sh -c pip3 install python-frog' returned a non-zero code: 1
    

    It appears to be related to Cython opting for a python2 language setting - but I'm explicitly telling the entire stack to use Python3. Is this a setting inside of the python-frog library somewhere? Is anyone else running into this issue?

    opened by tsaltena 4
  • Outdated icu version

    Outdated icu version

    (Moved from https://github.com/LanguageMachines/frog/issues/69)

    I got the following error when trying from frog import Frog, FrogOptions after updating icu to 64.1: ImportError: libicuio.so.63: cannot open shared object file: No such file or directory.

    Installing icu63 from the AUR (and updating it's icu source to resolve the missing ssl certificate). Fixed the problem.

    Being able to run with the latest icu would be nice.

    question 
    opened by cdfa 4
  • code from example.py in Jupyter gives XMLSyntaxError

    code from example.py in Jupyter gives XMLSyntaxError

    Hi, I just pulled and ran the docker image with the webserver, and found Jupyter running on http://localhost:8080/lab I pasted the code from the example.py in the cell and was presented with the following exception:

    from __future__ import print_function, unicode_literals
    ​
    from frog import Frog, FrogOptions
    ​
    fx = Frog(FrogOptions(parser=True,xmlout=True))
    output = fx.process("Dit is een FoLiA test.")
    
    Traceback (most recent call last):
    
      File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 3296, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)
    
      File "<ipython-input-1-21794a31267e>", line 6, in <module>
        output = fx.process("Dit is een FoLiA test.")
    
      File "frog_wrapper.pyx", line 190, in frog.Frog.process
    
      File "/usr/local/lib/python3.5/dist-packages/folia/main.py", line 7165, in __init__
        self.tree = xmltreefromstring(kwargs['string'])
    
      File "/usr/local/lib/python3.5/dist-packages/folia/main.py", line 522, in xmltreefromstring
        return ElementTree.parse(BytesIO(s), ElementTree.XMLParser(collect_ids=False))
    
      File "src/lxml/etree.pyx", line 3435, in lxml.etree.parse
    
      File "src/lxml/parser.pxi", line 1857, in lxml.etree._parseDocument
    
      File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
    
      File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
    
      File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
    
      File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
    
      File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
    
      File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
    
      File "<string>", line 1
    XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
    

    Hellup? :)

    opened by JJWTimmer 2
  • Error when trying to install python-frog

    Error when trying to install python-frog

    Hi, I'm getting the following error when I try to install python-frog: C:\Users\dst\Source\Repos\_WorkflowPatternFinder\WorkflowPatternFinder>pip install python-frog Collecting python-frog Using cached https://files.pythonhosted.org/packages/3f/2f/cb11ee8f282c3f85d30ba64ecb1c4ce501e1cac8fd6bed671b7382847c9c/python-frog-0.3.7.tar.gz Requirement already satisfied: Cython in c:\python34\lib\site-packages (from python-frog) (0.28.2) botocore 1.9.11 has requirement python-dateutil<2.7.0,>=2.1, but you'll have python-dateutil 2.7.0 which is incompatible. Installing collected packages: python-frog Running setup.py install for python-frog ... error Complete output from command c:\python34\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\dst\\AppData\\Local\\Temp\\pip-install-ha1roqvg\\python-frog\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\dst\AppData\Local\Temp\pip-record-vzin9j8h\install-record.txt --single-version-externally-managed --compile: c:\python34\lib\distutils\extension.py:132: UserWarning: Unknown Extension options: 'pyrex_gdb' warnings.warn(msg) running install running build running build_ext cythoning frog_wrapper.pyx to frog_wrapper.cpp building 'frog' extension creating build creating build\temp.win-amd64-3.4 creating build\temp.win-amd64-3.4\Release c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I/usr/include/ -I/usr/include/libxml2 -I/usr/local/include/ -Ic:\python34\include -Ic:\python34\include /Tpfrog_wrapper.cpp /Fobuild\temp.win-amd64-3.4\Release\frog_wrapper.obj --std=c++0x "-D U_USING_ICU_NAMESPACE=1" cl : Command line warning D9002 : ignoring unknown option '--std=c++0x' frog_wrapper.cpp c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\Include\xlocale(323) : warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc frog_wrapper.cpp(587) : fatal error C1083: Cannot open include file: 'libfolia/folia.h': No such file or directory error: command '"c:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\Bin\x86_amd64\cl.exe"' failed with exit status 2

    Does anyone know what I can do to fix this?

    opened by DStekel3 2
  • running python-frog on LaMachine

    running python-frog on LaMachine

    Hi there, I was wondering if you could help me with an error I got trying to run frog on LaMachine. I am already able to run it manually in the VM command line like this: frog -file. However, since I want to run it on a large dataset, I need to frog one document at a time in a loop. I am trying to run my own frog.py on the VM that contains the example code, but I get the following error: error Caused by example code: code Looks like the VM is not able to find the module at all, but I have no idea how it could. Do you have any idea?

    help wanted 
    opened by Poezedoez 2
  • libfolia/foliautils.h: No such file or directory

    libfolia/foliautils.h: No such file or directory

    I'm trying to install python-frog on Debian Jessie manually, getting this error. I'm not sure if it's a dependency problem on my side or a mistake in the code somewhere.

    Output:

    (thesis)[email protected]:~/projects/thesis/python-frog$ python setup.py install /usr/lib/python3.4/distutils/extension.py:132: UserWarning: Unknown Extension options: 'pyrex_gdb' warnings.warn(msg) running install running build running build_ext skipping 'frog_wrapper.cpp' Cython extension (up-to-date) building 'frog' extension x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/home/koen/venvs/thesis/include -I/home/koen/local/include/ -I/usr/include/ -I/usr/include/libxml2 -I/usr/local/include/ -I/usr/include/python3.4m -I/home/koen/venvs/thesis/include/python3.4m -c frog_wrapper.cpp -o build/temp.linux-x86_64-3.4/frog_wrapper.o --std=c++0x cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ frog_wrapper.cpp:257:33: fatal error: libfolia/foliautils.h: No such file or directory #include "libfolia/foliautils.h" ^ compilation terminated. error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

    opened by KDercksen 2
  • ImportError: libfrog.so.2: cannot open shared object file: No such file or directory

    ImportError: libfrog.so.2: cannot open shared object file: No such file or directory

    This is the error i get on my ubuntu 19.04 server. Installed with: pip install python-frog

    from frog import Frog, FrogOptions ImportError: libfrog.so.2: cannot open shared object file: No such file or directory

    Any clue what is causing this?

    By the way installed already libfrog-dev and does not solve the problem. But looks like it has installed libfrog.so instead of libfrog.so.2

    question 
    opened by gevezex 4
  • Ability to control logging/output

    Ability to control logging/output

    When processing large corpora, frog produces a lot of output like this:

    20180711:191754:590:Wed Jul 11 19:17:54 2018 process 2 sentences
    20180711:191754:592:Wed Jul 11 19:17:54 2018 done with sentence[1]
    20180711:191754:607:Wed Jul 11 19:17:54 2018 done with sentence[2]
    

    This drowns out other logging, like progress information.

    It would be nice if the frog output/log level could be controlled.

    enhancement 
    opened by gmjonker 1
  • frog doesnt work in threading

    frog doesnt work in threading

    I tried running frog in combination with multithreading. It did not work. (we talked about this in the ponyland irc) A cleaned example of my script is attached. I hope the example is clear. I later discovered there are a few more bugs in the example i did not spot earlier because I did not pass the frog part. I will upload a new example when all is fixed. frogthreadxmpl.py.zip

    bug 
    opened by hdvos 0
Releases(v0.6.1)
Owner
Maarten van Gompel
Research software engineer - NLP - AI - 🐧 Linux & open-source enthusiast - 🐍 Python/ 🌊C/C++ / 🦀 Rust / 🐚 Shell - 🔐 Privacy, Security & Decentralisation
Maarten van Gompel
Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

Structural Guidance for Transformer Language Models This repository accompanies the paper, Structural Guidance for Transformer Language Models, publis

International Business Machines 10 Dec 14, 2022
Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER. @inproceedings{tedes

Babelscape 40 Dec 11, 2022
Bnagla hand written document digiiztion

Bnagla hand written document digiiztion This repo addresses the problem of digiizing hand written documents in Bangla. Documents have definite fields

Mushfiqur Rahman 1 Dec 10, 2021
Indonesia spellchecker with python

indonesia-spellchecker Ganti kata yang terdapat pada file teks.txt untuk diperiksa kebenaran kata. Run on local machine python3 main.py

Rahmat Agung Julians 1 Sep 14, 2022
A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Persian NLP Benchmark The repository aims to track existing natural language processing models and evaluate their performance on well-known datasets.

Mofid AI 68 Dec 19, 2022
Unsupervised Abstract Reasoning for Raven’s Problem Matrices

Unsupervised Abstract Reasoning for Raven’s Problem Matrices This code is the implementation of our TIP paper. This is the first unsupervised abstract

Tao Zhuo 9 Dec 17, 2022
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

Phil Wang 32 Oct 14, 2022
A workshop with several modules to help learn Feast, an open-source feature store

Workshop: Learning Feast This workshop aims to teach users about Feast, an open-source feature store. We explain concepts & best practices by example,

Feast 52 Jan 05, 2023
Beyond the Imitation Game collaborative benchmark for enormous language models

BIG-bench 🪑 The Beyond the Imitation Game Benchmark (BIG-bench) will be a collaborative benchmark intended to probe large language models, and extrap

Google 1.3k Jan 01, 2023
This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

Sachit Yadav 1 Feb 11, 2022
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play

Ilaria Manco 91 Dec 23, 2022
Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

OkaeriChatBot Just another Telegram AI chat bot written in Python using Pyrogram. Requirements Python 3.7 or higher.

Wahyusaputra 2 Dec 23, 2021
NLP Text Classification

多标签文本分类任务 近年来随着深度学习的发展,模型参数的数量飞速增长。为了训练这些参数,需要更大的数据集来避免过拟合。然而,对于大部分NLP任务来说,构建大规模的标注数据集非常困难(成本过高),特别是对于句法和语义相关的任务。相比之下,大规模的未标注语料库的构建则相对容易。为了利用这些数据,我们可以

Jason 1 Nov 11, 2021
📝An easy-to-use package to restore punctuation of the text.

✏️ rpunct - Restore Punctuation This repo contains code for Punctuation restoration. This package is intended for direct use as a punctuation restorat

Daulet Nurmanbetov 72 Dec 30, 2022
Transformers and related deep network architectures are summarized and implemented here.

Transformers: from NLP to CV This is a practical introduction to Transformers from Natural Language Processing (NLP) to Computer Vision (CV) Introduct

Ibrahim Sobh 138 Dec 27, 2022
Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu

Happy N. Monday 2 Mar 17, 2022
Phrase-Based & Neural Unsupervised Machine Translation

Unsupervised Machine Translation This repository contains the original implementation of the unsupervised PBSMT and NMT models presented in Phrase-Bas

Facebook Research 1.5k Dec 28, 2022
This is a project of data parallel that running on NLP tasks.

This is a project of data parallel that running on NLP tasks.

2 Dec 12, 2021
Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products.

Leah Pathan Khan 2 Jan 12, 2022
Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Breame ( British English and American English) Breame is a lightweight Python package with a number of utility tools to aid in the detection of words

Charles 8 Oct 10, 2022