python-timbl, originally developed by Sander Canisius, is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.

Overview
http://applejack.science.ru.nl/lamabadge.php/python-timbl Project Status: Active – The project has reached a stable, usable state and is being actively developed.

README: python-timbl

Authors: Sander Canisius, Maarten van Gompel
Contact: [email protected]
Web site: https://github.com/proycon/python-timbl/

python-timbl is a Python extension module wrapping the full TiMBL C++ programming interface. With this module, all functionality exposed through the C++ interface is also available to Python scripts. Being able to access the API from Python greatly facilitates prototyping TiMBL-based applications.

This is the 2013 release by Maarten van Gompel, building on the 2006 release by Sander Canisius. For those used to the old library, there is one backwards-incompatible change, adapt your scripts to use import timblapi instead of import timbl, as the latter is now a higher-level interface.

Since 2020, this only supports Python 3, Python 2 support has been deprecated.

License

python-timbl is free software, distributed under the terms of the GNU General Public License. Please cite TiMBL in publication of research that uses TiMBL.

Installation

python-timbl is distributed as part of LaMachine (https://proycon.github.io/LaMachine), which significantly simplifies compilation and installation. The remainder of the instructions in this section refer to manual compilation and installation.

python-timbl depends on two external packages, which must have been built and/or installed on your system in order to successfully build python-timbl. The first is TiMBL itself; download its tarball from TiMBL's homepage and follow the installation instructions, recent Ubuntu/Debian users will find timbl in their distribution's package repository. In the remainder of this section, it is assumed that $TIMBL_HEADERS points to the directory that contains timbl/TimblAPI.h, and $TIMBL_LIBS the directory that has contains the Timbl libraries. Note that Timbl itself depends on additional dependencies.

The second prerequisite is Boost.Python, a library that facilitates writing Python extension modules in C++. Many Linux distributions come with prebuilt packages of Boost.Python. If so, install this package; on Ubuntu/Debian this can be done as follows:

$ sudo apt-get install libboost-python libboost-python-dev

If not, refer to the Boost installation instructions to build and install Boost.Python manually. In the remainder of this section, let $BOOST_HEADERS refer to the directory that contains the Boost header files, and $BOOST_LIBS to the directory that contains the Boost library files. If you installed Boost.Python with your distribution's package manager, these directories are probably /usr/include and /usr/lib respectively.

If both prerequisites have been installed on your system, python-timbl can be obtained through github:

$ git clone git://github.com/proycon/python-timbl.git
$ cd python-timbl

and can then be built and installed with the following command:

$ sudo python3 setup.py \
       build_ext --boost-include-dir=$BOOST_HEADERS \
                 --boost-library-dir=$BOOST_LIBS \
                 --timbl-include-dir=$TIMBL_HEADERS  \
                 --timbl-library-dir=$TIMBL_LIBS \
       install --prefix=/dir/to/install/in

This is the verbose variant, if default locations are used then the following may suffice already:

$ sudo python setup3.py install

The --prefix option to the install command denotes the directory in which the module is to be installed. If you have the appropriate system permissions, you can leave out this option. The module will then be installed in the Python system tree. Otherwise, make sure that the installation directory is in the module search path of your Python system.

Usage

python-timbl offers two interface to the timbl API. A low-level interface contained in the module timblapi, which is very much like the C++ library, and a high-level object oriented interface in the timbl module, which offers a TimblClassifier class.

timbl.TimblClassifier: High-level interface

The high-level interface features as TimblClassifier class which can be used for training and testing classifiers. An example is provided in example.py, parts of it will be discussed here.

After importing the necessary module, the classifier is instantiated by passing it an identifier which will be used as prefix used for all filenames written, and a string containing options just as you would pass them to Timbl:

import timbl
classifier = timbl.TimblClassifier("wsd-bank", "-a 0 -k 1" )

Normalization of theclass distribution is enabled by default (regardless of the -G option to Timbl), pass normalize=False to disable it.

Training instances can be added using the append(featurevector, classlabel) method:

classifier.append( (1,0,0), 'financial')
classifier.append( (0,1,0), 'furniture')
classifier.append( (0,0,1), 'geographic')

Subsequently, you invoke the actual training, note that at each step Timbl may output considerable details about what it is doing to standard error output:

classifier.train()

The results of this training is an instance base file, which you can save to file so you can load it again later:

classifier.save()

classifier = timbl.TimblClassifier("wsd-bank", "-a 0 -k 1" )
classifier.load()

The main advantage of the Python library is the fact that you can classify instances on the fly as follows, just pass a feature vector and optionally also a class label to classify(featurevector, classlabel):

classlabel, distribution, distance = classifier.classify( (1,0,0) )

You can also create a test file and test it all at once:

classifier = timbl.TimblClassifier("wsd-bank", "-a 0 -k 1" )
classifier.load()
classifier.addinstance("testfile", (1,0,0),'financial' ) #addinstance can be used to add instances to external files (use append() for training)
classifier.addinstance("testfile", (0,1,0),'furniture' )
classifier.addinstance("testfile", (0,0,1),'geograpic' )
classifier.addinstance("testfile", (1,1,0),'geograpic' ) #this one will be wrongly classified as financial & furniture
classifier.test("testfile")

print "Accuracy: ", classifier.getAccuracy()

Real multithreading support

If you are writing a multithreaded Python application (i.e. using the threading module) and want to benefit from actual concurrency, side-stepping Python's Global Interpreter Lock, add the parameter threading=True when invoking the TimblClassifier constructor. Take care to instantiate TimblClassifier before threading. You can then call TimblClassifier.classify() from within your threads. Concurrency only exists for this classify method.

If you do not set this option, everything will still work fine, but you won't benefit from actual concurrency due to Python's the Global Interpret Lock.

timblapi: Low-level interface

For documentation on the low level timblapi interface you can consult the TiMBL API guide. Although this document actually describes the C++ interface to TiMBL, the latter is similar enough to its Python binding for this document to be a useful reference for python-timbl as well. For most part, the Python TiMBL interface follows the C++ version closely. The differences are listed below.

Naming style

In the C++ interface, method names are in UpperCamelCase; for example, Classify, SetOptions, etc. In contrast, the Python interface uses lowerCamelCase: classify, setOptions, etc. Method overloading TiMBL's Classify methods use the C++ method overloading feature to provide three different kinds of outputs. Method overloading is non-existant in Python though; therefore, python-timbl has three differently named methods to mirror the functionality of the overloaded Classify method. The mapping is as follows:

    # bool TimblAPI::Classify(const std::string& Line,
    #                         std::string& result);
    #
    def TimblAPI.classify(line) -> bool, result

    #
    # bool TimblAPI::Classify(const std::string& Line,
    #                         std::string& result,
    #                         double& distance);
    #
    def TimblAPI.classify2(line) -> bool, string, distance

    #
    # bool TimblAPI::Classify(const std::string& Line,
    #                         std::string& result,
    #                         std::string& Distrib,
    #                         double& distance);
    #
    def TimblAPI.classify3(line, bool normalize=true,int requireddepth=0) -> bool, string, dictionary, distance

#Thread-safe version of the above, releases and reacquires Python's Global Interprer Lock
    def TimblAPI.classify3safe(line, normalize, requireddepth=0) -> bool, string, dictionary, distance

Note that the classify3 function returned a string representation of the distribution in versions of python-timbl prior to 2015.08.12, now it returns an actual dictionary. When using classify3safe (the thread-safe version) , ensure you first call initthreads after instantiating timblapi, and manually call the initthreading() method.

Python-only methods

Three TiMBL API methods print information to a standard C++ output stream object (ShowBestNeighbors, ShowOptions, ShowSettings, ShowSettings). In the Python interface, these methods will only work with Python (stream) objects that have a fileno method returning a valid file descriptor. Alternatively, three new methods are provided (bestNeighbo(u)rs, options, settings); these methods return the same information as a Python string object.

scikit-learn wrapper

A wrapper for use in scikit-learn has been added. It was designed for use in scikit-learn Pipeline objects. The wrapper is not finished and has to date only been tested on sparse data. Note that TiMBL does not work well with large amounts of features. It is suggested to reduce the amount of features to a number below 100 to keep system performance reasonable. Use on servers with large amounts of memory and processing cores advised.

You might also like...
Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.
Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Pytorch Implementation of Improv RNN Overview This code is a pytorch implementation of the popular Improv RNN model originally implemented by the Mage

Repository for publicly available deep learning models developed in Rosetta community

trRosetta2 This package contains deep learning models and related scripts used by Baker group in CASP14. Installation Linux/Mac clone the package git

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Create UIs for prototyping your machine learning model in 3 minutes
Create UIs for prototyping your machine learning model in 3 minutes

Note: We just launched Hosted, where anyone can upload their interface for permanent hosting. Check it out! Welcome to Gradio Quickly create customiza

Collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning.
Collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning.

Collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning Installation

Myia prototyping

Myia Myia is a new differentiable programming language. It aims to support large scale high performance computations (e.g. linear algebra) and their g

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear

Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.
Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

PyOpenVINO - An Experimental Python Implementation of OpenVINO Inference Engine (minimum-set) Description The PyOpenVINO is a spin-off product from my

Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs
Full body anonymization - Realistic Full-Body Anonymization with Surface-Guided GANs

Realistic Full-Body Anonymization with Surface-Guided GANs This is the official

Comments
  • classify() method does not return correct distribution

    classify() method does not return correct distribution

    I'm using the LaMachine virtual environment on Ponyland

    classifier = timbl.TimblClassifier("pl_type.master", "-mO:I1 -k 5 -G 0")

    Should return probability distribution that adds up to 1, but ...

    classifier.classify(("administrateur", "n", "i", "=", "str", "a", "=", "t", "|", "r", "-", "-", "+", "r"))

    returns:

    {'EN': 1, 'S': 1}

    But the same classifier returns the correct distribution if the test() method is used instead:

    { EN 0.0526316, S 0.947368 }

    bug question ready 
    opened by timjzee 10
  • Compatibility with latest timbl broken!

    Compatibility with latest timbl broken!

    
    gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include -I/home/travis/virtualenv/python3.4.6/include -I/usr/include/libxml2 -I/opt/python/3.4.6/include/python3.4m -c src/timblapi.cc -o build/temp.linux-x86_64-3.4/src/timblapi.o
    
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
    
    In file included from /usr/include/c++/4.8/unordered_map:35:0,
    
                     from /home/travis/virtualenv/python3.4.6/include/timbl/Instance.h:35,
    
                     from /home/travis/virtualenv/python3.4.6/include/timbl/TimblAPI.h:38,
    
                     from src/timblapi.h:51,
    
                     from src/timblapi.cc:47:
    
    /usr/include/c++/4.8/bits/c++0x_warning.h:32:2: error: #error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
    
     #error This file requires compiler and library support for the \
    
      ^
    
    In file included from /home/travis/virtualenv/python3.4.6/include/timbl/TimblAPI.h:38:0,
    
                     from src/timblapi.h:51,
    
                     from src/timblapi.cc:47:
    
    /home/travis/virtualenv/python3.4.6/include/timbl/Instance.h:211:11: error: ‘unordered_map’ in namespace ‘std’ does not name a type
    
       typedef std::unordered_map< size_t, ValueClass *> IVCmaptype;
    
               ^
    
    /home/travis/virtualenv/python3.4.6/include/timbl/Instance.h:221:5: error: ‘IVCmaptype’ does not name a type
    
         IVCmaptype ValuesMap;
    
         ^
    
    error: command 'gcc' failed with exit status 1
    
    bug PRIORITY 
    opened by proycon 1
Releases(v2020.06.08)
Owner
Maarten van Gompel
Research software engineer - NLP - AI - 🐧 Linux & open-source enthusiast - 🐍 Python/ 🌊C/C++ / 🦀 Rust / 🐚 Shell - 🔐 Privacy, Security & Decentralisation
Maarten van Gompel
Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Note: the current releases of this toolbox are a beta release, to test working with Haskell's, Python's, and R's code repositories. Metrics provides i

Ben Hamner 1.6k Dec 26, 2022
PyTorch implementation of the ideas presented in the paper Interaction Grounded Learning (IGL)

Interaction Grounded Learning This repository contains a simple PyTorch implementation of the ideas presented in the paper Interaction Grounded Learni

Arthur Juliani 4 Aug 31, 2022
EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Gender Bangs Body Side Pose (Yaw) Lighting Smile Face Shape Lipstick Color Painting Style Pose (Yaw) Pose (Pitch) Zoom & Rotate Flush & Eye Color Mout

Zhenliang He 321 Dec 01, 2022
(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing This repository contains pytorch source code for AAAI2020 oral paper: Grapy-ML

54 Aug 04, 2022
Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond

CRF - Conditional Random Fields A library for dense conditional random fields (CRFs). This is the official accompanying code for the paper Regularized

Đ.Khuê Lê-Huu 21 Nov 26, 2022
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.

简体中文 | English News [2021-10-12] PaddleNLP 2.1版本已发布!新增开箱即用的NLP任务能力、Prompt Tuning应用示例与生成任务的高性能推理! 🎉 更多详细升级信息请查看Release Note。 [2021-08-22]《千言:面向事实一致性的生

6.9k Jan 01, 2023
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 03, 2023
[Official] Exploring Temporal Coherence for More General Video Face Forgery Detection(ICCV 2021)

Exploring Temporal Coherence for More General Video Face Forgery Detection(FTCN) Yinglin Zheng, Jianmin Bao, Dong Chen, Ming Zeng, Fang Wen Accepted b

57 Dec 28, 2022
Testability-Aware Low Power Controller Design with Evolutionary Learning, ITC2021

Testability-Aware Low Power Controller Design with Evolutionary Learning This repo contains the source code of Testability-Aware Low Power Controller

Lee Man 1 Dec 26, 2021
Fast, modular reference implementation and easy training of Semantic Segmentation algorithms in PyTorch.

TorchSeg This project aims at providing a fast, modular reference implementation for semantic segmentation models using PyTorch. Highlights Modular De

ycszen 1.4k Jan 02, 2023
This repository contains the scripts for downloading and validating scripts for the documents

HC4: HLTCOE CLIR Common-Crawl Collection This repository contains the scripts for downloading and validating scripts for the documents. Document ids,

JHU Human Language Technology Center of Excellence 6 Jun 07, 2022
Semantic Segmentation in Pytorch

PyTorch Semantic Segmentation Introduction This repository is a PyTorch implementation for semantic segmentation / scene parsing. The code is easy to

Hengshuang Zhao 1.2k Jan 01, 2023
Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose Paper | Website | Data A-NeRF: Articulated Neural Radiance F

Shih-Yang Su 172 Dec 22, 2022
A Python library that provides a simplified alternative to DBAPI 2

A Python library that provides a simplified alternative to DBAPI 2. It provides a facade in front of DBAPI 2 drivers.

Tony Locke 44 Nov 17, 2021
PyTorch implementation of our CVPR2021 (oral) paper "Prototype Augmentation and Self-Supervision for Incremental Learning"

PASS - Official PyTorch Implementation [CVPR2021 Oral] Prototype Augmentation and Self-Supervision for Incremental Learning Fei Zhu, Xu-Yao Zhang, Chu

67 Dec 27, 2022
This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

OpenVINO Inference API This is a repository for an object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operati

BMW TechOffice MUNICH 68 Nov 24, 2022
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification

Sparse Structure Learning via Graph Neural Networks for inductive document classification Make graph dataset create co-occurrence graph for datasets.

16 Dec 22, 2022
an Evolutionary Algorithm assisted GAN

EvoGAN an Evolutionary Algorithm assisted GAN ckpts

3 Oct 09, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning. Please check https://ncvx.org for detailed instruction

SUN Group @ UMN 28 Aug 03, 2022
High performance distributed framework for training deep learning recommendation models based on PyTorch.

High performance distributed framework for training deep learning recommendation models based on PyTorch.

340 Dec 30, 2022