Machine learning models from Singapore's NLP research community

Related tags

Text Data & NLPsgnlp
Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

  • Python >= 3.8
pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments
  • Change demo api to use gevent worker

    Change demo api to use gevent worker

    • Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes
    • Workers constantly terminated due to signal 9
    • Try gevent to see if it works out
    opened by jonheng 2
  • UFD use case tutorial and usability improvement

    UFD use case tutorial and usability improvement

    • Added additional tutorial on how to use UFD to train and evaluate on custom dataset
    • Bug fix for UFD parse_args_and_load_config util function
    • Added feature to create folder if folder doesn't exist
    • Added some train args param in eval args param to improve usability
    • Made caching optional
    • Added validation to make debugging easier
    • Added links to config file examples for reccon models
    opened by vincenttzc 1
  • Wrong assert comparison for SenticGCN dataclass

    Wrong assert comparison for SenticGCN dataclass

    Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

    assert self.repeats > 1, "Repeats value must be at least 1."
    assert self.patience > 1, "Patience value must be at least 1." 
    

    The comparison operator should be >= instead.

    bug 
    opened by raymondng76 0
  • 47 centralized logging

    47 centralized logging

    • Create a centralized logger for 'sgnlp' base logger
    • 'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py
    • Replace all logging method call with their own script specific logger
    opened by raymondng76 0
  • Add parent class for preprocessor

    Add parent class for preprocessor

    • [x] Create a module named sgnlp.base
    • [x] Add abstractmethods for preprocess, save, load
    • [x] Add batch iteration to parent __call__
    • [x] Parent __call__ should return a dictionary
    enhancement 
    opened by jonheng 0
  • 46 senticgcn bugfix

    46 senticgcn bugfix

    • Add multi-word aspect support
    • Update documentation to reflect multi-word support
    • Update unit tests
    • Update usage example to include multi-word support
    opened by raymondng76 0
  • Fix multi-word aspect issue with Sentic-GCN preprocessor

    Fix multi-word aspect issue with Sentic-GCN preprocessor

    The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.

    bug 
    opened by raymondng76 0
  • Add Sentic-GCN demo_api to SGNlp

    Add Sentic-GCN demo_api to SGNlp

    Close #43

    This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

    • model_card
    • api.py
    • dockerfiles
    • requirements.txt
    • usage.py
    opened by K-WeiMing 0
  • Add Sentic-GCN to SGNlp

    Add Sentic-GCN to SGNlp

    close #41

    This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

    • Models
    • Configs
    • Tokenizers
    • Embedding models
    • Trainer/Evaluator
    • Unit test
    • documentation

    Does not include demo_api as it is covered in another issue tickets.

    opened by raymondng76 0
  • download_pretrained for demo API does not cache downloaded files/models

    download_pretrained for demo API does not cache downloaded files/models

    To allow the containers to start up quicker, models and files were downloaded and cached during build time.

    Recent changes in the huggingface transformers package has broken this functionality:

    • Released in v4.22.0
    • Issue

    Possible choices moving forward:

    • Write a simple caching utility function
    • Stick to versions of transformers before 4.22.0
    opened by jonheng 0
  • Add Stance Detection model

    Add Stance Detection model

    opened by atenzer 0
Releases(v0.4.0)
Owner
AI Singapore | AI Makerspace
Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.
AI Singapore | AI Makerspace
Fake news detector filters - Smart filter project allow to classify the quality of information and web pages

fake-news-detector-1.0 Lists, lists and more lists... Spam filter list, quality keyword list, stoplist list, top-domains urls list, news agencies webs

Memo Sim 1 Jan 04, 2022
Train BPE with fastBPE, and load to Huggingface Tokenizer.

BPEer Train BPE with fastBPE, and load to Huggingface Tokenizer. Description The BPETrainer of Huggingface consumes a lot of memory when I am training

Lizhuo 1 Dec 23, 2021
This repository contains the code for running the character-level Sandwich Transformers from our ACL 2020 paper on Improving Transformer Models by Reordering their Sublayers.

Improving Transformer Models by Reordering their Sublayers This repository contains the code for running the character-level Sandwich Transformers fro

Ofir Press 53 Sep 26, 2022
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
Python generation script for BitBirds

BitBirds generation script Intro This is published under MIT license, which means you can do whatever you want with it - entirely at your own risk. Pl

286 Dec 06, 2022
Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

Mortgage-Application-Analysis Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables: age, in

1 Jan 29, 2022
Residual2Vec: Debiasing graph embedding using random graphs

Residual2Vec: Debiasing graph embedding using random graphs This repository contains the code for S. Kojaku, J. Yoon, I. Constantino, and Y.-Y. Ahn, R

SADAMORI KOJAKU 5 Oct 12, 2022
Spokestack is a library that allows a user to easily incorporate a voice interface into any Python application with a focus on embedded systems.

Welcome to Spokestack Python! This library is intended for developing voice interfaces in Python. This can include anything from Raspberry Pi applicat

Spokestack 133 Sep 20, 2022
Translation for Trilium Notes. Trilium Notes 中文版.

Trilium Translation 中文说明 This repo provides a translation for the awesome Trilium Notes. Currently, I have translated Trilium Notes into Chinese. Test

743 Jan 08, 2023
Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA

Allen 16 Nov 12, 2022
An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | 中文 Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

Fan 137 Oct 26, 2022
Blender addon - Scrub timeline from viewport with a shortcut

Viewport scrub timeline Move in the timeline directly in viewport and snap to nearest keyframe Note : This standalone feature will be added in the nat

Samuel Bernou 40 Nov 07, 2022
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
Graph Coloring - Weighted Vertex Coloring Problem

Graph Coloring - Weighted Vertex Coloring Problem This project proposes several local searches and an MCTS algorithm for the weighted vertex coloring

Cyril 1 Jul 08, 2022
Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

ESACL: Enhanced Seq2Seq Autoencoder via Contrastive Learning for AbstractiveText Summarization This repo is for our paper "Enhanced Seq2Seq Autoencode

Rachel Zheng 14 Nov 01, 2022
Natural Language Processing with transformers

we want to create a repo to illustrate usage of transformers in chinese

Datawhale 763 Dec 27, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
This repository is home to the Optimus data transformation plugins for various data processing needs.

Transformers Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus. To i

Open Data Platform 37 Dec 14, 2022
Turn clang-tidy warnings and fixes to comments in your pull request

clang-tidy pull request comments A GitHub Action to post clang-tidy warnings and suggestions as review comments on your pull request. What platisd/cla

Dimitris Platis 30 Dec 13, 2022