Machine learning models from Singapore's NLP research community

Related tags

Text Data & NLPsgnlp
Overview

SG-NLP

Machine learning models from Singapore's natural language processing (NLP) research community.

sgnlp is a Python package that allows you to easily get started on using various (NLP) models implemented using the Pytorch and Transfromers frameworks.

We have an accompanying demo site where you can interact with our models and get a better understanding on how they work.

Installation

  • Python >= 3.8
pip install sgnlp

Documentation

Visit our documentation for tutorials.

License

Code and models from this project are released under the MIT License unless otherwise stated. If a model's code is under a separate license, it can be found in the respective model's folder.

Comments
  • Change demo api to use gevent worker

    Change demo api to use gevent worker

    • Using multiple workers of the default type 'sync' in gunicorn is not working on Kubernetes
    • Workers constantly terminated due to signal 9
    • Try gevent to see if it works out
    opened by jonheng 2
  • UFD use case tutorial and usability improvement

    UFD use case tutorial and usability improvement

    • Added additional tutorial on how to use UFD to train and evaluate on custom dataset
    • Bug fix for UFD parse_args_and_load_config util function
    • Added feature to create folder if folder doesn't exist
    • Added some train args param in eval args param to improve usability
    • Made caching optional
    • Added validation to make debugging easier
    • Added links to config file examples for reccon models
    opened by vincenttzc 1
  • Wrong assert comparison for SenticGCN dataclass

    Wrong assert comparison for SenticGCN dataclass

    Latest SenticGCN implementation for the Dev branch. In the dataclass.py, post_init method in SenticGCNTrainArgs, there are the following assertions,

    assert self.repeats > 1, "Repeats value must be at least 1."
    assert self.patience > 1, "Patience value must be at least 1." 
    

    The comparison operator should be >= instead.

    bug 
    opened by raymondng76 0
  • 47 centralized logging

    47 centralized logging

    • Create a centralized logger for 'sgnlp' base logger
    • 'sgnlp' logger is created from a config json and is init a the 'sgnlp' module init.py
    • Replace all logging method call with their own script specific logger
    opened by raymondng76 0
  • Add parent class for preprocessor

    Add parent class for preprocessor

    • [x] Create a module named sgnlp.base
    • [x] Add abstractmethods for preprocess, save, load
    • [x] Add batch iteration to parent __call__
    • [x] Parent __call__ should return a dictionary
    enhancement 
    opened by jonheng 0
  • 46 senticgcn bugfix

    46 senticgcn bugfix

    • Add multi-word aspect support
    • Update documentation to reflect multi-word support
    • Update unit tests
    • Update usage example to include multi-word support
    opened by raymondng76 0
  • Fix multi-word aspect issue with Sentic-GCN preprocessor

    Fix multi-word aspect issue with Sentic-GCN preprocessor

    The current implementation of preprocessor matches a single aspect index for the purpose of matching postprocessor output. The aspect index field for process_input payload should be expended to handle aspects with multiple indexes.

    bug 
    opened by raymondng76 0
  • Add Sentic-GCN demo_api to SGNlp

    Add Sentic-GCN demo_api to SGNlp

    Close #43

    This pull request is to add Sentic-GCN demo_api models to sgnlp. Includes the follow components:

    • model_card
    • api.py
    • dockerfiles
    • requirements.txt
    • usage.py
    opened by K-WeiMing 0
  • Add Sentic-GCN to SGNlp

    Add Sentic-GCN to SGNlp

    close #41

    This pull request is to add Sentic-GCN models to sgnlp. Includes the follow components:

    • Models
    • Configs
    • Tokenizers
    • Embedding models
    • Trainer/Evaluator
    • Unit test
    • documentation

    Does not include demo_api as it is covered in another issue tickets.

    opened by raymondng76 0
  • download_pretrained for demo API does not cache downloaded files/models

    download_pretrained for demo API does not cache downloaded files/models

    To allow the containers to start up quicker, models and files were downloaded and cached during build time.

    Recent changes in the huggingface transformers package has broken this functionality:

    • Released in v4.22.0
    • Issue

    Possible choices moving forward:

    • Write a simple caching utility function
    • Stick to versions of transformers before 4.22.0
    opened by jonheng 0
  • Add Stance Detection model

    Add Stance Detection model

    opened by atenzer 0
Releases(v0.4.0)
Owner
AI Singapore | AI Makerspace
Grow local AI talents and empowering start-ups, SMEs and enterprises with AI components, frameworks, platforms and advisory services.
AI Singapore | AI Makerspace
Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Sonnet finder Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet. Usage This is a Python scrip

Marcel Bollmann 11 Sep 25, 2022
BeautyNet is an AI powered model which can tell you whether you're beautiful or not.

BeautyNet BeautyNet is an AI powered model which can tell you whether you're beautiful or not. Download Dataset from here:https://www.kaggle.com/gpios

Ansh Gupta 0 May 06, 2022
Python library for interactive topic model visualization. Port of the R LDAvis package.

pyLDAvis Python library for interactive topic model visualization. This is a port of the fabulous R package by Carson Sievert and Kenny Shirley. pyLDA

Ben Mabey 1.7k Dec 20, 2022
Trained T5 and T5-large model for creating keywords from text

text to keywords Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru Pretraining Large version | Pretraining B

Danil 61 Nov 24, 2022
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

Rishikesh (ऋषिकेश) 126 Jan 02, 2023
Text to speech for Vietnamese, ez to use, ez to update

Chào mọi người, đây là dự án mở nhằm giúp việc đọc được trở nên dễ dàng hơn. Rất cảm ơn đội ngũ Zalo đã cung cấp hạ tầng để mình có thể tạo ra app này

Trần Cao Minh Bách 32 Jul 29, 2022
A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Won Joon Yoo 335 Jan 04, 2023
Transformers Wav2Vec2 + Parlance's CTCDecodeTransformers Wav2Vec2 + Parlance's CTCDecode

🤗 Transformers Wav2Vec2 + Parlance's CTCDecode Introduction This repo shows how 🤗 Transformers can be used in combination with Parlance's ctcdecode

Patrick von Platen 9 Jul 21, 2022
Edge-Augmented Graph Transformer

Edge-augmented Graph Transformer Introduction This is the official implementation of the Edge-augmented Graph Transformer (EGT) as described in https:

Md Shamim Hussain 21 Dec 14, 2022
HuggingTweets - Train a model to generate tweets

HuggingTweets - Train a model to generate tweets Create in 5 minutes a tweet generator based on your favorite Tweeter Make my own model with the demo

Boris Dayma 318 Jan 04, 2023
Linking data between GBIF, Biodiverse, and Open Tree of Life

GBIF-biodiverse-OpenTree Linking data between GBIF, Biodiverse, and Open Tree of Life The python scripts will rely on opentree and Dendropy. To set up

2 Oct 03, 2022
ETM - R package for Topic Modelling in Embedding Spaces

ETM - R package for Topic Modelling in Embedding Spaces This repository contains an R package called topicmodels.etm which is an implementation of ETM

bnosac 37 Nov 06, 2022
String Gen + Word Checker

Creates random strings and checks if any of them are a real words. Mostly a waste of time ngl but it is cool to see it work and the fact that it can generate a real random word within10sec

1 Jan 06, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
The ibet-Prime security token management system for ibet network.

ibet-Prime The ibet-Prime security token management system for ibet network. Features ibet-Prime is an API service that enables the issuance and manag

BOOSTRY 8 Dec 22, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

fastNLP fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。 fastNLP具有如下的特性: 统一的Tabular式数据容器,简化数据预处理过程; 内置多种数据集的Loader和Pipe,省去预处理代码; 各种方便的NLP工具,例如Embedd

fastNLP 2.8k Jan 01, 2023
This is a Prototype of an Ai ChatBot "Tea and Coffee Supplier" using python.

Ai-ChatBot-Python A chatbot is an intelligent system which can hold a conversation with a human using natural language in real time. Due to the rise o

1 Oct 30, 2021
DeBERTa: Decoding-enhanced BERT with Disentangled Attention

DeBERTa: Decoding-enhanced BERT with Disentangled Attention This repository is the official implementation of DeBERTa: Decoding-enhanced BERT with Dis

Microsoft 1.2k Jan 03, 2023
Club chatbot

Chatbot Club chatbot Instructions to get the Chatterbot working Step 1. First make sure you are using a version of Python 3 or newer. To check your ve

5 Mar 07, 2022