AllenNLP integration for Shiba: Japanese CANINE model

Last update: Feb 16, 2022

Overview

Allennlp Integration for Shiba

allennlp-shiab-model is a Python library that provides AllenNLP integration for shiba-model.

SHIBA is an approximate reimplementation of CANINE [1] in raw Pytorch, pretrained on the Japanese wikipedia corpus using random span masking. If you are unfamiliar with CANINE, you can think of it as a very efficient (approximately 4x as efficient) character-level BERT model. Of course, the name SHIBA comes from the identically named Japanese canine.

Installation

Installing the library and dependencies is simple using pip.

pip install allennlp-shiba

Example

This library enables users to specify the in a jsonnet config file. Here is an example of the model in jsonnet config file:

{
    "dataset_reader": {
        "tokenizer": {
            "type": "shiba",
        },
        "token_indexers": {
            "tokens": {
                "type": "shiba",
            }
        },
    },
    "model": {
        "shiba_embedder": {
            "type": "basic",
            "token_embedders": {
                "shiba": {
                    "type": "shiba",
                    "eval_model": true,
                }
            }

        }
    }
}

Reference

Joshua Tanner and Masato Hagiwara (2021). SHIBA: Japanese CANINE model. GitHub repository, GitHub.

You might also like...

Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

5 Aug 25, 2022

Code for evaluating Japanese pretrained models provided by NTT Ltd.

japanese-dialog-transformers 日本語の説明文はこちら This repository provides the information necessary to evaluate the Japanese Transformer Encoder-decoder dialo

216 Dec 22, 2022

Script to download some free japanese lessons in portuguse from NHK

Nihongo_nhk This is a script to download some free japanese lessons in portuguese from NHK. It can be executed by installing the packages with: pip in

2 Jan 6, 2022

An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

32 Dec 14, 2022

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

pyjpboatrace :speedboat: provides you with useful tools for data analysis and auto-betting for boatrace.

5 Oct 29, 2022

A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

325 Jan 5, 2023

This repository has a implementations of data augmentation for NLP for Japanese.

daaja This repository has a implementations of data augmentation for NLP for Japanese: EDA: Easy Data Augmentation Techniques for Boosting Performance

60 Nov 11, 2022

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor

92 Dec 27, 2022

AllenNLP integration for Shiba: Japanese CANINE model

Related tags

Overview

Allennlp Integration for Shiba

Installation

Example

Reference

You might also like...

Auto translate textbox from Japanese to English or Indonesia

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Script to download some free japanese lessons in portuguse from NHK

An open collection of annotated voices in Japanese language

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

PyJPBoatRace: Python-based Japanese boatrace tools 🚤

A Japanese tokenizer based on recurrent neural networks

This repository has a implementations of data augmentation for NLP for Japanese.

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

Releases(v0.1.1)

v0.1.1(Jun 26, 2021)

v0.1.0(Jun 26, 2021)

v0.0.1(Jun 26, 2021)

Owner

Shunsuke KITADA

PyTorch implementation of Tacotron speech synthesis model.

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Wind Speed Prediction using LSTMs in PyTorch

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Chinese Grammatical Error Diagnosis

Python port of Google's libphonenumber

基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

Transformer Based Korean Sentence Spacing Corrector

This is a NLP based project to extract effective date of the contract from their text files.

Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Anuvada: Interpretable Models for NLP using PyTorch

Findings of ACL 2021

This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

A paper list of pre-trained language models (PLMs).

Library for fast text representation and classification.

TPlinker for NER 中文/英文命名实体识别

A deep learning-based translation library built on Huggingface transformers

Journey is a NLP-Powered Developer assistant

Two-stage text summarization with BERT and BART