Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Last update: Dec 29, 2022

Related tags

Text Data & NLP python-zhuyin

Overview

Python-Zhuyin (pyzhuyin) 注音和拼音轉換

Introduction 介紹

pyzhuyin is an open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo).

pyzhuyin 是一個開放原始碼的 Python 套件，提供了將拼音轉換成注音的統一介面。

Installation 安裝

pip install pyzhuyin

Usage 使用

from pyzhuyin import pinyin_to_zhuyin, zhuyin_to_pinyin


assert(pinyin_to_zhuyin("lu3") == "ㄌㄨˇ")
assert(pinyin_to_zhuyin("dan4") == "ㄉㄢˋ")
assert(map(pinyin_to_zhuyin, ["lu3", "dan4"]) == ["ㄌㄨˇ", "ㄉㄢˋ"])

assert(zhuyin_to_pinyin("ㄌㄩˊ") == "lü2")
assert(zhuyin_to_pinyin("˙ㄗ") == "zi5")
assert(map(lambda z: zhuyin_to_pinyin(z, u_to_v=True), ["ㄌㄩˊ", "˙ㄗ"]) == ["lv2", "zi5"])

Testing 測試

Run the following command at the root of the project to test the library:

在根目錄執行以下指令以測試套件:

python3 -m unittest

Notes 備註

Only support numeric tone for pinyin
- e.g. "lu3" instead of "lǔ"
Neutral tone is represented as 5
- e.g. "˙ㄗ" -> "zi5"
For pinyin_to_zhuyin:
- if corresponding zhuyin not found, raise ValueError
- internally convert all v to ü
For zhuyin_to_pinyin:
- if corresponding pinyin not found, raise ValueError
兒化音 is not supported because it is not representable in the zhuyin system as a "combo" word
- e.g. "公園兒" -> "gong1 yuanr2" -> "ㄍㄨㄥㄩㄢㄦˊ" (not allowed)

Data Sources 資料來源

中華民國教育部（Ministry of Education, R.O.C.）。《重編國語辭典修訂本》（版本編號：2015_20210928 ）

網址：https://dict.revised.moe.edu.tw/

CC BY-ND 3.0 TW 授權

Author 作者

Raymond Ku

Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Related tags

Overview

Python-Zhuyin (pyzhuyin) 注音和拼音轉換

Introduction 介紹

Installation 安裝

Usage 使用

Testing 測試

Notes 備註

Data Sources 資料來源

Author 作者

Owner

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

The NewSHead dataset is a multi-doc headline dataset used in NHNet for training a headline summarization model.

Mapping a variable-length sentence to a fixed-length vector using BERT model

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

Semantic search for quotes.

Blender addon - Scrub timeline from viewport with a shortcut

Use Tensorflow2.7.0 Build OpenAI'GPT-2

FactSumm: Factual Consistency Scorer for Abstractive Summarization

Neural-Machine-Translation - Implementation of revolutionary machine translation models

ReCoin - Restoring our environment and businesses in parallel

official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

A python gui program to generate reddit text to speech videos from the id of any post.

customer care chatbot made with Rasa Open Source.

Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"