Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

Last update: Dec 26, 2021

Overview

Task-aware Joint CWS and POS (TCwsPos)

This is the implementation of the final project of the course DDA6309 Probabilistic Graphical Models, The Chinese University of Hong Kong (Shenzhen).

Please contact us at {pengsong,leqitian}@link.cuhk.edu.cn if you have any question.

Requirements

Our code works with the following environment.

python=3.6
pytorch=1.1

Downloading BERT

In our paper, we use BERT (paper) as the encoder.

For BERT, please download pre-trained BERT-Base Chinese from Google or from HuggingFace. If you download it from Google, you need to convert the model from TensorFlow version to PyTorch version.

Running on Sample Data

Run run_sample.sh to train a model on the small sample data under the sample_data folder.

Datasets

We use Universal Dependencies 2.4 (UD) in our paper.

To obtain and pre-process the data, you can go to data_preprocessing directory and run getdata.sh. This script will download and process the official data from UD.

All processed data will appear in data directory organized by the datasets, where each of them contains the files with the same file names under the sample_data directory.

Training and Testing

You can find the command lines to train and test model on a specific dataset in run.sh.

Here are some important parameters:

--do_train: train the model
--do_test: test the model
--use_bert: use BERT as encoder
--bert_model: the directory of pre-trained BERT model
--model_name: the name of model to save

Predicting

run_sample.sh contains the command line to segment and tag the sentences in an input file (./sample_data/sentence.txt).

Here are some important parameters:

--do_predict: segment and tag the sentences using a pre-trained TCwsPos model.
--input_file: the file contains sentences to be segmented and tagged. Each line contains one sentence; you can refer to a sample input file for the input format.
--output_file: the path of the output file. Words are segmented by a space; POS labels are attached to the resulting words by an underline ("_").
--eval_model: the pre-trained WMSeg model to be used to segment the sentences in the input file.

To-do List

Regular maintenance

You can leave comments in the Issues section, if you want us to implement any functions.

Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

Related tags

Overview

Task-aware Joint CWS and POS (TCwsPos)

Requirements

Downloading BERT

Running on Sample Data

Datasets

Training and Testing

Predicting

To-do List

Owner

Peng

Official repository for CVPR21 paper "Deep Stable Learning for Out-Of-Distribution Generalization".

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

Copy Paste positive polyp using poisson image blending for medical image segmentation

Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

A python interface for training Reinforcement Learning bots to battle on pokemon showdown

Crosslingual Segmental Language Model

End-to-end beat and downbeat tracking in the time domain.

Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

Making a music video with Wav2CLIP and VQGAN-CLIP

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

Attention for PyTorch with Linear Memory Footprint

VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

MPLP: Metapath-Based Label Propagation for Heterogenous Graphs

[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs

A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

A repository for the paper "Improved Adversarial Systems for 3D Object Generation and Reconstruction".