Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Last update: Dec 30, 2022

Related tags

Text Data & NLP PLBART

Overview

PLBART

Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021.

Note. A detailed documentation is coming soon.

Pre-training data

PLBART is pre-trained on Java and Python functions and natural language descriptions collected from Github and StackOverflow.

Evaluation tasks

We evaluated PLBART on five tasks.

Code summarization [REF]
Code generation [REF]
Code translation [REF]
Clone detection [REF]
Vulnerability REF [REF]

Notes

We will publish the pretrained PLBART checkpoint soon.
We list all the files in this repository here.

Acknowledgement

PLBART uses Fairseq, codeXglue, and TransCoder and thanks the authors of these works for their contribution.

Citation

@inproceedings{ahmad2020summarization,
    author = {Ahmad, Wasi Uddin and Chakraborty, Saikat and Ray, Baishakhi and Chang, Kai-Wei},
    booktitle = {Proceedings of the 2021 Conference of the North {A}merican Chapter of the Association for Computational Linguistics},
    title = {Unified Pre-training for Program Understanding and Generation},
    year = {2021}
}

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Related tags

Overview

PLBART

Pre-training data

Evaluation tasks

Notes

Acknowledgement

Citation

Owner

Wasi Ahmad

VMD Audio/Text control with natural language

An open source framework for seq2seq models in PyTorch.

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

ChatterBot is a machine learning, conversational dialog engine for creating chat bots

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

In this repository we have tested 3 VQA models on the ImageCLEF-2019 dataset.

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

Facilitating the design, comparison and sharing of deep text matching models.

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

Creating a python chatbot that Starbucks users can text to place an order + help cut wait time of a normal coffee.

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Chinese version of GPT2 training code, using BERT tokenizer.

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

Code for PED: DETR For (Crowd) Pedestrian Detection

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

A paper list of pre-trained language models (PLMs).