Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

Related tags

Deep LearningTPTrans
Overview

This is an official Pytorch implementation of the approaches proposed in:

Han Peng, Ge Li, Wenhan Wang, Yunfei Zhao, Zhi Jin “Integrating Tree Path in Transformer for Code Representation”

which appeared at NeurIPS 2021[Paper Link][Poster][Slides].

In this paper, we investigate the interaction between the absolute and relative path encoding, and propose novel code representation model TPTrans and its variants, which introduce path encoding inductive bias into the attention module of Transformer and power Transformer to know the structure of source codes.

Please cite our paper if you use the model, experimental results, or our code in your own work.

1.1 Raw data

To run experiments with TPTrans and its variants, please first create datasets from raw code snippets of CodeSearchNet dataset. Download and unzip the raw jsonl data of CSN into the raw_data dir like that

├── raw_data        
│   ├── python         
│   │   ├── train    
│   │   │   ├── XXXX.jsonl...
│   │   ├── test    
│   │   ├── valid   
│   ├── ruby          
│   ├── go        
│   ├── javascript        

1.2 Tree-Sitter

The Tree-Sitter is a open-source parser for multi-language programming languages. Please install it and then download the grammer files into vendor dir for four different programming languages like that

├── vendor        
│   ├── tree-sitter-python  (from https://github.com/tree-sitter/tree-sitter-python)         
│   ├── tree-sitter-javascript  (from https://github.com/tree-sitter/tree-sitter-javascript)     
│   ├── tree-sitter-go  (from https://github.com/tree-sitter/tree-sitter-go)
│   ├── tree-sitter-ruby  (from https://github.com/tree-sitter/tree-sitter-ruby)

After that, run the multi_language_parse.py in parser dir to parse the raw code snippets into the data dir.

1.3 Training

After preprocessing, run the _main.py_ to train the model.

To run the TPTrans, please specify the relation_path=True and absolute_path=False.

To run the TPTrans-\alpha, please specify the relation_path=True and absolute_path=True.

For other command triggers, please refer the comment inline for details.

Contact If you have any questions, please contact me via email: [email protected] or open issue on Github.

Owner
Han Peng
Han Peng
CL-Gym: Full-Featured PyTorch Library for Continual Learning

CL-Gym: Full-Featured PyTorch Library for Continual Learning CL-Gym is a small yet very flexible library for continual learning research and developme

Iman Mirzadeh 36 Dec 25, 2022
Tensorflow 2 implementation of the paper: Learning and Evaluating Representations for Deep One-class Classification published at ICLR 2021

Deep Representation One-class Classification (DROC). This is not an officially supported Google product. Tensorflow 2 implementation of the paper: Lea

Google Research 137 Dec 23, 2022
Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution.

Ubiquitous Knowledge Processing Lab 22 Jan 02, 2023
KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

KUIELAB-MDX-Net got the 2nd place on the Leaderboard A and the 3rd place on the Leaderboard B in the MDX-Challenge ISMIR 2021

IELab@ Korea University 74 Dec 28, 2022
AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation

AtlasNet [Project Page] [Paper] [Talk] AtlasNet: A Papier-Mâché Approach to Learning 3D Surface Generation Thibault Groueix, Matthew Fisher, Vladimir

577 Dec 17, 2022
A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.

meddlr Getting Started Meddlr is a config-driven ML framework built to simplify medical image reconstruction and analysis problems. Installation To av

Arjun Desai 36 Dec 16, 2022
CTC segmentation python package

CTC segmentation CTC segmentation can be used to find utterances alignments within large audio files. This repository contains the ctc-segmentation py

Ludwig Kürzinger 217 Jan 04, 2023
DaReCzech is a dataset for text relevance ranking in Czech

Dataset DaReCzech is a dataset for text relevance ranking in Czech. The dataset consists of more than 1.6M annotated query-documents pairs,

Seznam.cz a.s. 8 Jul 26, 2022
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Jan 06, 2023
Implementations of polygamma, lgamma, and beta functions for PyTorch

lgamma Implementations of polygamma, lgamma, and beta functions for PyTorch. It's very hacky, but that's usually ok for research use. To build, run: .

Rachit Singh 24 Nov 09, 2021
Vector Quantization, in Pytorch

Vector Quantization - Pytorch A vector quantization library originally transcribed from Deepmind's tensorflow implementation, made conveniently into a

Phil Wang 665 Jan 08, 2023
This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

Q-Programming Summer of Qode This repository contains all the code and materials distributed in the Q-Programming Summer of Qode. If you want to creat

Sammarth Kumar 11 Jun 11, 2021
Real-time Object Detection for Streaming Perception, CVPR 2022

StreamYOLO Real-time Object Detection for Streaming Perception Jinrong Yang, Songtao Liu, Zeming Li, Xiaoping Li, Sun Jian Real-time Object Detection

Jinrong Yang 237 Dec 27, 2022
This repository is an implementation of paper : Improving the Training of Graph Neural Networks with Consistency Regularization

CRGNN Paper : Improving the Training of Graph Neural Networks with Consistency Regularization Environments Implementing environment: GeForce RTX™ 3090

THUDM 28 Dec 09, 2022
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)

TorchCAM: class activation explorer Simple way to leverage the class-specific activation of convolutional layers in PyTorch. Quick Tour Setting your C

F-G Fernandez 1.2k Dec 29, 2022
NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023
Implementation of Squeezenet in pytorch, pretrained models on Cifar 10 data to come

Pytorch Squeeznet Pytorch implementation of Squeezenet model as described in https://arxiv.org/abs/1602.07360 on cifar-10 Data. The definition of Sque

gaurav pathak 86 Oct 28, 2022
A simple root calculater for python

Root A simple root calculater Usage/Examples python3 root.py 9 3 4 # Order: number - grid - number of decimals # Output: 2.08

Reza Hosseinzadeh 5 Feb 10, 2022
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

DeepConsensus DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS)

Google 149 Dec 19, 2022