Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Overview

Code Transformer

This is an official PyTorch implementation of the CodeTransformer model proposed in:

D. Zügner, T. Kirschstein, M. Catasta, J. Leskovec, and S. Günnemann, “Language-agnostic representation learning of source code from structure and context”

which appeared at ICLR'2021.
An online demo is available at https://code-transformer.org.

[Paper (PDF) | Poster | Slides | Online Demo]

The CodeTransformer is a Transformer based architecture that jointly learns from source code (Context) and parsed abstract syntax trees (AST; Structure). It does so by linking source code tokens to AST nodes and using pairwise distances (e.g., Shortest Paths, PPR) between the nodes to represent the AST. This combined representation is processed in the model by adding the contributions of each distance type to the raw self-attention score between two input tokens (See the paper for more details).

Strengths of the CodeTransformer:

  • Outperforms other approaches on the source code summarization task.
  • Effectively leverages similarities among different programming languages when trained in a multi-lingual setting.
  • Produces useful embeddings that may be employed for other down-stream tasks such as finding similar code snippets across languages.

Cite

Please cite our paper if you use the model, experimental results, or our code in your own work:

@inproceedings{zuegner_code_transformer_2021,
title = {Language-Agnostic Representation Learning of Source Code from Structure and Context},
author = {Z{\"u}gner, Daniel and Kirschstein, Tobias and Catasta, Michele and Leskovec, Jure and G{\"u}nnemann, Stephan},
booktitle={International Conference on Learning Representations (ICLR)},
year = {2021} }

Table of Contents

1. Repository Setup

1.1. Data

To run experiments with the CodeTransformer you can either:

  1. Create a new dataset from raw code snippets
    or
  2. Download the already preprocessed datasets we conducted our experiments on

1.1.1. Raw Data

To use our pipeline to generate a new dataset for code summarization, a collection of methods in the target language is needed. In our experiments, we use the following unprocessed datasets:

Name Description Obtain from
code2seq We use java-small for Code Summarization as well as java-medium and java-large for Pretraining code2seq repository
CodeSearchNet (CSN) For our (multilingual) experiments on Python, JavaScript, Ruby and Go, we employ the raw data from the CSN challenge CodeSearchNet repository
java-pretrain For our Pretraining experiments, we compiled and deduplicated a large code method dataset based on java-small, java-medium and java-large.

1.1.2. Preprocessed Data

We make our preprocessed datasets available for a quick setup and easy reproducibility of our results:

Name Language(s) Based on Download
Python Python CSN python.tar.gz
JavaScript JavaScript CSN javascript.tar.gz
Ruby Ruby CSN ruby.tar.gz
Go Go CSN go.tar.gz
Multi-language Python, JavaScript, Ruby, Go CSN multi-language.tar.gz
java-small Java code2seq
java-pretrain Java code2seq Full dataset available on request due its enormous size
  • java-pretrain-vocabularies.tar.gz: Contains only the vocabularies from pretraining and can be used for fine-tuning the pretrained CT-LM-1 model on any other Java dataset

1.2. Example notebooks

The notebooks/ folder contains two example notebooks that showcase the CodeTransformer:

  1. interactive_prediction.ipynb: Lets you load any of the models and specify an arbitrary code snippet to get a real-time prediction for its method name. Also showcases stage 1 and stage 2 preprocessing.
  2. deduplicate_java_pretrain.ipynb: Explains how we deduplicated the large java-pretrain dataset that we created

1.3. Environment Variables

All environment variables (and thus external dependencies on the host machine) used in the project have to be specified in an .env configuration file. These have to be set to suit your local setup before anything can be run.
The .env.example file gives an example configuration. The actual configuration has to be put into ${HOME}/.config/code_transformer/.env.
Alternatively, you can also directly specify the paths as environment variables, e.g., by sourcing the .env file.

Variable (+ CODE_TRANSFORMER_ prefix) Description Preprocessing Training Inference/Evaluation
Mandatory
DATA_PATH Location for storing datasets X X X
BINARY_PATH Location for storing executables X - -
MODELS_PATH Location for storing model configs, snapshots and predictions - X X
LOGS_PATH Location for logging train metrics - X -
Optional
CSN_RAW_DATA_PATH Location of the downloaded raw CSN dataset files X - -
CODE2SEQ_RAW_DATA_PATH Location of the downloaded raw code2seq dataset files (Java classes) X - -
CODE2SEQ_EXTRACTED_METHODS_DATA_PATH Location of the code snippets extracted from the raw code2seq dataset with the JavaMethodExtractor X - -
DATA_PATH_STAGE_1 Location of stage 1 preprocessing result (parsed ASTs) X - -
DATA_PATH_STAGE_2 Location of stage 2 preprocessing result (computed distances) X X X
JAVA_EXECUTABLE Command for executing java on the machine X - -
JAVA_METHOD_EXTRACTOR_EXECUTABLE Path to the built .jar from the java-method-extractor submodule used for extracting methods from raw .java files X - -
JAVA_PARSER_EXECUTABLE Path to the built .jar from the java-parser submodule used for parsing Java ASTs X - -
SEMANTIC_EXECUTABLE Path to the built semantic executable used for parsing Python, JavaScript, Ruby and Go ASTs X - -

1.4. Repository Structure

├── notebooks               # Example notebooks that showcase the CodeTransformer
├── code_transformer        # Main python package containing most functionality
│   ├── configuration         # Dict-style Configurations of ML models
│   ├── experiments           # Experiment setups for running preprocessing or training
│   │   ├── mixins              # Lightweight experiment modules that can be easily 
│   │   │                       #   combined for different datasets and models
│   │   ├── code_transformer    # Experiment configs for training the CodeTransformer
│   │   ├── great               # Experiment configs for training GREAT
│   │   ├── xl_net              # Experiment configs for training XLNet
│   │   ├── preprocessing       # Implementation scripts for stage 1 and stage 2 preprocessing
│   │   │   ├── preprocess-1.py   # Parallelized stage 1 preprocessing (Generating of ASTs from methods + word counts)
│   │   │   └── preprocess-2.py   # Parallelized stage 2 preprocessing (Calculating of distances in AST + vocabulary)
│   │   ├── paper               # Train configs for reproducing results of all models mentioned in the paper
│   │   └── experiment.py       # Main experiment setup containing training loop, evaluation, 
│   │                           #   metrics, logging and loading of pretrained models
│   ├── modeling              # PyTorch implementations of the Code Transformer, 
│   │   │                     #   GREAT and XLNet with different heads
│   │   ├── code_transformer    # CodeTransformer implementation
│   │   ├── decoder             # Transformer Decoder implementation with Pointer Network
│   │   ├── great_transformer   # Adapted implementation of GREAT for code summarization
│   │   ├── xl_net              # Adapted implementation of XLNet for code summarization
│   │   ├── modelmanager        # Easy loading of stored model parameters
│   │   └── constants.py        # Several constants affecting preprocessing and vocabularies
│   ├── preprocessing         # Implementation of preprocessing pipeline + data loading
│   │   │                     #   modeling, e.g., special tokens or number of method name tokens
│   │   ├── pipeline            # Stage 1 and Stage 2 preprocessing of CodeSearchNet code snippets
│   │   │   ├── code2seq.py       # Adaptation of code2seq AST path walks for CSN datasets and languages
│   │   │   ├── filter.py         # Low-level textual code snippet processing used during stage 1 preprocessing
│   │   │   ├── stage1.py         # Applies filters to code snippets and calls semantic to generate ASTs
│   │   │   └── stage2.py         # Contains only definitions of training samples, actual
│   │   │                         #   graph distance calculation is contained in preprocessing/graph/distances.py
│   │   ├── datamanager         # Easy loading and storing of preprocessed code snippets
│   │   │   ├── c2s               # Loading of raw code2seq dataset files
│   │   │   ├── csn               # Loading and storing of CSN dataset files
│   │   │   └── preprocessed.py   # Dataset-agnostic loading of stage 1 and stage 2 preprocessed samples
│   │   ├── dataset             # Final task-specific preprocessing before data is fed into model, i.e.,
│   │   │   │                   #   python modules to be used with torch.utils.data.DataLoader
│   │   │   ├── base.py           # task-agnostic preprocessing such as mapping sequence tokens to graph nodes
│   │   │   ├── ablation.py       # only-AST ablation
│   │   │   ├── code_summarization.py # Main preprocessing for the Code Summarization task.
│   │   │   │                         #   Masking the function name in input, drop punctuation tokens
│   │   │   └── lm.py             # Language Modeling pretraining task. Generate permutations
│   │   ├── graph               # Algorithms on ASTs
│   │   │   ├── alg.py            # Graph distance metrics such as next siblings
│   │   │   ├── ast.py            # Generalized AST as graph that handles semantic and Java-parser ASTs.
│   │   │   │                     #   Allows assigning tokens that have no corresponding AST node
│   │   │   ├── binning.py        # Graph distance binning (equal and exponential)
│   │   │   ├── distances.py      # Higher level modularized distance and binning wrappers for use in preprocessing
│   │   │   └── transform.py      # Core of stage2 preprocessing that calculates general graph distances  
│   │   └── nlp                 # Algorithms on text
│   │       ├── javaparser.py     # Python wrapper of java-parser to generate ASTs from Java methods
│   │       ├── semantic.py       # Python wrapper of semantic parser to generate ASTs
│   │       │                     #   from languages supported by semantic
│   │       ├── text.py           # Simple text handling such as positions of words in documents
│   │       ├── tokenization.py   # Mostly wrapper around Pygments Tokenizer to tokenize (and sub-tokenize) code snippets
│   │       └── vocab.py          # Mapping of the most frequent tokens to IDs understandable for ML models
│   ├── utils
│   │   └── metrics.py        # Evaluation metrics. Mostly, different F1-scores
│   └── env.py               # Defines several environment variables such as paths to executables 
├── scripts               # (Python) scripts intended to be run from the command line "python -m scripts/{SCRIPT_NAME}"
│   ├── code2seq
│   │   ├── combine-vocabs-code2seq.sh    # Creates code2seq vocabularies for multi-language setting 
│   │   ├── preprocess-code2seq.py        # Preprocessing for code2seq (Generating of tree paths).
│   │   │                                 #   Works with any datasets created by preprocess-1.py
│   │   ├── preprocess-code2seq.sh        # Calls preprocess-code2seq.py and preprocess-code2seq-helper.py.
│   │   │                                 #   Generates everything else code2seq needs, such as vocabularies
│   │   └── preprocess-code2seq-helper.py # Copied from code2seq. Performs vocabulary generation and normalization of snippets
│   ├── evaluate.py                   # Loads a trained model and evaluates it on a specified dataset
│   ├── evaluate-multilanguage.py     # Loads a trained multi-language model and evaluates it on a multi-language database  
│   ├── deduplicate-java-pretrain.py  # De-duplicates a directory of .java files (used for java-pretrain)
│   ├── extract-java-methods.py       # Extracts Java methods from raw .java files to feed into stage 1 preprocessing
│   ├── run-experiment.py             # Parses a .yaml config file and starts training of a CodeTransformer/GREAT/XLNet model
│   └── run-preprocessing.py          # Parses a .yaml config file and starts stage 1 or stage 2 preprocessing
├── sub_modules             # Separate modules
│   ├── code2seq               # code2seq adaptation: Mainly modifies code2seq for multi-language setting
│   ├── java-method-extractor  # code2seq adaptation: Extracts Java methods from .java files as JSON 
│   └── java-parser            # Simple parser wrapper for generating Java ASTs
├── tests                   # Unit Tests for parts of the project
└── .env.example            # Example environment variables configuration file

2. Preprocessing

Code Transformer Overview

2.1. semantic

The first stage of our preprocessing pipeline makes use of semantic to generate ASTs from code snippet that are written in Python, JavaScript, Ruby or Go.
semantic is a command line tool written in Haskell that is capable of parsing source code in a variety of languages. The generated ASTs mostly share a common set of node types which is important for multi-lingual experiments. For Java, we employ a separate AST parser, as the language currently is not supported by semantic.
To obtain the ASTs, we rely on the --json-graph option that has been dropped temporarily from semantic. As such, the stage 1 preprocessing requires a semantic executable built from a revision before Mar 27, 2020. E.g., the revision 34ea0d1dd6.

To enable stage 1 preprocessing, you can either:

  1. Build semantic on your machine using a revision with the --json-graph option. We refer to the semantic documentation for build instructions.
    or
  2. Use the statically linked semantic executable that we built for our experiments: semantic.tar.gz

2.2. CSN Dataset

Download the raw CSN dataset files as described in the raw data section.

  1. Compute ASTs (stage 1)
     python -m scripts.run-preprocessing code_transformer/experiments/preprocessing/preprocess-1-csn.yaml {python|javascript|ruby|go} {train|valid|test} 
  2. Compute graph distances (stage 2)
    python -m scripts.run-preprocessing code_transformer/experiments/preprocessing/preprocess-2.yaml {python|javascript|ruby|go} {train|valid|test}

The .yaml files contain configurations for preprocessing (e.g., which distance metrics to use and how the vocabularies are generated).
It is important to run the preprocessing for the train partition first as statistics for generating the vocabulary that are needed for the other partitions are computed there.

2.3. code2seq Dataset

Download the raw code2seq dataset files (Java classes) as described in the raw data section.

  1. Extract methods from the raw Java class files via
    python -m scripts.extract-java-methods {java-small|java-medium|java-large}
  2. Compute ASTs (stage 1)
    python -m scripts.run-preprocessing code_transformer/experiments/preprocessing/preprocess-1.yaml {java-small|java-medium|java-large} {train|valid|test}
  3. Compute graph distances (stage 2)
    python -m scripts.run-preprocessing code_transformer/experiments/preprocessing/preprocess-2.yaml {java-small|java-medium|java-large} {train|valid|test}

The .yaml files contain configurations for preprocessing (e.g., which distance metrics to use and how the vocabularies are generated).
Ensure to run the preprocessing for the train partition first as statistics for generating the vocabulary that are needed for the other partitions are computed there.

2.4. Multilanguage Dataset

Builds upon the stage 1 CSN datasets computed as shown above. Datasets are then combined by simply running the stage 2 preprocessing with a comma-separated string containing the desired languages. For the experiments in our paper we combined Python, JavaScript, Ruby and Go:

python -m scripts.run-preprocessing code_transformer/experiments/preprocessing/preprocess-2.yaml python,javascript,ruby,go {train|valid|test}

3. Training

Similar to how preprocessing works, training is configured via .yaml files that describe the data representation that is used, the model hyperparameters and how training should go about.
Train metrics are logged to a tensorboard in LOGS_PATH.

3.1. Code Transformer

python -m scripts.run-experiment code_transformer/experiments/code_transformer/code_summarization.yaml

This will start training of a CodeTransformer model. The .yaml file contains model and training configurations (e.g., which dataset to use, model hyperparameters or when to store checkpoints).

3.2. GREAT

We adapted the Graph Relational Embedding Attention Transformer (GREAT) for comparison.

python -m scripts.run-experiment code_transformer/experiments/great/code_summarization.yaml

This will start training of a GREAT model. The .yaml file contains model and training configurations (e.g., which dataset to use, model hyperparameters or when to store checkpoints).

3.3. XLNet (No structure)

For comparison, we also employed an XLNet architecture that only learns from the source code token sequence.

python -m scripts.run-experiment code_transformer/experiments/xl_net/code_summarization.yaml

This will start training of a XLNet model. The .yaml file contains model and training configurations (e.g., which dataset to use, model hyperparameters or when to store checkpoints).

3.4. Language Modeling Pretraining

The performance of Transformer architectures can often be further improved by first pretraining the model on a language modeling task.
In our case, we make use of XL-Nets permutation based masked language modeling.
Language Modelling Pretraining can be run via:

python -m scripts.run-experiment code_transformer/experiments/code_transformer/language_modeling.yaml

The pretrained model can then be finetuned on the Code Summarization task using the regular training script as described above. The transfer_learning section in the .yaml configuration file is used to define the model and snapshot to be finetuned.

4. Evaluation

4.1. Single Language

python -m scripts.evaluate {code_transformer|great|xl_net} {run_id} {snapshot} {valid|test}

where run_id is the unique name of the run as printed during training. This also corresponds to the folder name that contains the respective stored snapshots of the model.
snapshot it the training iteration in which the snapshot was stored, e.g., 50000.

4.2. Multilanguage

python -m scripts.evaluate-multilanguage {code_transformer|great|xl_net} {run_id} {snapshot} {valid|test} [--filter-language {language}]

The --filter-language option can be used to run the evaluation only on one of the single languages that the respective multilanguage dataset is comprised of (used for CT-[11-14]).

5. Models from the Paper

5.1. Models trained on CSN Dataset

Name in Paper Run ID Snapshot Language Hyperparameters
Single Language
GREAT (Python) GT-1 350000 Python great_python.yaml
GREAT (Javascript) GT-2 60000 JavaScript great_javascript.yaml
GREAT (Ruby) GT-3 30000 Ruby great_ruby.yaml
GREAT (Go) GT-4 220000 Go great_go.yaml
Ours w/o structure (Python) XL-1 400000 Python xl_net_python.yaml
Ours w/o structure (Javascript) XL-2 260000 JavaScript xl_net_javascript.yaml
Ours w/o structure (Ruby) XL-3 60000 Ruby xl_net_ruby.yaml
Ours w/o structure (Go) XL-4 200000 Go xl_net_go.yaml
Ours w/o pointer net (Python) CT-1 280000 Python ct_no_pointer_python.yaml
Ours w/o pointer net (Javascript) CT-2 120000 JavaScript ct_no_pointer_javascript.yaml
Ours w/o pointer net (Ruby) CT-3 520000 Ruby ct_no_pointer_ruby.yaml
Ours w/o pointer net (Go) CT-4 320000 Go ct_no_pointer_go.yaml
Ours (Python) CT-5 500000 Python ct_python.yaml
Ours (Javascript) CT-6 90000 JavaScript ct_javascript.yaml
Ours (Ruby) CT-7 40000 Ruby ct_ruby.yaml
Ours (Go) CT-8 120000 Go ct_go.yaml
Multilanguage Models
Great (Multilang.) GT-5 320000 Python, JavaScript, Ruby and Go great_multilang.yaml
Ours w/o structure (Mult.) XL-5 420000 Python, JavaScript, Ruby and Go xl_net_multilang.yaml
Ours w/o pointer (Mult.) CT-9 570000 Python, JavaScript, Ruby and Go ct_no_pointer_multilang.yaml
Ours (Multilanguage) CT-10 650000 Python, JavaScript, Ruby and Go ct_multilang.yaml
Mult. Pretraining
Ours (Mult. + Finetune Python) CT-11 120000 Python, JavaScript, Ruby and Go ct_multilang.yamlct_multilang_python.yaml
Ours (Mult. + Finetune Javascript) CT-12 20000 Python, JavaScript, Ruby and Go ct_multilang.yamlct_multilang_javascript.yaml
Ours (Mult. + Finetune Ruby) CT-13 10000 Python, JavaScript, Ruby and Go ct_multilang.yamlct_multilang_ruby.yaml
Ours (Mult. + Finetune Go) CT-14 60000 Python, JavaScript, Ruby and Go ct_multilang.yamlct_multilang_go.yaml
Ours (Mult. + LM Pretrain) CT-15 280000 Python, JavaScript, Ruby and Go ct_multilang_lm.yamlct_multilang_lm_pretrain.yaml

5.2. Models trained on code2seq Dataset

Name in Paper Run ID Snapshot Language Hyperparameters
Without Pointer Net
Ours w/o structure XL-6 400000 Java xl_net_no_pointer_java_small.yaml
Ours w/o context CT-16 150000 Java ct_no_pointer_java_small_only_ast.yaml
Ours CT-17 410000 Java ct_no_pointer_java_small.yaml
With Pointer Net
GREAT GT-6 170000 Java great_java_small.yaml
Ours w/o structure XL-7 170000 Java xl_net_java_small.yaml
Ours w/o context CT-18 90000 Java ct_java_small_only_ast.yaml
Ours CT-19 250000 Java ct_java_small.yaml
Ours + Pretrain CT-20 30000 Java ct_java_pretrain_lm.yamlct_java_small_pretrain.yaml

5.3. Models from Ablation Studies

Name in Paper Run ID Snapshot Language Hyperparameters
Sibling Shortest Paths CT-21 310000 Java ct_java_small_ablation_only_sibling_sp.yaml
Ancestor Shortest Paths CT-22 250000 Java ct_java_small_ablation_only_ancestor_sp.yaml
Shortest Paths CT-23 190000 Java ct_java_small_ablation_only_shortest_paths.yaml
Personalized Page Rank CT-24 210000 Java ct_java_small_ablation_only_ppr.yaml

5.4. Download Models

We also make all our trained models that are mentioned in the paper available for easy reproducibility of our results:

Name Description Models Download
CSN Single Language All models trained on one of the Python, JavaScript, Ruby or Go datasets GT-[1-4], XL-[1-4], CT-[1-8] csn-single-language-models.tar.gz
CSN Multi-Language All models trained on the multi-language dataset + Pretraining GT-5, XL-5, CT-[9-15], CT-LM-2 csn-multi-language-models.tar.gz
code2seq All models trained on the code2seq java-small dataset + Pretraining XL-[6+7], GT-6, CT-[16-20], CT-LM-1 code2seq-models.tar.gz
Ablation The models trained for ablation purposes on java-small CT-[21-24] ablation-models.tar.gz

Once downloaded, you can test any of the above models in the interactive_prediction.ipynb notebook.

Owner
Daniel Zügner
PhD candidate at TU Munich. Machine learning for graphs.
Daniel Zügner
Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Shunted Transformer This is the offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation by Sucheng Ren, Daquan Zhou, Shengf

156 Dec 27, 2022
The Wearables Development Toolkit - a development environment for activity recognition applications with sensor signals

Wearables Development Toolkit (WDK) The Wearables Development Toolkit (WDK) is a framework and set of tools to facilitate the iterative development of

Juan Haladjian 114 Nov 27, 2022
Improving Calibration for Long-Tailed Recognition (CVPR2021)

MiSLAS Improving Calibration for Long-Tailed Recognition Authors: Zhisheng Zhong, Jiequan Cui, Shu Liu, Jiaya Jia [arXiv] [slide] [BibTeX] Introductio

Jia Research Lab 116 Dec 20, 2022
Predicting Price of house by considering ,house age, Distance from public transport

House-Price-Prediction Predicting Price of house by considering ,house age, Distance from public transport, No of convenient stores around house etc..

Musab Jaleel 1 Jan 08, 2022
Static-test - A playground to play with ideas related to testing the comparability of the code

Static test playground ⚠️ The code is just an experiment. Compiles and runs on U

Igor Bogoslavskyi 4 Feb 18, 2022
code for Grapadora research paper experimentation

Road feature embedding selection method Code for research paper experimentation Abstract Traffic forecasting models rely on data that needs to be sens

Eric López Manibardo 0 May 26, 2022
The official homepage of the (outdated) COCO-Stuff 10K dataset.

COCO-Stuff 10K dataset v1.1 (outdated) Holger Caesar, Jasper Uijlings, Vittorio Ferrari Overview Welcome to official homepage of the COCO-Stuff [1] da

Holger Caesar 263 Dec 11, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

DALL-E in Pytorch Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ge

Phil Wang 5k Jan 04, 2023
Wider or Deeper: Revisiting the ResNet Model for Visual Recognition

ademxapp Visual applications by the University of Adelaide In designing our Model A, we did not over-optimize its structure for efficiency unless it w

Zifeng Wu 338 Dec 12, 2022
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

Yifan Zhang 259 Dec 25, 2022
This program automatically runs Python code copied in clipboard

CopyRun This program runs Python code which is copied in clipboard WARNING!! USE AT YOUR OWN RISK! NO GUARANTIES IF ANYTHING GETS BROKEN. DO NOT COPY

vertinski 4 Sep 10, 2021
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking (CVPR 2021) Pytorch implementation of the ArTIST motion model. In this repo

Fatemeh 38 Dec 12, 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Learning the Beauty in Songs: Neural Singing Voice Beautifier Jinglin Liu, Chengxi Li, Yi Ren, Zhiying Zhu, Zhou Zhao Zhejiang University ACL 2022 Mai

Jinglin Liu 257 Dec 30, 2022
TLDR: Twin Learning for Dimensionality Reduction

TLDR (Twin Learning for Dimensionality Reduction) is an unsupervised dimensionality reduction method that combines neighborhood embedding learning with the simplicity and effectiveness of recent self

NAVER 105 Dec 28, 2022
PyTorch-Geometric Implementation of MarkovGNN: Graph Neural Networks on Markov Diffusion

MarkovGNN This is the official PyTorch-Geometric implementation of MarkovGNN paper under the title "MarkovGNN: Graph Neural Networks on Markov Diffusi

HipGraph: High-Performance Graph Analytics and Learning 6 Sep 23, 2022
This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

BMW-IntelOpenVINO-Segmentation-Inference-API This is a repository for a semantic segmentation inference API using the OpenVINO toolkit. It's supported

BMW TechOffice MUNICH 34 Nov 24, 2022
Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Pytorch Pedestrian Attribute Recognition: A strong PyTorch baseline of pedestrian attribute recognition and multi-label classification.

Jian 79 Dec 18, 2022
Deep deconfounded recommender (Deep-Deconf) for paper "Deep causal reasoning for recommendations"

Deep Causal Reasoning for Recommender Systems The codes are associated with the following paper: Deep Causal Reasoning for Recommendations, Yaochen Zh

Yaochen Zhu 22 Oct 15, 2022
3DIAS: 3D Shape Reconstruction with Implicit Algebraic Surfaces (ICCV 2021)

3DIAS_Pytorch This repository contains the official code to reproduce the results from the paper: 3DIAS: 3D Shape Reconstruction with Implicit Algebra

Mohsen Yavartanoo 21 Dec 12, 2022
Easy genetic ancestry predictions in Python

ezancestry Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom

Kevin Arvai 38 Jan 02, 2023