This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Last update: Jan 11, 2022

Related tags

Deep Learning text-representation

Overview

Introduction

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

If you find this code useful, please cite the following paper:

@article{tan2022coherence,
  title = {Coherence-Based Distributed Document Representation Learning for Scientific Documents},
  author = {Tan, Shicheng and Zhao, Shu and Zhang, Yanping},
  journal = {arXiv},
  year = {2022},
  type = {Journal Article}
}

Run

Installation environment (ref. requirements.txt)
Download data: Link: https://pan.baidu.com/s/1EEJk0_P55Ov5ReXsmyVZPA Password: rkh0
python _av_CTE.py

信息检索数据运行指南

数据处理（4个文件）：使用“...data helper-IR.py”获取3份数据，原始数据处理暂存文件、原始数据处理暂存文件的语料、构建的数据集，然后使用“_aj_get dataset corpus.py”获得构建的数据集的语料
词向量训练（4个文件）：使用“_ak_get word embedding.py”训练第一步的2个语料得到2个词表和2个词向量文件，glove需要去除后缀名“.txt”
运行5次“_al_em-avg.py”得到5个结果，avg-word2vec、avg-word2vec(globe)、avg-glove、avg-glove(globe)、random embedding
运行“_ac_tf-idf.py”得到一个距离矩阵和1个结果，矩阵用于CTE方法
LDA、doc2vec、BM25、LSI、GPT2、XLNet、GPT、Transformer-XL、XLM 对应文件各运行一次得到9个结果
运行“_ah_WMD.py”4次得到4个结果，WMD-word2vec、WMD-word2vec(globe)、WMD-glove、WMD-glove(globe)
运行“_at_BERT.py”2次得到2个结果，BERT-Large uncased、BERT-Large uncased(wwm)
运行“_at_ELMo.py”2次得到2个结果，ELMo-Original(5.5B)、ELMo-Original(5.5B,级联)
运行“_av_CET.py”13次得到13个结果，基于 random embedding 等13种基础词向量

This code is the implementation of the paper "Coherence-Based Distributed Document Representation Learning for Scientific Documents".

Related tags

Overview

Introduction

Run

信息检索数据运行指南

Owner

tsc

PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PyTorch implementation of SQN based on CloserLook3D's encoder

Graduation Project

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

The implementation of DeBERTa

A rule learning algorithm for the deduction of syndrome definitions from time series data.

Adaptive Denoising Training (ADT) for Recommendation.

Pure python PEMDAS expression solver without using built-in eval function

MetaAvatar: Learning Animatable Clothed Human Models from Few Depth Images

Semi-supervised Representation Learning for Remote Sensing Image Classification Based on Generative Adversarial Networks

Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

Pytorch Lightning Distributed Accelerators using Ray

This is a Image aid classification software based on python TK library development

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

Final project code: Implementing BicycleGAN, for CIS680 FA21 at University of Pennsylvania

Code for the paper "Location-aware Single Image Reflection Removal"

The official PyTorch implementation for NCSNv2 (NeurIPS 2020)