Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Last update: Sep 14, 2022

Related tags

Overview

Graph Neural Topic Model (GNTM)

This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Requirements

Python >= 3.6
Pytorch == 1.6.0
torch-geometric == 1.7.0
torch-scatter == 2.0.6
torch-sparse == 0.6.9

Dataset

The links of the datasets can be found in the following:

The Glove word embeddings can be download from theis link.

The datasets and word embedings should be placed with the guide of the paths in the settings.py.

Usage

Before training GNTM, we first need to preprocess the data by the following scripts (need adjust some parameters based on the description in our paper for different datasets.):

cd dataPrepare
python preprocess.py
python graph_data.py

Example script to train GNTM:

python main.py \
--device cuda:0 \
--dataset News20 \
--model GDGNNMODEL \
--num_topic 20 \
--num_epoch 400 \
--ni 300  \
--word \
--taskid 0 \
--nwindow  3

Here,

--dataset specifies the dataset name, currently it supports News20, TMN, BNC and Reuters for 20 News Group, Tag My News, British National Corpus and Reuters, respectively.
--device represents computation device, such as cpu or cuda:0.
--model represents the used model, GDGNNMODEL is corresponding to GNTM
--num_topic represents the number of topics.
--num_epoch represents the maximized number of training epochs.
--ni represents the dimension of word embeddings.
--taskid is corresponding to the random seed.
--nwindow represents the window size to construct dpcument graphs.

Reference

If you find our methods or code helpful, please kindly cite the paper:

@inproceedings{shen2021topic,
  title={Topic Modeling Revisited: A Document Graph-based Neural Network Perspective},
  author={Shen, Dazhong and Qin, Chuan and Wang, Chao and Dong, Zheng and Zhu, Hengshu and Xiong, Hui},
  booktitle={Proceedings of Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS-2021)},
  year={2021}
}

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

Related tags

Overview

Graph Neural Topic Model (GNTM)

Requirements

Dataset

Usage

Reference

Owner

Dazhong Shen

Parasite: a tool allowing you to compress and decompress files, to reduce their size

The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

AfriBERTa: Exploring the Viability of Pretrained Multilingual Language Models for Low-resourced Languages

Clustering is a popular approach to detect patterns in unlabeled data

Portfolio Optimization and Quantitative Strategic Asset Allocation in Python

Classic Papers for Beginners and Impact Scope for Authors.

FastyAPI is a Stack boilerplate optimised for heavy loads.

PyTorch implementation of Graph Convolutional Networks in Feature Space for Image Deblurring and Super-resolution, IJCNN 2021.

An evaluation toolkit for voice conversion models.

A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python.

Code for the paper "Reinforced Active Learning for Image Segmentation"

Easy and Efficient Object Detector

Global-Local Context Network for Person Search

'Aligned mixture of latent dynamical systems' (amLDS) for stimulus decoding probabilistic manifold alignment across animals. P. Herrero-Vidal et al. NeurIPS 2021 code.

Starter Code for VALUE benchmark

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

CL-Gym: Full-Featured PyTorch Library for Continual Learning

Code for "Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks", CVPR 2021