Graph Neural Topic Model (GNTM)
This is the pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"
Requirements
- Python >= 3.6
- Pytorch == 1.6.0
- torch-geometric == 1.7.0
- torch-scatter == 2.0.6
- torch-sparse == 0.6.9
Dataset
The links of the datasets can be found in the following:
The Glove word embeddings can be download from theis link.
The datasets and word embedings should be placed with the guide of the paths in the settings.py.
Usage
Before training GNTM, we first need to preprocess the data by the following scripts (need adjust some parameters based on the description in our paper for different datasets.):
cd dataPrepare
python preprocess.py
python graph_data.py
Example script to train GNTM:
python main.py \
--device cuda:0 \
--dataset News20 \
--model GDGNNMODEL \
--num_topic 20 \
--num_epoch 400 \
--ni 300 \
--word \
--taskid 0 \
--nwindow 3
Here,
--datasetspecifies the dataset name, currently it supportsNews20,TMN,BNCandReutersfor20 News Group,Tag My News,British National CorpusandReuters, respectively.--devicerepresents computation device, such ascpuorcuda:0.--modelrepresents the used model,GDGNNMODELis corresponding toGNTM--num_topicrepresents the number of topics.--num_epochrepresents the maximized number of training epochs.--nirepresents the dimension of word embeddings.--taskidis corresponding to the random seed.--nwindowrepresents the window size to construct dpcument graphs.
Reference
If you find our methods or code helpful, please kindly cite the paper:
@inproceedings{shen2021topic,
title={Topic Modeling Revisited: A Document Graph-based Neural Network Perspective},
author={Shen, Dazhong and Qin, Chuan and Wang, Chao and Dong, Zheng and Zhu, Hengshu and Xiong, Hui},
booktitle={Proceedings of Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS-2021)},
year={2021}
}