CCGL: Contrastive Cascade Graph Learning
This repo provides a reference implementation of Contrastive Cascade Graph Learning (CCGL) framework as described in the paper:
CCGL: Contrastive Cascade Graph Learning
Xovee Xu, Fan Zhou, Kunpeng Zhang, and Siyuan Liu
Submitted for review
arXiv:2107.12576
Dataset
You can download all five datasets (Weibo, Twitter, ACM, APS, and DBLP) via any one of the following links:
| Google Drive | Dropbox | Onedrive | Tencent Drive | Baidu Netdisk | 
|---|---|---|---|---|
|  | .png) |  |  |   trqg | 
Environmental Settings
Our experiments are conducted on Ubuntu 20.04, a single NVIDIA 1080Ti GPU, 48GB RAM, and Intel i7 8700K. CCGL is implemented by Python 3.7, TensorFlow 2.3, Cuda 10.1, and Cudnn 7.6.5.
Create a virtual environment and install GPU-support packages via Anaconda:
# create virtual environment
conda create --name=ccgl python=3.7 cudatoolkit=10.1 cudnn=7.6.5
# activate virtual environment
conda activate ccgl
# install other dependencies
pip install -r requirements.txt
Usage
Here we take Weibo dataset as an example to demonstrate the usage.
Preprocess
Step 1: divide, filter, generate labeled and unlabeled cascades:
cd ccgl
# labeled cascades
python src/gene_cas.py --input=./datasets/weibo/ --unlabel=False
# unlabeled cascades
python src/gene_cas.py --input=./datasets/weibo/ --unlabel=True
Step 2: augment both labeled and unlabeled cascades (here we use the AugSIM strategy):
python src/augmentor.py --input=./datasets/weibo/ --aug_strategy=AugSIM
Step 3: generate cascade embeddings:
python src/gene_emb.py --input=./datasets/weibo/ 
Pre-training
python src/pre_training.py --name=weibo-0 --input=./datasets/weibo/ --projection_head=4-1
The saved pre-training model is named as weibo-0.
Fine-tuning
python src/fine_tuning.py --name=weibo-0 --num=0 --input=./datasets/weibo/ --projection_head=4-1
Here we load the pre-trained model weibo-0 and save the teacher network as weibo-0-0.
Distillation
python src/distilling.py --name=weibo-0-0 --num=0 --input=./datasets/weibo/ --projection_head=4-1
Here we load the teacher network weibo-0-0 and save the student network as weibo-0-0-student-0.
(Optional) Run the Base model
python src/base_model.py --input=./datasets/weibo/ 
CCGL model weights
We provide pre-trained, fine-tuned, and distilled CCGL model weights. Please see details in the following table.
| Model | Dataset | Label Fraction | Projection Head | MSLE | Weights | 
|---|---|---|---|---|---|
| Pre-trained CCGL model | 100% | 4-1 | - | Download | |
| Pre-trained CCGL model | 10% | 4-4 | - | Download | |
| Pre-trained CCGL model | 1% | 4-3 | - | Download | |
| Fine-tuned CCGL model | 100% | 4-1 | 2.70 | Download | |
| Fine-tuned CCGL model | 10% | 4-4 | 2.87 | Download | |
| Fine-tuned CCGL model | 1% | 4-3 | 3.30 | Download | 
Load weights into the model:
# construct model, carefully check projection head designs:
# use different number of Dense layers
...
# load weights for fine-tuning, distillation, or evaluation
model.load_weights(weight_path)
Check src/fine_tuning.py and src/distilling.py for weights loading examples.
Default hyper-parameter settings
Unless otherwise specified, we use following default hyper-parameter settings.
| Param | Value | Param | Value | 
|---|---|---|---|
| Augmentation strength | 0.1 | Pre-training epochs | 30 | 
| Augmentation strategy | AugSIM | Projection Head (100%) | 4-1 | 
| Batch size | 64 | Projection Head (10%) | 4-4 | 
| Early stopping patience | 20 | Projection Head (1%) | 4-3 | 
| Embedding dimension | 64 | Model size | 128 (4x) | 
| Learning rate | 5e-4 | Temperature | 0.1 | 
Change Logs
- Jul 21, 2021: fix a bug and some annotations
Cite
If you find our paper & code are useful for your research, please consider citing us 
@article{xu2021ccgl, 
  author = {Xovee Xu and Fan Zhou and Kunpeng Zhang and Siyuan Liu}, 
  title = {{CCGL}: Contrastive Cascade Graph Learning}, 
  journal = {arXiv:2107.12576},
  year = {2021}, 
}
We also have a survey paper you might be interested:
@article{zhou2021survey,
  author = {Fan Zhou and Xovee Xu and Goce Trajcevski and Kunpeng Zhang}, 
  title = {A Survey of Information Cascade Analysis: Models, Predictions, and Recent Advances}, 
  journal = {ACM Computing Surveys (CSUR)}, 
  volume = {54},
  number = {2},
  year = {2021},
  articleno = {27},
  numpages = {36},
  doi = {10.1145/3433000},
}
Acknowledgment
We would like to thank Xiuxiu Qi, Ce Li, Qing Yang, and Wenxiong Li for sharing their computing resources and help us to test the codes. We would also like to show our gratitude to the authors of SimCLR (and Sayak Paul), node2vec, DeepHawkes, and others, for sharing their codes and datasets.
Contact
For any questions please open an issue or drop an email to: xovee at ieee.org