Skip to content

lauhaide/clads

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains data and code for our EMNLP 2021 paper Models and Datasets for Cross-Lingual Summarisation. Please contact me at lperez@ed.ac.uk for any question.

Please cite this paper if you use our code or data.

@InProceedings{clads-emnlp,
  author =      "Laura Perez-Beltrachini and Mirella Lapata",
  title =       "Models and Datasets for Cross-Lingual Summarisation",
  booktitle =   "Proceedings of The 2021 Conference on Empirical Methods in Natural Language Processing ",
  year =        "2021",
  address =     "Punta Cana, Dominican Republic",
}

The XWikis Corpus

Our XWikis corpus is now on HuggingFace datasets. Follow this link to find all language subsets available for download. Thank you to Ronald Cardenas for helping to upload to HF and Huajian Zhang and Guangyu Li for adding Chinese subsets.

The original XWikis corpus is available at XWikis Corpus.

Instructions to re-create our corpus and extract different languages are available here.

Cross-lingual Summarisation Code

Our code is based on Fairseq and mBART/mBART50. You'll find our clone of Fairseq and the code extension to implement our models here and instructions to pre-process the data, and train and evaluate our models here.

Models' Outputs

About

XWikisCorpus, cross-lingual summarisation, multi-lingual summarisation, pre-trained language models, zero-shot and few-shot summarisation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published