Skip to content

Sense-GVT/SNCSE

Repository files navigation

SNCSE

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

This is the repository for SNCSE.

SNCSE aims to alleviate feature suppression in contrastive learning for unsupervised sentence embedding. In the field, feature suppression means the models fail to distinguish and decouple textual similarity and semantic similarity. As a result, they may overestimate the semantic similarity of any pairs with similar textual regardless of the actual semantic difference between them. And the models may underestimate the semantic similarity of pairs with less words in common. (Please refer to Section 5 of our paper for several instances and detailed analysis.) To this end, we propose to take the negation of original sentences as soft negative samples, and introduce them into the traditional contrastive learning framework through bidirectional margin loss (BML). The structure of SNCSE is as follows:

models2

The performance of SNCSE on STS task with different encoders is:

image

To reproduce above results, please download the files and unzip it to replace the original file folder. Then download the models from Google or Baidu , modify the file path variables and run:

python bert_prediction.py
python roberta_prediction.py

To train SNCSE, please download the training file, and put it at /SNCSE/data. You can either run:

python generate_soft_negative_samples.py

to generate soft negative samples, or use our files in /Files/soft_negative_samples.txt. Then you may modify and run train_SNCSE.sh.

To evaluate the checkpoints saved during training on the development set of STSB task, please run:

python bert_evaluation.py
python roberta_evaluation.py

Feel free to contact the authors at wanghao2@sensetime.com for any questions.

citation

@article{wang2022sncse,
  title={SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples},
  author={Wang, Hao and Li, Yangguang and Huang, Zhen and Dou, Yong and Kong, Lingpeng and Shao, Jing},
  journal={arXiv preprint arXiv:2201.05979},
  year={2022}
}

About

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published