SimCSE在中文任务上的简单实验

Related tags

MiscellaneousSimCSE
Overview

SimCSE 中文测试

SimCSE在常见中文数据集上的测试,包含ATECBQLCQMCPAWSXSTS-B共5个任务。

介绍

文件

- utils.py  工具函数
- eval.py  评测主文件

评测

命令格式:

python eval.py [model_type] [pooling] [task_name] [dropout_rate]

使用例子:

python eval.py BERT cls ATEC 0.3

其中四个参数必须传入,含义分别如下:

- model_type: 模型,必须是['BERT', 'RoBERTa', 'WoBERT', 'RoFormer', 'BERT-large', 'RoBERTa-large', 'SimBERT', 'SimBERT-tiny', 'SimBERT-small']之一;
- pooling: 池化方式,必须是['first-last-avg', 'last-avg', 'cls', 'pooler']之一;
- task_name: 评测数据集,必须是['ATEC', 'BQ', 'LCQMC', 'PAWSX', 'STS-B']之一;
- dropout_rate: 浮点数,dropout的比例,如果为0则不dropout;

环境

测试环境:tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.5,如果在其他环境组合下报错,请根据错误信息自行调整代码。

下载

Google官方的两个BERT模型:

关于语义相似度数据集,可以从数据集对应的链接自行下载,也可以从作者提供的百度云链接下载。

其中senteval_cn目录是评测数据集汇总,senteval_cn.zip是senteval目录的打包,两者下其一就好。

相关

交流

QQ交流群:808623966,微信群请加机器人微信号spaces_ac_cn

Owner
苏剑林(Jianlin Su)
科学爱好者
苏剑林(Jianlin Su)
Un Assistente Vocale scritto in Python e altamente personalizzabile

Un Assistente Vocale scritto in Python e altamente personalizzabile

Marco 2 May 06, 2022
Source code for Learn Programming: Python

This repository contains the source code of the game engine behind Learn Programming: Python. The two key files are game.py (the main source of the ga

Niema Moshiri 25 Apr 24, 2022
Fetch data from an excel file and create HTML file

excel-to-html Problem Statement! - Fetch data from excel file and create html file Excel.xlsx file contain the information.in multiple rows that is ne

Vivek Kashyap 1 Oct 25, 2021
eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard.

eyes is a Public Opinion Mining System focusing on taiwanese forums such as PTT, Dcard. Features 🔥 Article monitor: helps you capture the trend at a

Sean 116 Dec 29, 2022
Using graph_nets for pion classification and energy regression. Contributions from LLNL and LBNL

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

3 Nov 23, 2022
Utility to play with ADCS, allows to request tickets and collect information about related objects

certi Utility to play with ADCS, allows to request tickets and collect information about related objects. Basically, it's the impacket copy of Certify

Eloy 185 Dec 29, 2022
Speed up your typing by some exercises in the multi-platform(Windows/Ubuntu).

Introduction This project purpose is speed up your typing by some exercises in the multi-platform(Windows/Ubuntu). Build Environment Software Environm

lyfer233 1 Mar 24, 2022
Repo with data from local elections in Portugal from 2009 to 2013

autarquicas - local elections in Portugal Repo with data from local elections in Portugal from 2009 to 2013 Objective To provide, to all, raw data fro

Jorge Gomes 6 Apr 06, 2022
Decipher using Markov Chain Monte Carlo

Decipher using Markov Chain Monte Carlo

Science étonnante 43 Dec 24, 2022
A code base for python programs the goal is to integrate all the useful and essential functions

Base Dev EN This GitHub will be available in French and English FR Ce GitHub sera disponible en français et en anglais Author License Screen EN 🇬🇧 D

Pikatsuto 1 Mar 07, 2022
You will need to install a few python packages for this one.

Features Bait support Auto repair will repair every 10 catches Anti detection (still a work in progress) but using random times and click positions Pr

12 Sep 21, 2022
Python: Wrangled and unpivoted gaming datasets. Tableau: created dashboards - Market Beacon and Player’s Shopping Guide.

Created two information products for GameStop. Using Python, wrangled and unpivoted datasets, and created Tableau dashboards.

Zinaida Dvoskina 2 Jan 29, 2022
B-Pkg is a simple tool in python for installing all basic package in termux

Basic-Pkg 👉🏻 Basic-Pkg 👈🏻 B-Pkg is a simple tool in python for installing all basic package in termux This is my first tool, I hope you will like

Macgaiver 3 Oct 21, 2021
python scripts and other files to generate induction encoder PCBs in Kicad

induction_encoder python scripts and other files to generate induction encoder PCBs in Kicad Targeting the Renesas IPS2200 encoder chips.

Taylor Alexander 8 Feb 16, 2022
Whatsapp Messenger master

Whatsapp Messenger master

Swarup Kharul 5 Nov 21, 2021
Xoroshiro-cairo - A xoroshiro128** pseudorandom number generator implementation in Cairo

xoroshiro-cairo A xoroshiro128** pseudorandom number generator implementation in

Milan Cermak 26 Oct 05, 2022
Pykeeb - A small Python script that prints out currently connected keyboards

pykeeb 🐍 ⌨️ A small Python script that detects and prints out currently connect

Jordan Duabe 1 May 08, 2022
Interactivity Lab: Household Pulse Explorable

Interactivity Lab: Household Pulse Explorable Goal: Build an interactive application that incorporates fundamental Streamlit components to offer a cur

1 Feb 10, 2022
Multi-Process / Censorship Detection

Multi-Process / Censorship Detection

Baris Dincer 2 Dec 22, 2021
🍞 Create dynamic spreadsheets with arbitrary layouts using Python

🍞 tartine What this is Installation Usage example Fetching some data Getting started Adding a header Linking more cells Cell formatting API reference

Max Halford 11 Apr 16, 2022