2021海华AI挑战赛·中文阅读理解·技术组·第三名

Overview

海华中文阅读理解比赛

队名:ATTOY排名:第三名

赛题背景

https://www.biendata.xyz/competition/haihua_2021

文字是人类用以记录和表达的最基本工具,也是信息传播的重要媒介。透过文字与符号,我们可以追寻人类文明的起源,可以传播知识与经验,读懂文字是认识与了解的第一步。对于人工智能而言,它的核心问题之一就是认知,而认知的核心则是语义理解。

机器阅读理解(Machine Reading Comprehension)是自然语言处理和人工智能领域的前沿课题,对于使机器拥有认知能力、提升机器智能水平具有重要价值,拥有广阔的应用前景。机器的阅读理解是让机器阅读文本,然后回答与阅读内容相关的问题,体现的是人工智能对文本信息获取、理解和挖掘的能力,在对话、搜索、问答、同声传译等领域,机器阅读理解可以产生的现实价值正在日益凸显,长远的目标则是能够为各行各业提供解决方案。

《2021海华AI挑战赛·中文阅读理解》大赛由中关村海华信息技术前沿研究院与清华大学交叉信息研究院联合主办,腾讯云计算协办。共设置题库16000条数据,总奖金池30万元,且腾讯云计算为中学组赛道提供独家算力资源支持。

本次比赛的数据来自小学/中高考语文阅读理解题库(其中,技术组的数据主要为中高考语文试题,中学组的数据主要来自小学语文试题)。相较于英文,中文阅读理解有着更多的歧义性和多义性,然而璀璨的中华文明得以绵延数千年,离不开每一个时代里努力钻研、坚守传承的人,这也正是本次大赛的魅力与挑战,让机器读懂文字,让机器学习文明。秉承着人才培养的初心,我们继续保留针对中学组以及技术组的两条平行赛道,科技创新,时代有我,期待你们的回响。

比赛任务

本次比赛技术组的数据来自中高考语文阅读理解题库。每条数据都包括一篇文章,至少一个问题和多个候选选项。参赛选手需要搭建模型,从候选选项中选出正确的一个。

2021海华AI挑战赛·中文阅读理解·技术组 第三名(ATTOY团队)解决方案

算法方案

1.预训练模型:MacBERT-Large

2.对抗训练

FreeLB ICLR 2020

3.知识蒸馏

Born Again Neural Networks ICML 2018

环境要求

tqdm==4.50.2 numpy==1.19.2 pandas==1.1.3 transformers==3.5.1 torch==1.7.0+cu110 scikit_learn==0.24.2

运行方法

bash bash.sh

超参数

FreeLB训练参数配置

'fold_num': 4, 
'seed': 42,
'model': 'hfl/chinese-macbert-large', 
'max_len': 512, 
'epochs': 12,
'train_bs': 4, 
'valid_bs': 4,
'lr': 2e-5,  
'lrSelf': 1e-4,  
'accum_iter': 8, 
'weight_decay': 1e-4, 
'adv_lr': 0.01,
'adv_norm_type': 'l2',
'adv_init_mag': 0.03,
'adv_max_norm': 1.0,
'ip': 2

EKD训练参数配置

'fold_num': 4, 
'seed': 42,
'model': 'hfl/chinese-macbert-large', 
'max_len': 256, 
'epochs': 12,
'train_bs': 4, 
'valid_bs': 4,
'lr': 2e-5,  
'lrSelf': 1e-4,  
'accum_iter': 8, 
'weight_decay': 1e-4, 
'adv_lr': 0.01,
'adv_norm_type': 'l2',
'adv_init_mag': 0.03,
'adv_max_norm': 1.0,
'ip': 2
Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation

Diego 1 Mar 20, 2022
Conditional Transformer Language Model for Controllable Generation

CTRL - A Conditional Transformer Language Model for Controllable Generation Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong,

Salesforce 1.7k Dec 28, 2022
To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Ragesh Hajela 0 Feb 08, 2022
Pytorch version of BERT-whitening

BERT-whitening This is the Pytorch implementation of "Whitening Sentence Representations for Better Semantics and Faster Retrieval". BERT-whitening is

Weijie Liu 255 Dec 27, 2022
Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

eBook Reader Dictionaries Finally, decent dictionaries based on Wiktionary for your beloved eBook reader. Dictionaries Catalan 🚧 Ελληνικά (help welco

Mickaël Schoentgen 163 Dec 31, 2022
Automated question generation and question answering from Turkish texts using text-to-text transformers

Turkish Question Generation Offical source code for "Automated question generation & question answering from Turkish texts using text-to-text transfor

Open Business Software Solutions 29 Dec 14, 2022
Knowledge Oriented Programming Language

KoPL: 面向知识的推理问答编程语言 安装 | 快速开始 | 文档 KoPL全称 Knowledge oriented Programing Language, 是一个为复杂推理问答而设计的编程语言。我们可以将自然语言问题表示为由基本函数组合而成的KoPL程序,程序运行的结果就是问题的答案。目前,

THU-KEG 62 Dec 12, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 03, 2021
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

ReekyStive 3 Nov 11, 2022
Some embedding layer implementation using ivy library

ivy-manual-embeddings Some embedding layer implementation using ivy library. Just for fun. It is based on NYCTaxiFare dataset from kaggle (cut down to

Ishtiaq Hussain 2 Feb 10, 2022
Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

anlp21 Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dba

David Bamman 48 Dec 06, 2022
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Weitang Liu 1.6k Jan 03, 2023
Google's Meena transformer chatbot implementation

Here's my attempt at recreating Meena, a state of the art chatbot developed by Google Research and described in the paper Towards a Human-like Open-Domain Chatbot.

Francesco Pham 94 Dec 25, 2022
NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

Saarland University Spoken Language Systems Group 39 Nov 15, 2022
An ActivityWatch watcher to pose questions to the user and record her answers.

aw-watcher-ask An ActivityWatch watcher to pose questions to the user and record her answers. This watcher uses Zenity to present dialog boxes to the

Bernardo Chrispim Baron 33 Dec 03, 2022
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Words_And_Phrases Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours Abbreviations Abbreviation

Subhadeep Mandal 1 Feb 01, 2022
PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop

molten A minimal, extensible, fast and productive API framework for Python 3. Changelog: https://moltenframework.com/changelog.html Community: https:/

3.2k Dec 28, 2022
Grover is a model for Neural Fake News -- both generation and detectio

Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.

Rowan Zellers 856 Dec 24, 2022