Code for CPM-2 Pre-Train

Overview

CPM-2 Pre-Train

Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支

CPM-2技术报告请参考link

0 模型下载

请在智源资源下载页面进行申请,文件介绍如下:

文件名 描述 参数大小
100000.tar 纯中文模型 110亿
36000.tar 中英文双语模型 110亿
300000.tar 中英文MoE模型 1980亿

1 安装

可以直接拉取我们提供的 Docker 环境:

docker pull gyxthu17/cpm-2:1.0

2 数据

scripts/gen_data.sh 中给出了生成数据文件的脚本示例。该脚本将一个多行的纯文本文件(一个 document 一行)转化为二进制文件(会输出三个 .bin 和三个 .idx 文件),方便模型读取。

3 训练

首先需要将 WORKING_DIR 变量换成 CPM-2 目录的所在路径。调整 NUM_WORKERSNUM_GPUS_PER_WORKER 指定机器数量与每台机器的 GPU 设备数量。修改 ${WORKING_DIR}/src/configs/host_files/hostfile-cpm2 文件将其中的主机名称替换成每台机器的 IP 地址或者和 IP 地址相关联的主机名称。

运行命令:

cd src
bash scripts/pretrain_enc_dec.sh

4 引用

如果您使用了我们的代码,请您引用下面的文章。

@article{cpm-v2,
  title={CPM-2: Large-scale Cost-efficient Pre-trained Language Models},
  author={Zhang, Zhengyan and Gu, Yuxian and Han, Xu and Chen, Shengqi and Xiao, Chaojun and Sun, Zhenbo and Yao, Yuan and Qi, Fanchao and Guan, Jian and Ke, Pei and Cai, Yanzheng and Zeng, Guoyang and Tan, Zhixing and Liu, Zhiyuan and Huang, Minlie and Han, Wentao and Liu, Yang and Zhu, Xiaoyan and Sun, Maosong},
  year={2021}
}
Owner
Tsinghua AI
Tsinghua AI
LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021

LoFTR-with-train-script LoFTR:Detector-Free Local Feature Matching with Transformers CVPR 2021 (with train script --- unofficial ---). About Megadepth

Nan Xiaohu 15 Nov 04, 2022
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

ZJU3DV 1.4k Jan 04, 2023
The repository offers the official implementation of our paper in PyTorch.

Cloth Interactive Transformer (CIT) Cloth Interactive Transformer for Virtual Try-On Bin Ren1, Hao Tang1, Fanyang Meng2, Runwei Ding3, Ling Shao4, Phi

Bingoren 49 Dec 01, 2022
Pairwise model for commonlit competition

Pairwise model for commonlit competition To run: - install requirements - create input directory with train_folds.csv and other competition data - cd

abhishek thakur 45 Aug 31, 2022
Contrastive Feature Loss for Image Prediction

Contrastive Feature Loss for Image Prediction We provide a PyTorch implementation of our contrastive feature loss presented in: Contrastive Feature Lo

Alex Andonian 44 Oct 05, 2022
[CVPR 2020] Transform and Tell: Entity-Aware News Image Captioning

Transform and Tell: Entity-Aware News Image Captioning This repository contains the code to reproduce the results in our CVPR 2020 paper Transform and

Alasdair Tran 85 Dec 13, 2022
Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019

Class-Balanced Loss Based on Effective Number of Samples Tensorflow code for the paper: Class-Balanced Loss Based on Effective Number of Samples Yin C

Yin Cui 546 Jan 08, 2023
Convolutional Neural Networks

Darknet Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation. D

Joseph Redmon 23.7k Jan 05, 2023
An Unsupervised Graph-based Toolbox for Fraud Detection

An Unsupervised Graph-based Toolbox for Fraud Detection Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates s

SafeGraph 99 Dec 11, 2022
A custom-designed Spider Robot trained to walk using Deep RL in a PyBullet Simulation

SpiderBot_DeepRL Title: Implementation of Single and Multi-Agent Deep Reinforcement Learning Algorithms for a Walking Spider Robot Authors(s): Arijit

Arijit Dasgupta 9 Jul 28, 2022
Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

Open Sesame This repository contains the code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge. Credits We built the project on t

9 Jul 24, 2022
To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Kunal Wadhwa 2 Jan 05, 2022
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ 🏆 🧑‍🎓 👩‍⚖️ Dataset Summary Inspired by the recent widespread use of th

95 Dec 08, 2022
Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs This repository contains code to accompany the paper "Hierarchical Clustering: O

3 Sep 25, 2022
A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

Probabilistic U-Net + **Update** + An improved Model (the Hierarchical Probabilistic U-Net) + LIDC crops is now available. See below. Re-implementatio

Simon Kohl 498 Dec 26, 2022
Open AI's Python library

OpenAI Python Library The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. It incl

Pavan Ananth Sharma 3 Jul 10, 2022
BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Overview BisQue is a web-based platform specifically designed to provide researchers with organizational and quantitative analysis tools for up to 5D

Vision Research Lab @ UCSB 26 Nov 29, 2022
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion: A Machine Learning Library for Time Series Table of Contents Introduction Installation Documentation Getting Started Anomaly Detection Foreca

Salesforce 2.8k Dec 30, 2022
Codes and Data Processing Files for our paper.

Code Scripts and Processing Files for EEG Sleep Staging Paper 1. Folder Tree ./src_preprocess (data preprocessing files for SHHS and Sleep EDF) sleepE

Chaoqi Yang 18 Dec 12, 2022
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

Yixuan Su 79 Nov 04, 2022