Simple Annotated implementation of GPT-NeoX in PyTorch
This is a simpler implementation of GPT-NeoX in PyTorch. We have taken out several optimizations in GPT-NeoX for simplicity.
This is a simpler implementation of GPT-NeoX in PyTorch. We have taken out several optimizations in GPT-NeoX for simplicity.
ICE Tokenizer Token id [0, 20000) are image tokens. Token id [20000, 20100) are common tokens, mainly punctuations. E.g., icetk[20000] == 'unk', ice
GCRC GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Eva
About spellchecker.py Implementing a highly-accurate, brute-force, and dynamically programmed spellchecking program that utilizes the Damerau-Levensht
Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal
DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large 💻 GitHub Repository 📚 Documentat
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.
MONEYBALL - ChatBot Module: 4006CEM, Class: B, Group: 5 Contributors: Jonas Djondo Roshan Kc Cole Samson Daniel Rodrigues Ihteshaam Naseer Kind remind
Chinese Word Vectors 中文词向量 中文 This project provides 100+ Chinese Word Vectors (embeddings) trained with different representations (dense and sparse),
A cross platform OCR Library based on PaddleOCR & OnnxRuntime
In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt
Graph Coloring - Weighted Vertex Coloring Problem This project proposes several local searches and an MCTS algorithm for the weighted vertex coloring
KurumiChatBot Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @TokisakiChatB
Rebiber: A tool for normalizing bibtex with official info. We often cite papers using their arXiv versions without noting that they are already PUBLIS
Introduction New: Interactive demo using Google Colaboratory can be found here TTS-Cube is an end-2-end speech synthesis system that provides a full p
HAN2HAN : Hangul Font Generation
toxic-spans-detection An attempt at the SemEval 2021 Task 5: Toxic Spans Detection. The Toxic Spans Detection task of SemEval2021 required participant
✨A Python framework to explore, label, and monitor data for NLP projects
Emotional phrase extractor Extract phrase in the given text that is used to express the sentiment. Capturing sentiment in language is important in the
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".
Improving Transformer Models by Reordering their Sublayers This repository contains the code for running the character-level Sandwich Transformers fro