Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Overview

gpt-2-simple

gen_demo

A simple Python package that wraps existing model fine-tuning and generation scripts for OpenAI's GPT-2 text generation model (specifically the "small" 124M and "medium" 355M hyperparameter versions). Additionally, this package allows easier generation of text, generating to a file for easy curation, allowing for prefixes to force the text to start with a given phrase.

This package incorporates and makes minimal low-level changes to:

  • Model management from OpenAI's official GPT-2 repo (MIT License)
  • Model finetuning from Neil Shepperd's fork of GPT-2 (MIT License)
  • Text generation output management from textgenrnn (MIT License / also created by me)

For finetuning, it is strongly recommended to use a GPU, although you can generate using a CPU (albeit much more slowly). If you are training in the cloud, using a Colaboratory notebook or a Google Compute Engine VM w/ the TensorFlow Deep Learning image is strongly recommended. (as the GPT-2 model is hosted on GCP)

You can use gpt-2-simple to retrain a model using a GPU for free in this Colaboratory notebook, which also demos additional features of the package.

Install

gpt-2-simple can be installed via PyPI:

pip3 install gpt-2-simple

You will also need to install the corresponding TensorFlow for your system (e.g. tensorflow or tensorflow-gpu). TensorFlow 2.0 is currently not supported and the package will throw an assertion if loaded, so TensorFlow 1.14/1.15 is recommended.

Usage

An example for downloading the model to the local system, finetuning it on a dataset. and generating some text.

Warning: the pretrained 124M model, and thus any finetuned model, is 500 MB! (the pretrained 355M model is 1.5 GB)

import gpt_2_simple as gpt2
import os
import requests

model_name = "124M"
if not os.path.isdir(os.path.join("models", model_name)):
	print(f"Downloading {model_name} model...")
	gpt2.download_gpt2(model_name=model_name)   # model is saved into current directory under /models/124M/


file_name = "shakespeare.txt"
if not os.path.isfile(file_name):
	url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
	data = requests.get(url)
	
	with open(file_name, 'w') as f:
		f.write(data.text)
    

sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              file_name,
              model_name=model_name,
              steps=1000)   # steps is max number of training steps

gpt2.generate(sess)

The generated model checkpoints are by default in /checkpoint/run1. If you want to load a model from that folder and generate text from it:

import gpt_2_simple as gpt2

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess)

gpt2.generate(sess)

As with textgenrnn, you can generate and save text for later use (e.g. an API or a bot) by using the return_as_list parameter.

single_text = gpt2.generate(sess, return_as_list=True)[0]
print(single_text)

You can pass a run_name parameter to finetune and load_gpt2 if you want to store/load multiple models in a checkpoint folder.

There is also a command-line interface for both finetuning and generation with strong defaults for just running on a Cloud VM w/ GPU. For finetuning (which will also download the model if not present):

gpt_2_simple finetune shakespeare.txt

And for generation, which generates texts to files in a gen folder:

gpt_2_simple generate

Most of the same parameters available in the functions are available as CLI arguments, e.g.:

gpt_2_simple generate --temperature 1.0 --nsamples 20 --batch_size 20 --length 50 --prefix "<|startoftext|>" --truncate "<|endoftext|>" --include_prefix False --nfiles 5

See below to see what some of the CLI arguments do.

NB: Restart the Python session first if you want to finetune on another dataset or load another model.

Differences Between gpt-2-simple And Other Text Generation Utilities

The method GPT-2 uses to generate text is slightly different than those like other packages like textgenrnn (specifically, generating the full text sequence purely in the GPU and decoding it later), which cannot easily be fixed without hacking the underlying model code. As a result:

  • In general, GPT-2 is better at maintaining context over its entire generation length, making it good for generating conversational text. The text is also generally gramatically correct, with proper capitalization and few typoes.
  • The original GPT-2 model was trained on a very large variety of sources, allowing the model to incorporate idioms not seen in the input text.
  • GPT-2 can only generate a maximum of 1024 tokens per request (about 3-4 paragraphs of English text).
  • GPT-2 cannot stop early upon reaching a specific end token. (workaround: pass the truncate parameter to a generate function to only collect text until a specified end token. You may want to reduce length appropriately.)
  • Higher temperatures work better (e.g. 0.7 - 1.0) to generate more interesting text, while other frameworks work better between 0.2 - 0.5.
  • When finetuning GPT-2, it has no sense of the beginning or end of a document within a larger text. You'll need to use a bespoke character sequence to indicate the beginning and end of a document. Then while generating, you can specify a prefix targeting the beginning token sequences, and a truncate targeting the end token sequence. You can also set include_prefix=False to discard the prefix token while generating (e.g. if it's something unwanted like <|startoftext|>).
  • If you pass a single-column .csv file to finetune(), it will automatically parse the CSV into a format ideal for training with GPT-2 (including prepending <|startoftext|> and suffixing <|endoftext|> to every text document, so the truncate tricks above are helpful when generating output). This is necessary to handle both quotes and newlines in each text document correctly.
  • GPT-2 allows you to generate texts in parallel by setting a batch_size that is divisible into nsamples, resulting in much faster generation. Works very well with a GPU (can set batch_size up to 20 on Colaboratory's K80)!
  • Due to GPT-2's architecture, it scales up nicely with more powerful GPUs. For the 124M model, if you want to train for longer periods of time, GCP's P100 GPU is about 3x faster than a K80/T4 for only 3x the price, making it price-comparable (the V100 is about 1.5x faster than the P100 but about 2x the price). The P100 uses 100% of the GPU even with batch_size=1, and about 88% of the V100 GPU.
  • If you have a partially-trained GPT-2 model and want to continue finetuning it, you can set overwrite=True to finetune, which will continue training and remove the previous iteration of the model without creating a duplicate copy. This can be especially useful for transfer learning (e.g. heavily finetune GPT-2 on one dataset, then finetune on other dataset to get a "merging" of both datasets).
  • If your input text dataset is massive (>100 MB), you may want to preencode and compress the dataset using gpt2.encode_dataset(file_path). THe output is a compressed .npz file which will load much faster into the GPU for finetuning.
  • The 774M "large" model may support finetuning because it will cause modern GPUs to go out-of-memory (you may get lucky if you use a P100 GPU on Colaboratory). However, you can still generate from the default pretrained model using gpt2.load_gpt2(sess, model_name='774M') and gpt2.generate(sess, model_name='774M').
  • The 1558M "extra large", true model, may not work out-of-the-box with the GPU included with the Colaboratory Notebook. More testing is needed to identify optimial configurations for it.

Interactive Apps Using gpt-2-simple

  • gpt2-small — App using the default GPT-2 124M pretrained model
  • gpt2-reddit — App to generate Reddit titles based on a specified subreddit and/or keyword(s)
  • gpt2-mtg — App to generate Magic: The Gathering cards

Text Generation Examples Using gpt-2-simple

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Disclaimer

This repo has no affiliation or relationship with OpenAI.

Comments
  • Supporting Tensorflow 2

    Supporting Tensorflow 2

    Hello! The folks at @yaledhlab are huge fans of your work @minimaxir. I wanted to send on the first draft of a pull request to make the library function in Tensorflow 1 or 2.

    The only feature not ported to tf2 here is the memory saving gradients logic. The upstream library from which that logic is adopted still hasn't been ported to tf2, and some of the underlying graph traversal methods have been removed from tensorflow itself in 2.x, so there would be a good bit of work getting this running.

    That said, the memory saving gradients are simply a performance enhancement. For interested parties, there's a thread going in @cybertronai/gradient-checkpointing and another thread in the @tensorflow repo that discusses some decorater-based approaches to gradient checkpointing in tf2...

    In any event, thanks for this great work!

    opened by duhaime 19
  • Finetune Json Issue

    Finetune Json Issue

    When running any of the text files I created the program complains about the following issue regardless of the text file.

    sess = gpt2.start_tf_sess() gpt2.finetune(sess, 'java_train_java.txt', model_name=model_name, steps=1000) # steps is max number of training steps

    JSONDecodeError Traceback (most recent call last) in 6 'java_train_java.txt', 7 model_name=model_name, ----> 8 steps=1000) # steps is max number of training steps

    ~/.local/lib/python3.5/site-packages/gpt_2_simple/gpt_2.py in finetune(sess, dataset, steps, model_name, model_dir, combine, batch_size, learning_rate, accumulate_gradients, restore_from, run_name, checkpoint_dir, sample_every, sample_length, sample_num, save_every, print_every, max_checkpoints, use_memory_saving_gradients, only_train_transformer_layers, overwrite) 153 raise(fnf_error) 154 --> 155 enc = encoder.get_encoder(checkpoint_path) 156 hparams = model.default_hparams() 157 with open(os.path.join(checkpoint_path, 'hparams.json')) as f:

    ~/.local/lib/python3.5/site-packages/gpt_2_simple/src/encoder.py in get_encoder(checkpoint_path) 108 def get_encoder(checkpoint_path): 109 with open(os.path.join(checkpoint_path, 'encoder.json'), 'r') as f: --> 110 encoder = json.load(f) 111 with open(os.path.join(checkpoint_path, 'vocab.bpe'), 'r', encoding="utf-8") as f: 112 bpe_data = f.read()

    /usr/lib/python3.5/json/init.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 266 cls=cls, object_hook=object_hook, 267 parse_float=parse_float, parse_int=parse_int, --> 268 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) 269 270

    /usr/lib/python3.5/json/init.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 317 parse_int is None and parse_float is None and 318 parse_constant is None and object_pairs_hook is None and not kw): --> 319 return _default_decoder.decode(s) 320 if cls is None: 321 cls = JSONDecoder

    /usr/lib/python3.5/json/decoder.py in decode(self, s, _w) 337 338 """ --> 339 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 340 end = _w(s, end).end() 341 if end != len(s):

    /usr/lib/python3.5/json/decoder.py in raw_decode(self, s, idx) 355 obj, end = self.scan_once(s, idx) 356 except StopIteration as err: --> 357 raise JSONDecodeError("Expecting value", s, err.value) from None 358 return obj, end

    JSONDecodeError: Expecting value: line 1 column 1 (char 0)

    opened by Rafcin 12
  • Improved Sampling (Nucleus Sampling)

    Improved Sampling (Nucleus Sampling)

    In a recent paper, they find that beam search actually does even worse than greedy search for GPT-2 117M, and they propose a new sampling method to improve text output.

    Neural Text Degradation

    From what I understand, it's top-k sampling, except instead of k tokens, it takes all tokens until probability of these tokens combined is > some parameter p.

    Here's an example from the paper:

    🗣Context: On days when he woke early, the president liked to have coffee in the oval office. There was something about watching the sky grow lighter and lighter as you sat your pajama'd behind in the most powerful chair in the free world, sipping marine strength coffee, that you just couldn't achieve anywhere else.


    💰Greedy: He'd been in the oval office for over a year now, and he'd never been so happy to be there. He'd been in the oval office for almost a year, and he'd never been so happy to be there.


    🚦BeamSearch (k=4): The President was a man of his word. He was a man of his word. He was a man of his word. He was a man of his word. He was a man of his word. He was a man of his word.


    🥄Sampling (t=1.0): You couldn't be sure if that's what you were really doing, and If you decided to take the day off. The president wanted you to take the day off, but he wanted to maintain a curfew and use his influence wisely.


    🇰Top-k Sampling (k=8): "I'm sorry, sir." "No, it's okay, son," the president said. "I understand." "You're going to have to make a special trip down there to get that kid. He has no idea where he's going."


    ⚛️Nucleus Sampling (p=0.9): But that wasn't what drew the president's attention. He'd been seated for maybe a minute when he noticed the other man. What was the guy doing here?


    🗣Gold: He was therefore disagreeably surprised to find a man in an understated grey suit sitting in that selfsame chair sipping tea. The president turned around and went looking for his chief of staff.

    opened by bob80333 11
  • text generation quality for Chinese

    text generation quality for Chinese

    I use colab try to train a chinese novel , but result is not actually readable as below: ======== SAMPLE 1 ======== 是将吹雾挖出的将成功探测而出。令得她不过来了。 在这句她却便是张了什么处。可地云岚宗的基地本就有猜测的唶回纳,若是可见开一些他们纗地图下的威风。有什么那些自抗成形功探也是被实亀约不运的失踪,这个家伙?” “按一东西。 “以后?” 第一千两百四纳乎容其收获 双翼下午床 偂静以及云岚宗家族时,现在穿过尮层地死死一死的一位完全自人。若是被这位似乎么好。不过这些层地曘众而速的缘故。先前云岚宗家族与家伙破碎,也知道。” “按这些年边按一东西。” 双危得枯落双成一些纸藏。现在云岚宗家族。则是有着更是珋地的纳戒。一名一名视线成功探而来。似此如同一股落地被月地位置身给在落地墓墓吼。将三色山峰。都是在她身处的族人而出。他们。能够如何丧门两个家族事。我没有丝毫。比较给云岚宗家族身族成功压渐了过来二人。” 落地最后对此刻低低的落地。这些人吼力地双更驰在云岚宗这般种有些做完的同局一段时间。就在山脉交手吸了一圈。他们仅仅是将会从丝毫地毒间。那家伙。拥有会难以过足有山脉路。想必地实力。可怕的毒间不会速助引。” 心中现在也算是连脸色。纳戒的一道道人影击杀着自指大会回底独血之人。一个了。能够击

    my training parameter is ... gpt2.finetune(sess, dataset="train.txt", model_name='345M', steps=1000, restore_from='fresh', print_every=20, sample_every=200, save_every=500)

    Since GPT-2 should be very powerful for text generation, I just want to make sure this quality result is normal or I still have something not figure out yet.

    Thnx

    opened by chiangandy 8
  • Multi-gpu support

    Multi-gpu support

    • Added automated gpu name gathering and model partitioning across multiple GPUs
    • Added boolean multi-gpu signature to finetune and load_gpt2 functions.
    • Added CLI option for multi_gpu

    Note: if layer == 10: in the model was removed so all layers are checkpointed. This can be reverted.

    opened by huntrontrakkr 7
  • GPT2 Chatbot

    GPT2 Chatbot

    Hi, I'm new to this stuff, and I'm trying to make a chatbot out of gpt2 using finetune... My question is, can I make gpt remember stuff like this:

    Q: Hey, I'm Joe. A: Hey, Joe. Q: What's my name? A:Joe.

    Right now, it can barely remember anything I write to it. Do I just need a better more dialogue-focused dataset to finetune it on or is there something else I can do to make it remember? I'm using the colab notebook.

    Thanks, I'm rather new to ML, so...

    opened by ZeroMaxinumXZ 7
  • Not able to load the dataset

    Not able to load the dataset

    I have been trying to train the 117M model, with the dataset of size 1.03 GB, with 64 GB ram. But while it load the dataset, it remain stuck there. And after some 30 min, its just terminate. Here is the log.

    Fetching checkpoint: 1.00kit [00:00, 679kit/s]                                                      
    Fetching encoder.json: 1.04Mit [00:00, 16.5Mit/s]                                                   
    Fetching hparams.json: 1.00kit [00:00, 573kit/s]                                                    
    Fetching model.ckpt.data-00000-of-00001:  11%|#8               | 53.6M/498M [00:00<00:07, 62.2Mit/s]
    Fetching model.ckpt.data-00000-of-00001:  28%|#####3             | 141M/498M [00:01<00:03, 105Mit/s]
    Fetching model.ckpt.data-00000-of-00001:  46%|########7          | 230M/498M [00:02<00:02, 108Mit/s]
    Fetching model.ckpt.data-00000-of-00001:  63%|###########4      | 316M/498M [00:03<00:02, 66.6Mit/s]
    Fetching model.ckpt.data-00000-of-00001:  77%|#############8    | 384M/498M [00:04<00:01, 58.8Mit/s]
    Fetching model.ckpt.data-00000-of-00001:  92%|################6 | 460M/498M [00:06<00:00, 44.8Mit/s]
    Fetching model.ckpt.data-00000-of-00001: 498Mit [00:06, 72.4Mit/s]                                  
    Fetching model.ckpt.index: 6.00kit [00:00, 3.39Mit/s]                                               
    Fetching model.ckpt.meta: 472kit [00:00, 9.86Mit/s]                                                 
    Fetching vocab.bpe: 457kit [00:00, 9.54Mit/s]                                                       2019-05-19 16:12:23.408514: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instr
    uctions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
    
      0%|          | 0/1 [00:00<?, ?it/s]
    

    I also saw another issue, which ask to cut the text file. How much has to be ideal size in order to train. If not, what model size could go with 1 GB text file ?

    Help will be appreciated 👍

    opened by nvnvashisth 7
  • 0 tokens when attempting to finetune using .txt

    0 tokens when attempting to finetune using .txt

    Using the collaboratory (have not tried locally), I tried loading a normal text file and it found 0 tokens.

    I've found that splitting on whitespace and turning my source file into a csv was the only way to get past this.

    All of the examples reference "shakespeare.txt" but that file isn't included in the repo so I have not been able to confirm what the tool is expecting from a plaintext file.

    opened by lukegalea 7
  • Fails to load dataset on Windows due to text encoding

    Fails to load dataset on Windows due to text encoding

    On Windows 10, when attempting to run this code on my own dataset, I run into this error:

    return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2243891: character maps to <undefined>

    Adding the parameter encoding='utf8' to line 33 of load_dataset.py: with open(path, 'r') as fp: appears to fix this issue. I'm not 100% sure, because now instead of erroring out immediately, it's using as much RAM as it can

    Screen Shot 2019-04-20 at 6 48 10 PM

    I can open a pull request for this if you'd like.

    bug 
    opened by bob80333 7
  • Question about loss calculation

    Question about loss calculation

    Hi all,

    I've been reading the code, trying to understand the implementation, but I have a question about the loss function that possibly is kinda dumb.

    Suppose a target text: target="The fox went to the forest fast". Also suppose an input sample context=[The, fox, went, to, the, forest].

    The loss is calculated as:

    loss = tf.reduce_mean(
            input_tensor=tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels=context[:, 1:], logits=output['logits'][:, :-1]))
    

    As far as I know, the goal is to predict the nth token given all the previous ones (i.e. "fast", given the context). However, if we set labels as [fox, went, to, the, forest], and the logits as the [0, n-1] tokens, how can the model know the target for the last token (n)? Aren't we teaching the model to output the padded context and leaving as random the last token?

    Sorry for the inconvennience.

    opened by AIRLegend 6
  • Resolved most deprecation warnings

    Resolved most deprecation warnings

    I hate hiding warnings/errors but these ones were driving me crazy, so I went through and fixed up all the deprecation warnings I could find. There's still two (that I found) out there that I couldn't figure out how to fix, but this clears up most of them.

    opened by charliekmorris 6
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Simple question

    Simple question

    Does this gpt-2-simple python package works like an API or is it running on a server?.... does this run locally on our pc?.....

    I just started working with these and I was confused...so asked this question.

    THANK YOU

    opened by Dsanthosh2006 1
  • change the decode mode from Beam search to something else

    change the decode mode from Beam search to something else

    Hi, if I want to change the decode mode from Beam search to something else when I use this model, where should I change it? Looking forward to your reply

    opened by symebaline 1
  • Converting generated data to not tokenized default version

    Converting generated data to not tokenized default version

    Hi there,

    Thank you for sharing this repo. My problem is, I am training sequential data, where each word is an unique code in my txt file. Seemingly, GPT does not have any problem to understand and generate this data, however the data it creates are only tokens, before it tokenize the dataset before training. Now I have tokens as output but I need to convert my data back to use it properly. I did not have this problem with textgenrnn because it is based on chars, however i could not run it on colab due to dependencies. How can I map real values to generated tokens?

    Thanks a lot.

    opened by erenarkangel 0
  • How to load pretrained model in a new notebook without retraining?

    How to load pretrained model in a new notebook without retraining?

    Hello. I trained a model using GPT-2, which is great by the way. I am having trouble getting it to run in a new notebook. Do you have any instructions for how to do so, without retraining? Thank you.

    opened by Tylersuard 1
Releases(v0.8.1)
  • v0.8.1(Oct 18, 2021)

    Thanks to https://github.com/YaleDHLab via https://github.com/minimaxir/gpt-2-simple/pull/275, gpt-2-simple now supports TensorFlow 2 by default, and the minimum TensorFlow version is now 2.5.1! The Colab Notebook has also been update to no longer use TensorFlow 1.X.

    Note: Development on gpt-2-simple has mostly been superceded by aitextgen, which has similar AI text generation capabilities with more efficient training time and resource usage. If you do not require using TensorFlow, I recommend using aitextgen instead. Checkpoints trained using gpt-2-simple can be loaded using aitextgen as well.

    Source code(tar.gz)
    Source code(zip)
    gpt_2_simple-0.8.1.tar.gz(25.84 KB)
  • v0.7.2(Feb 14, 2021)

  • v0.7.1(Dec 28, 2019)

  • v0.7(Dec 1, 2019)

  • v0.6(Aug 28, 2019)

    • 774M is explicitly blocked from being fine-tuned and will trigger an assert if attempted. If a way to finetune it without being super-painful is added, the ability to finetune it will be restored.
    • Allow ability to generate text from the default pretrained models by passing model_name to gpt2.load_gpt2() and gpt2.generate() (this will work with 774M.
    • Addsgd as an optimizer parameter to finetune (default: adam)
    • Support for changed model names, w/ changes more prominent in the README.
    Source code(tar.gz)
    Source code(zip)
    gpt_2_simple-0.6.tar.gz(25.67 KB)
  • v0.5.4(Jul 29, 2019)

    Merged a few PRs:

    Fixed generate cmd run name: #78 Resolved most depreciation warnings: #83 Optional model parameters: #90

    This does not make the package fully TF 2.0 compatible, but it's a big step!

    Source code(tar.gz)
    Source code(zip)
  • v0.5.3(Jun 19, 2019)

  • v0.5.2(Jun 18, 2019)

  • v0.5.1(Jun 16, 2019)

  • v0.5(May 20, 2019)

    Adapted a few functions from Neil Shepperd's fork:

    • Nucleus Sampling (top_p) when generating text, which results in surprisingly different results. (setting top_p=0.9 works well). Supercedes top_k when used. (#51)
    • An encode_dataset() function to preencode and compress a large dataset before loading it for finetuning. (#19, #54)

    Improvements to continuing model training:

    • overwrite argument for finetune: with restore_from="latest", this continues model training without creating a duplicate copy of the model, and is therefore good for transfer learning using multiple datasets (#20)
    • You can continue to finetune a model without having the original GPT-2 model present.

    Improvements with I/O involving Colaboratory

    • Checkpoint folders are now packaged into a .tar file when copying to Google Drive, and when copying from Google Drive, the '.tar' file is automatically unpackaged into the correct checkpoint format. (you can pass copy_folder=True to the copy_checkpoint function to revert to the old behavior). (#37: thanks @woctezuma !)
    • copy_checkpoint_to_gdrive and copy_checkpoint_from_gdrive now take a run_name argument instead of a checkpoint_folder argument.

    Miscellaneous

    • Added CLI arguments for top_k, top_p, overwrite.
    • Cleaned up redundant function parameters (#39)
    Source code(tar.gz)
    Source code(zip)
    gpt_2_simple-0.5.tar.gz(24.44 KB)
  • v0.4.2(May 5, 2019)

    • load_gpt2() in a fresh session is much faster and uses much less memory when loaded. (for the 117M model, the system will stay under <2 GB RAM which is the critical point for cloud services)
    • start_tf_sess() now accepts a threads parameter, which is useful if you know exactly how many threads will be used.
    Source code(tar.gz)
    Source code(zip)
    gpt_2_simple-0.4.2.tar.gz(23.46 KB)
  • v0.4.1(May 5, 2019)

  • v0.4(May 5, 2019)

  • v0.3.1(Apr 23, 2019)

    • Fix one-off error where checkpoint saved a step early.
    • Fix issue where restore_from='fresh uses the counter from a previously-trained checkpoint.
    • If restore_from='latest , steps will now train for the specified amount of steps, instead of the training until the specified number of steps. (#13, #14)
    Source code(tar.gz)
    Source code(zip)
    gpt_2_simple-0.3.1.tar.gz(17.84 KB)
  • v0.3(Apr 21, 2019)

  • v0.2(Apr 20, 2019)

Owner
Max Woolf
Data Scientist @buzzfeed. Plotter of pretty charts.
Max Woolf
Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

Vaibhaw 12 Sep 28, 2022
LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation Tasks | Datasets | LongLM | Baselines | Paper Introduction LOT is a ben

46 Dec 28, 2022
fastai ulmfit - Pretraining the Language Model, Fine-Tuning and training a Classifier

fast.ai ULMFiT with SentencePiece from pretraining to deployment Motivation: Why even bother with a non-BERT / Transformer language model? Short answe

Florian Leuerer 26 May 27, 2022
PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

YangHeng 567 Jan 07, 2023
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

pyannote 2.2k Jan 09, 2023
Sentence Embeddings with BERT & XLNet

Sentence Transformers: Multilingual Sentence Embeddings using BERT / RoBERTa / XLM-RoBERTa & Co. with PyTorch This framework provides an easy method t

Ubiquitous Knowledge Processing Lab 9.1k Jan 02, 2023
a test times augmentation toolkit based on paddle2.0.

Patta Image Test Time Augmentation with Paddle2.0! Input | # input batch of images / / /|\ \ \ # apply

AgentMaker 110 Dec 03, 2022
Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

This codebase is being actively maintained, please create and issue if you have issues using it Basics All data files are included under losses and ea

Justin Terry 32 Nov 09, 2021
A Python script which randomly chooses and prints a file from a directory.

___ ____ ____ _ __ ___ / _ \ | _ \ | _ \ ___ _ __ | '__| / _ \ | |_| || | | || | | | / _ \| '__| | | | __/ | _ || |_| || |_| || __

yesmaybenookay 0 Aug 06, 2021
Code for the paper PermuteFormer

PermuteFormer This repo includes codes for the paper PermuteFormer: Efficient Relative Position Encoding for Long Sequences. Directory long_range_aren

Peng Chen 42 Mar 16, 2022
Help you discover excellent English projects and get rid of disturbing by other spoken language

GitHub English Top Charts 「Help you discover excellent English projects and get

GrowingGit 544 Jan 09, 2023
Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset The main part of the work focuses on the exploration and study of different approaches whi

Nikolas Petrou 1 Jan 12, 2022
PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

LXMERT: Learning Cross-Modality Encoder Representations from Transformers Our servers break again :(. I have updated the links so that they should wor

Hao Tan 838 Dec 19, 2022
MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-tr

Microsoft 228 Nov 21, 2022
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Structured Super Lottery Tickets in BERT This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compress

Chen Liang 16 Dec 11, 2022
Trained T5 and T5-large model for creating keywords from text

text to keywords Trained T5-base and T5-large model for creating keywords from text. Supported languages: ru Pretraining Large version | Pretraining B

Danil 61 Nov 24, 2022
This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

Brycen Westgarth 110 Jan 07, 2023
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

461 Dec 28, 2022
This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its

SaiVenkatDhulipudi 2 Nov 17, 2021
BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia. Its intended use is as input for neural models in natural languag

Benjamin Heinzerling 1.1k Jan 03, 2023