📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

Last update: May 22, 2021

Overview

tensorlm

Generate Shakespeare poems with 4 lines of code.

Installation

tensorlm is written in / for Python 3.4+ and TensorFlow 1.1+

pip3 install tensorlm

Basic Usage

Use the CharLM or WordLM class:

import tensorflow as tf
from tensorlm import CharLM
    
with tf.Session() as session:
    
    # Create a new model. You can also use WordLM
    model = CharLM(session, "datasets/sherlock/tinytrain.txt", max_vocab_size=96,
                   neurons_per_layer=100, num_layers=3, num_timesteps=15)
    
    # Train it 
    model.train(session, max_epochs=10, max_steps=500)
    
    # Let it generate a text
    generated = model.sample(session, "The ", num_steps=100)
    print("The " + generated)

This should output something like:

The  ee e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e

Command Line Usage

Train: python3 -m tensorlm.cli --train=True --level=char --train_text_path=datasets/sherlock/tinytrain.txt --max_vocab_size=96 --neurons_per_layer=100 --num_layers=2 --batch_size=10 --num_timesteps=15 --save_dir=out/model --max_epochs=300 --save_interval_hours=0.5

Sample: python3 -m tensorlm.cli --sample=True --level=char --neurons_per_layer=400 --num_layers=3 --num_timesteps=160 --save_dir=out/model

Evaluate: python3 -m tensorlm.cli --evaluate=True --level=char --evaluate_text_path=datasets/sherlock/tinyvalid.txt --neurons_per_layer=400 --num_layers=3 --batch_size=10 --num_timesteps=160 --save_dir=out/model

See python3 -m tensorlm.cli --help for all options.

Advanced Usage

Custom Input Data

The inputs and targets don't have to be text. GeneratingLSTM only expects token ids, so you can use any data type for the sequences, as long as you can encode the data to integer ids.

# We use integer ids from 0 to 19, so the vocab size is 20. The range of ids must always start
# at zero.
batch_inputs = np.array([[1, 2, 3, 4], [15, 16, 17, 18]])  # 2 batches, 4 time steps each
batch_targets = np.array([[2, 3, 4, 5], [16, 17, 18, 19]])

# Create the model in a TensorFlow graph
model = GeneratingLSTM(vocab_size=20, neurons_per_layer=10, num_layers=2, max_batch_size=2)

# Initialize all defined TF Variables
session.run(tf.global_variables_initializer())

for _ in range(5000):
    model.train_step(session, batch_inputs, batch_targets)

sampled = model.sample_ids(session, [15], num_steps=3)
print("Sampled: " + str(sampled))

This should output something like:

Sampled: [16, 18, 19]

Custom Training, Dropout etc.

Use the GeneratingLSTM class directly. This class is agnostic to the dataset type. It expects integer ids and returns integer ids.

import tensorflow as tf
from tensorlm import Vocabulary, Dataset, GeneratingLSTM

BATCH_SIZE = 20
NUM_TIMESTEPS = 15

with tf.Session() as session:
    # Generate a token -> id vocabulary based on the text
    vocab = Vocabulary.create_from_text("datasets/sherlock/tinytrain.txt", max_vocab_size=96,
                                        level="char")

    # Obtain input and target batches from the text file
    dataset = Dataset("datasets/sherlock/tinytrain.txt", vocab, BATCH_SIZE, NUM_TIMESTEPS)

    # Create the model in a TensorFlow graph
    model = GeneratingLSTM(vocab_size=vocab.get_size(), neurons_per_layer=100, num_layers=2,
                           max_batch_size=BATCH_SIZE, output_keep_prob=0.5)

    # Initialize all defined TF Variables
    session.run(tf.global_variables_initializer())

    # Do the training
    epoch = 1
    step = 1
    for epoch in range(20):
        for inputs, targets in dataset:
            loss = model.train_step(session, inputs, targets)

            if step % 100 == 0:
                # Evaluate from time to time
                dev_dataset = Dataset("datasets/sherlock/tinyvalid.txt", vocab,
                                      batch_size=BATCH_SIZE, num_timesteps=NUM_TIMESTEPS)
                dev_loss = model.evaluate(session, dev_dataset)
                print("Epoch: %d, Step: %d, Train Loss: %f, Dev Loss: %f" % (
                    epoch, step, loss, dev_loss))

                # Sample from the model from time to time
                print("Sampled: \"The " + model.sample_text(session, vocab, "The ") + "\"")

            step += 1

This should output something like:

Epoch: 3, Step: 100, Train Loss: 3.824941, Dev Loss: 3.778008
Sampled: "The                                                                                                     "
Epoch: 7, Step: 200, Train Loss: 2.832825, Dev Loss: 2.896187
Sampled: "The                                                                                                     "
Epoch: 11, Step: 300, Train Loss: 2.778579, Dev Loss: 2.830176
Sampled: "The         eee                                                                                         "
Epoch: 15, Step: 400, Train Loss: 2.655153, Dev Loss: 2.684828
Sampled: "The        ee    e  e   e  e  e  e  e  e  e   e  e  e   e  e  e   e  e  e   e  e  e   e  e  e   e  e  e "
Epoch: 19, Step: 500, Train Loss: 2.444502, Dev Loss: 2.479753
Sampled: "The    an  an  an  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  on  o"

RNN Predict Street Commercial Vitality

RNN-for-Predicting-Street-Vitality Code and dataset for Predicting the Vitality of Stores along the Street based on Business Type Sequence via Recurre

1 Dec 15, 2021

Emotion classification of online comments based on RNN

emotion_classification Emotion classification of online comments based on RNN, the accuracy of the model in the test set reaches 99% data: Large Movie

1 Nov 23, 2021

Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Pytorch Implementation of Improv RNN Overview This code is a pytorch implementation of the popular Improv RNN model originally implemented by the Mage

3 Nov 11, 2022

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Static Features Classifier This is a static features classifier for Point-Could

1 Jan 25, 2022

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Storium GPT-2 Models This is the official repository for the GPT-2 models described in the EMNLP 2020 paper [STORIUM: A Dataset and Evaluation Platfor

27 Dec 20, 2022

Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

9.6k Jan 2, 2023

Deep learning library featuring a higher-level API for TensorFlow.

TFLearn: Deep learning library featuring a higher-level API for TensorFlow. TFlearn is a modular and transparent deep learning library built on top of

9.5k Feb 12, 2021

Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

23 Oct 17, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

272 Dec 23, 2022

Comments

Bump numpy from 1.13.1 to 1.21.0
Bumps numpy from 1.13.1 to 1.21.0.

Release notes

Sourced from numpy's releases.

v1.21.0

NumPy 1.21.0 Release Notes

The NumPy 1.21.0 release highlights are

continued SIMD work covering more functions and platforms,

initial work on the new dtype infrastructure and casting,

universal2 wheels for Python 3.8 and Python 3.9 on Mac,

improved documentation,

improved annotations,

new PCG64DXSM bitgenerator for random numbers.

In addition there are the usual large number of bug fixes and other improvements.

The Python versions supported for this release are 3.7-3.9. Official support for Python 3.10 will be added when it is released.

:warning: Warning: there are unresolved problems compiling NumPy 1.21.0 with gcc-11.1 .

Optimization level -O3 results in many wrong warnings when running the tests.

On some hardware NumPy will hang in an infinite loop.

New functions

Add PCG64DXSM BitGenerator

Uses of the PCG64 BitGenerator in a massively-parallel context have been shown to have statistical weaknesses that were not apparent at the first release in numpy 1.17. Most users will never observe this weakness and are safe to continue to use PCG64. We have introduced a new PCG64DXSM BitGenerator that will eventually become the new default BitGenerator implementation used by default_rng in future releases. PCG64DXSM solves the statistical weakness while preserving the performance and the features of PCG64.

See upgrading-pcg64 for more details.

(gh-18906)

Expired deprecations

The shape argument numpy.unravel_index cannot be passed as dims keyword argument anymore. (Was deprecated in NumPy 1.16.)

... (truncated)

Commits

b235f9e Merge pull request #19283 from charris/prepare-1.21.0-release

34aebc2 MAINT: Update 1.21.0-notes.rst

493b64b MAINT: Update 1.21.0-changelog.rst

07d7e72 MAINT: Remove accidentally created directory.

032fca5 Merge pull request #19280 from charris/backport-19277

7d25b81 BUG: Fix refcount leak in ResultType

fa5754e BUG: Add missing DECREF in new path

61127bb Merge pull request #19268 from charris/backport-19264

143d45f Merge pull request #19269 from charris/backport-19228

d80e473 BUG: Removed typing for == and != in dtypes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 1
Bump numpy from 1.13.1 to 1.22.0
Bumps numpy from 1.13.1 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump nltk from 3.2.4 to 3.4.5
Bumps nltk from 3.2.4 to 3.4.5.

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot ignore this [patch|minor|major] version will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(v0.4.2)

v0.4.2(Apr 9, 2018)

Source code(tar.gz)
Source code(zip)
v0.4.1(Apr 9, 2018)

Source code(tar.gz)
Source code(zip)
v0.4(Apr 9, 2018)

Adds the temperature parameter to control the degree of randomness during sampling.
Source code(tar.gz)
Source code(zip)

Owner

Kilian Batzner

GitHub Repository

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning Authors repo (alphabetical) Constantin (CoEich), Mayukh (Mayukh

331 Jan 03, 2023

Make a surveillance camera from your raspberry pi!

rpi-surveillance Make a surveillance camera from your Raspberry Pi 4! The surveillance is built as following: the camera records 10 seconds video and

62 Feb 03, 2022

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment An implementation of SpecAugment for Pytorch How to use Install pytorch, version=1.9.0 (new feature (torch.Tensor.take_along_dim) is used

3 Oct 11, 2022

IhoneyBakFileScan Modify - 批量网站备份文件扫描器，增加文件规则，优化内存占用

ihoneyBakFileScan_Modify 批量网站备份文件泄露扫描工具 2022.2.8 添加、修改内容增加备份文件fuzz规则修改备份文件大小判断

220 Jan 05, 2023

Voice Gender Recognition

In this project it was used some different Machine Learning models to identify the gender of a voice (Female or Male) based on some specific speech and voice attributes.

1 Jan 27, 2022

Neural Motion Learner With Python

Neural Motion Learner Introduction This work is to extract skeletal structure from volumetric observations and to learn motion dynamics from the detec

14 Nov 28, 2022

Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

ML Privacy Meter Machine learning is playing a central role in automated decision making in a wide range of organization and service providers. The da

357 Jan 06, 2023

JAX + dataclasses

jax_dataclasses jax_dataclasses provides a wrapper around dataclasses.dataclass for use in JAX, which enables automatic support for: Pytree registrati

35 Dec 21, 2022

Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki

DeepCourse: Deep Learning for Computer Vision arthurdouillard.com/deepcourse/ This is a course I'm giving to the French engineering school EPITA each

113 Nov 29, 2022

ObjDetApp deploys a pytorch model for object detection

*ObjDetApp* deploys a pytorch model for object detection

1 Dec 26, 2021

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Autoformer (NeurIPS 2021) Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting Time series forecasting is a c

847 Jan 08, 2023

A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

Cluster di HPC con GPU per esperimenti di calcolo (draft version 1.0) Per poter utilizzare il cluster il primo passo è abilitare l'account istituziona

20 Dec 16, 2022

Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"

Intention Adaptive Graph Neural Network (IAGNN) This is the official repository of paper Intention Adaptive Graph Neural Network for Category-Aware Se

9 Nov 22, 2022

📝 Wrapper library for text generation / language models at char and word level with RNN in TensorFlow

Related tags

Overview

tensorlm

Installation

Basic Usage

Command Line Usage

Advanced Usage

Custom Input Data

Custom Training, Dropout etc.

You might also like...

RNN Predict Street Commercial Vitality

Emotion classification of online comments based on RNN

Pytorch implementation of the popular Improv RNN model originally proposed by the Magenta team.

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"

Deep learning library featuring a higher-level API for TensorFlow.

Deep learning library featuring a higher-level API for TensorFlow.

Image-generation-baseline - MUGE Text To Image Generation Baseline

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Comments

Bump numpy from 1.13.1 to 1.21.0

v1.21.0

NumPy 1.21.0 Release Notes

New functions

Add PCG64DXSM BitGenerator

Expired deprecations

Bump numpy from 1.13.1 to 1.22.0

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Bump nltk from 3.2.4 to 3.4.5

Releases(v0.4.2)

v0.4.2(Apr 9, 2018)

v0.4.1(Apr 9, 2018)

v0.4(Apr 9, 2018)

Owner

Kilian Batzner

MAGMA - a GPT-style multimodal model that can understand any combination of images and language

Make a surveillance camera from your raspberry pi!

SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

IhoneyBakFileScan Modify - 批量网站备份文件扫描器，增加文件规则，优化内存占用

Voice Gender Recognition

Neural Motion Learner With Python

Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

JAX + dataclasses

Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki

*ObjDetApp* deploys a pytorch model for object detection

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

A template repository for submitting a job to the Slurm Cluster installed at the DISI - University of Bologna

Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"

Tool for working with Y-chromosome data from YFull and FTDNA

End-to-end Temporal Action Detection with Transformer. [Under review]

"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Data loaders and abstractions for text and NLP

Code for approximate graph reduction techniques for cardinality-based DSFM, from paper

The code for SAG-DTA: Prediction of Drug–Target Affinity Using Self-Attention Graph Network.

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio

ObjDetApp deploys a pytorch model for object detection