Seq2seq - Sequence to Sequence Learning with Keras

Related tags

Deep Learningseq2seq
Overview

Seq2seq

Sequence to Sequence Learning with Keras

Hi! You have just found Seq2Seq. Seq2Seq is a sequence to sequence learning add-on for the python deep learning library Keras. Using Seq2Seq, you can build and train sequence-to-sequence neural network models in Keras. Such models are useful for machine translation, chatbots (see [4]), parsers, or whatever that comes to your mind.

seq2seq

Getting started

Seq2Seq contains modular and reusable layers that you can use to build your own seq2seq models as well as built-in models that work out of the box. Seq2Seq models can be compiled as they are or added as layers to a bigger model. Every Seq2Seq model has 2 primary layers : the encoder and the decoder. Generally, the encoder encodes the input sequence to an internal representation called 'context vector' which is used by the decoder to generate the output sequence. The lengths of input and output sequences can be different, as there is no explicit one on one relation between the input and output sequences. In addition to the encoder and decoder layers, a Seq2Seq model may also contain layers such as the left-stack (Stacked LSTMs on the encoder side), the right-stack (Stacked LSTMs on the decoder side), resizers (for shape compatibility between the encoder and the decoder) and dropout layers to avoid overfitting. The source code is heavily documented, so lets go straight to the examples:

A simple Seq2Seq model:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8)
model.compile(loss='mse', optimizer='rmsprop')

That's it! You have successfully compiled a minimal Seq2Seq model! Next, let's build a 6 layer deep Seq2Seq model (3 layers for encoding, 3 layers for decoding).

Deep Seq2Seq models:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=8, depth=3)
model.compile(loss='mse', optimizer='rmsprop')

Notice that we have specified the depth for both encoder and decoder as 3, and your model has a total depth of 3 + 3 = 6. You can also specify different depths for the encoder and the decoder. Example:

import seq2seq
from seq2seq.models import SimpleSeq2Seq

model = SimpleSeq2Seq(input_dim=5, hidden_dim=10, output_length=8, output_dim=20, depth=(4, 5))
model.compile(loss='mse', optimizer='rmsprop')

Notice that the depth is specified as tuple, (4, 5). Which means your encoder will be 4 layers deep whereas your decoder will be 5 layers deep. And your model will have a total depth of 4 + 5 = 9.

Advanced Seq2Seq models:

Until now, you have been using the SimpleSeq2Seq model, which is a very minimalistic model. In the actual Seq2Seq implementation described in [1], the hidden state of the encoder is transferred to decoder. Also, the output of decoder at each timestep becomes the input to the decoder at the next time step. To make things more complicated, the hidden state is propogated throughout the LSTM stack. But you have no reason to worry, as we have a built-in model that does all that out of the box. Example:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

Note that we had to specify the complete input shape, including the samples dimensions. This is because we need a static hidden state(similar to a stateful RNN) for transferring it across layers. (Update : Full input shape is not required in the latest version, since we switched to Recurrent Shop backend). By the way, Seq2Seq models also support the stateful argument, in case you need it.

You can also experiment with the hidden state propogation turned off. Simply set the arguments broadcast_state and inner_broadcast_state to False.

Peeky Seq2seq model:

Let's not stop there. Let's build a model similar to cho et al 2014, where the decoder gets a 'peek' at the context vector at every timestep.

cho et al 2014

To achieve this, simply add the argument peek=True:

import seq2seq
from seq2seq.models import Seq2Seq

model = Seq2Seq(batch_input_shape=(16, 7, 5), hidden_dim=10, output_length=8, output_dim=20, depth=4, peek=True)
model.compile(loss='mse', optimizer='rmsprop')

Seq2seq model with attention:

Attention Seq2seq

Let's not stop there either. In all the models described above, there is no allignment between the input sequence elements and the output sequence elements. But for machine translation, learning a soft allignment between the input and output sequences imporves performance.[3]. The Seq2seq framework includes a ready made attention model which does the same. Note that in the attention model, there is no hidden state propogation, and a bidirectional LSTM encoder is used by default. Example:

import seq2seq
from seq2seq.models import AttentionSeq2Seq

model = AttentionSeq2Seq(input_dim=5, input_length=7, hidden_dim=10, output_length=8, output_dim=20, depth=4)
model.compile(loss='mse', optimizer='rmsprop')

As you can see, in the attention model you need not specify the samples dimension as there are no static hidden states involved(But you have to if you are building a stateful Seq2seq model). Note: You can set the argument bidirectional=False if you wish not to use a bidirectional encoder.

Final Words

That's all for now. Hope you love this library. For any questions you might have, create an issue and I will get in touch. You can also contribute to this project by reporting bugs, adding new examples, datasets or models.

Installation:

sudo pip install git+https://github.com/farizrahman4u/seq2seq.git

Requirements:

Working Example:

Papers:

Owner
Fariz Rahman
Fariz Rahman
Good Semi-Supervised Learning That Requires a Bad GAN

Good Semi-Supervised Learning that Requires a Bad GAN This is the code we used in our paper Good Semi-supervised Learning that Requires a Bad GAN Ziha

Zhilin Yang 177 Dec 12, 2022
Churn prediction

Churn-prediction Churn-prediction Data preprocessing:: Label encoder is used to normalize the categorical variable Data Transformation:: For each data

1 Sep 28, 2022
The code for the NeurIPS 2021 paper "A Unified View of cGANs with and without Classifiers".

Energy-based Conditional Generative Adversarial Network (ECGAN) This is the code for the NeurIPS 2021 paper "A Unified View of cGANs with and without

sianchen 22 May 28, 2022
STRIVE: Scene Text Replacement In Videos

STRIVE: Scene Text Replacement In Videos Dataset Types: RoboText SynthText RealWorld videos RoboText : Videos of texts collected using navigation robo

15 Jul 11, 2022
sktime companion package for deep learning based on TensorFlow

NOTE: sktime-dl is currently being updated to work correctly with sktime 0.6, and wwill be fully relaunched over the summer. The plan is Refactor and

sktime 573 Jan 05, 2023
Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS

autolfads-tf2 A TensorFlow 2.0 implementation of LFADS and AutoLFADS. Installati

Systems Neural Engineering Lab 11 Oct 29, 2022
Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

ARM-Net Dependencies Python 3.6 Pytorch 1.7 Results Train Data preprocessing cd data_scripts python extract_subimages_test.py python data_augmentation

Bohong Chen 55 Nov 24, 2022
Using BERT+Bi-LSTM+CRF

Chinese Medical Entity Recognition Based on BERT+Bi-LSTM+CRF Step 1 I share the dataset on my google drive, please download the whole 'CCKS_2019_Task1

Xiang WU 55 Dec 21, 2022
Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Reminder ST-GCN has transferred to MMSkeleton, and keep on developing as an flexible open source toolbox for skeleton-based human understanding. You a

sijie yan 1.1k Dec 25, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
Most popular metrics used to evaluate object detection algorithms.

Most popular metrics used to evaluate object detection algorithms.

Rafael Padilla 4.4k Dec 25, 2022
Official code for "Decoupling Zero-Shot Semantic Segmentation"

Decoupling Zero-Shot Semantic Segmentation This is the official code for the arxiv. ZegFormer is the first framework that decouple the zero-shot seman

Jian Ding 108 Dec 30, 2022
[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Undistillable: Making A Nasty Teacher That CANNOT teach students "Undistillable: Making A Nasty Teacher That CANNOT teach students" Haoyu Ma, Tianlong

VITA 71 Dec 28, 2022
Python library for loading and using triangular meshes.

Trimesh is a pure Python (2.7-3.4+) library for loading and using triangular meshes with an emphasis on watertight surfaces. The goal of the library i

Michael Dawson-Haggerty 2.2k Jan 07, 2023
Deep Q-Learning Network in pytorch (not actively maintained)

pytoch-dqn This project is pytorch implementation of Human-level control through deep reinforcement learning and I also plan to implement the followin

Hung-Tu Chen 342 Jan 01, 2023
Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

NFFT4ANOVA Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication" This package uses th

Theresa Wagner 1 Aug 10, 2022
Fashion Entity Classification

Fashion-Entity-Classification - Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grays

ADITYA SHAH 1 Jan 04, 2022
Segmentation vgg16 fcn - cityscapes

VGGSegmentation Segmentation vgg16 fcn - cityscapes Priprema skupa skripta prepare_dataset_downsampled.py Iz slika cityscapesa izrezuje haubu automobi

6 Oct 24, 2020
🗺 General purpose U-Network implemented in Keras for image segmentation

TF-Unet General purpose U-Network implemented in Keras for image segmentation Getting started • Training • Evaluation Getting started Looking for Jupy

Or Fleisher 2 Aug 31, 2022