Image captioning

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Model is seq2seq model. In the encoder pretrained EfficientNet-b3 model is used to extract the features. Decoder is the LSTM with the Bahdanau Attention.

Dataset

The dataset is available at kaggle and contains 8,000 images that are each paired with five different captions.

Usage

run in terminal: python -m img_caption

Config

The user interface consists of file:

config.yaml - general configuration with data and model parameters

Default config.yaml:

data:
  path_to_data_folder: "data"
  caption_file_name: "captions.txt"
  images_folder_name: "Images"
  output_folder_name: "output"
  logging_file_name: "logging.txt"
  model_file_name: "model.pt"

batch_size: 32
num_worker: 1
gensim_model_name: "glove-wiki-gigaword-200"

model:
  embedding_dimension: 200
  decoder_hidden_dimension: 300
  learning_rate: 0.0001
  momentum: 0.9
  n_epochs: 50
  clip: 5
  fine_tune_encoder: false

Output

After training the model, the pipeline will return the following files:

model.pt - checkpoint with:
- epoch - last epoch
- model_state_dict - model parameters
- optimizer_state_dict - the state of the optimizer
- train_history - training history from a model
- valid_history - validation history from a model
- best_valid_loss - the best validation loss

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Related tags

Overview

Image captioning

Dataset

Usage

Config

Output

Owner

BERT score for text generation

neural network based speaker embedder

⚖️ A Statutory Article Retrieval Dataset in French.

A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

Resources for "Natural Language Processing" Coursera course.

This is a NLP based project to extract effective date of the contract from their text files.

GPT-3: Language Models are Few-Shot Learners

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Artificial Conversational Entity for queries in Eulogio "Amang" Rodriguez Institute of Science and Technology (EARIST)

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Transformer training code for sequential tasks

To classify the News into Real/Fake using Features from the Text Content of the article

Text vectorization tool to outperform TFIDF for classification tasks

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

MPNet: Masked and Permuted Pre-training for Language Understanding

Perform sentiment analysis and keyword extraction on Craigslist listings