Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Last update: Apr 06, 2022

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

========================================================================

Author: Jonathan Kuo
Python: 3.6.1
TensorFlow: 1.0.1 Keras: 2.0.4

Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Introduction

The Keras deep learning architecture of this project was inspired by Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Fei-Fei Li.

Given input of a dataset of images and their sentence descriptions, define a Keras (TensorFlow backend) deep learning model that corresponds detected regions on image with description segments. This learning allows the model to output novel descriptions for test images.

Dataset

Microsoft Common Objects in Context (MSCOCO) is an image recognition, segmentation, and captioning dataset. Training data includes 123,000 images and caption pairs. Validation and testing data are both 5,000 images and caption pairs.

Architecture

VGG16 CNN architecture (loaded in Keras) with pre-trained weights on ImageNet are used as the CNN to detect objects in the image. Then, the last dense softmax 200-classification layer was removed in order to pass the 4096-D activations into into the RNN (LSTM). CNN weights are frozen and RNN weights are updated in backpropagation through time (BPTT). The CNN and LSTM is merged before passing into a second LSTM to predict the next word in the sequence. RMSprop is used as the optimizer to combat the vanishing gradient problem.

Demo

View the demo iPython notebook for the model training and prediction on the MSCOCO dataset.

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Related tags

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

Introduction

Dataset

Architecture

Demo

Owner

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

A Blender python script for getting asset browser custom preview images for objects and collections.

Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集

Full Stack Deep Learning Labs

This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend.

This repository contains the code used for the implementation of the paper "Probabilistic Regression with HuberDistributions"

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

A really easy-to-use and powerful sudoku solver.

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

Semantic Segmentation Suite in TensorFlow

Pytorch implementation of the DeepDream computer vision algorithm

Multitask Learning Strengthens Adversarial Robustness

Implementation of Sequence Generative Adversarial Nets with Policy Gradient

Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore

Multi-query Video Retreival

The repository is for safe reinforcement learning baselines.

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation

Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset

Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"