Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

Last update: Jan 03, 2023

Related tags

Deep Learning image-captioning

Overview

An Image Captioning codebase

This is a codebase for image captioning research.

It supports:

Self critical training from Self-critical Sequence Training for Image Captioning
Bottom up feature from ref.
Test time ensemble
Multi-GPU training. (DistributedDataParallel is now supported with the help of pytorch-lightning, see ADVANCED.md for details)
Transformer captioning model.

A simple demo colab notebook is available here

Requirements

Python 3
PyTorch 1.3+ (along with torchvision)
cider (already been added as a submodule)
coco-caption (already been added as a submodule) (Remember to follow initialization steps in coco-caption/README.md)
yacs
lmdbdict

Install

If you have difficulty running the training scripts in tools. You can try installing this repo as a python package:

python -m pip install -e .

Pretrained models

Checkout MODEL_ZOO.md.

If you want to do evaluation only, you can then follow this section after downloading the pretrained models (and also the pretrained resnet101 or precomputed bottomup features, see data/README.md).

Train your own network on COCO/Flickr30k

Prepare data.

We now support both flickr30k and COCO. See details in data/README.md. (Note: the later sections assume COCO dataset; it should be trivial to use flickr30k.)

Start training

$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30

$ python tools/train.py --cfg configs/fc.yml --id fc

The train script will dump checkpoints into the folder specified by --checkpoint_path (default = log_$id/). By default only save the best-performing checkpoint on validation and the latest checkpoint to save disk space. You can also set --save_history_ckpt to 1 to save every checkpoint.

To resume training, you can specify --start_from option to be the path saving infos.pkl and model.pth (usually you could just set --start_from and --checkpoint_path to be the same).

To checkout the training curve or validation curve, you can use tensorboard. The loss histories are automatically dumped into --checkpoint_path.

The current command use scheduled sampling, you can also set --scheduled_sampling_start to -1 to turn off scheduled sampling.

If you'd like to evaluate BLEU/METEOR/CIDEr scores during training in addition to validation cross entropy loss, use --language_eval 1 option, but don't forget to pull the submodule coco-caption.

For all the arguments, you can specify them in a yaml file and use --cfg to use the configurations in that yaml file. The configurations in command line will overwrite cfg file if there are conflicts.

For more options, see opts.py.

Train using self critical

First you should preprocess the dataset and get the cache for calculating cider score:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Then, copy the model from the pretrained model using cross entropy. (It's not mandatory to copy the model, just for back-up)

$ bash scripts/copy_model.sh fc fc_rl

Then

$ python tools/train.py --id fc_rl --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-5 --start_from log_fc_rl --checkpoint_path log_fc_rl --save_checkpoint_every 6000 --language_eval 1 --val_images_use 5000 --self_critical_after 30 --cached_tokens coco-train-idxs --max_epoch 50 --train_sample_n 5

$ python tools/train.py --cfg configs/fc_rl.yml --id fc_rl

You will see a huge boost on Cider score, : ).

A few notes on training. Starting self-critical training after 30 epochs, the CIDEr score goes up to 1.05 after 600k iterations (including the 30 epochs pertraining).

Generate image captions

Evaluate on raw images

Note: this doesn't work for models trained with bottomup feature. Now place all your images of interest into a folder, e.g. blah, and run the eval script:

$ python tools/eval.py --model model.pth --infos_path infos.pkl --image_folder blah --num_images 10

This tells the eval script to run up to 10 images from the given folder. If you have a big GPU you can speed up the evaluation by increasing batch_size. Use --num_images -1 to process all images. The eval script will create an vis.json file inside the vis folder, which can then be visualized with the provided HTML interface:

$ cd vis
$ python -m SimpleHTTPServer

Now visit localhost:8000 in your browser and you should see your predicted captions.

Evaluate on Karpathy's test split

$ python tools/eval.py --dump_images 0 --num_images 5000 --model model.pth --infos_path infos.pkl --language_eval 1

The defualt split to evaluate is test. The default inference method is greedy decoding (--sample_method greedy), to sample from the posterior, set --sample_method sample.

Beam Search. Beam search can increase the performance of the search for greedy decoding sequence by ~5%. However, this is a little more expensive. To turn on the beam search, use --beam_size N, N should be greater than 1.

Evaluate on COCO test set

$ python tools/eval.py --input_json cocotest.json --input_fc_dir data/cocotest_bu_fc --input_att_dir data/cocotest_bu_att --input_label_h5 none --num_images -1 --model model.pth --infos_path infos.pkl --language_eval 0

You can download the preprocessed file cocotest.json, cocotest_bu_att and cocotest_bu_fc from link.

Miscellanea

Using cpu. The code is currently defaultly using gpu; there is even no option for switching. If someone highly needs a cpu model, please open an issue; I can potentially create a cpu checkpoint and modify the eval.py to run the model on cpu. However, there's no point using cpus to train the model.

Train on other dataset. It should be trivial to port if you can create a file like dataset_coco.json for your own dataset.

Live demo. Not supported now. Welcome pull request.

For more advanced features:

Checkout ADVANCED.md.

Reference

If you find this repo useful, please consider citing (no obligation at all):

@article{luo2018discriminability,
  title={Discriminability objective for training descriptive captions},
  author={Luo, Ruotian and Price, Brian and Cohen, Scott and Shakhnarovich, Gregory},
  journal={arXiv preprint arXiv:1803.04376},
  year={2018}
}

Of course, please cite the original paper of models you are using (You can find references in the model files).

Acknowledgements

Thanks the original neuraltalk2 and awesome PyTorch team.

Comments

关于bottom up and top down模型的问题

你好， bottom up and top down的论文里说 60k iterations 训9个小时就能到达cider 120.1 的效果。但我将你代码模型的参数修改成论文中的参数，并将attention lstm那块的输入修改成每个bounding box 的image feature的均值。但是也达不到论文里的效果。想问问你，以你的经验来看，觉得会是什么原因呢？

opened by JimLee4530 26
The cider score gets smaller than pretrained model

Hi, i am trying to reproduce your code in tensorflow. I almost write my code as you do, but find the cider score is getting smaller than before. At the same time, the sampling sentence is also getting shorter. Can you give me some advice?

opened by wjb123 23
Structure loss

Hi,

I have a question regarding the structure loss in su+bu+structure. The structure loss weight is similar as the lambda in your discriminality paper? That is scales the rewards with cross-entropy ? Furthermore, if that is the case each of the rewards can as well have different weights?

Thank you.

opened by mememimis 20
About multi-GPU training

hi~, when I tried to train the fc_model on 2 GPUs by setting the environment variable CUDA_VISIBLE_DEVICES=0,1, I ran this code with some errors intrain.py#125, but there were no errors when I ran your codes with only one GPU.

Does this repository support multi-GPUs training?

opened by xuyan1115 16
evaluation error using pre-trained model

Hi Ruotian,

I tried to run the following script: python eval.py --model resnet50.pth --infos_path infos.pkl --image_folder ./image/val2014_coco/ --num_images 1

The model was resnet50 (and resnet101) downloaded from your google drive. But I got the error: Traceback (most recent call last): File "eval.py", line 102, in <module> model.load_state_dict(torch.load(opt.model)) File "/home/wentong/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict .format(name)) KeyError: 'unexpected key "conv1.weight" in state_dict'

I have searched online but there was little information about that. I guess you have used multiple gpus.

Any advice? Thank you for your implementation.

opened by Wentong-DST 15
CUDA out of memory in self-critical training but not in xe

Hello. I am using GTX2080 Ti with 11GB memory. I trained the model using xe and it works fine. But then when I run self-critical, I get CUDA out of memory. How come the model can fit in xe training but not in rl training? It is the same model with the same number of parameters. Any advice would help

opened by homelifes 12
Issue when training

I am trying to train the model as the per the instructions given on the repo. I am getting below error:

[[email protected] self-critical.pytorch]$ python tools/train.py --id fc --caption_model newfc --input_json data/cocotalk.json --input_fc_dir data/cocotalk_fc --input_att_dir data/cocotalk_att --input_label_h5 data/cocotalk_label.h5 --batch_size 10 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log_fc --save_checkpoint_every 6000 --val_images_use 5000 --max_epochs 30 Hugginface transformers not installed; please visit https://github.com/huggingface/transformers meshed-memory-transformer not installed; please runpip install git+https://github.com/ruotianluo/meshed-memory-transformer.gitDataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) /home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/data/dataloader.py:291: RuntimeWarning: Mean of empty slice. fc_feat = att_feat.mean(0) 2020-07-27 22:49:15.536672: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 Read data: 0.0002117156982421875 Traceback (most recent call last): File "tools/train.py", line 289, in <module> train(opt) File "tools/train.py", line 182, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/usr/local/lib64/python3.6/site-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg) StopIteration: Caught StopIteration in replica 0 on device 0. Original Traceback (most recent call last): File "/usr/local/lib64/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, **kwargs) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/modules/loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/CaptionModel.py", line 33, in forward return getattr(self, '_'+mode)(*args, **kwargs) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/AttModel.py", line 128, in _forward state = self.init_hidden(batch_size*seq_per_img) File "/home/default/ephemeral_drive/work/image_captioning/self-critical.pytorch/captioning/models/AttModel.py", line 99, in init_hidden weight = next(self.parameters()) StopIteration

PyTorch version: '1.5.0+cu101'

I saw a pytorch - bug https://github.com/huggingface/transformers/issues/3936, which describes a similar issue. Not sure if it related.

opened by gsrivas4 11
transformer性能

您好，我使用 transformer.yml训练只能达到 Bleu_1: 0.749 Bleu_2: 0.584 Bleu_3: 0.443 Bleu_4: 0.336 METEOR: 0.271 ROUGE_L: 0.553 CIDEr: 1.092 远远不及您提及的分数，请问您训练参数是如何设置的？

opened by hasky123 10
Type error while training

@ruotianluo Sorry again, after fixing the file problem, I got an error: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

run tools/train.py --cfg configs/fc_rl.yml --id fc_rl DataLoader loading json file: data/cocotalk.json vocab size is 9487 DataLoader loading h5 file: data/cocotalk_fc data/cocotalk_att data/cocotalk_box data/cocotalk_label.h5 max sequence length in data is 16 read 123287 image features assigned 113287 images to split train assigned 5000 images to split val assigned 5000 images to split test Read data: 0.003994464874267578 Save ckpt on exception ... model saved to ./log_fc_rl\model.pth Save ckpt done. Traceback (most recent call last): File "D:\Stephan\Final project\self-critical.pytorch-master\tools\train.py", line 183, in train model_out = dp_lw_model(fc_feats, att_feats, labels, masks, att_masks, data['gts'], torch.arange(0, len(data['gts'])), sc_flag, struc_flag) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\parallel\data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\modules\loss_wrapper.py", line 45, in forward loss = self.crit(self.model(fc_feats, att_feats, labels[..., :-1], att_masks), labels[..., 1:], masks[..., 1:]) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\CaptionModel.py", line 33, in forward return getattr(self, '_'+mode)(*args, **kwargs) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 160, in _forward output, state = self.get_logprobs_state(it, p_fc_feats, p_att_feats, pp_att_feats, p_att_masks, state) File "D:\Stephan\Final project\self-critical.pytorch-master\captioning\models\AttModel.py", line 167, in get_logprobs_state xt = self.embed(it) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 541, in call result = self.forward(*input, **kwargs) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "C:\Users\ncku_ailab\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1484, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

opened by stephancheng 10
Question regarding self-critical reward

Hi Ruotian, I'd like to ask about the nature of the reward when training with self-critical. Is it normal to start at negative or zero for the first few epochs? I am getting the following for the first epoch with self-critical (after training with XE for 13 epochs):

Is this normal? And what is the maximum reward you achieved after training with self-critical?

Also, I'm using the py3 branch. I saw there are a lot of differences between the py3 and py2 branches. So is the py3 branch reliable to use?

Looking forward to your answer!

opened by fawazsammani 9

ZeroDivisionError: division by zero

ZeroDivisionError: division by zero Hi Dr. Luo. I try to evaluate model, but I meet an error.

error info:

python eval.py --model data/pretrain/fc/model-best.pth  --infos_path data/pretrain/fc/infos_fc-best.pkl --image_folder blah --num_images -1
loading annotations into memory...
0:00:00.498650
creating index...
index created!
Traceback (most recent call last):
  File "eval.py", line 71, in <module>
    lang_stats = eval_utils.language_eval(opt.input_json, predictions, n_predictions, vars(opt), opt.split)
  File "/home/andrewcao95/workspace/pycharm_ws/self-critical.pytorch/eval_utils.py", line 83, in language_eval
    mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
ZeroDivisionError: division by zero

source code:

# filter results to only those in MSCOCO validation set
    preds_filt = [p for p in preds if p['image_id'] in valids]
    mean_perplexity = sum([_['perplexity'] for _ in preds_filt]) / len(preds_filt)
    mean_entropy = sum([_['entropy'] for _ in preds_filt]) / len(preds_filt)
    print('using %d/%d predictions' % (len(preds_filt), len(preds)))
    json.dump(preds_filt, open(cache_path, 'w')) # serialize to temporary json file. Sigh, COCO API...

    cocoRes = coco.loadRes(cache_path)
    cocoEval = COCOEvalCap(coco, cocoRes)
    cocoEval.params['image_id'] = cocoRes.getImgIds()
    cocoEval.evaluate()

    for metric, score in cocoEval.eval.items():
        out[metric] = score
    # Add mean perplexity
    out['perplexity'] = mean_perplexity
    out['entropy'] = mean_entropy

    imgToEval = cocoEval.imgToEval
    for k in list(imgToEval.values())[0]['SPICE'].keys():
        if k != 'All':
            out['SPICE_'+k] = np.array([v['SPICE'][k]['f'] for v in imgToEval.values()])
            out['SPICE_'+k] = (out['SPICE_'+k][out['SPICE_'+k]==out['SPICE_'+k]]).mean()
    for p in preds_filt:
        image_id, caption = p['image_id'], p['caption']
        imgToEval[image_id]['caption'] = caption

opened by andrewcao95 8

How can I get my dataset image feature to scripts/prepro_feats.py

Thanks for your job! When I prepare data I find that I have not pre_extract the image features in my dataset. So, how can I get these image features? Could you give me some codebase?

opened by ShanZard 0
how to set the model ensemble?

hello, I see the model ensembling method in the paper, which is helpful to improve the model performance. So I want to know how to implement the method of model enhancement？Thanks very much.

opened by fjqfjqfjqfjqfjqfjqfjq 0
transformer_nsc

你好，我在运行transformer_nsc.yml时，碰到以下问题，请问怎么解决？ Hugginface transformers not installed; please visit https://github.com/huggingface/transformers meshed-memory-transformer not installed; please run pip install git+https://github.com/ruotianluo/meshed-memory-transformer.git Warning: coco-caption not available

opened by bai-24 11
Running Inference using CPU only.

I am not able to run inference on the model provided in the demo using CPU only. I am getting cuda device not available error. I tried removing all gpu and cuda references in the code and replacing it with CPU but it still does not work. It would be really helpful if you could help understand how to run the demo code without GPU.

Thanks.

opened by harindercnvrg 0

Releases(3.2)

3.2(May 29, 2020)
Faster beam search

support h5 feature file

allow beam search + scst (doesn't work as well though)

Add a few models, BertCapModel and m2transformer (usefulness still question marked)

Add projects.

Source code(tar.gz)
Source code(zip)
v3.1(Jan 10, 2020)
Since it's 2020, py3 is officially supported. Open an issue if there is still something wrong.

Finally, there is a model zoo which is relatively complete. Feel free to try the provided models.

Source code(tar.gz)
Source code(zip)
3(Dec 31, 2019)
Add structure loss inspired by Classical Structured Prediction Losses for Sequence to Sequence Learning

Add a function of sample n captions. Support methods described in https://www.dropbox.com/s/tdqr9efrjdkeicz/iccv.pdf?dl=0.

More pytorchy design of dataloader. Also, the dataloader now don't repeat image features according to seq_per_img. The repeating is now moved to the model forward function.

Add multi-sentence sampling evaluation metrics like mBleu, Self-CIDEr etc. (those described in https://www.dropbox.com/s/tdqr9efrjdkeicz/iccv.pdf?dl=0)

Use detectron type of config to setup experiments.

A better self critical objective. (Named as new_self_critical now.) Use config ymls that end with nsc to test the performance. A technical report will be out soon. Basically, it performs better than original SCST on all metrics (by a small margin), but also faster (by a little bit).

Source code(tar.gz)
Source code(zip)
2.3(Jul 18, 2019)

Source code(tar.gz)
Source code(zip)
2.2(Jun 25, 2019)

1 Refactor the code a little bit. 2 Add BPE (didn’t seem to work much different) 3 Add nucleus sampling, topk and gumbel softmax sampling. 4 Make AttEnsemble compatible with transformer 5 Add remove bad ending from Improving Reinforcement Learning Based Image Captioning with Natural Language Prior
Source code(tar.gz)
Source code(zip)
2.1(Jun 25, 2019)
Add loss_wrapper for multi-gpu loss computation

Fix some bugs

Add transformer.

Source code(tar.gz)
Source code(zip)
2.0.0(Apr 29, 2018)
Add support for bleu4 optimization or combination of bleu4 and cider

Add bottom-up feature support

Add ensemble during evaluation.

Add multi-gpu support.

Add miscellaneous things. (box features; experimental models etc.)

Source code(tar.gz)
Source code(zip)
1.0(Apr 28, 2018)

This version can replicate the self-critical sequence training paper.
Source code(tar.gz)
Source code(zip)

Owner

Ruotian(RT) Luo

Phd student at TTIC

GitHub Repository

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

Related tags

Overview

An Image Captioning codebase

Requirements

Install

Pretrained models

Train your own network on COCO/Flickr30k

Prepare data.

Start training

Train using self critical

Generate image captions

Evaluate on raw images

Evaluate on Karpathy's test split

Evaluate on COCO test set

Miscellanea

For more advanced features:

Reference

Acknowledgements

Comments

Releases(3.2)

3.2(May 29, 2020)

v3.1(Jan 10, 2020)

3(Dec 31, 2019)

2.3(Jul 18, 2019)

2.2(Jun 25, 2019)

2.1(Jun 25, 2019)

2.0.0(Apr 29, 2018)

1.0(Apr 28, 2018)

Owner

Ruotian(RT) Luo

Depth image based mouse cursor visual haptic

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Unofficial Tensorflow Implementation of ConvNeXt from A ConvNet for the 2020s

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

Pretraining on Dynamic Graph Neural Networks

EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration

Chunkmogrify: Real image inversion via Segments

[NeurIPS 2021] Galerkin Transformer: a linear attention without softmax

[CVPR2021] De-rendering the World's Revolutionary Artefacts

g2o: A General Framework for Graph Optimization

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

DPC: Unsupervised Deep Point Correspondence via Cross and Self Construction (3DV 2021)

Official PyTorch Implementation for "Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes"

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

PyTorch implementations of neural network models for keyword spotting

CMP 414/765 course repository for Spring 2022 semester

Deep Learning for Computer Vision final project

CausaLM: Causal Model Explanation Through Counterfactual Language Models

Multi-Horizon-Forecasting-for-Limit-Order-Books