An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Overview

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.

CLIP4Clip is a video-text retrieval model based on CLIP (ViT-B/32). We investigate three similarity calculation approaches: parameter-free type, sequential type, and tight type, in this work. The model achieve SOTA results on MSR-VTT, MSVC, and LSMDC.

CLIP4Clip

Requirement

# From CLIP 
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas

Data Preparing

For MSRVTT

The official data and video links can be found in link.

For the convenience, you can also download the splits and captions by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

For MSVD

Raw videos can be download from link.

The splits and raw_captions can be found in the wonderful job collaborative-experts. For the convenience, you can also download them by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msvd_data.zip

For LSMDC

You must obtain permission from MPII to download and use the data. The download link is here. The 1000 test clips data is link. Read our paper and the dataloader for more information.

How to Run

--features_path is the video root path

--linear_patch can be set with 2d or 3d

--sim_header can be set with meanP, seqLSTM, seqTransf, or tightTransf

read our paper for more details on --linear_patch and --sim_header. Test more hyperparameters for better performance.

Download CLIP (ViT-B/32) weight,

 wget -P ./modules https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt

Then, run

MSRVTT

DATA_PATH=[Your MSRVTT data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=0 \
--epochs=5 --batch_size=128 --n_display=50 \
--train_csv ${DATA_PATH}/MSRVTT_train.9k.csv \
--val_csv ${DATA_PATH}/MSRVTT_JSFUSION_test.csv \
--data_path ${DATA_PATH}/MSRVTT_data.json \
--features_path ${DATA_PATH}/MSRVTT_Videos \
--output_dir ckpts/ckpt_msrvtt_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msrvtt --expand_msrvtt_sentences  \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

MSVD

DATA_PATH=[Your MSVD data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/MSVD_Videos \
--output_dir ckpts/ckpt_msvd_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype msvd \
--feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0 --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

LSMDC

DATA_PATH=[Your LSMDC data and videos path]
python -m torch.distributed.launch --nproc_per_node=4 \
main_task_retrieval.py --do_train --num_thread_reader=2 \
--epochs=5 --batch_size=128 --n_display=50 \
--data_path ${DATA_PATH} \
--features_path ${DATA_PATH}/LSMDC_Videos \
--output_dir ckpts/ckpt_lsmdc_retrieval_looseType \
--lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16 \
--datatype lsmdc --feature_framerate 1 --coef_lr 1e-3 \
--freeze_layer_num 0  --slice_framepos 2 \
--loose_type --linear_patch 2d --sim_header meanP

Citation

If you find CLIP4Clip useful in your work, you can cite the following paper:

@Article{Luo2021CLIP4Clip,
  author  = {Huaishao Luo and Lei Ji and Ming Zhong and Yang Chen and Wen Lei and Nan Duan and Tianrui Li},
  title   = {CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval},
  journal = {arXiv preprint arXiv:2104.08860},
  year    = {2021},
}

Acknowledgments

Our code is based on CLIP (ViT-B/32) and UniVL.

Comments
  • Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    Poor performance when reproduce CLIP4clip(meanP) on ActivityNet

    @ArrowLuo Hi, I directly train the CLIP4clip(meanP) on ActivityNet and get [email protected]=37.9 which is much worse than 40.5 reported in Table 4.

    I extracted images from the original videos with FPS=1, and trained the CLIP4clip(meanP) on 8 RTX3090. Due to the GPU memory constrain, I set the gradient_accumulation_steps=2.

    The caption is downloaded from https://cs.stanford.edu/people/ranjaykrishna/densevid/.

    opened by jianghaojun 14
  • How to download

    How to download "cross_pytorch_model.bin" as pretrained weights used in the project? & problem in DDP

    1. 我按照readme提供的参数,无法收敛,观察到日志中:
    2. 05/20/2022 00:18:31 - INFO - Weight doesn't exsits. xxx/modules/cross-base/cross_pytorch_model.bin
    3. 05/20/2022 00:18:42 - INFO - Weights from pretrained model not used in : clip.input_resolution clip.context_length clip.vocab_size 以上,请问如何下载modules/cross-base/cross_pytorch_model.bin文件?谢谢

    同时,我在2机8卡上进行分布式训练,按照正常的理解是每台机器8个进程,共16个(和GPU数一致),但跑这个工程的时候,每台机器先是启动了8个进程,接着又各派生了8个子进程,每台机器上存在16个进程,并且其中8个进程休眠,这个问题一直没有解决,想交流一下其他人是否遇到了这个问题?

    opened by AAUfoa 10
  • Some questions about the results of the MARVTT with `sim_header seqTransf`.

    Some questions about the results of the MARVTT with `sim_header seqTransf`.

    When I use the following configuration to train the model on MSRVTT Training-9K, the best result I got is 07/27/2021 13:11:01 - INFO - sim matrix size: 1000, 1000 07/27/2021 13:11:01 - INFO - Length-T: 1000, Length-V:1000 07/27/2021 13:11:01 - INFO - Text-to-Video: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.2 - [email protected]: 71.0 - [email protected]: 79.4 - Median R: 2.0 - Mean R: 15.4 07/27/2021 13:11:01 - INFO - Video-to-Text: 07/27/2021 13:11:01 - INFO - >>> [email protected]: 43.1 - V2T$R@5: 71.2 - [email protected]: 80.7 - V2T$Median R: 2.0 - V2T$Mean R: 11.9. It's worse than the results [email protected]: 44.5 listed in the paper. Did i miss some details? Here is the configuration. CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_addr=127.0.0.2 --master_port 29552 main_ta sk_retrieval.py --num_thread_reader=4 --epochs=5 --batch_size=128 --n_display=20 --train_csv /home/hadoop-vacv/cephfs/data/caoshuqia ng/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_train.9k.csv --val_csv /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msr vtt_data/MSRVTT_JSFUSION_test.csv --data_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/csv/msrvtt_data/MSRVTT_data .json --features_path /home/hadoop-vacv/cephfs/data/caoshuqiang/data/jobs/MSRVTT/MSRVTT_Videos --output_dir /home/hadoop-vacv/cephfs /data/caoshuqiang/code/vicab/newexp/hope/clip_raw --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 12 --datatype msrvtt -- expand_msrvtt_sentences --feature_framerate 1 --coef_lr 1e-3 --freeze_layer_num 0 --slice_framepos 2 --loose_type --linear_patch 2d --sim_header seqTransf --do_train.

    opened by sqiangcao99 10
  • subprocess.CalledProcessError

    subprocess.CalledProcessError

    Hi, I wanna train CLIP4Clip on MSRVTT. But I got this issue. Can you Help me?

    subprocess.CalledProcessError: Command '['/home/nccu/anaconda3/envs/Clip4Video-alvin/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', './MSRVTT/videos/MSRVTT_train.9k.csv', '--val_csv', './MSRVTT/videos/MSRVTT_JSFUSION_test.csv', '--data_path', './MSRVTT/videos/MSRVTT_data.json', '--features_path', './MSRVTT/videos/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    I will appreciate your help with this situation. Thank you in advance.

    opened by alvinlin1271320 9
  • The reproduction results of `meanP` and `seqTransf` on  `DiDeMo` dataset are much worse than those in the paper

    The reproduction results of `meanP` and `seqTransf` on `DiDeMo` dataset are much worse than those in the paper

    I trained clip4clip on didemo dataset, and the [email protected] of text-to-video is much worse than that shown in paper.

    The metric reported in paper is 43.4 on DiDeMo when similarity calculator is meanP and is 42.8 when the head is seqTransf.

    But according to my reproduction result, based on meanP, max t2v [email protected] is only up to 40.5, and it only up to 40.2 based on seqTransf.

    All settings remain the same as in the paper.

    image
    opened by HanielF 8
  • Query search

    Query search

    As I understood correctly, after training and evaluation, videos are set on the basis of ranks and similarity scores. Is there any script available to make a search query as text and get the video with some freedom of search based on confidence or similarity index with some closest ranked video ?

    opened by Tortoise17 7
  • What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    What is the meaning of --num_thread_reader=0 in MSR-VTT training configuration?

    Can you explain the number of thread reader in the training configuration? I can adjust this value to decrease my training time? (Why --num_thread_reader=0 in MSR-VTT while --num_thread_reader=2 in other dataset.) Thank you so much!

    opened by thinh276 6
  • loss becomes nan

    loss becomes nan

    Hi, I ran the code on MSRVTT dataset with 2 A100s, and its loss becomes nan after some iterations, like this issue. However, I found that the RawVideoExtractorCV2 function succeeded in reading the video when testing only one video input (Directly test the video_to_tensor function in RawVideoExtractorCV2 ), but failed to read the video with multiple num_workers when running the given scripts (the print log is printed in line 63 by myself, but no thing will be printed in line 211 ). Is there something wrong with the multiple threads setting?

    opened by fake-warrior8 5
  • Data split issue

    Data split issue

    Hi @ArrowLuo, I have a question about the data split during the fine-tuning stage. The paper claims that you refer to the data split of the ECCV'20 paper "Multi-modal transformer for video retrieval". In the ECCV'20 paper's code, they sample a subset from the training data to cross-validate and select model. And in your implementation, you directly validate on the 1k testing set. Is it ok to use the testing set to select the best model?

    opened by zhixinma 5
  • torch.distributed.init_process_group(backend=

    torch.distributed.init_process_group(backend="nccl") error and some other errors

    Dear author,

    Thank you for helping me about 4 months before.

    I'd tried to leave you recomment to your helping words, but my issue was already canceled. I really really feel sorry for that.

    Now, I have another issues on my running process with MSVD datasets.

    I successed to run your framework several time, but I got another problem on my environment these days.

    If you don't mind, Can I burrow your hand one more time?

    the following is my error log, and I can manage the GPU sever of two.

    each separation means that the error log raised on each server.

    If you need another request required for, leave me your comments.

    Sincerely,

    Server 16

    (CLIP4Clip) [email protected]:~/CLIP4Clip$ main_task_retrieval.py DATA_PATH=/home/key2317/CLIP4Clip/msvd_data/ VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4
    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32main_task_retrieval.py: command not found (CLIP4Clip) [email protected]:~/CLIP4Clip$ CUDA_VISIBLE_DEVICES=3,4,0,5 python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32 Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 185, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 111, in _get_module_details import(pkg_name) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 189, in _load_global_deps() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/init.py", line 142, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: /home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/lib/../../../../libcublas.so.11: undefined symbol: free_gemm_select, version libcublasLt.so.11 (CLIP4Clip) [email protected]:~/CLIP4Clip$

    Server 19

    (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ python -m torch.distributed.launch --nproc_per_node=4 \

    main_task_retrieval.py --do_train --num_thread_reader=2
    --epochs=5 --batch_size=128 --n_display=50
    --data_path ${DATA_PATH}
    --features_path ${DATA_PATH}/MSVD_Videos
    --output_dir ckpts/ckpt_msvd_retrieval_looseType
    --lr 1e-4 --max_words 32 --max_frames 12 --batch_size_val 16
    --datatype msvd
    --feature_framerate 1 --coef_lr 1e-3
    --freeze_layer_num 0 --slice_framepos 2
    --loose_type --linear_patch 2d --sim_header meanP
    --pretrained_clip_name ViT-B/32

    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

    Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 436, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 179, in _env_rendezvous_handler store = TCPStore(master_addr, master_port, world_size, start_daemon, timeout) RuntimeError: Address already in use Traceback (most recent call last): File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/launch.py", line 255, in main raise subprocess.CalledProcessError(returncode=process.returncode, subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=2', '--epochs=5', '--batch_size=128', '--n_display=50', '--data_path', '--features_path', '/MSVD_Videos', '--output_dir', 'ckpts/ckpt_msvd_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msvd', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1. (CLIP4Clip) [email protected]:~/video-multimodal/CLIP4Clip$ Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8 Traceback (most recent call last): File "main_task_retrieval.py", line 29, in torch.distributed.init_process_group(backend="nccl") File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group barrier() File "/home/key2317/anaconda3/envs/CLIP4Clip/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370172916/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8

    opened by celestialxevermore 5
  • Some errors : subprocess.CalledProcessError

    Some errors : subprocess.CalledProcessError

    Dear Author, really thanks you all for opening this open source.

    I'm a novice and do not have much techniques in dealing with and running all kinds of framework, so today, I've been suffered from error :

    subprocess.CalledProcessError: Command '['/home/key2317/anaconda3/envs/CLIP4Clip/bin/python', '-u', 'main_task_retrieval.py', '--local_rank=3', '--do_train', '--num_thread_reader=0', '--epochs=5', '--batch_size=128', '--n_display=50', '--train_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_train.9k.csv', '--val_csv', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_JSFUSION_test.csv', '--data_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_data.json', '--features_path', '/home/key2317/CLIP4Clip/msrvtt_data/MSRVTT_Videos', '--output_dir', 'ckpts/ckpt_msrvtt_retrieval_looseType', '--lr', '1e-4', '--max_words', '32', '--max_frames', '12', '--batch_size_val', '16', '--datatype', 'msrvtt', '--expand_msrvtt_sentences', '--feature_framerate', '1', '--coef_lr', '1e-3', '--freeze_layer_num', '0', '--slice_framepos', '2', '--loose_type', '--linear_patch', '2d', '--sim_header', 'meanP', '--pretrained_clip_name', 'ViT-B/32']' returned non-zero exit status 1.

    From entering my starting command, It seemed to run well for about 5 seconds showing inner state like vison_layers: 12, vision_width : 768, blah blah blah. But unfortunately, It was all ended up with aforementioned messages.

    One of my colleague guessed that It would be problem in our unmatching issue in GPU environments, in short, GPU problems, but I'm not sure how can I diagnose my problem.

    I am really interested in your papers, also codes, but, because of this initial step, I cannot go for next. Would you please help me?

    Thanks.

    opened by celestialxevermore 5
  • About mean_pooling on text sequence

    About mean_pooling on text sequence

    Dear author, I hope you enjoy your new year.

    I have a quetion about, the reason why you do not use the function, _mean_pooling_for_similarity_sequence.

    Is there any special reason for that?

    I also looked out your previous model, UniVL, but I cannot find any reason about that.

    I hope you reply soon.

    Thx.

    Sincerly,

    opened by celestialxevermore 1
  • run simple inference

    run simple inference

    1.Is there a simple example to run an inference Eg: python infer.py --model --input 'red car', and return the video(s) or url(s) or something like that. 2. Is it possible to get the embeddings?

    opened by jdso1988 1
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • train on DiDeMo

    train on DiDeMo

    Thanks for sharing your wonderful work!

    When I train on DiDeMo dataset, the following information appears on the terminal more than once. 139ae0410327e9b89ba27ca5d1d6cfe

    But the model training was not interrupted. The model could be trained and tested normally. I ran according to the training parameters on DiDeMo dataset you provided, but changed the batch size. I want to know whether you have encountered the same situation, and whether it will affect the model results? I searched the Internet for solutions, but I could hardly find them.

    opened by qjyyyy 1
  • MSVD Weights

    MSVD Weights

    Hello, thanks for providing the code!

    In regards to the models used for generating the results in the paper, is this uploaded anywhere that can be shared? I would be interested in the inference without running the training from scratch.

    opened by ntseng450 1
Owner
ArrowLuo
Each place has water. The problem is that the depth we dig is not enough. So be more patient, be more try.
ArrowLuo
A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large 💻 GitHub Repository 📚 Documentat

Xing Han Lu 244 Dec 30, 2022
CATs: Semantic Correspondence with Transformers

CATs: Semantic Correspondence with Transformers For more information, check out the paper on [arXiv]. Training with different backbones and evaluation

74 Dec 10, 2021
Kinky furry assitant based on GPT2

KinkyFurs-V0 Kinky furry assistant based on GPT2 How to run python3 V0.py then, open web browser and go to localhost:8080 Requirements: Flask trans

Sparki 1 Jun 11, 2022
APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

APEACH - Korean Hate Speech Evaluation Datasets APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of

Kevin-Yang 70 Dec 06, 2022
One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

Adobe, Inc. 148 Dec 26, 2022
A Transformer Implementation that is easy to understand and customizable.

Simple Transformer I've written a series of articles on the transformer architecture and language models on Medium. This repository contains an implem

Naoki Shibuya 4 Jan 20, 2022
A workshop with several modules to help learn Feast, an open-source feature store

Workshop: Learning Feast This workshop aims to teach users about Feast, an open-source feature store. We explain concepts & best practices by example,

Feast 52 Jan 05, 2023
DeepPavlov Tutorials

DeepPavlov tutorials DeepPavlov: Sentence Classification with Word Embeddings DeepPavlov: Transfer Learning with BERT. Classification, Tagging, QA, Ze

Neural Networks and Deep Learning lab, MIPT 28 Sep 13, 2022
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Phil Wang 5k Jan 02, 2023
Dust model dichotomous performance analysis

Dust-model-dichotomous-performance-analysis Using a collated dataset of 90,000 dust point source observations from 9 drylands studies from around the

1 Dec 17, 2021
Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

Serbian AI Society 3 Nov 22, 2022
Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Transliterator Text Editor This is a simple transliteration program which is used to convert english word to phonetically matching word in another lan

Merin Rose Tom 1 Jan 16, 2022
A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Sonnet finder Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet. Usage This is a Python scrip

Marcel Bollmann 11 Sep 25, 2022
A demo of chinese asr

chinese_asr_demo 一个端到端的中文语音识别模型训练、测试框架 具备数据预处理、模型训练、解码、计算wer等等功能 训练数据 训练数据采用thchs_30,

4 Dec 09, 2021
A Japanese tokenizer based on recurrent neural networks

Nagisa is a python module for Japanese word segmentation/POS-tagging. It is designed to be a simple and easy-to-use tool. This tool has the following

325 Jan 05, 2023
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
Speech to text streamlit app

Speech to text Streamlit-app! 👄 This speech to text recognition is powered by t

Charly Wargnier 9 Jan 01, 2023
iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform This repo try to implement iSTFTNet : Fast

Rishikesh (ऋषिकेश) 126 Jan 02, 2023
189 Jan 02, 2023