The Submission for SIMMC 2.0 Challenge 2021

Related tags

Deep Learningsimmc2.0
Overview

The Submission for SIMMC 2.0 Challenge 2021

Requirements

Preprocessing

  1. Download Data
  • Download the data provided by the challenge organizer and put it in the data folder.
  • Unzip data files
  1. Image saving
  • Preprocess the image files in advance. The preprocessed result has the image name as the key and visual as the value.
python3 image_preprocessor.py
python3 image_preprocessor_final.py

Step 1 (ITM)

First, the model is post-trained by image-to-text matching. Here, image is each object and text is the visual metadata of the object. Code is provided in the ITM folder.

Step 2 (BTM)

Second, pretraining is performed to use background reprsentation of image in subtasks. Similar to ITM, it is trained to match image and text, and the image is the background of the dialog and the text is the entire context of the dialog. Code is provided in the BTM folder.

Step 3

This is the learning process for each subtask. You can train the model in each folder (sub1, sub2_1, sub2_2, sub2_3, sub2_4, sub4).

Model

All models can be downloaded from the following link

model.pt is a model for evaluating devtest, and the result is saved in the dstc10-simmc-entry folder. model_final.pt is a model for evaluating teststd, and the result is saved in the dstc10-simmc-final-entry folder. However, the training of the model was not completed within the challenge period, so we inferred to model.pt for the teststd data in subtask2.

Evlauation

Using the evaluation script suggested by the challenge organizer

The SIMMC organizers introduce the scripts:

(line-by-line evaluation) $ python -m gpt2_dst.scripts.evaluate \ --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \ --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \ --output_path_report={PATH_TO_REPORT} (Or, dialog level evaluation) $ python -m utils.evaluate_dst \ --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \ --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \ --output_path_report={PATH_TO_REPORT} $ python tools/response_evaluation.py \ --data_json_path={PATH_TO_GOLD_RESPONSES} \ --model_response_path={PATH_TO_MODEL_RESPONSES} \ --single_round_evaluation $ python tools/retrieval_evaluation.py \ --retrieval_json_path={PATH_TO_GROUNDTRUTH_RETRIEVAL} \ --model_score_path={PATH_TO_MODEL_CANDIDATE_SCORES} \ --single_round_evaluation ">

     
      
$ python tools/disambiguator_evaluation.py \
	--pred_file="{PATH_TO_PRED_FILE}" \
	--test_file="{PATH_TO_TEST_FILE}" \


      
       
(line-by-line evaluation)
$ python -m gpt2_dst.scripts.evaluate \
  --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
  --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
  --output_path_report={PATH_TO_REPORT}

(Or, dialog level evaluation)
$ python -m utils.evaluate_dst \
    --input_path_target={PATH_TO_GROUNDTRUTH_TARGET} \
    --input_path_predicted={PATH_TO_MODEL_PREDICTIONS} \
    --output_path_report={PATH_TO_REPORT}
    

       
        
$ python tools/response_evaluation.py \
    --data_json_path={PATH_TO_GOLD_RESPONSES} \
    --model_response_path={PATH_TO_MODEL_RESPONSES} \
    --single_round_evaluation


        
         
$ python tools/retrieval_evaluation.py \
    --retrieval_json_path={PATH_TO_GROUNDTRUTH_RETRIEVAL} \
    --model_score_path={PATH_TO_MODEL_CANDIDATE_SCORES} \
    --single_round_evaluation    

        
       
      
     

DevTest Results

Subtask #1: Multimodal Disambiguation

Test Method Accuracy
GPT2 from CO(Challenge Organizer) 73.9
Ours 92.28

Subtask #2: Multimodal Coreference Resolution

Test Method Object F1
GPT2 from CO 0.366
Ours-1 (sub2_1) 0.595
Ours-2 (sub2_2) 0.604
Ours-3 (sub2_3) 0.607
Ours-4 (sub2_4) 0.608

Subtask #3: Multimodal Dialog State Tracking

No Training/Testing

Subtask #4: Multimodal Dialog Response Generation

Generation

Baseline BLEU
GPT2 from CO 0.192
MTN-SIMMC2 from CO 0.217
Ours 0.285

Retrieval

No Training/Testing

Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai

Coursera-deep-learning-specialization - Notes, programming assignments and quizzes from all courses within the Coursera Deep Learning specialization offered by deeplearning.ai: (i) Neural Networks an

Aman Chadha 1.7k Jan 08, 2023
A simple, high level, easy-to-use open source Computer Vision library for Python.

ZoomVision : Slicing Aid Detection A simple, high level, easy-to-use open source Computer Vision library for Python. Installation Installing dependenc

Nurettin Sinanoğlu 2 Mar 04, 2022
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Paper Introduction Multi-task indoor scene understanding is widely considered a

62 Dec 05, 2022
This provides the R code and data to replicate results in "The USS Trustee’s risky strategy"

USSBriefs2021 This provides the R code and data to replicate results in "The USS Trustee’s risky strategy" by Neil M Davies, Jackie Grant and Chin Yan

1 Oct 30, 2021
High performance distributed framework for training deep learning recommendation models based on PyTorch.

PERSIA (Parallel rEcommendation tRaining System with hybrId Acceleration) is developed by AI 340 Dec 30, 2022

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022
A script that trains a model to recognize handwritten digits using the MNIST data set.

handwritten-digits-recognition A script that trains a model to recognize handwritten digits using the MNIST data set. Then it loads external files and

Hamza Sayih 1 Oct 30, 2021
TensorFlow Tutorials with YouTube Videos

TensorFlow Tutorials Original repository on GitHub Original author is Magnus Erik Hvass Pedersen Introduction These tutorials are intended for beginne

9.1k Jan 02, 2023
This respository includes implementations on Manifoldron: Direct Space Partition via Manifold Discovery

Manifoldron: Direct Space Partition via Manifold Discovery This respository includes implementations on Manifoldron: Direct Space Partition via Manifo

dayang_wang 4 Apr 28, 2022
This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement This is the repository for the paper "Improving the Accuracy-Memory Trad

3 Dec 29, 2022
A collection of awesome resources image-to-image translation.

awesome image-to-image translation A collection of resources on image-to-image translation. Contributing If you think I have missed out on something (

876 Dec 28, 2022
Video Swin Transformer - PyTorch

Video-Swin-Transformer-Pytorch This repo is a simple usage of the official implementation "Video Swin Transformer". Introduction Video Swin Transforme

Haofan Wang 116 Dec 20, 2022
Misc YOLOL scripts for use in the Starbase space sandbox videogame

starbase-misc Misc YOLOL scripts for use in the Starbase space sandbox videogame. Each directory contains standalone YOLOL scripts. They don't really

4 Oct 17, 2021
Network Compression via Central Filter

Network Compression via Central Filter Environments The code has been tested in the following environments: Python 3.8 PyTorch 1.8.1 cuda 10.2 torchsu

2 May 12, 2022
[CVPR'2020] DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data

DeepDeform (CVPR'2020) DeepDeform is an RGB-D video dataset containing over 390,000 RGB-D frames in 400 videos, with 5,533 optical and scene flow imag

Aljaz Bozic 165 Jan 09, 2023
MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

2 Jan 29, 2022
Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 03, 2023
Research using Cirq!

ReCirq Research using Cirq! This project contains modules for running quantum computing applications and experiments through Cirq and Quantum Engine.

quantumlib 230 Dec 29, 2022
Score refinement for confidence-based 3D multi-object tracking

Score refinement for confidence-based 3D multi-object tracking Our video gives a brief explanation of our Method. This is the official code for the pa

Cognitive Systems Research Group 47 Dec 26, 2022
Curved Projection Reformation

Description Assuming that we already know the image of the centerline, we want the lumen to be displayed on a plane, which requires curved projection

夜听残荷 5 Sep 11, 2022