Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Last update: Dec 05, 2022

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Code for the MMCoref task of the SIMMC 2.0 dataset.
Pretrained vision-language models adapted from Transformers-VQA.
Zero-shot visual feature extraction using CLIP and BUTD.
Zero-shot non-visual prefab feature (flattened into strings) extraction using BERT and SBERT.

Dependencies

requirements.txt

Download the data and pretrained/trained model checkpoints

Data: Put the data in ./data. Unpack all image in ./data/all_images and all scene.jsons (including teststd split) in ./data/simmc2_scene_jsons_dstc10_public/public.
Pretrained models: Checkpoints in ./pretrained and ./model/Transformers-VQA-master/models/pretrained. Download links in placeholder.txt in these folders.
Trained models: Checkpints in ./trained. Download from ./trained/placeholder.txt

Preprocess

Convert json files ~~using ./scripts/converter.py~~ *Currently not working. (Someone managed to lose the latest converter.py.) Download the processed data instead.
Get BERT/SBERT embeddings of non-visual prefab features using ./scripts/{get_KB_embedding, get_KB_embedding_SBERT, get_KB_embedding_no_duplicate}.py
Get CLIP/BUTD embeddigns for images using scripts ./scripts/get-visual-features-{CLIP, RCNN}.ipynb
Or just download everything from ./processed/placeholder.txt

Train

Under ./sh/train. See the arguments for used input.

Inference and evaluate

Under ./sh/infer_eval (devtest split) and ./sh/infer_eval_dev (dev split)
Outputs at ./output (same format as the original dialogue json).
Logits at ./output/logit {dialogue_idx: {round_idx: [[logit, label], ...]}}
run ./scripts/output_filter_error.py to select and reformat error cases.

Ensemble

cd script python ensemble --method optuna

output saved to output/logit/blended_devtest.json

Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

Related tags

Overview

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv

MMCoref_cleaned

Dependencies

Download the data and pretrained/trained model checkpoints

Preprocess

Train

Inference and evaluate

Ensemble

Owner

Yichen (William) Huang

A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swar.

Training code and evaluation benchmarks for the "Self-Supervised Policy Adaptation during Deployment" paper.

Official PyTorch Implementation for "Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes"

Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters"

The goal of the exercises below is to evaluate the candidate knowledge and problem solving expertise regarding the main development focuses for the iFood ML Platform team: MLOps and Feature Store development.

TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular potentials

Learning cell communication from spatial graphs of cells

Real time sign language recognition

Code for "Learning Graph Cellular Automata"

Adaptive Prototype Learning and Allocation for Few-Shot Segmentation (CVPR 2021)

X-VLM: Multi-Grained Vision Language Pre-Training

Patch2Pix: Epipolar-Guided Pixel-Level Correspondences [CVPR2021]

PyTorch code for MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

GAN-based Matrix Factorization for Recommender Systems

Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning

A library for finding knowledge neurons in pretrained transformer models.

Robust Partial Matching for Person Search in the Wild

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].