SEJE Pytorch implementation

Related tags

Deep LearningSEJE
Overview

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

Contents

  1. Instroduction
  2. Installation
  3. Recipe1M Dataset
  4. Vision models
  5. Out-of-the-box training
  6. Training
  7. Testing
  8. Contact

Introduction

Overview: SEJE is a two-phase deep feature engineering framework for efficient learning of semantics enhanced joint embedding, which clearly separates the deep feature engineering in data preprocessing from training the text-image joint embedding model. We use the Recipe1M dataset for the technical description and empirical validation. In preprocessing, we perform deep feature engineering by combining deep feature engineering with semantic context features derived from raw text-image input data. We leverage LSTM to identify key terms, deep NLP models from the BERT family, TextRank, or TF-IDF to produce ranking scores for key terms before generating the vector representation for each key term by using word2vec. We leverage wideResNet50 and word2vec to extract and encode the image category semantics of food images to help semantic alignment of the learned recipe and image embeddings in the joint latent space. In joint embedding learning, we perform deep feature engineering by optimizing the batch-hard triplet loss function with soft-margin and double negative sampling, taking into account also the category-based alignment loss and discriminator-based alignment loss. Extensive experiments demonstrate that our SEJE approach with deep feature engineering significantly outperforms the state-of-the-art approaches.

SEJE Architecture

SEJE Phase I Architecture and Examples

SEJE Phase II Architecture

SEJE Joint Embedding Optimization with instance-class double hard sampling strategy

SEJE Joint Embedding Optimization with discriminator based alignment loss regularization

SEJE Experimental Evaluation Highlights

Installation

We use the environment with Python 3.7.6 and Pytorch 1.4.0. Run pip install --upgrade cython and then install the dependencies with pip install -r requirements.txt. Our work is an extension of im2recipe.

Recipe1M Dataset

The Recipe1M dataset is available for download here, where you can find some code used to construct the dataset and get the structured recipe text, food images, pre-trained instruction featuers and so on.

Vision models

This current version of the code uses a pre-trained ResNet-50.

Out-of-the-box training

To train the model, you will need to create following files:

  • data/train_lmdb: LMDB (training) containing skip-instructions vectors, ingredient ids and categories.
  • data/train_keys: pickle (training) file containing skip-instructions vectors, ingredient ids and categories.
  • data/val_lmdb: LMDB (validation) containing skip-instructions vectors, ingredient ids and categories.
  • data/val_keys: pickle (validation) file containing skip-instructions vectors, ingredient ids and categories.
  • data/test_lmdb: LMDB (testing) containing skip-instructions vectors, ingredient ids and categories.
  • data/test_keys: pickle (testing) file containing skip-instructions vectors, ingredient ids and categories.
  • data/text/vocab.txt: file containing all the vocabulary found within the recipes.

Recipe1M LMDBs and pickle files can be found in train.tar, val.tar and test.tar. here

It is worth mentioning that the code is expecting images to be located in a four-level folder structure, e.g. image named 0fa8309c13.jpg can be found in ./data/images/0/f/a/8/0fa8309c13.jpg. Each one of the Tar files contains the first folder level, 16 in total.

The pre-trained TFIDF vectors for each recipe, image category feature for each image and the optimized category label for each image-recipe pair can be found in id2tfidf_vec.pkl, id2img_101_cls_vec.pkl and id2class_1005.pkl respectively.

Word2Vec

Training word2vec with recipe data:

  • Download and compile word2vec
  • Train with:
./word2vec -hs 1 -negative 0 -window 10 -cbow 0 -iter 10 -size 300 -binary 1 -min-count 10 -threads 20 -train tokenized_text.txt -output vocab.bin

The pre-trained word2vec model can be found in vocab.bin.

Training

  • Train the model with:
CUDA_VISIBLE_DEVICES=0 python train.py 

We did the experiments with batch size 100, which takes about 11 GB memory.

Testing

  • Test the trained model with
CUDA_VISIBLE_DEVICES=0 python test.py
  • The results will be saved in results, which include the MedR result and recall scores for the recipe-to-image retrieval and image-to-recipe retrieval.
  • Our best model trained with Recipe1M (TSC paper) can be downloaded here.

Contact

We are continuing the development and there is ongoing work in our lab regarding cross-modal retrieval between cooking recipes and food images. For any questions or suggestions you can use the issues section or reach us at [email protected].

Lead Developer: Zhongwei Xie, Georgia Institute of Technology

Advisor: Prof. Dr. Ling Liu, Georgia Institute of Technology

If you use our code, please cite

[1] Zhongwei Xie, Ling Liu, Yanzhao Wu, et al. Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering[J]//ACM Transactions on Information Systems (TOIS).

[2] Zhongwei Xie, Ling Liu, Lin Li, et al. Efficient Deep Feature Calibration for Cross-Modal Joint Embedding Learning[C]//Proceedings of the 2021 International Conference on Multimodal Interaction. 2021: 43-51.

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Discontinuous Grammar as a Foreign Language This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing

Daniel Fernández-González 2 Apr 07, 2022
Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomaly Detection

Why, hello there! This is the supporting notebook for the research paper — Why Are You Weird? Infusing Interpretability in Isolation Forest for Anomal

2 Dec 14, 2021
A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

This is a simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

crispengari 3 Jan 08, 2022
GUI for TOAD-GAN, a PCG-ML algorithm for Token-based Super Mario Bros. Levels.

If you are using this code in your own project, please cite our paper: @inproceedings{awiszus2020toadgan, title={TOAD-GAN: Coherent Style Level Gene

Maren A. 13 Dec 14, 2022
Video-Music Transformer

VMT Video-Music Transformer (VMT) is an attention-based multi-modal model, which generates piano music for a given video. Paper https://arxiv.org/abs/

Chin-Tung Lin 5 Jul 13, 2022
Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable.

Diffrax Numerical differential equation solvers in JAX. Autodifferentiable and GPU-capable. Diffrax is a JAX-based library providing numerical differe

Patrick Kidger 717 Jan 09, 2023
Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

GANs for Biological Image Synthesis This codes implements the ICCV-2017 paper "GANs for Biological Image Synthesis". The paper and its supplementary m

Anton Osokin 95 Nov 25, 2022
This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool

OpenSurfaces Segmentation UI This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool.

Sean Bell 66 Jul 11, 2022
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023
Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking

Part-aware Measurement for Robust Multi-View Multi-Human 3D Pose Estimation and Tracking Part-Aware Measurement for Robust Multi-View Multi-Human 3D P

19 Oct 27, 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation

ZEROGEN This repository contains the code for our paper “ZeroGen: Efficient Zero

Jiacheng Ye 31 Dec 30, 2022
A Moonraker plug-in for real-time compensation of frame thermal expansion

Frame Expansion Compensation A Moonraker plug-in for real-time compensation of frame thermal expansion. Installation Credit to protoloft, from whom I

58 Jan 02, 2023
Open source implementation of AceNAS: Learning to Rank Ace Neural Architectures with Weak Supervision of Weight Sharing

AceNAS This repo is the experiment code of AceNAS, and is not considered as an official release. We are working on integrating AceNAS as a built-in st

Yuge Zhang 6 Sep 07, 2022
A torch implementation of "Pixel-Level Domain Transfer"

Pixel Level Domain Transfer A torch implementation of "Pixel-Level Domain Transfer". based on dcgan.torch. Dataset The dataset used is "LookBook", fro

Fei Xia 260 Sep 02, 2022
DeepFashion2 is a comprehensive fashion dataset.

DeepFashion2 Dataset DeepFashion2 is a comprehensive fashion dataset. It contains 491K diverse images of 13 popular clothing categories from both comm

switchnorm 1.8k Jan 07, 2023
FewBit — a library for memory efficient training of large neural networks

FewBit FewBit — a library for memory efficient training of large neural networks. Its efficiency originates from storage optimizations applied to back

24 Oct 22, 2022
Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Price-Prediction-For-a-Dream-Home ROADMAP TO THIS LINEAR REGRESSION BASED HOUSE PRICE PREDICTION PREDICTION MODEL Import all the dependencies of the p

DIKSHA DESWAL 1 Dec 29, 2021
Rl-quickstart - Reinforcement Learning Quickstart

Reinforcement Learning Quickstart To get setup with the repository, git clone ht

UCLA DataRes 3 Jun 16, 2022
PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

FullSubNet This Git repository for the official PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech E

郝翔 357 Jan 04, 2023
Generative Exploration and Exploitation - This is an improved version of GENE.

GENE This is an improved version of GENE. In the original version, the states are generated from the decoder of VAE. We have to check whether the gere

33 Mar 23, 2022