SEJE Pytorch implementation

Related tags

Deep LearningSEJE
Overview

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

Contents

  1. Instroduction
  2. Installation
  3. Recipe1M Dataset
  4. Vision models
  5. Out-of-the-box training
  6. Training
  7. Testing
  8. Contact

Introduction

Overview: SEJE is a two-phase deep feature engineering framework for efficient learning of semantics enhanced joint embedding, which clearly separates the deep feature engineering in data preprocessing from training the text-image joint embedding model. We use the Recipe1M dataset for the technical description and empirical validation. In preprocessing, we perform deep feature engineering by combining deep feature engineering with semantic context features derived from raw text-image input data. We leverage LSTM to identify key terms, deep NLP models from the BERT family, TextRank, or TF-IDF to produce ranking scores for key terms before generating the vector representation for each key term by using word2vec. We leverage wideResNet50 and word2vec to extract and encode the image category semantics of food images to help semantic alignment of the learned recipe and image embeddings in the joint latent space. In joint embedding learning, we perform deep feature engineering by optimizing the batch-hard triplet loss function with soft-margin and double negative sampling, taking into account also the category-based alignment loss and discriminator-based alignment loss. Extensive experiments demonstrate that our SEJE approach with deep feature engineering significantly outperforms the state-of-the-art approaches.

SEJE Architecture

SEJE Phase I Architecture and Examples

SEJE Phase II Architecture

SEJE Joint Embedding Optimization with instance-class double hard sampling strategy

SEJE Joint Embedding Optimization with discriminator based alignment loss regularization

SEJE Experimental Evaluation Highlights

Installation

We use the environment with Python 3.7.6 and Pytorch 1.4.0. Run pip install --upgrade cython and then install the dependencies with pip install -r requirements.txt. Our work is an extension of im2recipe.

Recipe1M Dataset

The Recipe1M dataset is available for download here, where you can find some code used to construct the dataset and get the structured recipe text, food images, pre-trained instruction featuers and so on.

Vision models

This current version of the code uses a pre-trained ResNet-50.

Out-of-the-box training

To train the model, you will need to create following files:

  • data/train_lmdb: LMDB (training) containing skip-instructions vectors, ingredient ids and categories.
  • data/train_keys: pickle (training) file containing skip-instructions vectors, ingredient ids and categories.
  • data/val_lmdb: LMDB (validation) containing skip-instructions vectors, ingredient ids and categories.
  • data/val_keys: pickle (validation) file containing skip-instructions vectors, ingredient ids and categories.
  • data/test_lmdb: LMDB (testing) containing skip-instructions vectors, ingredient ids and categories.
  • data/test_keys: pickle (testing) file containing skip-instructions vectors, ingredient ids and categories.
  • data/text/vocab.txt: file containing all the vocabulary found within the recipes.

Recipe1M LMDBs and pickle files can be found in train.tar, val.tar and test.tar. here

It is worth mentioning that the code is expecting images to be located in a four-level folder structure, e.g. image named 0fa8309c13.jpg can be found in ./data/images/0/f/a/8/0fa8309c13.jpg. Each one of the Tar files contains the first folder level, 16 in total.

The pre-trained TFIDF vectors for each recipe, image category feature for each image and the optimized category label for each image-recipe pair can be found in id2tfidf_vec.pkl, id2img_101_cls_vec.pkl and id2class_1005.pkl respectively.

Word2Vec

Training word2vec with recipe data:

  • Download and compile word2vec
  • Train with:
./word2vec -hs 1 -negative 0 -window 10 -cbow 0 -iter 10 -size 300 -binary 1 -min-count 10 -threads 20 -train tokenized_text.txt -output vocab.bin

The pre-trained word2vec model can be found in vocab.bin.

Training

  • Train the model with:
CUDA_VISIBLE_DEVICES=0 python train.py 

We did the experiments with batch size 100, which takes about 11 GB memory.

Testing

  • Test the trained model with
CUDA_VISIBLE_DEVICES=0 python test.py
  • The results will be saved in results, which include the MedR result and recall scores for the recipe-to-image retrieval and image-to-recipe retrieval.
  • Our best model trained with Recipe1M (TSC paper) can be downloaded here.

Contact

We are continuing the development and there is ongoing work in our lab regarding cross-modal retrieval between cooking recipes and food images. For any questions or suggestions you can use the issues section or reach us at [email protected].

Lead Developer: Zhongwei Xie, Georgia Institute of Technology

Advisor: Prof. Dr. Ling Liu, Georgia Institute of Technology

If you use our code, please cite

[1] Zhongwei Xie, Ling Liu, Yanzhao Wu, et al. Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering[J]//ACM Transactions on Information Systems (TOIS).

[2] Zhongwei Xie, Ling Liu, Lin Li, et al. Efficient Deep Feature Calibration for Cross-Modal Joint Embedding Learning[C]//Proceedings of the 2021 International Conference on Multimodal Interaction. 2021: 43-51.

This dlib-based facial login system

Facial-Login-System This dlib-based facial login system is a technology capable of matching a human face from a digital webcam frame capture against a

Mushahid Ali 3 Apr 23, 2022
Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

SESS: Self-Ensembling Semi-Supervised 3D Object Detection Created by Na Zhao from National University of Singapore Introduction This repository contai

125 Dec 23, 2022
PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation

PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation

76 Dec 24, 2022
Implementation of SiameseXML (ICML 2021)

SiameseXML Code for SiameseXML: Siamese networks meet extreme classifiers with 100M labels Best Practices for features creation Adding sub-words on to

Extreme Classification 35 Nov 06, 2022
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

132 Dec 23, 2022
This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)

Elaborative Rehearsal for Zero-shot Action Recognition This is an official implementation of: Shizhe Chen and Dong Huang, Elaborative Rehearsal for Ze

DeLightCMU 26 Sep 24, 2022
Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning This is the official repository of "Camera Distortion-

Hanbyel Cho 12 Oct 06, 2022
Pytorch implementation of XRD spectral identification from COD database

XRDidentifier Pytorch implementation of XRD spectral identification from COD database. Details will be explained in the paper to be submitted to NeurI

Masaki Adachi 4 Jan 07, 2023
"Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback"

This is code repo for our EMNLP 2017 paper "Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback", which implements the A2C algorithm on top of a neural encoder-

Khanh Nguyen 131 Oct 21, 2022
NEG loss implemented in pytorch

Pytorch Negative Sampling Loss Negative Sampling Loss implemented in PyTorch. Usage neg_loss = NEG_loss(num_classes, embedding_size) optimizer =

Daniil Gavrilov 123 Sep 13, 2022
Research Artifact of USENIX Security 2022 Paper: Automated Side Channel Analysis of Media Software with Manifold Learning

Automated Side Channel Analysis of Media Software with Manifold Learning Official implementation of USENIX Security 2022 paper: Automated Side Channel

Yuanyuan Yuan 175 Jan 07, 2023
Implementations of CNNs, RNNs, GANs, etc

Tensorflow Programs and Tutorials This repository will contain Tensorflow tutorials on a lot of the most popular deep learning concepts. It'll also co

Adit Deshpande 1k Dec 30, 2022
[ICML 2021] “ Self-Damaging Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Bobak Mortazavi, Zhangyang Wang

Self-Damaging Contrastive Learning Introduction The recent breakthrough achieved by contrastive learning accelerates the pace for deploying unsupervis

VITA 51 Dec 29, 2022
A Pytorch implementation of "LegoNet: Efficient Convolutional Neural Networks with Lego Filters" (ICML 2019).

LegoNet This code is the implementation of ICML2019 paper LegoNet: Efficient Convolutional Neural Networks with Lego Filters Run python train.py You c

YangZhaohui 140 Sep 26, 2022
Learning multiple gaits of quadruped robot using hierarchical reinforcement learning

Learning multiple gaits of quadruped robot using hierarchical reinforcement learning We propose a method to learn multiple gaits of quadruped robot us

Yunho Kim 17 Dec 11, 2022
Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI

EmotionUI Software for Multimodalty 2D+3D Facial Expression Recognition (FER) UI. demo screenshot (with RealSense) required packages Python = 3.6 num

Yang Jiao 2 Dec 23, 2021
Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

Gradient Centralization TensorFlow This Python package implements Gradient Centralization in TensorFlow, a simple and effective optimization technique

Rishit Dagli 101 Nov 01, 2022
Neural Point-Based Graphics

Neural Point-Based Graphics Project   Video   Paper Neural Point-Based Graphics Kara-Ali Aliev1 Artem Sevastopolsky1,2 Maria Kolos1,2 Dmitry Ulyanov3

Ali Aliev 252 Dec 13, 2022
Official implementation of Long-Short Transformer in PyTorch.

Long-Short Transformer (Transformer-LS) This repository hosts the code and models for the paper: Long-Short Transformer: Efficient Transformers for La

NVIDIA Corporation 198 Dec 29, 2022
Modifications of the official PyTorch implementation of StyleGAN3. Let's easily generate images and videos with StyleGAN2/2-ADA/3!

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation of the NeurIPS 2021 paper Alias-Free Generative Adversarial Net

Diego Porres 185 Dec 24, 2022