A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

Last update: Dec 30, 2022

Related tags

Overview

IconQA

About

IconQA is a new diverse abstract visual question answering dataset that highlights the importance of abstract diagram understanding and comprehensive cognitive reasoning in real-world problems.

There are three different sub-tasks in IconQA:

57,672 image choice MC questions
31,578 text chioce MC questions
18,189 fill-in-the-blank questions

Sub-Tasks	Train	Validation	Test	Total
Multi-image-choice	34,603	11,535	11,535	57,672
Multi-text-choice	18,946	6,316	6,316	31,578
Filling-in-the-blank	10,913	3,638	3,638	18,189

In addition to IconQA, we also present Icon645, a large-scale dataset of icons that cover a wide range of objects:

645,687 colored icons
377 different icon classes

For more details, you can find our website here and our paper here.

Download

Our dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please read the license before you use, change, or share our dataset.

You can download IconQA here. Or run the commands by:

cd data
wget https://iconqa2021.s3.us-west-1.amazonaws.com/iconqa.zip
unzip iconqa.zip

You can download Icon645 here. Or run the commands by:

cd data
wget https://iconqa2021.s3.us-west-1.amazonaws.com/icon645.zip
unzip icon645.zip

File structures for the IconQA dataset:

IconQA
|   LICENSE.md
|   metadata.json
|   pid2skills.json
|   pid_splits.json
|   problems.json
|   skills.json
└───test
│   │
│   └───choose_img
│   |   |
│   |   └───question_id
│   |   |   |   image.png
|   |   |   |   data.json
|   |   |   |   choice_0.png
|   |   |   |   choice_1.png
|   |   |   |   ...
|   |   |
|   |   └───question_id
|   |   |   ...
|   |   
|   └───choose_txt
|   |   |  
|   |   └───question_id
|   |   |   |   image.png
|   |   |   |   data.json
|   |   | 
|   |   └───question_id
|   |   |   ...
|   |
|   └───fill_in_blank
|       |  
|       └───question_id
|       |   |   image.png
|       |   |   data.json
|       | 
|       └───question_id
|       |   ...
|   
└───train
|   |   same as test
|   
└───val
    |   same as test

File structures for the Icon645 dataset:

Icon645
|   LICENCE.md
|   metadata.json
└───colored_icons_final
    |
    └───acorn
    |   |   image_id1.png
    |   |   image_id2.png
    |   |   ...
    |   
    └───airplane
    |   |   image_id3.png
    |   |   ...
    |      
    |   ...

Citation

If the paper or the dataset inspires you, please cite us:

@inproceedings{lu2021iconqa,
  title = {IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning},
  author = {Lu, Pan and Qiu, Liang and Chen, Jiaqi and Xia, Tony and Zhao, Yizhou and Zhang, Wei and Yu, Zhou and Liang, Xiaodan and Zhu, Song-Chun},
  booktitle = {Submitted to the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks},
  year = {2021}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

Related tags

Overview

IconQA

About

Download

Citation

License

Owner

Pan Lu

It's like Shape Editor in Maya but works with skeletons (transforms).

Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation

Multi-objective constrained optimization for energy applications via tree ensembles

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

Snapchat-filters-app-opencv-python - Here we used opencv and other inbuilt python modules to create filter application like snapchat

Classifying audio using Wavelet transform and deep learning

A large-image collection explorer and fast classification tool

Make differentially private training of transformers easy for everyone

Official repository accompanying a CVPR 2022 paper EMOCA: Emotion Driven Monocular Face Capture And Animation. EMOCA takes a single image of a face as input and produces a 3D reconstruction. EMOCA sets the new standard on reconstructing highly emotional images in-the-wild

[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

DeepFashion2 is a comprehensive fashion dataset.

Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

Utilities to bridge Canvas-generated course rosters with GitLab's API.

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL 2021.

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

A robust camera and Lidar fusion based velocity estimator to undistort the pointcloud.

Pytorch port of Google Research's LEAF Audio paper