Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Overview

Event Queue Dialect

Event queue (Equeue) dialect is an MLIR Dialect that models concurrent devices in terms of control and structure.

Motivation

The main motivation of the event queue dialect is to efficiently estimate performance of programs running on heterogenous accelerators. The dialect is designed to bridge the gap between low-level hardware specific dialects and high-level dialects with little hardware specific information, thus facilitating custom lowering among different design choices. In particular, the EventQueue dialect supports modeling memory size constraints, bandwidth constraints, and processing time across a large number of heterogenous processors with distributed event-based control.

By and large, event queue dialect is design to estimate performance of concurrent devices. It supports:

  • Arbitrary hardware hierarchy and each hardware with its own properties.

  • Modeling data movement and buffer allocation that is critical to energy and efficiency estimation.

  • Model concurrency between heterogenous devices.

Check further documentation to see how the goals are achieved.

EQueue Dialect in MLIR Lowering Pipeline

lowering_pipeline

Event queue dialect is designed to do performance analysis.

Because there is a gap between high level dialect that has no structure information, and low level dialect that is too detail to analyze, event queue dialect bridges them.

The input for the event queue dialect is high level control dialect without structure and the output will be dialect describing detailed structure information.

In the lowering pipeline, equeue dialect is at the same level as gpu dialect. The difference is that existing gpu dialect assumes a synchronous gpu model and try to communicate with gpu.barrier among concurrent gpus, while equeue dialect models a more general design, where it allows any kinds of structure, thus allowing maximum flexibility. To describe the complexity of any possible structure in a flexible device like FPGA, equeue dialect develops a general semantics for asynchronous communication between concurrent devices.

How to Use

Dependency

The dependency of this project is MLIR. Because MLIR is project that frequently being updated. When I started the EQueue project, The latest stable version was 12-init. One needs checkout to the right version.

git clone https://github.com/llvm/llvm-project.git
git fetch --all --tags
git checkout tags/llvmorg-12-init -b 
   

   

and then follow MLIR quick start to build executable.

Quick Start

After git clone and cd the repo,

mkdir build
cp *.sh build/
cd build
#change LLVM_EXTERNAL_LIT and MLIR_DIR in run.sh to your local directory
sh config; sh run.sh
./bin/equeue-opt ../test/Equeue/[path-to-input-file.mlir]

Debug Outputs

If one want to turn on debug outputs with -debug or debug-only when there are multiple debugging options

./bin/equeue-opt ../test/Equeue/[path-to-input-file.mlir] -debug
# when there are multiple debugging options
./bin/equeue-opt ../test/Equeue/[path-to-input-file.mlir] -debug-only=command_processor
# to redirect output to file
./bin/equeue-opt ../test/Equeue/[path-to-input-file.mlir] -debug > & report

Visualization

By default equeue-opt will generate a Trace Event Format JSON file to test/Equeue/out.json . You can specify the output file name with -json

./bin/equeue-opt ../test/Equeue/[path-to-input-file.mlir] -json [path-to-json-file.json]

The output JSON file can be viewed in chrome://tracing/

Below is the visualization of running test/EQueue/gpu.mlir

visualization

Examples

You may want to check on Examples on the convolution and the finite impulse response. Detailed explanation can be found in the example directory

Paper and Citation

The paper is accepted to HPCA 2022. We upload a preprint to Arxiv.

Contact

I am Zhijing at Cornell University. This project is originally my Xilinx internship project. I extend after the internship and now it is accepted by HPCA 2022. I will put the reference later. If getting to any trouble, you can contact me at [email protected]

Owner
Cornell Capra
Computer architecture & programming abstractions at Cornell University.
Cornell Capra
Code from PropMix, accepted at BMVC'21

PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels This repository is the official implementation of Hard Sample Fil

6 Dec 21, 2022
This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Discontinuous Grammar as a Foreign Language This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing

Daniel Fernández-González 2 Apr 07, 2022
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

Explainable Fact Checking: A Survey This repository and the accompanying webpage contain resources for the paper "Explainable Fact Checking: A Survey"

Neema Kotonya 42 Nov 17, 2022
Awesome AI Learning with +100 AI Cheat-Sheets, Free online Books, Top Courses, Best Videos and Lectures, Papers, Tutorials, +99 Researchers, Premium Websites, +121 Datasets, Conferences, Frameworks, Tools

All about AI with Cheat-Sheets(+100 Cheat-sheets), Free Online Books, Courses, Videos and Lectures, Papers, Tutorials, Researchers, Websites, Datasets

Niraj Lunavat 1.2k Jan 01, 2023
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks [Paper] [Project Website] This repository holds the source code, pretra

Humam Alwassel 83 Dec 21, 2022
Official source code of Fast Point Transformer, CVPR 2022

Fast Point Transformer Project Page | Paper This repository contains the official source code and data for our paper: Fast Point Transformer Chunghyun

182 Dec 23, 2022
Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.

Molecular Docking integrated in Jupyter Notebooks Description | Citation | Installation | Examples | Limitations | License Table of content Descriptio

Angel J. Ruiz Moreno 173 Dec 25, 2022
Code for 2021 NeurIPS --- Towards Multi-Grained Explainability for Graph Neural Networks

ReFine: Multi-Grained Explainability for GNNs We are trying hard to update the code, but it may take a while to complete due to our tight schedule rec

Shirley (Ying-Xin) Wu 47 Dec 16, 2022
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
Pytorch implementation for DFN: Distributed Feedback Network for Single-Image Deraining.

DFN:Distributed Feedback Network for Single-Image Deraining Abstract Recently, deep convolutional neural networks have achieved great success for sing

6 Nov 05, 2022
Recurrent Conditional Query Learning

Recurrent Conditional Query Learning (RCQL) This repository contains the Pytorch implementation of One Model Packs Thousands of Items with Recurrent C

Dongda 4 Nov 28, 2022
2021 credit card consuming recommendation

2021 credit card consuming recommendation

Wang, Chung-Che 7 Mar 08, 2022
Let's create a tool to convert Thailand budget from PDF to CSV.

thailand-budget-pdf2csv Let's create a tool to convert Thailand Government Budgeting from PDF to CSV! รวมพลัง Dev แปลงงบ จาก PDF สู่ Machine-readable

Kao.Geek 88 Dec 19, 2022
ppo_pytorch_cpp - an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

PPO Pytorch C++ This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment t

Martin Huber 59 Dec 09, 2022
A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding his way.

GuidEye A python software that can help blind people find things like laptops, phones, etc the same way a guide dog guides a blind person in finding h

Munal Jain 0 Aug 09, 2022
Instance-wise Feature Importance in Time (FIT)

Instance-wise Feature Importance in Time (FIT) FIT is a framework for explaining time series perdiction models, by assigning feature importance to eve

Sana 46 Dec 25, 2022
A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.

Telemanom (v2.0) v2.0 updates: Vectorized operations via numpy Object-oriented restructure, improved organization Merge branches into single branch fo

Kyle Hundman 844 Dec 28, 2022
Learning 3D Part Assembly from a Single Image

Learning 3D Part Assembly from a Single Image This repository contains a PyTorch implementation of the paper: Learning 3D Part Assembly from A Single

18 Dec 21, 2022
Use unsupervised and supervised learning to predict stocks

AIAlpha: Multilayer neural network architecture for stock return prediction This project is meant to be an advanced implementation of stacked neural n

Vivek Palaniappan 1.5k Dec 26, 2022