CUda Matrix Multiply library.

Related tags

Deep Learningcumm
Overview

cumm

CUda Matrix Multiply library.

Build Status

cumm is developed during learning of CUTLASS, which use too much c++ template and make code unmaintainable. So I develop pccm, use python as meta programming language, to replace c++ template meta programming. Now pccm become a foundational framework of cumm and my other c++ project such as spconv. cumm also contains a python asyncio-based gemm simulator that share same meta program with CUDA code, enable gemm visualization and easy debug experience.

Install

Prebuilt

We offer python 3.6-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for linux (manylinux).

We offer python 3.7-3.10 and cuda 10.2/11.1/11.3/11.4 prebuilt binaries for windows 10/11.

We will offer prebuilts for CUDA versions supported by latest pytorch release. For example, pytorch 1.9 support cuda 10.2 and 11.1, so we support them too.

pip install cumm for CPU-only

pip install cumm-cu102 for CUDA 10.2

pip install cumm-cu111 for CUDA 11.1

pip install cumm-cu113 for CUDA 11.3

pip install cumm-cu114 for CUDA 11.4

Build from source for development (JIT, recommend for develop)

WARNING Use code in tags!!! code in main branch may contain bugs.

The c++ code will be built automatically when you change c++ code in project.

Linux

  1. uninstall cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
  2. install build-essential, install CUDA
  3. git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
  4. in python, import cumm and wait for build finish.

Windows

  1. uninstall spconv and cumm installed by pip. you must ensure no "cumm" exists in pip list | grep cumm
  2. install visual studio 2019 or newer. make sure C++ development component is installed. install CUDA
  3. set powershell script execution policy
  4. start a new powershell, run tools/msvc_setup.ps1
  5. git clone https://github.com/FindDefinition/cumm, cd ./cumm, pip install -e .
  6. in python, import cumm and wait for build finish.

Build wheel from source

WARNING Use code in tags!!! code in main branch may contain bugs.

WARNING: If CUMM_CUDA_VERSION is set with a CUDA version, following steps will create a wheel named "cumm-cuxxx", not "cumm", this means you must use cumm-cuxxx in dependency of your project which depend on cumm, not cumm. If CUMM_CUDA_VERSION isn't set, cumm will always built with CUDA, so the CUDA must exists in your system. The wheel name will be cumm even if it is built with cuda.

Linux

It's recommend to build Linux packages in official build docker. Build with CUDA support don't need a real GPU.

Build in Official Docker
  1. select a cuda version. available: CUDA 10.2, 11.1, 11.3, 11.4, 11.5
  2. (Example for CUDA 11.4) git clone https://github.com/FindDefinition/cumm, cd ./cumm, docker run --rm -e PLAT=manylinux2014_x86_64 -e CUMM_CUDA_VERSION=114 -v `pwd`:/io scrin/manylinux2014-cuda:cu114-devel-1.0.0 bash -c "source /etc/bashrc && /io/tools/build-wheels.sh"
Build in your environment
  1. install build-essential, install CUDA
  2. set env for installed cuda version. for example, export CUMM_CUDA_VERSION="11.4". If you want to build CPU-only, run export CUMM_CUDA_VERSION="". If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
  3. run export CUMM_DISABLE_JIT="1"
  4. run python setup.py bdist_wheel+pip install dists/xxx.whl

Windows 10/11

  1. install visual studio 2019 or newer. make sure C++ development package is installed. install CUDA
  2. set powershell script execution policy
  3. start a new powershell, run tools/msvc_setup.ps1
  4. set env for installed cuda version. for example, $Env:CUMM_CUDA_VERSION = "11.4". If you want to build CPU-only, run $Env:CUMM_CUDA_VERSION = "". . If CUMM_CUDA_VERSION isn't set, you need to ensure cuda libraries are inside OS search path, and the built wheel name will be cumm, otherwise cumm-cuxxx
  5. run $Env:CUMM_DISABLE_JIT = "1"
  6. run python setup.py bdist_wheel+pip install dists/xxx.whl

Note

The work is done when the author is an employee at Tusimple.

LICENSE

Apache 2.0

This repository contains the code for the paper "PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization"

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization News: [2020/05/04] Added EGL rendering option for training data g

Shunsuke Saito 1.5k Jan 03, 2023
Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on [ Paper ] [ Project Page ] This repository contains the code fo

Andrew Jong 97 Dec 13, 2022
PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

FastPitchFormant - PyTorch Implementation PyTorch Implementation of FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis. Qu

Keon Lee 63 Jan 02, 2023
PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

Libo Qin 25 Sep 06, 2022
Code for "PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation" CVPR 2019 oral

Good news! We release a clean version of PVNet: clean-pvnet, including how to train the PVNet on the custom dataset. Use PVNet with a detector. The tr

ZJU3DV 722 Dec 27, 2022
WORD: Revisiting Organs Segmentation in the Whole Abdominal Region

WORD: Revisiting Organs Segmentation in the Whole Abdominal Region. This repository provides the codebase and dataset for our work WORD: Revisiting Or

Healthcare Intelligence Laboratory 71 Jan 07, 2023
PyTorch Implementation of the paper Learning to Reweight Examples for Robust Deep Learning

Learning to Reweight Examples for Robust Deep Learning Unofficial PyTorch implementation of Learning to Reweight Examples for Robust Deep Learning. Th

Daniel Stanley Tan 325 Dec 28, 2022
Code repo for "RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network" (Machine Learning and the Physical Sciences workshop in NeurIPS 2021).

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network An official PyTorch implementation of the RBSRICNN network as desc

Rao Muhammad Umer 6 Nov 14, 2022
Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

Implementation of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"(CVPR 2021)

105 Nov 07, 2022
A toolkit for making real world machine learning and data analysis applications in C++

dlib C++ library Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real worl

Davis E. King 11.6k Jan 01, 2023
[ICSE2020] MemLock: Memory Usage Guided Fuzzing

MemLock: Memory Usage Guided Fuzzing This repository provides the tool and the evaluation subjects for the paper "MemLock: Memory Usage Guided Fuzzing

Cheng Wen 54 Jan 07, 2023
Assessing syntactic abilities of BERT

BERT-Syntax Assesing the syntactic abilities of BERT. What Evaluate Google's BERT-Base and BERT-Large models on the syntactic agreement datasets from

Yoav Goldberg 147 Aug 02, 2022
Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks.

Dynamic-Graphs-Construction Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Ne

11 Dec 14, 2022
BanditPAM: Almost Linear-Time k-Medoids Clustering

BanditPAM: Almost Linear-Time k-Medoids Clustering This repo contains a high-performance implementation of BanditPAM from BanditPAM: Almost Linear-Tim

254 Dec 12, 2022
NALSM: Neuron-Astrocyte Liquid State Machine

NALSM: Neuron-Astrocyte Liquid State Machine This package is a Tensorflow implementation of the Neuron-Astrocyte Liquid State Machine (NALSM) that int

Computational Brain Lab 4 Nov 28, 2022
The repository contain code for building compiler using puthon.

Building Compiler This is a python implementation of JamieBuild's "Super Tiny Compiler" Overview JamieBuilds developed a wonderfully educative compile

Shyam Das Shrestha 1 Nov 21, 2021
Square Root Bundle Adjustment for Large-Scale Reconstruction

RootBA: Square Root Bundle Adjustment Project Page | Paper | Poster | Video | Code Table of Contents Citation Dependencies Installing dependencies on

Nikolaus Demmel 205 Dec 20, 2022
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Core ML Tools Use coremltools to convert machine learning models from third-party libraries to the Core ML format. The Python package contains the sup

Apple 3k Jan 08, 2023
Improving adversarial robustness by a coupling rejection strategy

Adversarial Training with Rectified Rejection The code for the paper Adversarial Training with Rectified Rejection. Environment settings and libraries

Tianyu Pang 29 Jan 06, 2023
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 07, 2023