CVNets: A library for training computer vision networks

This repository contains the source code for training computer vision models. Specifically, it contains the source code of the MobileViT paper for the following tasks:

Image classification on the ImageNet dataset
Object detection using SSD
Semantic segmentation using Deeplabv3

Note: Any image classification backbone can be used with object detection and semantic segmentation models

Training can be done with two samplers:

Standard distributed sampler
Mulit-scale distributed sampler

We recommend to use multi-scale sampler as it improves generalization capability and leads to better performance. See MobileViT for details.

Installation

CVNets can be installed in the local python environment using the below command:

    git clone [email protected]:apple/ml-cvnets.git
    cd ml-cvnets
    pip install -r requirements.txt
    pip install --editable .

We recommend to use Python 3.6+ and PyTorch (version >= v1.8.0) with conda environment. For setting-up python environment with conda, see here.

Getting Started

General instructions for training and evaluation different models are given here.
Examples for a training and evaluating a specific model are provided in the examples folder. Right now, we support following models.
For converting PyTorch models to CoreML, see README-pytorch-to-coreml.md.

Citation

If you find our work useful, please cite the following paper:

@article{mehta2021mobilevit,
  title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={arXiv preprint arXiv:2110.02178},
  year={2021}
}

CVNets: A library for training computer vision networks

Related tags

Overview

CVNets: A library for training computer vision networks

Installation

Getting Started

Citation

Owner

Apple

This is the source code of the 1st place solution for segmentation task (with Dice 90.32%) in 2021 CCF BDCI challenge.

Train a state-of-the-art yolov3 object detector from scratch!

An open-source Deep Learning Engine for Healthcare that aims to treat & prevent major diseases

Codebase for the self-supervised goal reaching benchmark introduced in the LEXA paper

TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular potentials

Torchyolo - Yolov3 ve Yolov4 modellerin Pytorch uygulamasıdır

An implementation of the BADGE batch active learning algorithm.

PICARD - Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

Acute ischemic stroke dataset

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Codebase for "Revisiting spatio-temporal layouts for compositional action recognition" (Oral at BMVC 2021).

Learning from graph data using Keras

Learning to Segment Instances in Videos with Spatial Propagation Network

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Self-Regulated Learning for Egocentric Video Activity Anticipation

LoL Runes Recommender With Python

This code uses generative adversarial networks to generate diverse task allocation plans for Multi-agent teams.

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

MiniSom is a minimalistic implementation of the Self Organizing Maps