Scenic: A Jax Library for Computer Vision and Beyond

Overview

Scenic

Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop classification, segmentation, and detection models for multiple modalities including images, video, audio, and multimodal combinations of them.

More precisely, Scenic is a (i) set of shared light-weight libraries solving tasks commonly encountered tasks when training large-scale (i.e. multi-device, multi-host) vision models; and (ii) a number of projects containing fully fleshed out problem-specific training and evaluation loops using these libraries.

Scenic is developed in JAX and uses Flax.

What we offer

Among others Scenic provides

  • Boilerplate code for launching experiments, summary writing, logging, profiling, etc;
  • Optimized training and evaluation loops, losses, metrics, bi-partite matchers, etc;
  • Input-pipelines for popular vision datasets;
  • Baseline models, including strong non-attentional baselines.

Papers using Scenic

Scenic can be used to reproduce the results from the following papers, which were either developed using Scenic, or have been reimplemented in Scenic:

Philosophy

Scenic aims to facilitate rapid prototyping of large-scale vision models. To keep the code simple to understand and extend we prefer forking and copy-pasting over adding complexity or increasing abstraction. Only when functionality proves to be widely useful across many models and tasks it may be upstreamed to Scenic's shared libraries.

Code structure

Shared libraries provided by Scenic are split into:

  • dataset_lib: Implements IO pipelines for loading and pre-processing data for common Computer Vision tasks and benchmarks. All pipelines are designed to be scalable and support multi-host and multi-device setups, taking care of dividing data among multiple hosts, incomplete batches, caching, pre-fetching, etc.
  • model_lib: Provides (i) several abstract model interfaces (e.g. ClassificationModel or SegmentationModel in model_lib.base_models) with task-specific losses and metrics; (ii) neural network layers in model_lib.layers, focusing on efficient implementation of attention and transfomer layers; and (iii) accelerator-friedly implementations of bipartite matching algorithms in model_lib.matchers.
  • train_lib: Provides tools for constructing training loops and implements several example trainers (classification trainer and segmentation trainer).
  • common_lib: Utilities that do not belong anywhere else.

Projects

Models built on top of Scenic exist as separate projects. Model-specific code such as configs, layers, losses, network architectures, or training and evaluation loops exist as separate projects.

Common baselines such as a ResNet or a Visual Transformer (ViT) are implemented in the projects/baselines project. Forking this directory is a good starting point for new projects.

There is no one-fits-all recipe for how much code should be re-used by projects. Projects can fall anywhere on the wide spectrum of code re-use: from defining new configs for an existing model to redefining models, training loop, logging, etc.

Getting started

  • See projects/baselines/README.md for a walk-through baseline models and instructions on how to run the code.
  • If you would like to to contribute to Scenic, please check out the Philisophy, Code structure and Contributing sections. Should your contribution be a part of the shared libraries, please send us a pull request!

Quick start

Download the code from GitHub

git clone https://github.com/google-research/scenic.git
cd scenic
pip install .

and run training for ViT on ImageNet:

python main.py -- \
  --config=projects/baselines/configs/imagenet/imagenet_vit_config.py \
  --workdir=./

Disclaimer: This is not an official Google product.

Owner
Google Research
Google Research
基于AlphaPose的TensorRT加速

1. Requirements CUDA 11.1 TensorRT 7.2.2 Python 3.8.5 Cython PyTorch 1.8.1 torchvision 0.9.1 numpy 1.17.4 (numpy版本过高会出报错 this issue ) python-package s

52 Dec 06, 2022
WPPNets: Unsupervised CNN Training with Wasserstein Patch Priors for Image Superresolution

WPPNets: Unsupervised CNN Training with Wasserstein Patch Priors for Image Superresolution This code belongs to the paper [1] available at https://arx

Fabian Altekrueger 5 Jun 02, 2022
A project to make Amazon Echo respond to sign language using your webcam

Making Alexa respond to Sign Language using Tensorflow.js Try the live demo Read the Blog Post on Tensorflow's Blog Coming Soon Watch the video This p

Abhishek Singh 444 Jan 03, 2023
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Welcome to AirSim AirSim is a simulator for drones, cars and more, built on Unreal Engine (we now also have an experimental Unity release). It is open

Microsoft 13.8k Jan 05, 2023
Official implementation of Unfolded Deep Kernel Estimation for Blind Image Super-resolution.

Unfolded Deep Kernel Estimation for Blind Image Super-resolution Hongyi Zheng, Hongwei Yong, Lei Zhang, "Unfolded Deep Kernel Estimation for Blind Ima

Z80 15 Dec 26, 2022
LSTM-VAE Implementation and Relevant Evaluations

LSTM-VAE Implementation and Relevant Evaluations Before using any file in this repository, please create two directories under the root directory name

Lan Zhang 5 Oct 08, 2022
Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification We provide the codes for repr

12 Dec 12, 2022
Train a deep learning net with OpenStreetMap features and satellite imagery.

DeepOSM Classify roads and features in satellite imagery, by training neural networks with OpenStreetMap (OSM) data. DeepOSM can: Download a chunk of

TrailBehind, Inc. 1.3k Nov 24, 2022
Implementation of C-RNN-GAN.

Implementation of C-RNN-GAN. Publication: Title: C-RNN-GAN: Continuous recurrent neural networks with adversarial training Information: http://mogren.

Olof Mogren 427 Dec 25, 2022
The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Ren Yurui 261 Jan 09, 2023
TensorFlow-based neural network library

Sonnet Documentation | Examples Sonnet is a library built on top of TensorFlow 2 designed to provide simple, composable abstractions for machine learn

DeepMind 9.5k Jan 07, 2023
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

229 Dec 13, 2022
ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

This is the project page for the paper: ISTR: End-to-End Instance Segmentation via Transformers, Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wa

Jie Hu 182 Dec 19, 2022
EfficientNetV2 implementation using PyTorch

EfficientNetV2-S implementation using PyTorch Train Steps Configure imagenet path by changing data_dir in train.py python main.py --benchmark for mode

Jahongir Yunusov 86 Dec 29, 2022
L-Verse: Bidirectional Generation Between Image and Text

Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalabilty

Kim, Taehoon 102 Dec 21, 2022
验证码识别 深度学习 tensorflow 神经网络

captcha_tf2 验证码识别 深度学习 tensorflow 神经网络 使用卷积神经网络,对字符,数字类型验证码进行识别,tensorflow使用2.0以上 目前项目还在更新中,诸多bug,欢迎提出issue和PR, 希望和你一起共同完善项目。 实例demo 训练过程 优化器选择: Adam

5 Apr 28, 2022
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

10 Mar 16, 2022
RSNA Intracranial Hemorrhage Detection with python

RSNA Intracranial Hemorrhage Detection This is the source code for the first place solution to the RSNA2019 Intracranial Hemorrhage Detection Challeng

24 Nov 30, 2022
Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Chord Recognition Demo application The demo application is written in C# with .NETCore. As of July 9, 2020, the only version available is for windows

Andres Mauricio Rondon Patiño 24 Oct 22, 2022