Bottleneck Transformers for Visual Recognition
Experiments
Model
Params (M)
Acc (%)
ResNet50 baseline (ref )
23.5M
93.62
BoTNet-50
18.8M
95.11%
BoTNet-S1-50
18.8M
95.67%
BoTNet-S1-59
27.5M
95.98%
BoTNet-S1-77
44.9M
wip
Summary
Usage (example)
from model import Model
model = ResNet50 (num_classes = 1000 , resolution = (224 , 224 ))
x = torch .randn ([2 , 3 , 224 , 224 ])
print (model (x ).size ())
from model import MHSA
resolution = 14
mhsa = MHSA (planes , width = resolution , height = resolution )
Reference
Paper link
Author: Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, Jonathon Shlens, Pieter Abbeel, Ashish Vaswani
Organization: UC Berkeley, Google Research