Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Overview

Keras_cv_attention_models


Usage

Basic Usage

  • Current under works: CMT, CoAtNet training.
  • Install as pip package:
    pip install -U keras-cv-attention-models
    # Or
    pip install -U git+https://github.com/leondgarse/keras_cv_attention_models
    Refer to each sub directory for detail usage.
  • Basic model prediction
    from keras_cv_attention_models import volo
    mm = volo.VOLO_d1(pretrained="imagenet")
    
    """ Run predict """
    import tensorflow as tf
    from tensorflow import keras
    from skimage.data import chelsea
    img = chelsea() # Chelsea the cat
    imm = keras.applications.imagenet_utils.preprocess_input(img, mode='torch')
    pred = mm(tf.expand_dims(tf.image.resize(imm, mm.input_shape[1:3]), 0)).numpy()
    pred = tf.nn.softmax(pred).numpy()  # If classifier activation is not softmax
    print(keras.applications.imagenet_utils.decode_predictions(pred)[0])
    # [('n02124075', 'Egyptian_cat', 0.9692954),
    #  ('n02123045', 'tabby', 0.020203391),
    #  ('n02123159', 'tiger_cat', 0.006867502),
    #  ('n02127052', 'lynx', 0.00017674894),
    #  ('n02123597', 'Siamese_cat', 4.9493494e-05)]
  • Exclude model top layers by set num_classes=0
    from keras_cv_attention_models import resnest
    mm = resnest.ResNest50(num_classes=0)
    print(mm.output_shape)
    # (None, 7, 7, 2048)

Layers

  • attention_layers is __init__.py only, which imports core layers defined in model architectures. Like RelativePositionalEmbedding from botnet, outlook_attention from volo.
from keras_cv_attention_models import attention_layers
aa = attention_layers.RelativePositionalEmbedding()
print(f"{aa(tf.ones([1, 4, 14, 16, 256])).shape = }")
# aa(tf.ones([1, 4, 14, 16, 256])).shape = TensorShape([1, 4, 14, 16, 14, 16])

Model surgery

  • model_surgery including functions used to change model parameters after built.
from keras_cv_attention_models import model_surgery
# Replace all ReLU with PReLU
mm = model_surgery.replace_ReLU(keras.applications.ResNet50(), target_activation='PReLU')

AotNet

  • Keras AotNet is just a ResNet / ResNetV2 like framework, that set parameters like attn_types and se_ratio and others, which is used to apply different types attention layer.
    # Mixing se and outlook and halo and mhsa and cot_attention, 21M parameters
    # 50 is just a picked number that larger than the relative `num_block`
    from keras_cv_attention_models import aotnet
    attn_types = [None, "outlook", ["mhsa", "halo"] * 50, "cot"]
    se_ratio = [0.25, 0, 0, 0]
    mm = aotnet.AotNet50V2(attn_types=attn_types, se_ratio=se_ratio, deep_stem=True, strides=1)

ResNetD

Model Params Image resolution Top1 Acc Download
ResNet50D 25.58M 224 80.530 resnet50d.h5
ResNet101D 44.57M 224 83.022 resnet101d.h5
ResNet152D 60.21M 224 83.680 resnet152d.h5
ResNet200D 64.69 224 83.962 resnet200d.h5

ResNeXt

Model Params Image resolution Top1 Acc Download
ResNeXt50 (32x4d) 25M 224 79.768 resnext50_imagenet.h5
- SWSL 25M 224 82.182 resnext50_swsl.h5
ResNeXt50D (32x4d + deep) 25M 224 79.676 resnext50d_imagenet.h5
ResNeXt101 (32x4d) 42M 224 80.334 resnext101_imagenet.h5
- SWSL 42M 224 83.230 resnext101_swsl.h5
ResNeXt101W (32x8d) 89M 224 79.308 resnext101_imagenet.h5
- SWSL 89M 224 84.284 resnext101w_swsl.h5

ResNetQ

Model Params Image resolution Top1 Acc Download
ResNet51Q 35.7M 224 82.36 resnet51q.h5

BotNet

Model Params Image resolution Top1 Acc Download
botnet50 21M 224 77.604 botnet50_imagenet.h5
botnet101 41M 224
botnet152 56M 224

VOLO

Model Params Image resolution Top1 Acc Download
volo_d1 27M 224 84.2 volo_d1_224.h5
volo_d1 ↑384 27M 384 85.2 volo_d1_384.h5
volo_d2 59M 224 85.2 volo_d2_224.h5
volo_d2 ↑384 59M 384 86.0 volo_d2_384.h5
volo_d3 86M 224 85.4 volo_d3_224.h5
volo_d3 ↑448 86M 448 86.3 volo_d3_448.h5
volo_d4 193M 224 85.7 volo_d4_224.h5
volo_d4 ↑448 193M 448 86.8 volo_d4_448.h5
volo_d5 296M 224 86.1 volo_d5_224.h5
volo_d5 ↑448 296M 448 87.0 volo_d5_448.h5
volo_d5 ↑512 296M 512 87.1 volo_d5_512.h5

ResNeSt

Model Params Image resolution Top1 Acc Download
resnest50 28M 224 81.03 resnest50.h5
resnest101 49M 256 82.83 resnest101.h5
resnest200 71M 320 83.84 resnest200.h5
resnest269 111M 416 84.54 resnest269.h5

HaloNet

Model Params Image resolution Top1 Acc
HaloNetH0 6.6M 256 77.9
HaloNetH1 9.1M 256 79.9
HaloNetH2 10.3M 256 80.4
HaloNetH3 12.5M 320 81.9
HaloNetH4 19.5M 384 83.3
- 21k 19.5M 384 85.5
HaloNetH5 31.6M 448 84.0
HaloNetH6 44.3M 512 84.4
HaloNetH7 67.9M 600 84.9

CoTNet

Model Params Image resolution FLOPs Top1 Acc Download
CoTNet-50 22.2M 224 3.3 81.3 cotnet50_224.h5
CoTNeXt-50 30.1M 224 4.3 82.1
SE-CoTNetD-50 23.1M 224 4.1 81.6 se_cotnetd50_224.h5
CoTNet-101 38.3M 224 6.1 82.8 cotnet101_224.h5
CoTNeXt-101 53.4M 224 8.2 83.2
SE-CoTNetD-101 40.9M 224 8.5 83.2 se_cotnetd101_224.h5
SE-CoTNetD-152 55.8M 224 17.0 84.0 se_cotnetd152_224.h5
SE-CoTNetD-152 55.8M 320 26.5 84.6 se_cotnetd152_320.h5

CoAtNet

Model Params Image resolution Top1 Acc
CoAtNet-0 25M 224 81.6
CoAtNet-1 42M 224 83.3
CoAtNet-2 75M 224 84.1
CoAtNet-2, ImageNet-21k pretrain 75M 224 87.1
CoAtNet-3 168M 224 84.5
CoAtNet-3, ImageNet-21k pretrain 168M 224 87.6
CoAtNet-3, ImageNet-21k pretrain 168M 512 87.9
CoAtNet-4, ImageNet-21k pretrain 275M 512 88.1
CoAtNet-4, ImageNet-21K + PT-RA-E150 275M 512 88.56

CMT

Model Params Image resolution Top1 Acc
CMTTiny 9.5M 160 79.2
CMTXS 15.2M 192 81.8
CMTSmall 25.1M 224 83.5
CMTBig 45.7M 256 84.5

CoaT

Model Params Image resolution Top1 Acc Download
CoaTLiteTiny 5.7M 224 77.5 coat_lite_tiny_imagenet.h5
CoaTLiteMini 11M 224 79.1 coat_lite_mini_imagenet.h5
CoaTLiteSmall 20M 224 81.9 coat_lite_small_imagenet.h5
CoaTTiny 5.5M 224 78.3 coat_tiny_imagenet.h5
CoaTMini 10M 224 81.0 coat_mini_imagenet.h5

MLP mixer

Model Params Top1 Acc ImageNet Imagenet21k ImageNet SAM
MLPMixerS32 19.1M 68.70
MLPMixerS16 18.5M 73.83
MLPMixerB32 60.3M 75.53 b32_imagenet_sam.h5
MLPMixerB16 59.9M 80.00 b16_imagenet.h5 b16_imagenet21k.h5 b16_imagenet_sam.h5
MLPMixerL32 206.9M 80.67
MLPMixerL16 208.2M 84.82 l16_imagenet.h5 l16_imagenet21k.h5
- input 448 208.2M 86.78
MLPMixerH14 432.3M 86.32
- input 448 432.3M 87.94

ResMLP

Model Params Image resolution Top1 Acc ImageNet
ResMLP12 15M 224 77.8 resmlp12_imagenet.h5
ResMLP24 30M 224 80.8 resmlp24_imagenet.h5
ResMLP36 116M 224 81.1 resmlp36_imagenet.h5
ResMLP_B24 129M 224 83.6 resmlp_b24_imagenet.h5
- imagenet22k 129M 224 84.4 resmlp_b24_imagenet22k.h5

GMLP

Model Params Image resolution Top1 Acc ImageNet
GMLPTiny16 6M 224 72.3
GMLPS16 20M 224 79.6 gmlp_s16_imagenet.h5
GMLPB16 73M 224 81.6

LeViT

Model Params Image resolution Top1 Acc ImageNet
LeViT128S 7.8M 224 76.6 levit128s_imagenet.h5
LeViT128 9.2M 224 78.6 levit128_imagenet.h5
LeViT192 11M 224 80.0 levit192_imagenet.h5
LeViT256 19M 224 81.6 levit256_imagenet.h5
LeViT384 39M 224 82.6 levit384_imagenet.h5

Other implemented keras models


Comments
  • TPU support for VOLO

    TPU support for VOLO

    While trying VOLO with TPU I'm getting this error, any idea how to reolve this?

    InvalidArgumentError: 9 root error(s) found.
      (0) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_127]]
      (1) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_103]]
      (2) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929 ... [truncated]
    
    enhancement 
    opened by awsaf49 14
  • Use YoloR with swin transformer as backbone.

    Use YoloR with swin transformer as backbone.

    @leondgarse I am trying to get inference using yolor with swin backbone but getting the following results. What can be the issue?

    from keras_cv_attention_models import efficientnet, yolor
    from keras_cv_attention_models import swin_transformer_v2
    
    from keras_cv_attention_models import efficientnet, yolor
    bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(256, 256, 3), num_classes=1000)
    model = yolor.YOLOR(backbone=bb) 
    
    from keras_cv_attention_models import test_images
    imm = test_images.dog_cat()
    preds = model(model.preprocess_input(imm))
    bboxs, lables, confidences = model.decode_predictions(preds)[0]
    
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, lables, confidences)
    

    resulting output download

    opened by farazBhatti 10
  • MobileViT

    MobileViT

    Tried to run MobileViT_S model with input shape 256, 256, 3 and got the following error

    UnimplementedError Traceback (most recent call last) in () 2 3 history = model.fit(get_training_dataset_with_oversample(repeat_dataset=True, oversample=True), steps_per_epoch=STEPS_PER_EPOCH, epochs=EPOCHS, ----> 4 validation_data=get_validation_dataset(), validation_steps=VALIDATION_STEPS) 5

    1 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self) 1189 return self._numpy_internal() 1190 except core._NotOkStatusException as e: # pylint: disable=protected-access -> 1191 raise core._status_to_exception(e) from None # pylint: disable=protected-access 1192 1193 @property

    UnimplementedError: 9 root error(s) found. (0) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] (1) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_35/_445]] (2) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_23/_381]] (3) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Pad_8/_407]] (4) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Maximum_2/y/_341]] (5) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f3 ... [truncated]

    bug good first issue 
    opened by KyloRen1 10
  • [General Questions] Rough estimates for training time for pre-training CoAtNet?

    [General Questions] Rough estimates for training time for pre-training CoAtNet?

    Hi, 👋 Thanks for such an amazing library and taking out the time to implement so many parts of the CoatNet paper!

    In your CoAtNet README, you mentioned you use TPU accelerators. Could you provide a ballpark for the amount of time it took for you to train the biggest models and the corresponding accelerators? I have a task for which I wish to use scaled-up models, but I'd have to pre-train on Imagenet first because of low data amount (<5-10M) and squeeze out maximum accuracy from fine-tuning.

    I assume there might've been a few bottlenecks also, perhaps data? 🤔 If you could describe your setup, it would be very helpful to my experiments!

    Sorry for bothering you with minor questions, and again thank you for all your work!

    opened by neel04 9
  • Visualize saliency map with the attention models

    Visualize saliency map with the attention models

    It would be great if some functional code could be included for plotting attention maps using the attention models. Such a functionality has been provided for the vision transformer models at https://github.com/faustomorales/vit-keras. Thanks and looking forward.

    enhancement good first issue 
    opened by sivaramakrishnan-rajaraman 9
  • How to save models ?

    How to save models ?

    @leondgarse I want to save the models in saved_model format. How to do that? When I am attempting it, it is showing me the error

    WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
    

    What can be the soluion for this?

    Code:

    import os
    from keras_cv_attention_models import mobilevit
    pretrained = '/content/mobilevit_xxs_imagenet.h5'
    model = mobilevit.MobileViT_XXS(pretrained=pretrained)
    model.save('mobilevit_xxs_imagenet1k')
    
    opened by sayannath 7
  • The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    In Line 44 of beit.py, you use tf.meshgrid(range(height), range(width)), while it should be tf.meshgrid(range(width), range(height)), isn't it?

    When I ran the code from Line 44 to Line 52 with height=3 and width=4, it gives the output

    [[17 16 15 10  9  8  3  2  1 -4 -5 -6]
     [18 17 16 11 10  9  4  3  2 -3 -4 -5]
     [19 18 17 12 11 10  5  4  3 -2 -3 -4]
     [24 23 22 17 16 15 10  9  8  3  2  1]
     [25 24 23 18 17 16 11 10  9  4  3  2]
     [26 25 24 19 18 17 12 11 10  5  4  3]
     [31 30 29 24 23 22 17 16 15 10  9  8]
     [32 31 30 25 24 23 18 17 16 11 10  9]
     [33 32 31 26 25 24 19 18 17 12 11 10]
     [38 37 36 31 30 29 24 23 22 17 16 15]
     [39 38 37 32 31 30 25 24 23 18 17 16]
     [40 39 38 33 32 31 26 25 24 19 18 17]], shape=(12, 12), dtype=int32)
    

    which seems incorrect.

    Of course, this is not a problem if you assume height==width, but I think tf.meshgrid(range(width), range(height)) gives more readability and can potentially prevent bugs if height != width is supported in the future.

    bug enhancement 
    opened by xskxzr 6
  • Training of YoloXS Model on Coco dataset

    Training of YoloXS Model on Coco dataset

    Hi, I am currently reproducing the coco training on YoloXS model with line below:

    python leondgarse/coco_train_script.py --det_header yolox.YOLOXS --data_name coco/2014 --batch_size 16

    After my training using 30 epochs, I am getting poor result, as

    # Show result
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, labels, confidences, num_classes=80)
    

    b8fb80fa-e897-4a40-a8be-50d4a59b23a1

    Do I have anything configure wrongly? Or any suggestion could I change? Thanks!

    opened by ThePaperFish 5
  • Update for EdgeNeXt

    Update for EdgeNeXt

    I reproduced EdgeNeXt based on torch and your project, Is there any mistake with this code? Why can't it show all layers details,looks like it's missing some layers in “summary”

    import tensorflow as tf
    from tensorflow import keras
    from keras_cv_attention_models.common_layers import (
        layer_norm, activation_by_name
    )
    from tensorflow.keras import initializers
    from keras_cv_attention_models.attention_layers import (
        conv2d_no_bias,
        drop_block,
    )
    import math
    
    BATCH_NORM_DECAY = 0.9
    BATCH_NORM_EPSILON = 1e-5
    TF_BATCH_NORM_EPSILON = 0.001
    LAYER_NORM_EPSILON = 1e-5
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class PositionalEncodingFourier(keras.layers.Layer):
        def __init__(self, hidden_dim=32, dim=768, temperature=10000):
            super(PositionalEncodingFourier, self).__init__()
            self.token_projection = tf.keras.layers.Conv2D(dim, kernel_size=1)
            self.scale = 2 * math.pi
            self.temperature = temperature
            self.hidden_dim = hidden_dim
            self.dim = dim
            self.eps = 1e-6
    
        def __call__(self, B, H, W, *args, **kwargs):
            mask_tf = tf.zeros([B, H, W])
            not_mask_tf = 1 - mask_tf
            y_embed_tf = tf.cumsum(not_mask_tf, axis=1)
            x_embed_tf = tf.cumsum(not_mask_tf, axis=2)
            y_embed_tf = y_embed_tf / (y_embed_tf[:, -1:, :] + self.eps) * self.scale  # 2 * math.pi
            x_embed_tf = x_embed_tf / (x_embed_tf[:, :, -1:] + self.eps) * self.scale  # 2 * math.pi
            dim_t_tf = tf.range(self.hidden_dim, dtype=tf.float32)
            dim_t_tf = self.temperature ** (2 * (dim_t_tf // 2) / self.hidden_dim)
            pos_x_tf = x_embed_tf[:, :, :, None] / dim_t_tf
            pos_y_tf = y_embed_tf[:, :, :, None] / dim_t_tf
            pos_x_tf = tf.reshape(tf.stack([tf.math.sin(pos_x_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_x_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_y_tf = tf.reshape(tf.stack([tf.math.sin(pos_y_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_y_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_tf = tf.concat([pos_y_tf, pos_x_tf], axis=-1)
            pos_tf = self.token_projection(pos_tf)
    
            return pos_tf
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"token_projection": self.token_projection, "scale": self.scale,
                                "temperature": self.temperature, "hidden_dim": self.hidden_dim,
                                "dim": self.dim, "eps": self.eps})
            return base_config
    
    
    def EdgeNeXt(input_shape=(256, 256, 3), depths=[3, 3, 9, 3], dims=[24, 48, 88, 168],
                 global_block=[0, 0, 0, 3], global_block_type=['None', 'None', 'None', 'SDTA'],
                 drop_path_rate=1, layer_scale_init_value=1e-6, head_init_scale=1., expan_ratio=4,
                 kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False],
                 use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], epsilon=1e-6, model_name='EdgeNeXt'):
        inputs = keras.layers.Input(input_shape, batch_size=2)
    
        nn = conv2d_no_bias(inputs, dims[0], kernel_size=4, strides=4, padding="valid", name="stem_")
        nn = layer_norm(nn, epsilon=epsilon, name='stem_')
    
        drop_connect_rates = tf.linspace(0, stop=drop_path_rate, num=int(
            sum(depths)))  # drop_connect_rates_split(num_blocks, start=0.0, end=drop_connect_rate)
        cur = 0
        for i in range(4):
            for j in range(depths[i]):
                if j > depths[i] - global_block[i] - 1:
                    if global_block_type[i] == 'SDTA':
                        SDTA_encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                     expan_ratio=expan_ratio, scales=d2_scales[i],
                                     use_pos_emb=use_pos_embd_xca[i], num_heads=heads[i], name='stage_'+str(i)+'_SDTA_encoder_'+str(j))(nn)
                    else:
                        raise NotImplementedError
                else:
                    if i != 0 and j == 0:
                        nn = layer_norm(nn, epsilon=epsilon, name='stage_' + str(i) + '_')
                        nn = conv2d_no_bias(nn, dims[i], kernel_size=2, strides=2, padding="valid",
                                            name='stage_' + str(i) + '_')
    
                    Conv_Encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                 layer_scale_init_value=layer_scale_init_value,
                                 expan_ratio=expan_ratio, kernel_size=kernel_sizes[i], name='stage_'+str(i)+'_Conv_Encoder_'+str(j) + '_')(nn)  # drop_connect_rates[cur + j]
    
        model = keras.models.Model(inputs, nn, name=model_name)
        return model
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class Conv_Encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4, kernel_size=7, epsilon=1e-6,
                     name=''):
    
            super(Conv_Encoder, self).__init__()
            self.encoder_name = name
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_path = drop_path
            self.dim = dim
            self.expan_ratio = expan_ratio
            self.kernel_size = kernel_size
            self.epsilon = epsilon
    
        def __call__(self, x, *args, **kwargs):
            inputs = x
            x = keras.layers.Conv2D(self.dim, kernel_size=self.kernel_size, padding="SAME", name=self.encoder_name +'Conv2D')(x)
            x = layer_norm(x, epsilon=self.epsilon, name=self.encoder_name)
            x = keras.layers.Dense(self.expan_ratio * self.dim)(x)
            x = activation_by_name(x, activation="gelu")
            x = keras.layers.Dense(self.dim)(x)
            if self.gamma is not None:
                x = self.gamma * x
    
            x = inputs + drop_block(x, drop_rate=0.)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"gamma": self.gamma, "drop_path": self.drop_path,
                                "dim": self.dim, "expan_ratio": self.expan_ratio,
                                "kernel_size": self.kernel_size})
            return base_config
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class SDTA_encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4,
                     use_pos_emb=True, num_heads=8, qkv_bias=True, attn_drop=0., drop=0., scales=1, zero_gamma=False,
                     activation='gelu', use_bias=False, name='sdf'):
            super(SDTA_encoder, self).__init__()
            self.expan_ratio = expan_ratio
            self.width = max(int(math.ceil(dim / scales)), int(math.floor(dim // scales)))
            self.width_list = [self.width] * (scales - 1)
            self.width_list.append(dim - self.width * (scales - 1))
            self.dim = dim
            self.scales = scales
            if scales == 1:
                self.nums = 1
            else:
                self.nums = scales - 1
            self.pos_embd = None
            if use_pos_emb:
                self.pos_embd = PositionalEncodingFourier(dim=dim)
            self.xca = XCA(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)
            self.gamma_xca = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                         name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_rate = drop_path
            self.drop_path = keras.layers.Dropout(drop_path)
            gamma_initializer = tf.zeros_initializer() if zero_gamma else tf.ones_initializer()
            self.norm = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                        name=name and name + "ln")
            self.norm_xca = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                            name=name and name + "norm_xca")
            self.activation = activation
            self.use_bias = use_bias
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"width": self.width, "dim": self.dim,
                                "nums": self.nums, "pos_embd": self.pos_embd,
                                "xca": self.xca, "gamma_xca": self.gamma_xca,
                                "gamma": self.gamma, "norm": self.norm,
                                "activation": self.activation, "use_bias": self.use_bias,
                                })
            return base_config
    
        def __call__(self, inputs, *args, **kwargs):
            x = inputs
            spx = tf.split(inputs, self.width_list, axis=-1)
            for i in range(self.nums):
                if i == 0:
                    sp = spx[i]
                else:
                    sp = sp + spx[i]
                sp = keras.layers.Conv2D(self.width, kernel_size=3, padding='SAME')(sp)  # , groups=self.width
                if i == 0:
                    out = sp
                else:
                    out = tf.concat([out, sp], -1)
            inputs = tf.concat([out, spx[self.nums]], -1)
    
            # XCA
            B, H, W, C = inputs.shape
            inputs = tf.reshape(inputs, (-1, H * W, C))  # tf.transpose(), perm=[0, 2, 1])
    
            if self.pos_embd:
                pos_encoding = tf.reshape(self.pos_embd(B, H, W), (-1, H * W, C))
                inputs += pos_encoding
    
            if self.gamma_xca is not None:
                inputs = self.gamma_xca * inputs
            input_xca = self.gamma_xca * self.xca(self.norm_xca(inputs))
            inputs = inputs + drop_block(input_xca, drop_rate=self.drop_rate, name="SDTA_encoder_")
            inputs = tf.reshape(inputs, (-1, H, W, C))
    
            # Inverted Bottleneck
            inputs = self.norm(inputs)
            inputs = keras.layers.Conv2D(self.expan_ratio * self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            inputs = activation_by_name(inputs, activation=self.activation)
            inputs = keras.layers.Conv2D(self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            if self.gamma is not None:
                inputs = self.gamma * inputs
    
            x = x + self.drop_path(inputs)
            return x
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class XCA(keras.layers.Layer):
        def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0., name=""):
            super(XCA, self).__init__()
            self.num_heads = num_heads
            self.temperature = tf.Variable(tf.ones(num_heads, 1, 1), trainable=True, name=name + 'gamma')
    
            self.qkv = keras.layers.Dense(dim * 3, use_bias=qkv_bias)
            self.attn_drop = keras.layers.Dropout(attn_drop)
            self.k_ini = initializers.GlorotUniform()
            self.b_ini = initializers.Zeros()
            self.proj = keras.layers.Dense(dim, name="out",
                                           kernel_initializer=self.k_ini, bias_initializer=self.b_ini)
            self.proj_drop = keras.layers.Dropout(proj_drop)
    
        def __call__(self, inputs, training=None, *args, **kwargs):
            input_shape = inputs.shape
            qkv = self.qkv(inputs)
            qkv = tf.reshape(qkv, (input_shape[0], input_shape[1], 3,
                                   self.num_heads,
                                   input_shape[2] // self.num_heads))  # [batch, hh * ww, 3, num_heads, dims_per_head]
            qkv = tf.transpose(qkv, perm=[2, 0, 3, 4, 1])  # [3, batch, num_heads, dims_per_head, hh * ww]
            query, key, value = tf.split(qkv, 3, axis=0)  # [batch, num_heads, dims_per_head, hh * ww]
    
            norm_query, norm_key = tf.nn.l2_normalize(tf.squeeze(query), axis=-1, epsilon=1e-6), \
                                   tf.nn.l2_normalize(tf.squeeze(key), axis=-1, epsilon=1e-6)
            attn = tf.matmul(norm_query, norm_key, transpose_b=True)
            attn = tf.transpose(tf.transpose(attn, perm=[0, 2, 3, 1]) * self.temperature, perm=[0, 3, 2, 1])
    
            attn = tf.nn.softmax(attn, axis=-1)
            attn = self.attn_drop(attn, training=training)  # [batch, num_heads, hh * ww, hh * ww]
    
            x = tf.matmul(attn, value)  # [batch, num_heads, hh * ww, dims_per_head]
            x = tf.reshape(x, [input_shape[0], input_shape[1], input_shape[2]])
    
            x = self.proj(x)
            x = self.proj_drop(x)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"num_heads": self.num_heads, "temperature": self.temperature,
                                "qkv": self.qkv, "attn_drop": self.attn_drop,
                                "proj": self.proj, "proj_drop": self.proj_drop})
            return base_config
    
    
    def edgenext_xx_small(pretrained=False, **kwargs):
        # 1.33M & 260.58M @ 256 resolution
        # 71.23% Top-1 accuracy
        # No AA, Color Jitter=0.4, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=51.66 versus 47.67 for MobileViT_XXS
        # For A100: FPS @ BS=1: 212.13 & @ BS=256: 7042.06 versus FPS @ BS=1: 96.68 & @ BS=256: 4624.71 for MobileViT_XXS
        model = EdgeNeXt(depths=[2, 2, 6, 2], dims=[24, 48, 88, 168], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_x_small(pretrained=False, **kwargs):
        # 2.34M & 538.0M @ 256 resolution
        # 75.00% Top-1 accuracy
        # No AA, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=31.61 versus 28.49 for MobileViT_XS
        # For A100: FPS @ BS=1: 179.55 & @ BS=256: 4404.95 versus FPS @ BS=1: 94.55 & @ BS=256: 2361.53 for MobileViT_XS
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[32, 64, 100, 192], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_small(pretrained=False, **kwargs):
        # 5.59M & 1260.59M @ 256 resolution
        # 79.43% Top-1 accuracy
        # AA=True, No Mixup & Cutmix, DropPath=0.1, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=20.47 versus 18.86 for MobileViT_S
        # For A100: FPS @ BS=1: 172.33 & @ BS=256: 3010.25 versus FPS @ BS=1: 93.84 & @ BS=256: 1785.92 for MobileViT_S
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    if __name__ == '__main__':
        model = edgenext_small()
        model.summary()
        # from download_and_load import keras_reload_from_torch_model
        # keras_reload_from_torch_model(
        #     'D:\GitHub\EdgeNeXt\edgenext_small.pth',
        #     keras_model=model,
        #     # tail_align_dict=tail_align_dict,
        #     # full_name_align_dict=full_name_align_dict,
        #     # additional_transfer=additional_transfer,
        #     input_shape=(256, 256),
        #     do_convert=True,
        #     save_name="adaface_ir101_webface4m.h5",
        # )
    
    
    
    ```
    
    
    
    
    
    opened by whalefa1I 5
  • custom layer issue at tflite conversion

    custom layer issue at tflite conversion

    Hi, thanks for the good references.

    I have implemented MobileViT with your package, and tried to convert the trained model into tflite format. At there, I met an error saying,

    Unknown layer: Addons>GroupNormalization. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details
    

    I tried to addd custom layer name as a parameter of model load, but still facing the issue.

    model = tf.keras.models.load_model('./checkpoints/model_best.h5', custom_objects={'AttentionLayer': AttentionLayer})
    

    Is there any way to solve this?

    Thanks,

    bug good first issue 
    opened by mhyeonsoo 4
  • coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    Hi,

    I try to train a model with:

        model = coat.CoaTMini(input_shape=(200, 240, 1), num_classes=240, pretrained=None)
    

    but the model cannot be build, error out with:

    ValueError: Exception encountered when calling layer "tf.__operators__.add" (type TFOpLambda).
    
    Dimensions must be equal, but are 730 and 677 for '{{node tf.__operators__.add/AddV2}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [?,730,216], [?,677,216].
    
    Call arguments received:
      • x=tf.Tensor(shape=(None, 730, 216), dtype=float32)
      • y=tf.Tensor(shape=(None, 677, 216), dtype=float32)
      • name=None
    

    I just wonder something wrong with coat?

    Thanks.

    bug enhancement 
    opened by mw66 4
  • Can you provide the code for converting pytorch weights to tf?

    Can you provide the code for converting pytorch weights to tf?

    Hi. Can you provide the code for converting pytorch weights to tf, such as beit. Because I wanted to try the effect of beitv2's pre-training weights. Thanks!

    opened by 131404060321 1
  • tflite conversion - GPU/XNNPACK fails

    tflite conversion - GPU/XNNPACK fails

    Hi! Thanks for great repo! I have converted the EfficientFormer model to tflite. However, applying both XNNPACK and GPU delegates fail.

    GPU delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite delegate for GPU. Failed to apply GPU delegate. Benchmarking failed.

    XNNPACK delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Failed to apply XNNPACK delegate. Benchmarking failed.

    Do you know what could be the issue? Im using latest tensorflow version for conversion.

    opened by macsmy 3
Releases(yolov7)
Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Neural Circuit Policies Enabling Auditable Autonomy Online access via SharedIt Neural Circuit Policies (NCPs) are designed sparse recurrent neural net

8 Jan 07, 2023
Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models

LMPBT Supplementary code for the Paper entitled ``Locally Most Powerful Bayesian Test for Out-of-Distribution Detection using Deep Generative Models"

1 Sep 29, 2022
TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network

TransferNet: Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network Created by Seunghoon Hong, Junhyuk Oh,

42 Jun 29, 2022
Torch-based tool for quantizing high-dimensional vectors using additive codebooks

Trainable multi-codebook quantization This repository implements a utility for use with PyTorch, and ideally GPUs, for training an efficient quantizer

Daniel Povey 41 Jan 07, 2023
Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch

MeMOT - Pytorch (wip) Implementation of MeMOT - Multi-Object Tracking with Memory - in Pytorch. This paper is just one in a line of work, but importan

Phil Wang 15 May 09, 2022
This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

ST This is the code of NeurIPS 2021 paper "Towards Enabling Meta-Learning from Target Models". If you use any content of this repo for your work, plea

Su Lu 7 Dec 06, 2022
Space-event-trace - Tracing service for spaceteam events

space-event-trace Tracing service for TU Wien Spaceteam events. This service is

TU Wien Space Team 2 Jan 04, 2022
This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.

ReceptiveFieldAnalysisToolbox This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures usin

84 Nov 23, 2022
Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

Online Pseudo Label Generation by Hierarchical Cluster Dynamics for Adaptive Person Re-identification

TANG, shixiang 6 Nov 25, 2022
Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

Direct LiDAR Odometry: Fast Localization with Dense Point Clouds DLO is a lightweight and computationally-efficient frontend LiDAR odometry solution w

VECTR at UCLA 369 Dec 30, 2022
Head and Neck Tumour Segmentation and Prediction of Patient Survival Project

Head-and-Neck-Tumour-Segmentation-and-Prediction-of-Patient-Survival Welcome to the Head and Neck Tumour Segmentation and Prediction of Patient Surviv

5 Oct 20, 2022
Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Trevor Ablett*, Bryan Chan*,

STARS Laboratory 8 Sep 14, 2022
Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

nsdf Representing SDFs of arbitrary meshes has been a bit tricky so far. Express

Jan Ivanecky 5 Feb 18, 2022
The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

FMFCC-A This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts. The FMFCC-A dataset is shared through BaiduCl

18 Dec 24, 2022
Learning from Synthetic Shadows for Shadow Detection and Removal [Inoue+, IEEE TCSVT 2020].

Learning from Synthetic Shadows for Shadow Detection and Removal (IEEE TCSVT 2020) Overview This repo is for the paper "Learning from Synthetic Shadow

Naoto Inoue 67 Dec 28, 2022
Roger Labbe 13k Dec 29, 2022
This code is a near-infrared spectrum modeling method based on PCA and pls

Nirs-Pls-Corn This code is a near-infrared spectrum modeling method based on PCA and pls 近红外光谱分析技术属于交叉领域,需要化学、计算机科学、生物科学等多领域的合作。为此,在(北邮邮电大学杨辉华老师团队)指导下

Fu Pengyou 6 Dec 17, 2022
Convert Table data to approximate values with GUI

Table_Editor Convert Table data to approximate values with GUIs... usage - Import methods for extension Tables. Imported method supposed to have only

CLJ 1 Jan 10, 2022
Learned model to estimate number of distinct values (NDV) of a population using a small sample.

Learned NDV estimator Learned model to estimate number of distinct values (NDV) of a population using a small sample. The model approximates the maxim

2 Nov 21, 2022
Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Datset)

Graphlevel-SSL Overview Apply Graph Self-Supervised Learning methods to graph-level task(TUDataset, MolculeNet Dataset). It is unified framework to co

JunSeok 8 Oct 15, 2021