Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Overview

Keras_cv_attention_models


Usage

Basic Usage

  • Current under works: CMT, CoAtNet training.
  • Install as pip package:
    pip install -U keras-cv-attention-models
    # Or
    pip install -U git+https://github.com/leondgarse/keras_cv_attention_models
    Refer to each sub directory for detail usage.
  • Basic model prediction
    from keras_cv_attention_models import volo
    mm = volo.VOLO_d1(pretrained="imagenet")
    
    """ Run predict """
    import tensorflow as tf
    from tensorflow import keras
    from skimage.data import chelsea
    img = chelsea() # Chelsea the cat
    imm = keras.applications.imagenet_utils.preprocess_input(img, mode='torch')
    pred = mm(tf.expand_dims(tf.image.resize(imm, mm.input_shape[1:3]), 0)).numpy()
    pred = tf.nn.softmax(pred).numpy()  # If classifier activation is not softmax
    print(keras.applications.imagenet_utils.decode_predictions(pred)[0])
    # [('n02124075', 'Egyptian_cat', 0.9692954),
    #  ('n02123045', 'tabby', 0.020203391),
    #  ('n02123159', 'tiger_cat', 0.006867502),
    #  ('n02127052', 'lynx', 0.00017674894),
    #  ('n02123597', 'Siamese_cat', 4.9493494e-05)]
  • Exclude model top layers by set num_classes=0
    from keras_cv_attention_models import resnest
    mm = resnest.ResNest50(num_classes=0)
    print(mm.output_shape)
    # (None, 7, 7, 2048)

Layers

  • attention_layers is __init__.py only, which imports core layers defined in model architectures. Like RelativePositionalEmbedding from botnet, outlook_attention from volo.
from keras_cv_attention_models import attention_layers
aa = attention_layers.RelativePositionalEmbedding()
print(f"{aa(tf.ones([1, 4, 14, 16, 256])).shape = }")
# aa(tf.ones([1, 4, 14, 16, 256])).shape = TensorShape([1, 4, 14, 16, 14, 16])

Model surgery

  • model_surgery including functions used to change model parameters after built.
from keras_cv_attention_models import model_surgery
# Replace all ReLU with PReLU
mm = model_surgery.replace_ReLU(keras.applications.ResNet50(), target_activation='PReLU')

AotNet

  • Keras AotNet is just a ResNet / ResNetV2 like framework, that set parameters like attn_types and se_ratio and others, which is used to apply different types attention layer.
    # Mixing se and outlook and halo and mhsa and cot_attention, 21M parameters
    # 50 is just a picked number that larger than the relative `num_block`
    from keras_cv_attention_models import aotnet
    attn_types = [None, "outlook", ["mhsa", "halo"] * 50, "cot"]
    se_ratio = [0.25, 0, 0, 0]
    mm = aotnet.AotNet50V2(attn_types=attn_types, se_ratio=se_ratio, deep_stem=True, strides=1)

ResNetD

Model Params Image resolution Top1 Acc Download
ResNet50D 25.58M 224 80.530 resnet50d.h5
ResNet101D 44.57M 224 83.022 resnet101d.h5
ResNet152D 60.21M 224 83.680 resnet152d.h5
ResNet200D 64.69 224 83.962 resnet200d.h5

ResNeXt

Model Params Image resolution Top1 Acc Download
ResNeXt50 (32x4d) 25M 224 79.768 resnext50_imagenet.h5
- SWSL 25M 224 82.182 resnext50_swsl.h5
ResNeXt50D (32x4d + deep) 25M 224 79.676 resnext50d_imagenet.h5
ResNeXt101 (32x4d) 42M 224 80.334 resnext101_imagenet.h5
- SWSL 42M 224 83.230 resnext101_swsl.h5
ResNeXt101W (32x8d) 89M 224 79.308 resnext101_imagenet.h5
- SWSL 89M 224 84.284 resnext101w_swsl.h5

ResNetQ

Model Params Image resolution Top1 Acc Download
ResNet51Q 35.7M 224 82.36 resnet51q.h5

BotNet

Model Params Image resolution Top1 Acc Download
botnet50 21M 224 77.604 botnet50_imagenet.h5
botnet101 41M 224
botnet152 56M 224

VOLO

Model Params Image resolution Top1 Acc Download
volo_d1 27M 224 84.2 volo_d1_224.h5
volo_d1 ↑384 27M 384 85.2 volo_d1_384.h5
volo_d2 59M 224 85.2 volo_d2_224.h5
volo_d2 ↑384 59M 384 86.0 volo_d2_384.h5
volo_d3 86M 224 85.4 volo_d3_224.h5
volo_d3 ↑448 86M 448 86.3 volo_d3_448.h5
volo_d4 193M 224 85.7 volo_d4_224.h5
volo_d4 ↑448 193M 448 86.8 volo_d4_448.h5
volo_d5 296M 224 86.1 volo_d5_224.h5
volo_d5 ↑448 296M 448 87.0 volo_d5_448.h5
volo_d5 ↑512 296M 512 87.1 volo_d5_512.h5

ResNeSt

Model Params Image resolution Top1 Acc Download
resnest50 28M 224 81.03 resnest50.h5
resnest101 49M 256 82.83 resnest101.h5
resnest200 71M 320 83.84 resnest200.h5
resnest269 111M 416 84.54 resnest269.h5

HaloNet

Model Params Image resolution Top1 Acc
HaloNetH0 6.6M 256 77.9
HaloNetH1 9.1M 256 79.9
HaloNetH2 10.3M 256 80.4
HaloNetH3 12.5M 320 81.9
HaloNetH4 19.5M 384 83.3
- 21k 19.5M 384 85.5
HaloNetH5 31.6M 448 84.0
HaloNetH6 44.3M 512 84.4
HaloNetH7 67.9M 600 84.9

CoTNet

Model Params Image resolution FLOPs Top1 Acc Download
CoTNet-50 22.2M 224 3.3 81.3 cotnet50_224.h5
CoTNeXt-50 30.1M 224 4.3 82.1
SE-CoTNetD-50 23.1M 224 4.1 81.6 se_cotnetd50_224.h5
CoTNet-101 38.3M 224 6.1 82.8 cotnet101_224.h5
CoTNeXt-101 53.4M 224 8.2 83.2
SE-CoTNetD-101 40.9M 224 8.5 83.2 se_cotnetd101_224.h5
SE-CoTNetD-152 55.8M 224 17.0 84.0 se_cotnetd152_224.h5
SE-CoTNetD-152 55.8M 320 26.5 84.6 se_cotnetd152_320.h5

CoAtNet

Model Params Image resolution Top1 Acc
CoAtNet-0 25M 224 81.6
CoAtNet-1 42M 224 83.3
CoAtNet-2 75M 224 84.1
CoAtNet-2, ImageNet-21k pretrain 75M 224 87.1
CoAtNet-3 168M 224 84.5
CoAtNet-3, ImageNet-21k pretrain 168M 224 87.6
CoAtNet-3, ImageNet-21k pretrain 168M 512 87.9
CoAtNet-4, ImageNet-21k pretrain 275M 512 88.1
CoAtNet-4, ImageNet-21K + PT-RA-E150 275M 512 88.56

CMT

Model Params Image resolution Top1 Acc
CMTTiny 9.5M 160 79.2
CMTXS 15.2M 192 81.8
CMTSmall 25.1M 224 83.5
CMTBig 45.7M 256 84.5

CoaT

Model Params Image resolution Top1 Acc Download
CoaTLiteTiny 5.7M 224 77.5 coat_lite_tiny_imagenet.h5
CoaTLiteMini 11M 224 79.1 coat_lite_mini_imagenet.h5
CoaTLiteSmall 20M 224 81.9 coat_lite_small_imagenet.h5
CoaTTiny 5.5M 224 78.3 coat_tiny_imagenet.h5
CoaTMini 10M 224 81.0 coat_mini_imagenet.h5

MLP mixer

Model Params Top1 Acc ImageNet Imagenet21k ImageNet SAM
MLPMixerS32 19.1M 68.70
MLPMixerS16 18.5M 73.83
MLPMixerB32 60.3M 75.53 b32_imagenet_sam.h5
MLPMixerB16 59.9M 80.00 b16_imagenet.h5 b16_imagenet21k.h5 b16_imagenet_sam.h5
MLPMixerL32 206.9M 80.67
MLPMixerL16 208.2M 84.82 l16_imagenet.h5 l16_imagenet21k.h5
- input 448 208.2M 86.78
MLPMixerH14 432.3M 86.32
- input 448 432.3M 87.94

ResMLP

Model Params Image resolution Top1 Acc ImageNet
ResMLP12 15M 224 77.8 resmlp12_imagenet.h5
ResMLP24 30M 224 80.8 resmlp24_imagenet.h5
ResMLP36 116M 224 81.1 resmlp36_imagenet.h5
ResMLP_B24 129M 224 83.6 resmlp_b24_imagenet.h5
- imagenet22k 129M 224 84.4 resmlp_b24_imagenet22k.h5

GMLP

Model Params Image resolution Top1 Acc ImageNet
GMLPTiny16 6M 224 72.3
GMLPS16 20M 224 79.6 gmlp_s16_imagenet.h5
GMLPB16 73M 224 81.6

LeViT

Model Params Image resolution Top1 Acc ImageNet
LeViT128S 7.8M 224 76.6 levit128s_imagenet.h5
LeViT128 9.2M 224 78.6 levit128_imagenet.h5
LeViT192 11M 224 80.0 levit192_imagenet.h5
LeViT256 19M 224 81.6 levit256_imagenet.h5
LeViT384 39M 224 82.6 levit384_imagenet.h5

Other implemented keras models


Comments
  • TPU support for VOLO

    TPU support for VOLO

    While trying VOLO with TPU I'm getting this error, any idea how to reolve this?

    InvalidArgumentError: 9 root error(s) found.
      (0) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_127]]
      (1) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_103]]
      (2) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929 ... [truncated]
    
    enhancement 
    opened by awsaf49 14
  • Use YoloR with swin transformer as backbone.

    Use YoloR with swin transformer as backbone.

    @leondgarse I am trying to get inference using yolor with swin backbone but getting the following results. What can be the issue?

    from keras_cv_attention_models import efficientnet, yolor
    from keras_cv_attention_models import swin_transformer_v2
    
    from keras_cv_attention_models import efficientnet, yolor
    bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(256, 256, 3), num_classes=1000)
    model = yolor.YOLOR(backbone=bb) 
    
    from keras_cv_attention_models import test_images
    imm = test_images.dog_cat()
    preds = model(model.preprocess_input(imm))
    bboxs, lables, confidences = model.decode_predictions(preds)[0]
    
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, lables, confidences)
    

    resulting output download

    opened by farazBhatti 10
  • MobileViT

    MobileViT

    Tried to run MobileViT_S model with input shape 256, 256, 3 and got the following error

    UnimplementedError Traceback (most recent call last) in () 2 3 history = model.fit(get_training_dataset_with_oversample(repeat_dataset=True, oversample=True), steps_per_epoch=STEPS_PER_EPOCH, epochs=EPOCHS, ----> 4 validation_data=get_validation_dataset(), validation_steps=VALIDATION_STEPS) 5

    1 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self) 1189 return self._numpy_internal() 1190 except core._NotOkStatusException as e: # pylint: disable=protected-access -> 1191 raise core._status_to_exception(e) from None # pylint: disable=protected-access 1192 1193 @property

    UnimplementedError: 9 root error(s) found. (0) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] (1) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_35/_445]] (2) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_23/_381]] (3) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Pad_8/_407]] (4) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Maximum_2/y/_341]] (5) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f3 ... [truncated]

    bug good first issue 
    opened by KyloRen1 10
  • [General Questions] Rough estimates for training time for pre-training CoAtNet?

    [General Questions] Rough estimates for training time for pre-training CoAtNet?

    Hi, 👋 Thanks for such an amazing library and taking out the time to implement so many parts of the CoatNet paper!

    In your CoAtNet README, you mentioned you use TPU accelerators. Could you provide a ballpark for the amount of time it took for you to train the biggest models and the corresponding accelerators? I have a task for which I wish to use scaled-up models, but I'd have to pre-train on Imagenet first because of low data amount (<5-10M) and squeeze out maximum accuracy from fine-tuning.

    I assume there might've been a few bottlenecks also, perhaps data? 🤔 If you could describe your setup, it would be very helpful to my experiments!

    Sorry for bothering you with minor questions, and again thank you for all your work!

    opened by neel04 9
  • Visualize saliency map with the attention models

    Visualize saliency map with the attention models

    It would be great if some functional code could be included for plotting attention maps using the attention models. Such a functionality has been provided for the vision transformer models at https://github.com/faustomorales/vit-keras. Thanks and looking forward.

    enhancement good first issue 
    opened by sivaramakrishnan-rajaraman 9
  • How to save models ?

    How to save models ?

    @leondgarse I want to save the models in saved_model format. How to do that? When I am attempting it, it is showing me the error

    WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
    

    What can be the soluion for this?

    Code:

    import os
    from keras_cv_attention_models import mobilevit
    pretrained = '/content/mobilevit_xxs_imagenet.h5'
    model = mobilevit.MobileViT_XXS(pretrained=pretrained)
    model.save('mobilevit_xxs_imagenet1k')
    
    opened by sayannath 7
  • The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    In Line 44 of beit.py, you use tf.meshgrid(range(height), range(width)), while it should be tf.meshgrid(range(width), range(height)), isn't it?

    When I ran the code from Line 44 to Line 52 with height=3 and width=4, it gives the output

    [[17 16 15 10  9  8  3  2  1 -4 -5 -6]
     [18 17 16 11 10  9  4  3  2 -3 -4 -5]
     [19 18 17 12 11 10  5  4  3 -2 -3 -4]
     [24 23 22 17 16 15 10  9  8  3  2  1]
     [25 24 23 18 17 16 11 10  9  4  3  2]
     [26 25 24 19 18 17 12 11 10  5  4  3]
     [31 30 29 24 23 22 17 16 15 10  9  8]
     [32 31 30 25 24 23 18 17 16 11 10  9]
     [33 32 31 26 25 24 19 18 17 12 11 10]
     [38 37 36 31 30 29 24 23 22 17 16 15]
     [39 38 37 32 31 30 25 24 23 18 17 16]
     [40 39 38 33 32 31 26 25 24 19 18 17]], shape=(12, 12), dtype=int32)
    

    which seems incorrect.

    Of course, this is not a problem if you assume height==width, but I think tf.meshgrid(range(width), range(height)) gives more readability and can potentially prevent bugs if height != width is supported in the future.

    bug enhancement 
    opened by xskxzr 6
  • Training of YoloXS Model on Coco dataset

    Training of YoloXS Model on Coco dataset

    Hi, I am currently reproducing the coco training on YoloXS model with line below:

    python leondgarse/coco_train_script.py --det_header yolox.YOLOXS --data_name coco/2014 --batch_size 16

    After my training using 30 epochs, I am getting poor result, as

    # Show result
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, labels, confidences, num_classes=80)
    

    b8fb80fa-e897-4a40-a8be-50d4a59b23a1

    Do I have anything configure wrongly? Or any suggestion could I change? Thanks!

    opened by ThePaperFish 5
  • Update for EdgeNeXt

    Update for EdgeNeXt

    I reproduced EdgeNeXt based on torch and your project, Is there any mistake with this code? Why can't it show all layers details,looks like it's missing some layers in “summary”

    import tensorflow as tf
    from tensorflow import keras
    from keras_cv_attention_models.common_layers import (
        layer_norm, activation_by_name
    )
    from tensorflow.keras import initializers
    from keras_cv_attention_models.attention_layers import (
        conv2d_no_bias,
        drop_block,
    )
    import math
    
    BATCH_NORM_DECAY = 0.9
    BATCH_NORM_EPSILON = 1e-5
    TF_BATCH_NORM_EPSILON = 0.001
    LAYER_NORM_EPSILON = 1e-5
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class PositionalEncodingFourier(keras.layers.Layer):
        def __init__(self, hidden_dim=32, dim=768, temperature=10000):
            super(PositionalEncodingFourier, self).__init__()
            self.token_projection = tf.keras.layers.Conv2D(dim, kernel_size=1)
            self.scale = 2 * math.pi
            self.temperature = temperature
            self.hidden_dim = hidden_dim
            self.dim = dim
            self.eps = 1e-6
    
        def __call__(self, B, H, W, *args, **kwargs):
            mask_tf = tf.zeros([B, H, W])
            not_mask_tf = 1 - mask_tf
            y_embed_tf = tf.cumsum(not_mask_tf, axis=1)
            x_embed_tf = tf.cumsum(not_mask_tf, axis=2)
            y_embed_tf = y_embed_tf / (y_embed_tf[:, -1:, :] + self.eps) * self.scale  # 2 * math.pi
            x_embed_tf = x_embed_tf / (x_embed_tf[:, :, -1:] + self.eps) * self.scale  # 2 * math.pi
            dim_t_tf = tf.range(self.hidden_dim, dtype=tf.float32)
            dim_t_tf = self.temperature ** (2 * (dim_t_tf // 2) / self.hidden_dim)
            pos_x_tf = x_embed_tf[:, :, :, None] / dim_t_tf
            pos_y_tf = y_embed_tf[:, :, :, None] / dim_t_tf
            pos_x_tf = tf.reshape(tf.stack([tf.math.sin(pos_x_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_x_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_y_tf = tf.reshape(tf.stack([tf.math.sin(pos_y_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_y_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_tf = tf.concat([pos_y_tf, pos_x_tf], axis=-1)
            pos_tf = self.token_projection(pos_tf)
    
            return pos_tf
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"token_projection": self.token_projection, "scale": self.scale,
                                "temperature": self.temperature, "hidden_dim": self.hidden_dim,
                                "dim": self.dim, "eps": self.eps})
            return base_config
    
    
    def EdgeNeXt(input_shape=(256, 256, 3), depths=[3, 3, 9, 3], dims=[24, 48, 88, 168],
                 global_block=[0, 0, 0, 3], global_block_type=['None', 'None', 'None', 'SDTA'],
                 drop_path_rate=1, layer_scale_init_value=1e-6, head_init_scale=1., expan_ratio=4,
                 kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False],
                 use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], epsilon=1e-6, model_name='EdgeNeXt'):
        inputs = keras.layers.Input(input_shape, batch_size=2)
    
        nn = conv2d_no_bias(inputs, dims[0], kernel_size=4, strides=4, padding="valid", name="stem_")
        nn = layer_norm(nn, epsilon=epsilon, name='stem_')
    
        drop_connect_rates = tf.linspace(0, stop=drop_path_rate, num=int(
            sum(depths)))  # drop_connect_rates_split(num_blocks, start=0.0, end=drop_connect_rate)
        cur = 0
        for i in range(4):
            for j in range(depths[i]):
                if j > depths[i] - global_block[i] - 1:
                    if global_block_type[i] == 'SDTA':
                        SDTA_encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                     expan_ratio=expan_ratio, scales=d2_scales[i],
                                     use_pos_emb=use_pos_embd_xca[i], num_heads=heads[i], name='stage_'+str(i)+'_SDTA_encoder_'+str(j))(nn)
                    else:
                        raise NotImplementedError
                else:
                    if i != 0 and j == 0:
                        nn = layer_norm(nn, epsilon=epsilon, name='stage_' + str(i) + '_')
                        nn = conv2d_no_bias(nn, dims[i], kernel_size=2, strides=2, padding="valid",
                                            name='stage_' + str(i) + '_')
    
                    Conv_Encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                 layer_scale_init_value=layer_scale_init_value,
                                 expan_ratio=expan_ratio, kernel_size=kernel_sizes[i], name='stage_'+str(i)+'_Conv_Encoder_'+str(j) + '_')(nn)  # drop_connect_rates[cur + j]
    
        model = keras.models.Model(inputs, nn, name=model_name)
        return model
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class Conv_Encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4, kernel_size=7, epsilon=1e-6,
                     name=''):
    
            super(Conv_Encoder, self).__init__()
            self.encoder_name = name
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_path = drop_path
            self.dim = dim
            self.expan_ratio = expan_ratio
            self.kernel_size = kernel_size
            self.epsilon = epsilon
    
        def __call__(self, x, *args, **kwargs):
            inputs = x
            x = keras.layers.Conv2D(self.dim, kernel_size=self.kernel_size, padding="SAME", name=self.encoder_name +'Conv2D')(x)
            x = layer_norm(x, epsilon=self.epsilon, name=self.encoder_name)
            x = keras.layers.Dense(self.expan_ratio * self.dim)(x)
            x = activation_by_name(x, activation="gelu")
            x = keras.layers.Dense(self.dim)(x)
            if self.gamma is not None:
                x = self.gamma * x
    
            x = inputs + drop_block(x, drop_rate=0.)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"gamma": self.gamma, "drop_path": self.drop_path,
                                "dim": self.dim, "expan_ratio": self.expan_ratio,
                                "kernel_size": self.kernel_size})
            return base_config
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class SDTA_encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4,
                     use_pos_emb=True, num_heads=8, qkv_bias=True, attn_drop=0., drop=0., scales=1, zero_gamma=False,
                     activation='gelu', use_bias=False, name='sdf'):
            super(SDTA_encoder, self).__init__()
            self.expan_ratio = expan_ratio
            self.width = max(int(math.ceil(dim / scales)), int(math.floor(dim // scales)))
            self.width_list = [self.width] * (scales - 1)
            self.width_list.append(dim - self.width * (scales - 1))
            self.dim = dim
            self.scales = scales
            if scales == 1:
                self.nums = 1
            else:
                self.nums = scales - 1
            self.pos_embd = None
            if use_pos_emb:
                self.pos_embd = PositionalEncodingFourier(dim=dim)
            self.xca = XCA(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)
            self.gamma_xca = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                         name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_rate = drop_path
            self.drop_path = keras.layers.Dropout(drop_path)
            gamma_initializer = tf.zeros_initializer() if zero_gamma else tf.ones_initializer()
            self.norm = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                        name=name and name + "ln")
            self.norm_xca = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                            name=name and name + "norm_xca")
            self.activation = activation
            self.use_bias = use_bias
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"width": self.width, "dim": self.dim,
                                "nums": self.nums, "pos_embd": self.pos_embd,
                                "xca": self.xca, "gamma_xca": self.gamma_xca,
                                "gamma": self.gamma, "norm": self.norm,
                                "activation": self.activation, "use_bias": self.use_bias,
                                })
            return base_config
    
        def __call__(self, inputs, *args, **kwargs):
            x = inputs
            spx = tf.split(inputs, self.width_list, axis=-1)
            for i in range(self.nums):
                if i == 0:
                    sp = spx[i]
                else:
                    sp = sp + spx[i]
                sp = keras.layers.Conv2D(self.width, kernel_size=3, padding='SAME')(sp)  # , groups=self.width
                if i == 0:
                    out = sp
                else:
                    out = tf.concat([out, sp], -1)
            inputs = tf.concat([out, spx[self.nums]], -1)
    
            # XCA
            B, H, W, C = inputs.shape
            inputs = tf.reshape(inputs, (-1, H * W, C))  # tf.transpose(), perm=[0, 2, 1])
    
            if self.pos_embd:
                pos_encoding = tf.reshape(self.pos_embd(B, H, W), (-1, H * W, C))
                inputs += pos_encoding
    
            if self.gamma_xca is not None:
                inputs = self.gamma_xca * inputs
            input_xca = self.gamma_xca * self.xca(self.norm_xca(inputs))
            inputs = inputs + drop_block(input_xca, drop_rate=self.drop_rate, name="SDTA_encoder_")
            inputs = tf.reshape(inputs, (-1, H, W, C))
    
            # Inverted Bottleneck
            inputs = self.norm(inputs)
            inputs = keras.layers.Conv2D(self.expan_ratio * self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            inputs = activation_by_name(inputs, activation=self.activation)
            inputs = keras.layers.Conv2D(self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            if self.gamma is not None:
                inputs = self.gamma * inputs
    
            x = x + self.drop_path(inputs)
            return x
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class XCA(keras.layers.Layer):
        def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0., name=""):
            super(XCA, self).__init__()
            self.num_heads = num_heads
            self.temperature = tf.Variable(tf.ones(num_heads, 1, 1), trainable=True, name=name + 'gamma')
    
            self.qkv = keras.layers.Dense(dim * 3, use_bias=qkv_bias)
            self.attn_drop = keras.layers.Dropout(attn_drop)
            self.k_ini = initializers.GlorotUniform()
            self.b_ini = initializers.Zeros()
            self.proj = keras.layers.Dense(dim, name="out",
                                           kernel_initializer=self.k_ini, bias_initializer=self.b_ini)
            self.proj_drop = keras.layers.Dropout(proj_drop)
    
        def __call__(self, inputs, training=None, *args, **kwargs):
            input_shape = inputs.shape
            qkv = self.qkv(inputs)
            qkv = tf.reshape(qkv, (input_shape[0], input_shape[1], 3,
                                   self.num_heads,
                                   input_shape[2] // self.num_heads))  # [batch, hh * ww, 3, num_heads, dims_per_head]
            qkv = tf.transpose(qkv, perm=[2, 0, 3, 4, 1])  # [3, batch, num_heads, dims_per_head, hh * ww]
            query, key, value = tf.split(qkv, 3, axis=0)  # [batch, num_heads, dims_per_head, hh * ww]
    
            norm_query, norm_key = tf.nn.l2_normalize(tf.squeeze(query), axis=-1, epsilon=1e-6), \
                                   tf.nn.l2_normalize(tf.squeeze(key), axis=-1, epsilon=1e-6)
            attn = tf.matmul(norm_query, norm_key, transpose_b=True)
            attn = tf.transpose(tf.transpose(attn, perm=[0, 2, 3, 1]) * self.temperature, perm=[0, 3, 2, 1])
    
            attn = tf.nn.softmax(attn, axis=-1)
            attn = self.attn_drop(attn, training=training)  # [batch, num_heads, hh * ww, hh * ww]
    
            x = tf.matmul(attn, value)  # [batch, num_heads, hh * ww, dims_per_head]
            x = tf.reshape(x, [input_shape[0], input_shape[1], input_shape[2]])
    
            x = self.proj(x)
            x = self.proj_drop(x)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"num_heads": self.num_heads, "temperature": self.temperature,
                                "qkv": self.qkv, "attn_drop": self.attn_drop,
                                "proj": self.proj, "proj_drop": self.proj_drop})
            return base_config
    
    
    def edgenext_xx_small(pretrained=False, **kwargs):
        # 1.33M & 260.58M @ 256 resolution
        # 71.23% Top-1 accuracy
        # No AA, Color Jitter=0.4, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=51.66 versus 47.67 for MobileViT_XXS
        # For A100: FPS @ BS=1: 212.13 & @ BS=256: 7042.06 versus FPS @ BS=1: 96.68 & @ BS=256: 4624.71 for MobileViT_XXS
        model = EdgeNeXt(depths=[2, 2, 6, 2], dims=[24, 48, 88, 168], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_x_small(pretrained=False, **kwargs):
        # 2.34M & 538.0M @ 256 resolution
        # 75.00% Top-1 accuracy
        # No AA, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=31.61 versus 28.49 for MobileViT_XS
        # For A100: FPS @ BS=1: 179.55 & @ BS=256: 4404.95 versus FPS @ BS=1: 94.55 & @ BS=256: 2361.53 for MobileViT_XS
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[32, 64, 100, 192], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_small(pretrained=False, **kwargs):
        # 5.59M & 1260.59M @ 256 resolution
        # 79.43% Top-1 accuracy
        # AA=True, No Mixup & Cutmix, DropPath=0.1, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=20.47 versus 18.86 for MobileViT_S
        # For A100: FPS @ BS=1: 172.33 & @ BS=256: 3010.25 versus FPS @ BS=1: 93.84 & @ BS=256: 1785.92 for MobileViT_S
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    if __name__ == '__main__':
        model = edgenext_small()
        model.summary()
        # from download_and_load import keras_reload_from_torch_model
        # keras_reload_from_torch_model(
        #     'D:\GitHub\EdgeNeXt\edgenext_small.pth',
        #     keras_model=model,
        #     # tail_align_dict=tail_align_dict,
        #     # full_name_align_dict=full_name_align_dict,
        #     # additional_transfer=additional_transfer,
        #     input_shape=(256, 256),
        #     do_convert=True,
        #     save_name="adaface_ir101_webface4m.h5",
        # )
    
    
    
    ```
    
    
    
    
    
    opened by whalefa1I 5
  • custom layer issue at tflite conversion

    custom layer issue at tflite conversion

    Hi, thanks for the good references.

    I have implemented MobileViT with your package, and tried to convert the trained model into tflite format. At there, I met an error saying,

    Unknown layer: Addons>GroupNormalization. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details
    

    I tried to addd custom layer name as a parameter of model load, but still facing the issue.

    model = tf.keras.models.load_model('./checkpoints/model_best.h5', custom_objects={'AttentionLayer': AttentionLayer})
    

    Is there any way to solve this?

    Thanks,

    bug good first issue 
    opened by mhyeonsoo 4
  • coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    Hi,

    I try to train a model with:

        model = coat.CoaTMini(input_shape=(200, 240, 1), num_classes=240, pretrained=None)
    

    but the model cannot be build, error out with:

    ValueError: Exception encountered when calling layer "tf.__operators__.add" (type TFOpLambda).
    
    Dimensions must be equal, but are 730 and 677 for '{{node tf.__operators__.add/AddV2}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [?,730,216], [?,677,216].
    
    Call arguments received:
      • x=tf.Tensor(shape=(None, 730, 216), dtype=float32)
      • y=tf.Tensor(shape=(None, 677, 216), dtype=float32)
      • name=None
    

    I just wonder something wrong with coat?

    Thanks.

    bug enhancement 
    opened by mw66 4
  • Can you provide the code for converting pytorch weights to tf?

    Can you provide the code for converting pytorch weights to tf?

    Hi. Can you provide the code for converting pytorch weights to tf, such as beit. Because I wanted to try the effect of beitv2's pre-training weights. Thanks!

    opened by 131404060321 1
  • tflite conversion - GPU/XNNPACK fails

    tflite conversion - GPU/XNNPACK fails

    Hi! Thanks for great repo! I have converted the EfficientFormer model to tflite. However, applying both XNNPACK and GPU delegates fail.

    GPU delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite delegate for GPU. Failed to apply GPU delegate. Benchmarking failed.

    XNNPACK delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Failed to apply XNNPACK delegate. Benchmarking failed.

    Do you know what could be the issue? Im using latest tensorflow version for conversion.

    opened by macsmy 3
Releases(yolov7)
Neural HMMs are all you need (for high-quality attention-free TTS)

Neural HMMs are all you need (for high-quality attention-free TTS) Shivam Mehta, Éva Székely, Jonas Beskow, and Gustav Eje Henter This is the official

Shivam Mehta 0 Oct 28, 2022
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022
The code repository for "PyCIL: A Python Toolbox for Class-Incremental Learning" in PyTorch.

PyCIL: A Python Toolbox for Class-Incremental Learning Introduction • Methods Reproduced • Reproduced Results • How To Use • License • Acknowledgement

Fu-Yun Wang 258 Dec 31, 2022
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

Hans Brouwer 33 Jan 05, 2023
Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021]

Deep Video Matting via Spatio-Temporal Alignment and Aggregation [CVPR2021] Paper: https://arxiv.org/abs/2104.11208 Introduction Despite the significa

76 Dec 07, 2022
An original implementation of "Noisy Channel Language Model Prompting for Few-Shot Text Classification"

Channel LM Prompting (and beyond) This includes an original implementation of Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer. "Noisy Cha

Sewon Min 92 Jan 07, 2023
Automatic differentiation with weighted finite-state transducers.

GTN: Automatic Differentiation with WFSTs Quickstart | Installation | Documentation What is GTN? GTN is a framework for automatic differentiation with

100 Dec 29, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 09, 2022
A collection of semantic image segmentation models implemented in TensorFlow

A collection of semantic image segmentation models implemented in TensorFlow. Contains data-loaders for the generic and medical benchmark datasets.

bobby 16 Dec 06, 2019
Udacity's CS101: Intro to Computer Science - Building a Search Engine

Udacity's CS101: Intro to Computer Science - Building a Search Engine All soluti

Phillip 0 Feb 26, 2022
More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval

More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdh

Ayan Kumar Bhunia 22 Aug 27, 2022
QuanTaichi evaluation suite

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021) Yuanming Hu, Jiafeng Liu, Xuanda Yang, Mingkuan Xu, Ye Kuang, Weiwei Xu, Qiang Dai, W

Taichi Developers 120 Jan 04, 2023
The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

The trained model and denoising example for paper : Cardiopulmonary Auscultation Enhancement with a Two-Stage Noise Cancellation Approach

ycj_project 1 Jan 18, 2022
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 865 Nov 17, 2022
[WACV 2022] Contextual Gradient Scaling for Few-Shot Learning

CxGrad - Official PyTorch Implementation Contextual Gradient Scaling for Few-Shot Learning Sanghyuk Lee, Seunghyun Lee, and Byung Cheol Song In WACV 2

Sanghyuk Lee 4 Dec 05, 2022
YouRefIt: Embodied Reference Understanding with Language and Gesture

YouRefIt: Embodied Reference Understanding with Language and Gesture YouRefIt: Embodied Reference Understanding with Language and Gesture by Yixin Che

16 Jul 11, 2022
Material del curso IIC2233 Programación Avanzada 📚

Contenidos Los contenidos se organizan según la semana del semestre en que nos encontremos, y según la semana que se destina para su estudio. Los cont

IIC2233 @ UC 72 Dec 23, 2022
This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

SPARQLing Database Queries from Intermediate Question Decompositions This repo is the implementation of the following paper: SPARQLing Database Querie

Yandex Research 20 Dec 19, 2022
HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

HeatNet HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events glob

Google Research 6 Jul 07, 2022
[CVPR 2022] Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement

Back To Reality: Weak-supervised 3D Object Detection with Shape-guided Label Enhancement Announcement 🔥 We have not tested the code yet. We will fini

Xiuwei Xu 7 Oct 30, 2022