Happywhale - Whale and Dolphin Identification Silver🥈 Solution (26/1588)

Overview

Kaggle-Happywhale

Happywhale - Whale and Dolphin Identification Silver 🥈 Solution (26/1588)

竞赛方案思路

  1. 图像数据预处理-标志性特征图片裁剪:首先根据开源的标注数据训练YOLOv5x6目标检测模型,将训练集与测试集数据裁剪出背鳍或者身体部分;
  2. 背鳍图片特征提取模型:将训练集数据划分为训练与验证两部分,训练 EfficientNet_B6 / EfficientNet_V2_L / NFNet_L2 (backone)三个模型,并且都加上了GeM Pooling 和 Arcface 损失函数,有效增强类内紧凑度和类间分离度;
  3. 聚类与排序:利用最终训练完成的backone模型分别提取训练集与测试集的嵌入特征,所有模型都会输出一个512维的Embedding,将这些特征 concatenated 后获得了一个 512×9=4608 维的特征向量,将训练集的嵌入特征融合后训练KNN模型,然后推断测试集嵌入特征距离,排序获取top5类别,作为预测结果,最后使用new_individual替换进行后处理,得到了top2%的成绩。

Model

class HappyWhaleModel(nn.Module):
    def __init__(self, model_name, embedding_size, pretrained=True):
        super(HappyWhaleModel, self).__init__()
        self.model = timm.create_model(model_name, pretrained=pretrained) 

        if 'efficientnet' in model_name:
            in_features = self.model.classifier.in_features
            self.model.classifier = nn.Identity()
            self.model.global_pool = nn.Identity()
        elif 'nfnet' in model_name:
            in_features = self.model.head.fc.in_features
            self.model.head.fc = nn.Identity()
            self.model.head.global_pool = nn.Identity()

        self.pooling = GeM() 
        self.embedding = nn.Sequential(
                            nn.BatchNorm1d(in_features),
                            nn.Linear(in_features, embedding_size)
                            )
        # arcface
        self.fc = ArcMarginProduct(embedding_size,
                                   CONFIG["num_classes"], 
                                   s=CONFIG["s"],
                                   m=CONFIG["m"], 
                                   easy_margin=CONFIG["easy_margin"], 
                                   ls_eps=CONFIG["ls_eps"]) 

    def forward(self, images, labels):
        features = self.model(images)  
        pooled_features = self.pooling(features).flatten(1)
        embedding = self.embedding(pooled_features) # embedding
        output = self.fc(embedding, labels) # arcface
        return output
    
    def extract(self, images):
        features = self.model(images) 
        pooled_features = self.pooling(features).flatten(1)
        embedding = self.embedding(pooled_features) # embedding
        return embedding

ArcFace

# Arcface
class ArcMarginProduct(nn.Module):
    r"""Implement of large margin arc distance: :
        Args:
            in_features: size of each input sample
            out_features: size of each output sample
            s: norm of input feature
            m: margin
            cos(theta + m)
        """
    def __init__(self, in_features, out_features, s=30.0, 
                 m=0.50, easy_margin=False, ls_eps=0.0):
        super(ArcMarginProduct, self).__init__()
        self.in_features = in_features 
        self.out_features = out_features 
        self.s = s
        self.m = m 
        self.ls_eps = ls_eps 
        self.weight = nn.Parameter(torch.FloatTensor(out_features, in_features))
        nn.init.xavier_uniform_(self.weight)

        self.easy_margin = easy_margin
        self.cos_m = math.cos(m) # cos margin
        self.sin_m = math.sin(m) # sin margin
        self.threshold = math.cos(math.pi - m) # cos(pi - m) = -cos(m)
        self.mm = math.sin(math.pi - m) * m # sin(pi - m)*m = sin(m)*m

    def forward(self, input, label):
        # --------------------------- cos(theta) & phi(theta) ---------------------
        cosine = F.linear(F.normalize(input), F.normalize(self.weight)) 
        sine = torch.sqrt(1.0 - torch.pow(cosine, 2)) 
        phi = cosine * self.cos_m - sine * self.sin_m # cosθ*cosm – sinθ*sinm = cos(θ + m)
        phi = phi.float() # phi to float
        cosine = cosine.float() # cosine to float
        if self.easy_margin:
            phi = torch.where(cosine > 0, phi, cosine)
        else:
            # if cos(θ) > cos(pi - m) means θ + m < math.pi, so phi = cos(θ + m);
            # else means θ + m >= math.pi, we use Talyer extension to approximate the cos(θ + m).
            # if fact, cos(θ + m) = cos(θ) - m * sin(θ) >= cos(θ) - m * sin(math.pi - m)
            phi = torch.where(cosine > self.threshold, phi, cosine - self.mm)
            
        # https://github.com/ronghuaiyang/arcface-pytorch/issues/48
        # --------------------------- convert label to one-hot ---------------------
        # one_hot = torch.zeros(cosine.size(), requires_grad=True, device='cuda')
        one_hot = torch.zeros(cosine.size(), device=CONFIG['device'])
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        # label smoothing
        if self.ls_eps > 0:
            one_hot = (1 - self.ls_eps) * one_hot + self.ls_eps / self.out_features
        # -------------torch.where(out_i = {x_i if condition_i else y_i) ------------
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)  
        output *= self.s

        return output

冲榜历程

  1. 使用Yolov5切分 fullbody数据 和 backfins数据;
  2. 使用小模型tf_efficientnet_b0_ns + ArcFace 作为 Baseline,训练fullbody 512size, 使用kNN 搜寻,搭建初步的pipeline,Public LB : 0.729;
  3. 加入new_individual后处理,Public LB : 0.742;
  4. 使用fullbody 768size图像,并调整了数据增强, Public LB : 0.770;
  5. 训练 tf_efficientnet_b6_ns ,以及上述所有功能微调,Public LB:0.832;
  6. 训练 tf_efficientnetv2_l_in21k,以及上述所有功能微调,Public LB:0.843;
  7. 训练 eca_nfnet_l2,以及上述所有功能微调,Public LB:0.854;
  8. 将上述三个模型的5Fold,挑选cv高的,进行融合,Public LB:0.858;

代码、数据集

  • 代码

    • Happywhale_crop_image.ipynb # 裁切fullbody数据和backfin数据
    • Happywhale_train.ipynb # 训练代码 (最低要求GPU显存不小于12G)
    • Happywhale_infernce.ipynb # 推理代码以及kNN计算和后处理
  • 数据集

写在后面

感谢我的队友徐哥和他的3090们 🤣

Owner
Franxx
Franxx
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Video Object Segmentation Language as Queries for Referring Video Object S

Jonas Wu 232 Dec 29, 2022
Migration of Edge-based Distributed Federated Learning

FedFly: Towards Migration in Edge-based Distributed Federated Learning About the research Due to mobility, a device participating in Federated Learnin

qub-blesson 11 Nov 13, 2022
Multi-Output Gaussian Process Toolkit

Multi-Output Gaussian Process Toolkit Paper - API Documentation - Tutorials & Examples The Multi-Output Gaussian Process Toolkit is a Python toolkit f

GAMES 113 Nov 25, 2022
Face Alignment using python

Face Alignment Face Alignment using python Input Image Aligned Face Aligned Face Aligned Face Input Image Aligned Face Input Image Aligned Face Instal

Sajjad Aemmi 28 Nov 23, 2022
Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion

Improving Generalization Bounds for VC Classes Using the Hypergeometric Tail Inversion Preface This directory provides an implementation of the algori

Jean-Samuel Leboeuf 0 Nov 03, 2021
Pydantic models for pywttr and aiopywttr.

Pydantic models for pywttr and aiopywttr.

Almaz 2 Dec 08, 2022
AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人

paddle-wechaty-Zodiac AI创造营 :Metaverse启动机之重构现世,结合PaddlePaddle 和 Wechaty 创造自己的聊天机器人 12星座若穿越科幻剧,会拥有什么超能力呢?快来迎接你的专属超能力吧! 现在很多年轻人都喜欢看科幻剧,像是复仇者系列,里面有很多英雄、超

105 Dec 22, 2022
The CLRS Algorithmic Reasoning Benchmark

Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms.

DeepMind 251 Jan 05, 2023
Architecture Patterns with Python (TDD, DDD, EDM)

architecture-traning Architecture Patterns with Python (TDD, DDD, EDM) Chapter 5. 높은 기어비와 낮은 기어비의 TDD 5.2 도메인 계층 테스트를 서비스 계층으로 옮겨야 하는가? 도메인 계층 테스트 def

minsung sim 2 Mar 04, 2022
Official implementation for CVPR 2021 paper: Adaptive Class Suppression Loss for Long-Tail Object Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection This repo is the official implementation for CVPR 2021 paper: Adaptive Class Suppressio

CASIA-IVA-Lab 67 Dec 04, 2022
UltraGCN: An Ultra Simplification of Graph Convolutional Networks for Recommendation

UltraGCN This is our Pytorch implementation for our CIKM 2021 paper: Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, Xiuqiang He. UltraGCN: A

XUEPAI 93 Jan 03, 2023
Temporally Efficient Vision Transformer for Video Instance Segmentation, CVPR 2022, Oral

Temporally Efficient Vision Transformer for Video Instance Segmentation Temporally Efficient Vision Transformer for Video Instance Segmentation (CVPR

Hust Visual Learning Team 203 Dec 31, 2022
Age and Gender prediction using Keras

cnn_age_gender Age and Gender prediction using Keras Dataset example : Description : UTKFace dataset is a large-scale face dataset with long age span

XN3UR0N 58 May 03, 2022
Bald-to-Hairy Translation Using CycleGAN

GANiry: Bald-to-Hairy Translation Using CycleGAN Official PyTorch implementation of GANiry. GANiry: Bald-to-Hairy Translation Using CycleGAN, Fidan Sa

Fidan Samet 10 Oct 27, 2022
Code for the paper "M2m: Imbalanced Classification via Major-to-minor Translation" (CVPR 2020)

M2m: Imbalanced Classification via Major-to-minor Translation This repository contains code for the paper "M2m: Imbalanced Classification via Major-to

79 Oct 13, 2022
Parameterising Simulated Annealing for the Travelling Salesman Problem

Parameterising Simulated Annealing for the Travelling Salesman Problem

Gary Sun 55 Jun 15, 2022
A gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor.

OpenHands OpenHands is a gesture recognition system powered by OpenPose, k-nearest neighbours, and local outlier factor. Currently the system can iden

Paul Treanor 12 Jan 10, 2022
Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

Yosep 1 Nov 15, 2021
Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

Alex Nichol 101 Dec 16, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Sarosij Bose 2 Jan 23, 2022