STFT_Transformer

Code for STFT Transformer used in BirdCLEF 2021 competition.

The STFT Transformer is a new way to use Transformers similar to Vision Transformers on audio data. It has been developed for the BirdCLEF 2021 competition hosted on Kaggle. The pdf document gives more context. It has been submitted to the BIRDCLEF 2021 workshop.

The code is provided as is, it has not been rewritten. Given competitions are done in a hurry, code may not meet usual open source standard.

The code assumes this directory structure:

<base_dir>/code

<base_dir>/input

<base_dir>/input/freefield1010

<base_dir>/checkpoints

<base_dir>/data

Code has to be run in the code directory. Competition data has to be downloaded in the input directory. freefield1010 data must also be downloaded in the freefield1010 directory. data_final.py should be run first. It reads audio files from input and stores the relevant part in data directory as numpy files.

Then stft_transformer_final.py can be run to train one fold model. During the competition I ran 5 folds, by editing the FOLD global variable in the script (I know, this is sub standard).

Once all 5 models are trained one can upload the weights to a kaggle dataset and use the submission notebook I used. This should get a score worth the 15th rank in the competition. Achieving this rank with a single model is significant, as all top teams used an ensemble of models.

Code for STFT Transformer used in BirdCLEF 2021 competition.

Related tags

Overview

STFT_Transformer

Owner

Jean-François Puget

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

A public available dataset for road boundary detection in aerial images

利用python脚本实现微信、支付宝账单的合并，并保存到excel文件实现自动记账，可查看可视化图表。

Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

Open-Ended Commonsense Reasoning (NAACL 2021)

Official Pytorch implementation for AAAI2021 paper (RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning)

Image Restoration Using Swin Transformer for VapourSynth

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

Automatic library of congress classification, using word embeddings from book titles and synopses.

[NeurIPS-2021] Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

A list of all papers and resoureces on Semantic Segmentation

OMAMO: orthology-based model organism selection

PyTorch implementation of Densely Connected Time Delay Neural Network

A spatial genome aligner for analyzing multiplexed DNA-FISH imaging data.

Hierarchical-Bayesian-Defense - Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference (Openreview)

This is an open solution to the Home Credit Default Risk challenge 🏡

Moment-DETR code and QVHighlights dataset

magiCARP: Contrastive Authoring+Reviewing Pretraining

Official Repository for Machine Learning class - Physics Without Frontiers 2021