Data Competition: automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly

Overview

Table of contents

  1. Introduction
  2. Dataset
  3. Model & Metrics
  4. How to Run

DATA COMPETITION

The COVID-19 pandemic, which is caused by the SARS-CoV-2 virus, is still continuing strong, infecting hundreds of millions of people and killing millions. Face masks reduce transmission by preventing aerosols and droplets from spreading too far into the atmosphere. As a result, there is a growing demand for automated systems that can detect whether people are not wearing masks or are wearing masks incorrectly. This competition was designed in order to solve the problem mentioned above. This competition is unlike any other that has come before it. With a fixed model, participants will receive model code and configuration code that organizers use to train models. The candidate's task is to use data processing and generation techniques to improve the model's performance, then submit the dataset to the organizing team for training and evaluation on the private test set. The winner is the team with the highest score on the private test set.

Dataset

  • A dataset of 1100 images will be sent to you. This is an object detection dataset consisting of employee images at the office. The dataset has been assigned 3 labels by us which are no mask, mask, and incorrect mask, with the numbers 0,1,2 corresponding to each.

  • The dataset has been divided into three parts for you: train, valid, and public test. We have prepared a private test to be able to evaluate the candidate's model. This private test will be made public after the contest ends. In the public test, you can get a basic idea of the private test. Download the dataset here

  • To improve the model's performance, you can re-label it and employ data augmentation to generate more images (up to 3000).

The number of each label in each part is shown below:

No mask Mask incorrect mask
Train 308 882 51
Val 97 190 9
Public_test 47 95 13

Model & Metrics

  • The challenge is defined as object detection challenge. In the competition, We use YOLOv5s and also use a pre-trained model trained with easy mask dataset to greatly reduce training time.

  • We fix all hyperparameters of the model and do not use any augmentation tips in the source code. Therefore, each participant need to build the best possible dataset by relabeling incorrect labels, splitting train/val, augmentation tips, adding new dataset, etc.

  • In training process, Early Stopping method with patience setten to 100 iterations is used to keep track of validation set's [email protected]. Detail about [email protected] metric:

[email protected] = [email protected] = 0.2 * AP50_w + 0.3 * AP50_nw + 0.5 * AP50_wi

Where,
AP50_w: AP50 on valid mask boxes
AP50_nw: AP50 on non-mask boxes
AP50_wi: AP50 on invalid mask boxes

  • The [email protected] metric is also used as the main metric to evaluate participant's submission on private testing set.

How to Run

QuickStart

Click the image below

Open In Colab

Install requirements

  • All requirements are included in requirements.txt

  • Run the script below to clone and install all requirements

git clone https://github.com/fsoft-ailab/Data-Competition
cd Data-Competition
pip3 install -r requirements.txt

Training

  • Put your dataset into the Data-Competition folder. The structure of dataset folder is followed as folder structure below:
folder-name
├── images
│   ├── train
│   │   ├── train_img1.jpg
│   │   ├── train_img2.jpg
│   │   └── ...
│   │   
│   └── val
│       ├── val_img1.jpg
│       ├── val_img2.jpg
│       └── ...
│   
└── labels
    ├── train
    │   ├── train_img1.txt
    │   ├── train_img2.txt
    │   └── ...
    │   
    └── val
        ├── val_img1.txt
        ├── val_img2.txt
        └── ...
  • Change relative paths to train and val images folder in config/data_cfg.yaml file

  • train_cfg.yaml where we set up the model during training. You should not change such hyperparameters because it will result in incorrect results. The training results are saved in the results/train/ .

  • Run the script below to train the model. Specify particular name to identify your experiment:

python3 train.py --batch-size 64 --device 0 --name 
    

   

Note: If you get out of memory error, you can decrease batch-size to multiple of 2 as 32, 16.

Evaluation

  • Run script below to evaluate on particular dataset.
  • The --task's value is only one of train, val, or test, respectively evaluating on the training set, validation set, or public testing set.
  • Note: Specify relative path to images folder which you evaluate in config/data_cfg.yaml file.
python3 val.py --weights 
   
     --task test --name 
    
      --batch-size 64 --device 0
                                                 val
                                                 train

    
   
  • Results are saved at results/evaluate/ / .

Detection

  • You can use this script to make inferences on particular folder

  • Results are saved at .

python3 detect.py --weights 
   
     --source 
    
      --dir 
     
       --device 0

     
    
   
  • You can find more default arguments at detect.py

References

Owner
Thanh Dat Vu
Thanh Dat Vu
CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological images.

cleanX CleanX is an open source python library for exploring, cleaning and augmenting large datasets of X-rays, or certain other types of radiological

Candace Makeda Moore, MD 20 Jan 05, 2023
A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

xwrf A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of

National Center for Atmospheric Research 43 Nov 29, 2022
Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Spectacular AI SDK examples Spectacular AI SDK fuses data from cameras and IMU sensors (accelerometer and gyroscope) and outputs an accurate 6-degree-

Spectacular AI 94 Jan 04, 2023
Statistical Rethinking course winter 2022

Statistical Rethinking (2022 Edition) Instructor: Richard McElreath Lectures: Uploaded Playlist and pre-recorded, two per week Discussion: Online, F

Richard McElreath 3.9k Dec 31, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

PATC: Introduction to Big Data Analytics. Practical Data Analytics for Solving Real World Problems

1 Feb 07, 2022
Tokyo 2020 Paralympics, Analytics

Tokyo 2020 Paralympics, Analytics Thanks for checking out my app! It was built entirely using matplotlib and Tokyo 2020 Paralympics data. This applica

Petro Ivaniuk 1 Nov 18, 2021
A Python package for Bayesian forecasting with object-oriented design and probabilistic models under the hood.

Disclaimer This project is stable and being incubated for long-term support. It may contain new experimental code, for which APIs are subject to chang

Uber Open Source 1.6k Dec 29, 2022
This module is used to create Convolutional AutoEncoders for Variational Data Assimilation

VarDACAE This module is used to create Convolutional AutoEncoders for Variational Data Assimilation. A user can define, create and train an AE for Dat

Julian Mack 23 Dec 16, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
Tools for working with MARC data in Catalogue Bridge.

catbridge_tools Tools for working with MARC data in Catalogue Bridge. Borrows heavily from PyMarc

1 Nov 11, 2021
Employee Turnover Analysis

Employee Turnover Analysis Submission to the DataCamp competition "Can you help reduce employee turnover?"

Jannik Wiedenhaupt 1 Feb 13, 2022
This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

COSMETICS GENERATOR This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics! Remember to put the l

ᴅᴊʟᴏʀ3xᴢᴏ 11 Dec 13, 2022
MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI Hallo

Florent Zahoui 1 Feb 07, 2022
The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science is among the first European programs in Data Science and is fully focused on data engineering and data a

Amir Ali 2 Jun 17, 2022
Extract data from a wide range of Internet sources into a pandas DataFrame.

pandas-datareader Up to date remote data access for pandas, works for multiple versions of pandas. Installation Install using pip pip install pandas-d

Python for Data 2.5k Jan 09, 2023
Anomaly Detection with R

AnomalyDetection R package AnomalyDetection is an open-source R package to detect anomalies which is robust, from a statistical standpoint, in the pre

Twitter 3.5k Dec 27, 2022
Import, connect and transform data into Excel

xlwings_query Import, connect and transform data into Excel. Description The concept is to apply data transformations to a main query object. When the

George Karakostas 1 Jan 19, 2022
In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift.

ETL Pipeline for AWS Project Description In this project, ETL pipeline is build on data warehouse hosted on AWS Redshift. The data is loaded from S3 t

Mobeen Ahmed 1 Nov 01, 2021
Methylation/modified base calling separated from basecalling.

Remora Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs s

Oxford Nanopore Technologies 72 Jan 05, 2023