Extreme Dynamic Classifier Chains - XGBoost for Multi-label Classification

Related tags

Deep LearningXDCC
Overview

Extreme Dynamic Classifier Chains

Classifier chains is a key technique in multi-label classification, sinceit allows to consider label dependencies effectively. However, the classifiers arealigned according to a static order of the labels. In the concept of dynamic classifier chains (DCC) the label ordering is chosen for each prediction dynamically depending on the respective instance at hand. We combine this concept with the boosting of extreme gradient boosted trees (XGBoot), an effective and scalable state-of-the-art technique, and incorporate DCC in a fast multi-label extension of XGBoost which we make publicly available. As only positive labels have to be predicted and these are usually only few, the training costs can be further substantially reduced. Moreover, as experiments on ten datasets show, the length of the chain allows for a more control over the usage of previous predictions and hence over the measure one want to optimize,

Installation

The first step requires to build the modified multilabel version of XGBoost and install the resulting python package to build the dynamic chain model. This requires MinGW, i.e. the mingw32-make command, and Python 3. To start the build run the following commands:

cd XGBoost_ML
mingw32-make -j4

After a successful execution the python package can be installed.

cd python-package
python setup.py install

You should now be able to import the package into your Python project:

import xgboost as xgb

Training the Dynamic Chain Model

We recommend running the models by calling train_dcc.py from within a console. Place all datasets as .arff files into the datasets directory. Append -train to the train set and -test to the test set.

Parameters:

The following parameters are available:

Parameter Short Description Required
--filename <string> -f Name of your dataset .arff file located in the datasets sub-directory yes
--num_labels <int> -l Number of Labels in the dataset yes
--models <string> -m Specifies all models that will be build. Available options:
  • dcc: The proposed dynamic chain model
  • sxgb: A single multilabel XGBoost model
  • cc-dcc: A classifier chain with the label order of a previously built dynamic chain
  • cc-freq: A classifier chain with a label order sorted by label frequency (frequent to rare) in the train set
  • cc-rare: A classifier chain with a label order sorted by label frequency (rare to frequent) in the train set
  • cc-rand: A classifier chain with a random label order
  • br: A binary relevance model
example: -m "dc,br"
yes
--validation <int> -v Size of validation set. The first XX% of the train set will be used for validating the model. If the parameter is not set, the test set will be used for evaluation. Example: --validation 20 The frist 20% will be used for evaluation, the last 80% for training. (default: 0) no
--max_depth <int> -d Max depth of each XGBoost multilabel tree (default: 10) no
--num_rounds <int> -r Number of boosting rounds of each XGBoost model (default: 10) no
--chain_length <int> -c Length of the chain. Represents number of labeling-rounds. Each round builds a new XGBoost model that will predict a single label per instance (default: num_labels) no
--split <int> -s Index of split method used for building the trees. Available options:
  • maxGain: 1
  • maxWeight: 2
  • sumGain: 3
  • sumWeight: 4
  • maxAbsGain: 5
  • sumAbsGain: 6
(default: 1)
no
--parameters <string> -p XGBoost parameters used for each model in the chain. Example: -p "{'silent':1, 'eta':0.1}" (default: {}) no
--features_to_transform <string> -t A list of all features in the dataset that have to be encoded. XGBoost can only process numerical features. Use this parameter to encode categorical features. Example: -t "featureA,featureB" no
--output_extra -o Write extended log and json files (default: True) no

Example

We train two models, the dynamic chain and a binary relevance model, on a dataset called emotions with 6 labels. So we specify the models with -m "dc, br" and the dataset with -f "emotions". Additionally we place the files for training and testing into the datasets directory:

project
│   README.md
│   train_dcc.py   
│
└───datasets
│   │   emotions-train.arff
│   │   emotions-test.arff
│   
└───XGBoost_ML
    │   ...

The dcc model should build a full chain with 6 models, so we use -l 6. All XGBoost models, also the one for binary relevance, should train for 100 rounds with a maximum tree depth of 10 and a step size of 0.1. Therefore we add -p "{'eta':0.1}" -r 100 -d 10

The full command to train and evaluate both models is:

 train_dcc.py -p "{'eta':0.1}" -f "emotions" -l 6 -r 100 -d 10 -c 6 -m 'dcc, br'
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 647 Jan 04, 2023
A novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance.

appearance-scanner About This repository is an implementation of the neural network proposed in Free-form Scanning of Non-planar Appearance with Neura

Xiaohe Ma 14 Oct 18, 2022
State-of-the-art data augmentation search algorithms in PyTorch

MuarAugment Description MuarAugment is a package providing the easiest way to a state-of-the-art data augmentation pipeline. How to use You can instal

43 Dec 12, 2022
Atomistic Line Graph Neural Network

Table of Contents Introduction Installation Examples Pre-trained models Quick start using colab JARVIS-ALIGNN webapp Peformances on a few datasets Use

National Institute of Standards and Technology 91 Dec 30, 2022
[ICCV 2021] Group-aware Contrastive Regression for Action Quality Assessment

CoRe Created by Xumin Yu*, Yongming Rao*, Wenliang Zhao, Jiwen Lu, Jie Zhou This is the PyTorch implementation for ICCV paper Group-aware Contrastive

Xumin Yu 31 Dec 24, 2022
Turning SymPy expressions into PyTorch modules.

sympytorch A micro-library as a convenience for turning SymPy expressions into PyTorch Modules. All SymPy floats become trainable parameters. All SymP

Patrick Kidger 89 Dec 13, 2022
Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

BlobGAN: Spatially Disentangled Scene Representations Official PyTorch Implementation Paper | Project Page | Video | Interactive Demo BlobGAN.mp4 This

148 Dec 29, 2022
Python based Advanced AI Assistant

Knick is a virtual artificial intelligence project, fully developed in python. The objective of this project is to develop a virtual assistant that can handle our minor, intermediate as well as heavy

19 Nov 15, 2022
A fast, dataset-agnostic, deep visual search engine for digital art history

imgs.ai imgs.ai is a fast, dataset-agnostic, deep visual search engine for digital art history based on neural network embeddings. It utilizes modern

Fabian Offert 5 Dec 14, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 07, 2022
Gluon CV Toolkit

Gluon CV Toolkit | Installation | Documentation | Tutorials | GluonCV provides implementations of the state-of-the-art (SOTA) deep learning models in

Distributed (Deep) Machine Learning Community 5.4k Jan 06, 2023
Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti

12 Jul 23, 2022
This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

Davide Coccomini 9 Dec 18, 2022
This is the repository for the paper "Have I done enough planning or should I plan more?"

Metacognitive Learning Tool box https://re.is.mpg.de What Is This? This repository contains two modules used to analyse metacognitive learning in huma

0 Dec 01, 2021
YOLO-v5 기반 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adaptive Cruise Control 기능 구현

자율 주행차의 영상 기반 차간거리 유지 개발 Table of Contents 프로젝트 소개 주요 기능 시스템 구조 디렉토리 구조 결과 실행 방법 참조 팀원 프로젝트 소개 YOLO-v5 기반으로 단안 카메라의 영상을 활용해 차간 거리를 일정하게 유지하며 주행하는 Adap

14 Jun 29, 2022
PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"

MIRCO PyTorch implementation for paper: Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation Dependencies Python 3.

Big Data and Multi-modal Computing Group, CRIPAC 9 Dec 08, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

23 Oct 20, 2022
Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

WSDEC This is the official repo for our NeurIPS paper Weakly Supervised Dense Event Captioning in Videos. Description Repo directories ./: global conf

Melon(Xuguang Duan) 96 Nov 01, 2022
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 542 Dec 03, 2022
Implementation of the SUMO (Slim U-Net trained on MODA) model

SUMO - Slim U-Net trained on MODA Implementation of the SUMO (Slim U-Net trained on MODA) model as described in: TODO: add reference to paper once ava

6 Nov 19, 2022