PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

Last update: Dec 13, 2022

Overview

hierarchical-multi-label-text-classification-pytorch

Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach

This repository is a PyTorch implementation made with reference to this research project.

The main objective of the project is to solve the hierarchical multi-label text classification (HMTC) problem. Different from the multi-label text classification, HMTC assigns each instance (object) into multiple categories and these categories are stored in a hierarchy structure, is a fundamental but challenging task of numerous applications.

Introduction

Many real-world applications organize data in a hierarchical structure, where classes are specialized into subclasses or grouped into superclasses. For example, an electronic document (e.g. web-pages, digital libraries, patents and e-mails) is associated with multiple categories and all these categories are stored hierarchically in a tree or Direct Acyclic Graph (DAG).

It provides an elegant way to show the characteristics of data and a multi-dimensional perspective to tackle the classification problem via hierarchy structure.

The Figure shows an example of predefined labels in hierarchical multi-label classification of documents in patent texts.

Documents are shown as colored rectangles, labels as rounded rectangles.
Circles in the rounded rectangles indicate that the corresponding document has been assigned the label.
Arrows indicate a hierarchical structure between labels.

Data

See data format in data folder which including the data sample files.

Text Segment

You can use jieba package if you are going to deal with the Chinese text data.

Data Format

This repository can be used in other datasets (text classification) in two ways:

Modify your datasets into the same format of the sample.
Modify the data preprocess code in data_helpers.py, data_loader.py.

Anyway, it should depend on what your data and task are.

Pre-trained Word Vectors

~~You can pre-training your word vectors(based on your corpus) in many ways:~~

~~Use gensim package to pre-train data.~~
~~Use glove tools to pre-train data.~~
~~Even can use a fasttext network to pre-train data.~~
This implementation used an embedding layer, but the original paper uses word2vec.

Network Structure

Built with

Python 3.8
Pytorch
Numpy
Sklearn

PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

Related tags

Overview

hierarchical-multi-label-text-classification-pytorch

Introduction

Data

Text Segment

Data Format

Pre-trained Word Vectors

Network Structure

Built with

Owner

Mingu Kang

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Explaining neural decisions contrastively to alternative decisions.

Evaluation suite for large-scale language models.

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology (LMRL Workshop, NeurIPS 2021)

A simplified framework and utilities for PyTorch

LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice,

[ACM MM 2021] TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Hierarchical Motion Encoder-Decoder Network for Trajectory Forecasting (HMNet)

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

Pytorch implementation of the paper Improving Text-to-Image Synthesis Using Contrastive Learning

Course materials for Fall 2021 "CIS6930 Topics in Computing for Data Science" at New College of Florida

LIVECell - A large-scale dataset for label-free live cell segmentation

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

An official TensorFlow implementation of “CLCC: Contrastive Learning for Color Constancy” accepted at CVPR 2021.

PyTorch code for the paper "FIERY: Future Instance Segmentation in Bird's-Eye view from Surround Monocular Cameras"

Traffic4D: Single View Reconstruction of Repetitious Activity Using Longitudinal Self-Supervision

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

Code for Greedy Gradient Ensemble for Visual Question Answering （ICCV 2021, Oral）

EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network