A PyTorch implementation of Deep SAD, a deep Semi-supervised Anomaly Detection method.

Overview

Deep SAD: A Method for Deep Semi-Supervised Anomaly Detection

This repository provides a PyTorch implementation of the Deep SAD method presented in our ICLR 2020 paper ”Deep Semi-Supervised Anomaly Detection”.

Citation and Contact

You find a PDF of the Deep Semi-Supervised Anomaly Detection ICLR 2020 paper on arXiv https://arxiv.org/abs/1906.02694.

If you find our work useful, please also cite the paper:

@InProceedings{ruff2020deep,
  title     = {Deep Semi-Supervised Anomaly Detection},
  author    = {Ruff, Lukas and Vandermeulen, Robert A. and G{\"o}rnitz, Nico and Binder, Alexander and M{\"u}ller, Emmanuel and M{\"u}ller, Klaus-Robert and Kloft, Marius},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://openreview.net/forum?id=HkgH0TEYwH}
}

If you would like get in touch, just drop us an email to [email protected].

Abstract

Deep approaches to anomaly detection have recently shown promising results over shallow methods on large and complex datasets. Typically anomaly detection is treated as an unsupervised learning problem. In practice however, one may have---in addition to a large set of unlabeled samples---access to a small pool of labeled samples, e.g. a subset verified by some domain expert as being normal or anomalous. Semi-supervised approaches to anomaly detection aim to utilize such labeled samples, but most proposed methods are limited to merely including labeled normal samples. Only a few methods take advantage of labeled anomalies, with existing deep approaches being domain-specific. In this work we present Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection. We further introduce an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for our method. In extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10, along with other anomaly detection benchmark datasets, we demonstrate that our method is on par or outperforms shallow, hybrid, and deep competitors, yielding appreciable performance improvements even when provided with only little labeled data.

The need for semi-supervised anomaly detection

fig1

Installation

This code is written in Python 3.7 and requires the packages listed in requirements.txt.

Clone the repository to your machine and directory of choice:

git clone https://github.com/lukasruff/Deep-SAD-PyTorch.git

To run the code, we recommend setting up a virtual environment, e.g. using virtualenv or conda:

virtualenv

# pip install virtualenv
cd <path-to-Deep-SAD-PyTorch-directory>
virtualenv myenv
source myenv/bin/activate
pip install -r requirements.txt

conda

cd <path-to-Deep-SAD-PyTorch-directory>
conda create --name myenv
source activate myenv
while read requirement; do conda install -n myenv --yes $requirement; done < requirements.txt

Running experiments

We have implemented the MNIST, Fashion-MNIST, and CIFAR-10 datasets as well as the classic anomaly detection benchmark datasets arrhythmia, cardio, satellite, satimage-2, shuttle, and thyroid from the Outlier Detection DataSets (ODDS) repository (http://odds.cs.stonybrook.edu/) as reported in the paper.

The implemented network architectures are as reported in the appendix of the paper.

Deep SAD

You can run Deep SAD experiments using the main.py script.

Here's an example on MNIST with 0 considered to be the normal class and having 1% labeled (known) training samples from anomaly class 1 with a pollution ratio of 10% of the unlabeled training data (with unknown anomalies from all anomaly classes 1-9):

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folders for experimental output
mkdir log/DeepSAD
mkdir log/DeepSAD/mnist_test

# change to source directory
cd src

# run experiment
python main.py mnist mnist_LeNet ../log/DeepSAD/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --lr 0.0001 --n_epochs 150 --lr_milestone 50 --batch_size 128 --weight_decay 0.5e-6 --pretrain True --ae_lr 0.0001 --ae_n_epochs 150 --ae_batch_size 128 --ae_weight_decay 0.5e-3 --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

Have a look into main.py for all possible arguments and options.

Baselines

We also provide an implementation of the following baselines via the respective baseline_<method_name>.py scripts: OC-SVM (ocsvm), Isolation Forest (isoforest), Kernel Density Estimation (kde), kernel Semi-Supervised Anomaly Detection (ssad), and Semi-Supervised Deep Generative Model (SemiDGM).

Here's how to run SSAD for example on the same experimental setup as above:

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folder for experimental output
mkdir log/ssad
mkdir log/ssad/mnist_test

# change to source directory
cd src

# run experiment
python baseline_ssad.py mnist ../log/ssad/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --kernel rbf --kappa 1.0 --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

The autoencoder is provided through Deep SAD pre-training using --pretrain True with main.py. To then run a hybrid approach using one of the classic methods on top of autoencoder features, simply point to the saved autoencoder model using --load_ae ../log/DeepSAD/mnist_test/model.tar and set --hybrid True.

To run hybrid SSAD for example on the same experimental setup as above:

cd <path-to-Deep-SAD-PyTorch-directory>

# activate virtual environment
source myenv/bin/activate  # or 'source activate myenv' for conda

# create folder for experimental output
mkdir log/hybrid_ssad
mkdir log/hybrid_ssad/mnist_test

# change to source directory
cd src

# run experiment
python baseline_ssad.py mnist ../log/hybrid_ssad/mnist_test ../data --ratio_known_outlier 0.01 --ratio_pollution 0.1 --kernel rbf --kappa 1.0 --hybrid True --load_ae ../log/DeepSAD/mnist_test/model.tar --normal_class 0 --known_outlier_class 1 --n_known_outlier_classes 1;

License

MIT

Owner
Lukas Ruff
PhD student in the ML group at TU Berlin.
Lukas Ruff
💡 Catatan Materi Bahasa Pemrogramman Python

Repository catatan kuliah Andika Tulus Pangestu selama belajar Dasar Pemrograman dengan Python.

0 Oct 10, 2021
A markdown wiki and dashboarding system for Datasette

datasette-notebook A markdown wiki and dashboarding system for Datasette This is an experimental alpha and everything about it is likely to change. In

Simon Willison 19 Apr 20, 2022
Fast, efficient Blowfish cipher implementation in pure Python (3.4+).

blowfish This module implements the Blowfish cipher using only Python (3.4+). Blowfish is a block cipher that can be used for symmetric-key encryption

Jashandeep Sohi 41 Dec 31, 2022
Feature Store for Machine Learning

Overview Feast is an open source feature store for machine learning. Feast is the fastest path to productionizing analytic data for model training and

Feast 3.8k Dec 30, 2022
Python bindings to OpenSlide

OpenSlide Python OpenSlide Python is a Python interface to the OpenSlide library. OpenSlide is a C library that provides a simple interface for readin

OpenSlide 297 Dec 21, 2022
Types that make coding in Python quick and safe.

Type[T] Types that make coding in Python quick and safe. Type[T] works best with Python 3.6 or later. Prior to 3.6, object types must use comment type

Contains 17 Aug 01, 2022
Hasköy is an open-source variable sans-serif typeface family

Hasköy Hasköy is an open-source variable sans-serif typeface family. Designed with powerful opentype features and each weight includes latin-extended

67 Jan 04, 2023
sphinx builder that outputs markdown files.

sphinx-markdown-builder sphinx builder that outputs markdown files Please ★ this repo if you found it useful ★ ★ ★ If you want frontmatter support ple

Clay Risser 144 Jan 06, 2023
Reproducible Data Science at Scale!

Pachyderm: The Data Foundation for Machine Learning Pachyderm provides the data layer that allows machine learning teams to productionize and scale th

Pachyderm 5.7k Dec 29, 2022
Lightweight, configurable Sphinx theme. Now the Sphinx default!

What is Alabaster? Alabaster is a visually (c)lean, responsive, configurable theme for the Sphinx documentation system. It is Python 2+3 compatible. I

Jeff Forcier 670 Dec 19, 2022
A website for courses of Major Computer Science, NKU

A website for courses of Major Computer Science, NKU

Sakura 0 Oct 06, 2022
Autolookup GUI Plugin for Plover

Word Tray for Plover Word Tray is a GUI plugin that automatically looks up efficient outlines for words that start with the current input, much like a

Kathy 3 Jun 08, 2022
JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.

JTEX JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates. This package uses Jinja2 as the template engine with

Curvenote 15 Dec 21, 2022
Near Zero-Overhead Python Code Coverage

Slipcover: Near Zero-Overhead Python Code Coverage by Juan Altmayer Pizzorno and Emery Berger at UMass Amherst's PLASMA lab. About Slipcover Slipcover

PLASMA @ UMass 325 Dec 28, 2022
A curated list of python programming language blogs

Python Blogs A curated list of python programming language blogs Contribute Companies/Organization # A B C D E F G H I J K L M N O P Q R S T U V W X Y

Rizky D. Onto 48 Nov 15, 2022
Quick tutorial on orchest.io that shows how to build multiple deep learning models on your data with a single line of code using python

Deep AutoViML Pipeline for orchest.io Quickstart Build Deep Learning models with a single line of code: deep_autoviml Deep AutoViML helps you build te

Ram Seshadri 6 Oct 02, 2022
A simple malware that tries to explain the logic of computer viruses with Python.

Simple-Virus-With-Python A simple malware that tries to explain the logic of computer viruses with Python. What Is The Virus ? Computer viruses are ma

Xrypt0 6 Nov 18, 2022
Build AGNOS, the operating system for your comma three

agnos-builder This is the tool to build AGNOS, our Ubuntu based OS. AGNOS runs on the comma three devkit. NOTE: the edk2_tici and agnos-firmare submod

comma.ai 21 Dec 24, 2022
Tampilan - Change Termux Appearance With Python

Tampilan Gambar usage pkg update && pkg upgrade pkg install git && pkg install f

Creator Lord-Botz 1 Jan 31, 2022
A collection of lecture notes, drawings, flash cards, mind maps, scripts

Neuroanatomy A collection of lecture notes, drawings, flash cards, mind maps, scripts and other helpful resources for the course "Functional Organizat

Georg Reich 3 Sep 21, 2022