Data stream analytics: Implement online learning methods to address concept drift in data streams using the River library. Code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams" accepted in IEEE GlobeCom 2021.

Overview

PWPAE-Concept-Drift-Detection-and-Adaptation

This is the code for the paper entitled "PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams" published in 2021 IEEE Global Communications Conference (GLOBECOM).
Authors: Li Yang, Dimitrios Michael Manias, and Abdallah Shami
Organization: The Optimized Computing and Communications (OC2) Lab, ECE Department, Western University

This repository also introduces concept drift definitions and online machine learning methods for data stream analytics using the River library.

Another tutorial code for concept drift, online machine learning, and data stream analytics can be found in: OASW-Concept-Drift-Detection-and-Adaptation

Concept Drift

In non-stationary and dynamical environments, such as IoT environments, the distribution of input data often changes over time, known as concept drift. The occurrence of concept drift will result in the performance degradation of the current trained data analytics model. Traditional offline machine learning (ML) models cannot deal with concept drift, making it necessary to develop online adaptive analytics models that can adapt to the predictable and unpredictable changes in data streams.

To address concept drift, effective methods should be able to detect concept drift and adapt to the changes accordingly. Therefore, concept drift detection and adaptation are the two major steps for online learning on data streams.

Drift Detection

  • Adaptive Windowing (ADWIN) is a distribution-based method that uses an adaptive sliding window to detect concept drift based on data distribution changes. ADWIN identifies concept drift by calculating and analyzing the average of certain statistics over the two sub-windows of the adaptive window. The occurrence of concept drift is indicated by a large difference between the averages of the two sub-windows. Once a drift point is detected, all the old data samples before that drift time point are discarded.

    • Albert Bifet and Ricard Gavalda. "Learning from time-changing data with adaptive windowing." In Proceedings of the 2007 SIAM international conference on data mining, pp. 443-448. Society for Industrial and Applied Mathematics, 2007.
    from river.drift import ADWIN
    adwin = ADWIN()
  • Drift Detection Method (DDM) is a popular model performance-based method that defines two thresholds, a warning level and a drift level, to monitor model's error rate and standard deviation changes for drift detection.

    • João Gama, Pedro Medas, Gladys Castillo, Pedro Pereira Rodrigues: Learning with Drift Detection. SBIA 2004: 286-295
    from river.drift import DDM
    ddm = DDM()

Drift Adaptation

  • Hoeffding tree (HT) is a type of decision tree (DT) that uses the Hoeffding bound to incrementally adapt to data streams. Compared to a DT that chooses the best split, the HT uses the Hoeffding bound to calculate the number of necessary samples to select the split node. Thus, the HT can update its node to adapt to newly incoming samples.

    • G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In KDD’01, pages 97–106, San Francisco, CA, 2001. ACM Press.
    from river import tree
    model = tree.HoeffdingTreeClassifier(
         grace_period=100,
         split_confidence=1e-5,
         ...
    )
  • Extremely Fast Decision Tree (EFDT), also named Hoeffding Anytime Tree (HATT), is an improved version of the HT that splits nodes as soon as it reaches the confidence level instead of detecting the best split in the HT.

    • C. Manapragada, G. Webb, and M. Salehi. Extremely Fast Decision Tree. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 1953-1962, 2018.
    from river import tree
    model = tree.ExtremelyFastDecisionTreeClassifier(
         grace_period=100,
         split_confidence=1e-5,
         min_samples_reevaluate=100,
         ...
     )
  • Adaptive random forest (ARF) algorithm uses HTs as base learners and ADWIN as the drift detector for each tree to address concept drift. Through the drift detection process, the poor-performing base trees are replaced by new trees to fit the new concept.

    • Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabricio Enembreck, Bernhard Pfharinger, Geoff Holmes, Talel Abdessalem. Adaptive random forests for evolving data stream classification. In Machine Learning, DOI: 10.1007/s10994-017-5642-8, Springer, 2017.
    from river import ensemble
    model = ensemble.AdaptiveRandomForestClassifier(
         n_models=3,
         drift_detector=ADWIN(),
         ...
     )
  • Streaming Random Patches (SRP) uses the similar technology of ARF, but it uses the global subspace randomization strategy, instead of the local subspace randomization technique used by ARF. The global subspace randomization is a more flexible method that improves the diversity of base learners.

    • Heitor Murilo Gomes, Jesse Read, Albert Bifet. Streaming Random Patches for Evolving Data Stream Classification. IEEE International Conference on Data Mining (ICDM), 2019.
    from river import ensemble
    base_model = tree.HoeffdingTreeClassifier(
       grace_period=50, split_confidence=0.01,
       ...
     )
    model = ensemble.SRPClassifier(
       model=base_model, n_models=3, drift_detector=ADWIN(),
       ...
    )
  • Leverage bagging (LB) is another popular online ensemble that uses bootstrap samples to construct base learners. It uses Poisson distribution to increase the data diversity and leverage the bagging performance.

    • Bifet A., Holmes G., Pfahringer B. (2010) Leveraging Bagging for Evolving Data Streams. In: Balcázar J.L., Bonchi F., Gionis A., Sebag M. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2010. Lecture Notes in Computer Science, vol 6321. Springer, Berlin, Heidelberg.
    from river import ensemble
    from river import linear_model
    from river import preprocessing
    model = ensemble.LeveragingBaggingClassifier(
       model=(
           preprocessing.StandardScaler() |
           linear_model.LogisticRegression()
       ),
       n_models=3,
       ...
    )

Abstract of The Paper

As the number of Internet of Things (IoT) devices and systems have surged, IoT data analytics techniques have been developed to detect malicious cyber-attacks and secure IoT systems; however, concept drift issues often occur in IoT data analytics, as IoT data is often dynamic data streams that change over time, causing model degradation and attack detection failure. This is because traditional data analytics models are static models that cannot adapt to data distribution changes. In this paper, we propose a Performance Weighted Probability Averaging Ensemble (PWPAE) framework for drift adaptive IoT anomaly detection through IoT data stream analytics. Experiments on two public datasets show the effectiveness of our proposed PWPAE method compared against state-of-the-art methods.

Implementation

Online Learning/Concept Drift Adaptation Algorithms

  • Adaptive Random Forest (ARF)
  • Streaming Random Patches (SRP)
  • Extremely Fast Decision Tree (EFDT)
  • Hoeffding Tree (HT)
  • Leveraging Bagging (LB)
  • Performance Weighted Probability Averaging Ensemble (PWPAE)
    • Proposed Method

Drift Detection Algorithms

  • Adaptive Windowing (ADWIN)
  • Drift Detection Method (DDM)

Dataset

  1. IoTID20 dataset, a novel IoT botnet dataset

  2. CICIDS2017 dataset, a popular network traffic dataset for intrusion detection problems

For the purpose of displaying the experimental results in Jupyter Notebook, the sampled subsets of the two datasets are used in the sample code. The subsets are in the "data" folder.

Code

Requirements & Libraries

Contact-Info

Please feel free to contact us for any questions or cooperation opportunities. We will be happy to help.

Citation

If you find this repository useful in your research, please cite this article as:

L. Yang, D. M. Manias, and A. Shami, “PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams,” in 2021 IEEE Glob. Commun. Conf. (GLOBECOM), Madrid, Spain, Dec. 2021.

@INPROCEEDINGS{9685338,
  author={Yang, Li and Manias, Dimitrios Michael and Shami, Abdallah},
  booktitle={2021 IEEE Global Communications Conference (GLOBECOM)}, 
  title={PWPAE: An Ensemble Framework for Concept Drift Adaptation in IoT Data Streams}, 
  year={2021},
  pages={1-6},
  doi={10.1109/GLOBECOM46510.2021.9685338}
  }
Owner
Western OC2 Lab
The Optimized Computing and Communications (OC2) Laboratory within the Department of Electrical and Computer Engineering at Western University, London, Canada.
Western OC2 Lab
Code accompanying our NeurIPS 2021 traffic4cast challenge

Traffic forecasting on traffic movie snippets This repo contains all code to reproduce our approach to the IARAI Traffic4cast 2021 challenge. In the c

Nina Wiedemann 2 Aug 09, 2022
This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling

deSpeckNet-TF-GEE This repository contains the re-implementation of our paper deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling publi

Adugna Mullissa 16 Sep 07, 2022
Online-compatible Unsupervised Non-resonant Anomaly Detection Repository

Online-compatible Unsupervised Non-resonant Anomaly Detection Repository Repository containing all scripts used in the studies of Online-compatible Un

0 Nov 09, 2021
Neural Style and MSG-Net

PyTorch-Style-Transfer This repo provides PyTorch Implementation of MSG-Net (ours) and Neural Style (Gatys et al. CVPR 2016), which has been included

Hang Zhang 904 Dec 21, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution

Joint Implicit Image Function for Guided Depth Super-Resolution This repository contains the code for: Joint Implicit Image Function for Guided Depth

hawkey 78 Dec 27, 2022
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Intro Real-time object detection and classification. Paper: version 1, version 2. Read more about YOLO (in darknet) and download weight files here. In

Trieu 6.1k Dec 30, 2022
Implementation of Shape Generation and Completion Through Point-Voxel Diffusion

Shape Generation and Completion Through Point-Voxel Diffusion Project | Paper Implementation of Shape Generation and Completion Through Point-Voxel Di

Linqi Zhou 103 Dec 29, 2022
A repository that finds a person who looks like you by using face recognition technology.

Find Your Twin Hello everyone, I've always wondered how casting agencies do the casting for a scene where a certain actor is young or old for a movie

Cengizhan Yurdakul 3 Jan 29, 2022
The King is Naked: on the Notion of Robustness for Natural Language Processing

the-king-is-naked: on the notion of robustness for natural language processing AAAI2022 DISCLAIMER:This repo will be updated soon with instructions on

Iperboreo_ 1 Nov 24, 2022
Official Pytorch implementation for "End2End Occluded Face Recognition by Masking Corrupted Features, TPAMI 2021"

End2End Occluded Face Recognition by Masking Corrupted Features This is the Pytorch implementation of our TPAMI 2021 paper End2End Occluded Face Recog

Haibo Qiu 25 Oct 31, 2022
Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework"

Privacy-Aware Inverse RL (PRIL) Analysis Framework Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based

1 Dec 06, 2021
Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Selene is a Python library and command line interface for training deep neural networks from biological sequence data such as genomes.

Troyanskaya Laboratory 323 Jan 01, 2023
Re-implementation of the vector capsule with dynamic routing

VectorCapsule Re-implementation of the vector capsule with dynamic routing We implement the vector capsule and dynamic routing via graph neural networ

ZhenchaoTang 10 Feb 10, 2022
Efficient 3D Backbone Network for Temporal Modeling

VoV3D is an efficient and effective 3D backbone network for temporal modeling implemented on top of PySlowFast. Diverse Temporal Aggregation and

102 Dec 06, 2022
[NeurIPS 2021] SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

SSUL - Official Pytorch Implementation (NeurIPS 2021) SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning Sun

Clova AI Research 44 Dec 27, 2022
Bu repo SAHI uygulamasını mantığını öğreniyoruz.

SAHI-Learn: SAHI'den Beraber Kodlamak İster Misiniz Herkese merhabalar ben Kadir Nar. SAHI kütüphanesine gönüllü geliştiriciyim. Bu repo SAHI kütüphan

Kadir Nar 11 Aug 22, 2022
The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Quantifying Hippocampus Volume for Alzheimer's Progression Background Alzheimer's disease (AD) is a progressive neurodegenerative disorder that result

Omar Laham 1 Jan 14, 2022
Model-based Reinforcement Learning Improves Autonomous Racing Performance

Racing Dreamer: Model-based versus Model-free Deep Reinforcement Learning for Autonomous Racing Cars In this work, we propose to learn a racing contro

Cyber Physical Systems - TU Wien 38 Dec 06, 2022
Object DGCNN and DETR3D, Our implementations are built on top of MMdetection3D.

This repo contains the implementations of Object DGCNN (https://arxiv.org/abs/2110.06923) and DETR3D (https://arxiv.org/abs/2110.06922). Our implementations are built on top of MMdetection3D.

Wang, Yue 539 Jan 07, 2023