A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Overview

OutliersSlidingWindows

A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Dataset generation

The original datasets, namely Higgs and Cover, are provided (compressed) in the data folder. One can download and preprocess the datasets as follows:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz
cat HIGGS.csv.gz | gunzip | cut -d ',' -f 23,24,25,26,27,28,29 > higgs.dat

wget https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz
gunzip covtype.data.gz

The script datasets.sh decompresses the zipped original datasets and generates the artificial datasets used in the paper. In particular, the program InjectOutliers takes a dataset and injects artificial outliers. It takes as an argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • p, the probability with which to inject an outlier after every point
  • r, the scaling factor for the norm of the outlier points
  • d, the dimension of the points

The program GenerateArtificial generates automatically a dataset with points in a unit ball with outliers on the suface of a ball of radius r. It takes as an argument:

  • out, the path to the output file
  • p, the probability with which to inject an outlier
  • r, the radius of the outer ball
  • d, the dimension of the points

Running the experiments

The script exec.sh runs a representative subset of the experiments presented in the paper.

The program Main runs the experiments on the comparison of our k-center algorithm with the sequential ones. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • samp, the number of candidate centers for sampled-charikar
  • doChar, if set to 1 executes charikar, else it is skipped

It outputs, in the folder out/k-cen/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage and the clustering radius.

The program MainLambda runs the experiments on the sensitivity on lambda. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method (lambda unused)
  • minDist, maxDist, parameters of our method
  • doSlow, if set to 1 executes the slowest test, else it is skipped

It outputs, in the folder out/lam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage due to histograms, the total memory usage and the clustering radius.

The program MainEffDiam runs the experiments on the effective diameter algorithms. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • alpha, fraction fo distances to discard
  • eta, lower bound on ratio between effective diameter and diameter
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • doSeq, if set to 1 executes the sequential method, else it is skipped

It outputs, in the folder out/diam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the two methods, the update times, the query times, the memory usage and the effective diameter estimate.
Owner
PaoloPellizzoni
PaoloPellizzoni
Uni-Fold: Training your own deep protein-folding models

Uni-Fold: Training your own deep protein-folding models. This package provides an implementation of a trainable, Transformer-based deep protein foldin

DP Technology 187 Jan 04, 2023
Learning Skeletal Articulations with Neural Blend Shapes

This repository provides an end-to-end library for automatic character rigging and blend shapes generation as well as a visualization tool. It is based on our work Learning Skeletal Articulations wit

Peizhuo 504 Dec 30, 2022
SuperSDR: multiplatform KiwiSDR + CAT transceiver integrator

SuperSDR SuperSDR integrates a realtime spectrum waterfall and audio receive from any KiwiSDR around the world, together with a local (or remote) cont

Marco Cogoni 30 Nov 29, 2022
API for RL algorithm design & testing of BCA (Building Control Agent) HVAC on EnergyPlus building energy simulator by wrapping their EMS Python API

RL - EmsPy (work In Progress...) The EmsPy Python package was made to facilitate Reinforcement Learning (RL) algorithm research for developing and tes

20 Jan 05, 2023
A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

Hyunsoo Cho 1 Dec 20, 2021
Doosan robotic arm, simulation, control, visualization in Gazebo and ROS2 for Reinforcement Learning.

Robotic Arm Simulation in ROS2 and Gazebo General Overview This repository includes: First, how to simulate a 6DoF Robotic Arm from scratch using GAZE

David Valencia 12 Jan 02, 2023
PyTorch-lightning implementation of the ESFW module proposed in our paper Edge-Selective Feature Weaving for Point Cloud Matching

Edge-Selective Feature Weaving for Point Cloud Matching This repository contains a PyTorch-lightning implementation of the ESFW module proposed in our

5 Feb 14, 2022
Release of the ConditionalQA dataset

ConditionalQA Datasets accompanying the paper ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers. Disclaimer This dataset

14 Oct 17, 2022
Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification Prerequisite PyTorch = 1.2.0 Python3 torch

16 Dec 14, 2022
POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propagation including diffraction

POPPY: Physical Optics Propagation in Python POPPY (Physical Optics Propagation in Python) is a Python package that simulates physical optical propaga

Space Telescope Science Institute 132 Dec 15, 2022
BlockUnexpectedPackets - Preventing BungeeCord CPU overload due to Layer 7 DDoS attacks by scanning BungeeCord's logs

BlockUnexpectedPackets This script automatically blocks DDoS attacks that are sp

SparklyPower 3 Mar 31, 2022
GANSketchingJittor - Implementation of Sketch Your Own GAN in Jittor

GANSketching in Jittor Implementation of (Sketch Your Own GAN) in Jittor(计图). Or

Bernard Tan 10 Jul 02, 2022
A cross-lingual COVID-19 fake news dataset

CrossFake An English-Chinese COVID-19 fake&real news dataset from the ICDMW 2021 paper below: Cross-lingual COVID-19 Fake News Detection. Jiangshu Du,

Yingtong Dou 11 Dec 01, 2022
DeepDiffusion: Unsupervised Learning of Retrieval-adapted Representations via Diffusion-based Ranking on Latent Feature Manifold

DeepDiffusion Introduction This repository provides the code of the DeepDiffusion algorithm for unsupervised learning of retrieval-adapted representat

4 Nov 15, 2022
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
Code for "R-GCN: The R Could Stand for Random"

RR-GCN: Random Relational Graph Convolutional Networks PyTorch Geometric code for the paper "R-GCN: The R Could Stand for Random" RR-GCN is an extensi

PreDiCT.IDLab 31 Sep 07, 2022
The end-to-end platform for building voice products at scale

Picovoice Made in Vancouver, Canada by Picovoice Picovoice is the end-to-end platform for building voice products on your terms. Unlike Alexa and Goog

Picovoice 318 Jan 07, 2023
unet for image segmentation

Implementation of deep learning framework -- Unet, using Keras The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Seg

zhixuhao 4.1k Dec 31, 2022
Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks.

The Lottery Ticket Hypothesis for Pre-trained BERT Networks Code for this paper The Lottery Ticket Hypothesis for Pre-trained BERT Networks. [NeurIPS

VITA 122 Dec 14, 2022
MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

MoCoPnet: Exploring Local Motion and Contrast Priors for Infrared Small Target Super-Resolution Pytorch implementation of local motion and contrast pr

Xinyi Ying 28 Dec 15, 2022