A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Overview

OutliersSlidingWindows

A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

Dataset generation

The original datasets, namely Higgs and Cover, are provided (compressed) in the data folder. One can download and preprocess the datasets as follows:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/00280/HIGGS.csv.gz
cat HIGGS.csv.gz | gunzip | cut -d ',' -f 23,24,25,26,27,28,29 > higgs.dat

wget https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/covtype.data.gz
gunzip covtype.data.gz

The script datasets.sh decompresses the zipped original datasets and generates the artificial datasets used in the paper. In particular, the program InjectOutliers takes a dataset and injects artificial outliers. It takes as an argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • p, the probability with which to inject an outlier after every point
  • r, the scaling factor for the norm of the outlier points
  • d, the dimension of the points

The program GenerateArtificial generates automatically a dataset with points in a unit ball with outliers on the suface of a ball of radius r. It takes as an argument:

  • out, the path to the output file
  • p, the probability with which to inject an outlier
  • r, the radius of the outer ball
  • d, the dimension of the points

Running the experiments

The script exec.sh runs a representative subset of the experiments presented in the paper.

The program Main runs the experiments on the comparison of our k-center algorithm with the sequential ones. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • samp, the number of candidate centers for sampled-charikar
  • doChar, if set to 1 executes charikar, else it is skipped

It outputs, in the folder out/k-cen/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage and the clustering radius.

The program MainLambda runs the experiments on the sensitivity on lambda. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • k, the number of centers
  • z, the number of outliers
  • N, the window size
  • beta, eps, lambda, parameters of our method (lambda unused)
  • minDist, maxDist, parameters of our method
  • doSlow, if set to 1 executes the slowest test, else it is skipped

It outputs, in the folder out/lam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the four methods, the update times, the query times, the memory usage due to histograms, the total memory usage and the clustering radius.

The program MainEffDiam runs the experiments on the effective diameter algorithms. It takes as and argument:

  • in, the path to the input dataset
  • out, the path to the output file
  • d, the dimension of the points
  • alpha, fraction fo distances to discard
  • eta, lower bound on ratio between effective diameter and diameter
  • N, the window size
  • beta, eps, lambda, parameters of our method
  • minDist, maxDist, parameters of our method
  • doSeq, if set to 1 executes the sequential method, else it is skipped

It outputs, in the folder out/diam/, a file with:

  • the first line reporting the parameters of the experiments
  • a line for each of the sampled windows reporting, for each of the two methods, the update times, the query times, the memory usage and the effective diameter estimate.
Owner
PaoloPellizzoni
PaoloPellizzoni
Understanding Convolutional Neural Networks from Theoretical Perspective via Volterra Convolution

nnvolterra Run Code Compile first: make compile Run all codes: make all Test xconv: make npxconv_test MNIST dataset needs to be downloaded, converted

1 May 24, 2022
Deep Unsupervised 3D SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment.

(ACMMM 2021 Oral) SfM Face Reconstruction Based on Massive Landmark Bundle Adjustment This repository shows two tasks: Face landmark detection and Fac

BoomStar 51 Dec 13, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 03, 2023
Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

Phil Wang 17 May 06, 2022
Human-Pose-and-Motion History

Human Pose and Motion Scientist Approach Eadweard Muybridge, The Galloping Horse Portfolio, 1887 Etienne-Jules Marey, Descent of Inclined Plane, Chron

Daito Manabe 47 Dec 16, 2022
Pretraining Representations For Data-Efficient Reinforcement Learning

Pretraining Representations For Data-Efficient Reinforcement Learning Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Ch

Mila 40 Dec 11, 2022
HybVIO visual-inertial odometry and SLAM system

HybVIO A visual-inertial odometry system with an optional SLAM module. This is a research-oriented codebase, which has been published for the purposes

Spectacular AI 320 Jan 03, 2023
A knowledge base construction engine for richly formatted data

Fonduer is a Python package and framework for building knowledge base construction (KBC) applications from richly formatted data. Note that Fonduer is

HazyResearch 386 Dec 05, 2022
[IEEE Transactions on Computational Imaging] Self-Gated Memory Recurrent Network for Efficient Scalable HDR Deghosting

Few-shot Deep HDR Deghosting This repository contains code and pretrained models for our paper: Self-Gated Memory Recurrent Network for Efficient Scal

Susmit Agrawal 4 Dec 29, 2021
Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

Geometry Processing with Neural Fields Pytorch implementation for the NeurIPS 2021 paper: Geometry Processing with Neural Fields Guandao Yang, Serge B

Guandao Yang 162 Dec 16, 2022
MassiveSumm: a very large-scale, very multilingual, news summarisation dataset

MassiveSumm: a very large-scale, very multilingual, news summarisation dataset This repository contains links to data and code to fetch and reproduce

Daniel Varab 19 Dec 16, 2022
A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis

A Multi-attribute Controllable Generative Model for Histopathology Image Synthesis This is the pytorch implementation for our MICCAI 2021 paper. A Mul

Jiarong Ye 7 Apr 04, 2022
A deep learning object detector framework written in Python for supporting Land Search and Rescue Missions.

AIR: Aerial Inspection RetinaNet for supporting Land Search and Rescue Missions AIR is a deep learning based object detection solution to automate the

Accenture 13 Dec 22, 2022
🔪 Elimination based Lightweight Neural Net with Pretrained Weights

ELimNet ELimNet: Eliminating Layers in a Neural Network Pretrained with Large Dataset for Downstream Task Removed top layers from pretrained Efficient

snoop2head 4 Jul 12, 2022
Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Real-time VIBE Inference VIBE frame-by-frame. Overview This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE]. Usage: i

23 Jul 02, 2022
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Sun Yi 201 Nov 21, 2022
Implement some metaheuristics and cost functions

Metaheuristics This repot implement some metaheuristics and cost functions. Metaheuristics JAYA Implement Jaya optimizer without constraints. Cost fun

Adri1G 1 Mar 23, 2022
Minecraft Hack Detection With Python

Minecraft Hack Detection An attempt to try and use crowd sourced replays to find

Kuleen Sasse 3 Mar 26, 2022
Finite difference solution of 2D Poisson equation. Can handle Dirichlet, Neumann and mixed boundary conditions.

Poisson-solver-2D Finite difference solution of 2D Poisson equation Current version can handle Dirichlet, Neumann, and mixed (combination of Dirichlet

Mohammad Asif Zaman 34 Dec 23, 2022
Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University 139 Nov 18, 2022