Code for reproducible experiments presented in KSD Aggregated Goodness-of-fit Test.

Overview

Code for KSDAgg: a KSD aggregated goodness-of-fit test

This GitHub repository contains the code for the reproducible experiments presented in our paper KSD Aggregated Goodness-of-fit Test:

  • Gamma distribution experiment,
  • Gaussian-Bernoulli Restricted Boltzmann Machine experiment,
  • MNIST Normalizing Flow experiment.

We provide the code to run the experiments to generate Figures 1-4 and Table 1 from our paper, those can be found in figures.

Our aggregated test KSDAgg is implemented in ksdagg.py. We provide code for two quantile estimation methods: the wild bootstrap and the parametric bootstrap. Our implementation uses the IMQ (inverse multiquadric) kernel with a collection of bandwidths consisting of the median bandwidth scaled by powers of 2, and with one of the four types of weights proposed in MMD Aggregated Two-Sample Test. We also provide custom KSDAgg functions in ksdagg.py which allow for the use of any kernel collections and weights.

Requirements

  • python 3.9

Installation

In a chosen directory, clone the repository and change to its directory by executing

git clone [email protected]:antoninschrab/ksdagg-paper.git
cd ksdagg-paper

We then recommend creating and activating a virtual environment by either

  • using venv:
    python3 -m venv ksdagg-env
    source ksdagg-env/bin/activate
    # can be deactivated by running:
    # deactivate
    
  • or using conda:
    conda create --name ksdagg-env python=3.9
    conda activate ksdagg-env
    # can be deactivated by running:
    # conda deactivate
    

The required packages can then be installed in the virtual environment by running

python -m pip install -r requirements.txt

Generating or downloading the data

The data for the Gaussian-Bernoulli Restricted Boltzmann Machine experiment and for the MNIST Normalizing Flow experiment can

  • be obtained by executing
    python generate_data_rbm.py
    python generate_data_nf.py
    
  • or, as running the above scripts can be computationally expensive, we also provide the option to download their outputs directly
    python download_data.py
    

Those scripts generate samples and compute their associated scores under the model for the different settings considered in our experiments, the data is saved in the new directory data.

Reproducing the experiments of the paper

First, for our three experiments, we compute KSD values to be used for the parametric bootstrap and save them in the directory parametric. This can be done by running

python generate_parametric.py

For convenience, we directly provide the directory parametric obtained by running this script.

To run the three experiments, the following commands can be executed

python experiment_gamma.py 
python experiment_rbm.py 
python experiment_nf.py 

Those commands run all the tests necessary for our experiments, the results are saved in dedicated .csv and .pkl files in the directory results (which is already provided for ease of use). Note that our expeiments are comprised of 'embarrassingly parallel for loops', for which significant speed up can be obtained by using parallel computing libraries such as joblib or dask.

The actual figures of the paper can be obtained from the saved dataframes in results by using the command

python figures.py  

The figures are saved in the directory figures and correspond to the ones used in our paper.

References

Our KSDAgg code is based our MMDAgg implementation which can be found at https://github.com/antoninschrab/mmdagg-paper.

For the Gaussian-Bernoulli Restricted Boltzmann Machine experiment, we obtain the samples and scores in generate_data_rbm.py by relying on Wittawat Jitkrittum's implementation which can be found at https://github.com/wittawatj/kernel-gof under the MIT License. The relevant files we use are in the directory kgof.

For the MNIST Normalizing Flow experiment, we use in generate_data_nf.py a multiscale Normalizing Flow trained on the MNIST dataset as implemented by Phillip Lippe in Tutorial 11: Normalizing Flows for image modeling as part of the UvA Deep Learning Tutorials under the MIT License.

Author

Antonin Schrab

Centre for Artificial Intelligence, Department of Computer Science, University College London

Gatsby Computational Neuroscience Unit, University College London

Inria, Lille - Nord Europe research centre and Inria London Programme

Bibtex

@unpublished{schrab2022ksd,
    title={{KSD} Aggregated Goodness-of-fit Test},
    author={Antonin Schrab and Benjamin Guedj and Arthur Gretton},
    year={2022},
    note = "Submitted.",
    abstract = {We investigate properties of goodness-of-fit tests based on the Kernel Stein Discrepancy (KSD). We introduce a strategy to construct a test, called KSDAgg, which aggregates multiple tests with different kernels. KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels. We provide theoretical guarantees on the power of KSDAgg: we show it achieves the smallest uniform separation rate of the collection, up to a logarithmic term. KSDAgg can be computed exactly in practice as it relies either on a parametric bootstrap or on a wild bootstrap to estimate the quantiles and the level corrections. In particular, for the crucial choice of bandwidth of a fixed kernel, it avoids resorting to arbitrary heuristics (such as median or standard deviation) or to data splitting. We find on both synthetic and real-world data that KSDAgg outperforms other state-of-the-art adaptive KSD-based goodness-of-fit testing procedures.},
    url = {https://arxiv.org/abs/2202.00824},
    url_PDF = {https://arxiv.org/pdf/2202.00824.pdf},
    url_Code = {https://github.com/antoninschrab/ksdagg-paper},
    eprint={2202.00824},
    archivePrefix={arXiv},
    primaryClass={stat.ML}
}

License

MIT License (see LICENSE.md)

Owner
Antonin Schrab
Antonin Schrab
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022
scalingscattering

Scaling The Scattering Transform : Deep Hybrid Networks This repository contains the experiments found in the paper: https://arxiv.org/abs/1703.08961

Edouard Oyallon 78 Dec 21, 2022
Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

NonCuboidRoom Paper Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiao

67 Dec 15, 2022
Face Detection & Age Gender & Expression & Recognition

Face Detection & Age Gender & Expression & Recognition

Sajjad Ayobi 188 Dec 28, 2022
MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network This repository is the official implementation of MatchGAN: A S

Justin Sun 12 Dec 27, 2022
Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

Self-Supervised Document Similarity Ranking (SDR) via Contextualized Language Models and Hierarchical Inference This repo is the implementation for SD

Microsoft 36 Nov 28, 2022
Some pvbatch (paraview) scripts for postprocessing OpenFOAM data

pvbatchForFoam Some pvbatch (paraview) scripts for postprocessing OpenFOAM data For every script there is a help message available: pvbatch pv_state_s

Morev Ilya 2 Oct 26, 2022
SNE-RoadSeg in PyTorch, ECCV 2020

SNE-RoadSeg Introduction This is the official PyTorch implementation of SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentati

242 Dec 20, 2022
Contenido del curso Bases de datos del DCC PUC versión 2021-2

IIC2413 - Bases de Datos Tabla de contenidos Equipo Profesores Ayudantes Contenidos Calendario Evaluaciones Resumen de notas Foro Política de integrid

54 Nov 23, 2022
Pytorch implementation of Hinton's Dynamic Routing Between Capsules

pytorch-capsule A Pytorch implementation of Hinton's "Dynamic Routing Between Capsules". https://arxiv.org/pdf/1710.09829.pdf Thanks to @naturomics fo

Tim Omernick 625 Oct 27, 2022
Create Data & AI apps in 20 lines of code with Shimoku

Install with: pip install shimoku-api-python Start with: from os import getenv import shimoku_api_python.client as Shimoku

Shimoku 5 Nov 07, 2022
2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

sohu_text_matching 2021搜狐校园文本匹配算法大赛Top2:分比我们低的都是帅哥队 本repo包含了本次大赛决赛环节提交的代码文件及答辩PPT,提交的模型文件可在百度网盘获取(链接:https://pan.baidu.com/s/1T9FtwiGFZhuC8qqwXKZSNA ,

hflserdaniel 43 Oct 01, 2022
Code repository for the paper "Tracking People with 3D Representations"

Tracking People with 3D Representations Code repository for the paper "Tracking People with 3D Representations" (paper link) (project site). Jathushan

Jathushan Rajasegaran 77 Dec 03, 2022
Generating Videos with Scene Dynamics

Generating Videos with Scene Dynamics This repository contains an implementation of Generating Videos with Scene Dynamics by Carl Vondrick, Hamed Pirs

Carl Vondrick 706 Jan 04, 2023
Food Drinks and groceries Images Multi Lingual (FooDI-ML) dataset.

Food Drinks and groceries Images Multi Lingual (FooDI-ML) dataset.

41 Jan 04, 2023
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).

Varun Nair 37 Dec 30, 2022
Iris prediction model is used to classify iris species created julia's DecisionTree, DataFrames, JLD2, PlotlyJS and Statistics packages.

Iris Species Predictor Iris prediction is used to classify iris species using their sepal length, sepal width, petal length and petal width created us

Siva Prakash 2 Jan 06, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 264 Dec 23, 2022
Video-based open-world segmentation

UVO_Challenge Team Alpes_runner Solutions This is an official repo for our UVO Challenge solutions for Image/Video-based open-world segmentation. Our

Yuming Du 84 Dec 22, 2022
Multiview 3D object detection on MultiviewC dataset through moft3d.

Multiview Orthographic Feature Transformation for 3D Object Detection Multiview 3D object detection on MultiviewC dataset through moft3d. Introduction

Jiahao Ma 20 Dec 21, 2022