Repository for the AugmentedPCA Python package.

Overview

AugmentedPCA logo

Overview

This Python package provides implementations of Augmented Principal Component Analysis (AugmentedPCA) - a family of linear factor models that find a set of factors aligned with an augmenting objective in addition to the canonical PCA objective of finding factors that represent the data variance. AugmentedPCA can be split into two general families of models: adversarial AugmentedPCA and supervised AugmentedPCA.

Supervised AugmentedPCA

In supervised AugmentedPCA (SAPCA), the augmenting objective is to make the factors aligned with the data labels, or some outcome, in addition to having the factors explain the variance of the original observed or primary data. SAPCA is useful when predictivity of latent components with respects to a set of data labels or outcomes is desired. SAPCA is equivalent to a supervised autoencoder (SAE) with a single hidden layer. Therefore, SAPCA can be applied to situations where the properties of latent representations enforced via deep SAEs are desired, yet where limited data or training inconsistencies are a concern. Below is a diagram depicting the relationship between primary data, supervision data, and the resulting SAPCA factors.

SAPCA diagram

Adversarial AugmentedPCA

In adversarial AugmentedPCA (AAPCA), the augmenting objective is to make the factors orthogonal to a set of concomitant data, in addition to having the factors explain the variance of the original observed or primary data. AAPCA can be used in situations where one wishes to enforce invariance of latent components to a set of concomitant data, and is equivalent to an adversarial autoencoder with a single hidden layer. Below is a diagram depicting the relationship between primary data, concomitant data, and the resulting AAPCA factors.

AAPCA diagram

Documentation

Documentation for AugmentedPCA is available on this documentation site.

Provided documentation includes:

  • Motivation - Motivation behind AugmentedPCA models and the different approximate inference strategies.

  • Model formulation - Overview of different models and approximate inference strategies as well as more in-depth mathematical descriptions.

  • Tutorials - Step-by-step guide on how to use the different offered AugmentedPCA models.

  • Examples - Use case examples for the different models.

Dependencies

The AugmentedPCA package is written in Python, and requires Python >= 3.6 to run. AugmentedPCA relies on the following libraries and version numbers:

Installation

To install the latest stable release, use pip. Use the following command to install:

$ pip install augmented-pca

Issue Tracking and Reports

Please use the GitHub issue tracker associated with the AugmentedPCA repository for issue tracking, filing bug reports, and asking general questions about the package or project.

Quick Introduction

A quick guide to using AugmentedPCA is given in this section. For a more in-depth guide, see our documentation.

Importing AugmentedPCA Models

APCA models can be imported from the models.py module. Below we show an example of importing the AAPCA model.

# Import all AugmentedPCA models
from apca.models import AAPCA

Alternatively, all offered AugmentedPCA models can be imported at once.

# Import all AugmentedPCA models
from apca.models import *

Instantiating AugmentedPCA Models

APCA models are instantiated by assigning either an SAPCA or AAPCA object to a variable. During instantiation, one has the option to define parameters n_components, mu, which represent the number of components and the augmenting objective strength, respectively. Additionally, approximate inference strategy can be defined through the inference parameter.

# Define model parameters
n_components = 2        # factors will have dimensionality of 2
mu = 1.0                # augmenting objective strength equal to 1 
inference = 'encoded'   # encoded approximate inference strategy

# Instantiate adversarial AugmentedPCA model
aapca = AAPCA(n_components=n_components, mu=mu, inference=inference)

Fitting AugmentedPCA Models

APCA models closely follow the style and implemention of scikit-learn's PCA implementation, with many of the same methods and functionality. Similar to scikit-learn models, AugmentedPCA models are fit using the fit() method. fit() takes two parameters: X which represents the matrix of primary data and Y which represents the matrix of augmenting data.

# Import numpy
import numpy as np

# Generate synthetic data
# Note: primary and augmenting data must have same number of samples/same first dimension size
n_samp = 100
X = np.random.randn(n_samp, 20)   # primary data, 100 samples with dimensionality of 20
Y = np.random.randn(n_samp, 3)    # concomitant data, 100 samples with dimensionality of 3

# Fit adversarial AugmentedPCA instance
aapca.fit(X=X, Y=Y)

Alternatively, AugmentedPCA models can be fit using the fit_transform() method, which takes the same parameters as the fit() method but also returns a matrix of components or factors.

# Fit adversarial AugmentedPCA instance and generate components
S = aapca.fit_transform(X=X, Y=Y)

Approximate Inference Strategies

In this section, we give a brief overview of the different approximate inference strategies offered for AugmentedPCA. Inference strategy should be chosen based on the data on which the AugmentedPCA model will be used as well as the specific use case. Both SAPCA and AAPCA models use the jointly-encoded approximate inference strategy by default.

Local

In the local approximate inference strategy, the factors (local variables associated with each observation) are included in both the likelihood relating and the augmenting objective. Below is a diagram from our paper depicting the local inference strategy.

local inference diagram

Because the local variables are included in the augmenting objective, given new data we must have both primary and augmenting data to obtain factors. Thus, the local inference strategy should only be used for inference on new data when both primary and augmenting data are available. Below we show an example of how to fit a SAPCA model with local approximate inference strategy to training data and obtain factors for test data.

# Import numpy
import numpy as np

# Import supervised AugmentedPCA
from apca.models import SAPCA

# Generate synthetic data and labels
n_samp = 100
X = np.random.randn(n_samp, 20)
Y = np.random.randint(low=0, high=1, size=(n_samp, 1), dtype=int)

# Generate test/train splits
train_pct = 0.7
idx = np.arange(start=0, stop=101, step=1, dtype=int)
np.random.shuffle(idx)
n_train = int(train_pct * len(idx))
train_idx = idx[:n_train]
test_idx = idx[n_train:]

# Split data into test/train sets
X_train = X[train_idx, :]
X_test = X[test_idx, :]
Y_train = Y[train_idx, :]
Y_test = Y[test_idx, :]

# Instantiate supervised AugmentedPCA model with local approximate inference strategy
sapca = SAPCA(n_components=3, mu=5.0, inference='local')

# Fit supervised AugmentedPCA model
sapca.fit(X=X_train, Y_train)

# Generate components for test set
# Note: both primary and augmenting data are needed to obtain factors
S_test = sapca.transform(X=X_test, Y=Y_test)

Note that when factors are generated for the test set that the transform() method requires both the primary data X_test and labels Y_test be passed as parameters. For a more in-depth description of the local approximate inference strategy, see our paper or the corresponding documentation section.

Encoded

In the encoded approximate inference strategy, a linear encoder is used to transform the data into factors or components. This inference strategy is termed "encoded" because the augmenting objective is enforced via an encoding function. Below is a diagram depicting the encoded inference strategy.

encoded inference diagram

In contrast to the local inference strategy, when factors are generated for the test set under the encoded inference strategy the transform() method only requires the primary data X_test. Below we show an example of how to fit a SAPCA model with encoded approximate inference strategy to training data and obtain factors for test data.

# Instantiate supervised AugmentedPCA model model with encoded approximate inference strategy
sapca = SAPCA(n_components=3, mu=5.0, inference='encoded')

# Fit supervised AugmentedPCA model
# Note: both primary and augmenting data are required to fit the model
sapca.fit(X=X_train, Y_train)

# Generate components for test set
# Note: only primary data are needed to obtain factors
S_test = sapca.transform(X=X_test)

For a more in-depth description of the encoded approximate inference strategy, see our paper or the corresponding documentation section.

Jointly-Encoded

The jointly-encoded approximate inference strategy is similar to the encoded in that the augmenting objective is enforced through a linear encoding matrix. However, in the jointly-encoded inference strategy both the primary and augmenting data are required for computing factors, similar to the local inference strategy. Below is a diagram depicting the jointly-encoded inference strategy.

jointly-encoded inference diagram

Similar to the local inference strategy, when factors are generated for the test set under the jointly-encoded inference strategy the transform() method requires both the primary data X_test and augmenting data Y_test. Below we show an example of how to fit a SAPCA model with jointly-encoded approximate inference strategy to training data and obtain factors for test data.

# Instantiate supervised AugmentedPCA model model with encoded approximate inference strategy
sapca = SAPCA(n_components=3, mu=5.0, inference='encoded')

# Fit supervised AugmentedPCA model
# Note: both primary and augmenting data are required to fit the model
sapca.fit(X=X_train, Y_train)

# Generate components for test set
# Note: both primary and augmenting data are needed to obtain factors
S_test = sapca.transform(X=X_test)

For a more in-depth description of the jointly-encoded approximate inference strategy, see our paper or the corresponding documentation section.

Citation

Please cite our paper if you find this package helpful in your research:

@inproceedings{carson2021augmentedpca,
title={{AugmentedPCA}: {A} {P}ython {P}ackage of {S}upervised and {A}dversarial {L}inear {F}actor {M}odels},
author={{Carson IV}, William E. and Talbot, Austin and Carlson, David},
year={2021},
month={December},
booktitle={{P}roceedings of {L}earning {M}eaningful {R}epresentations of {L}ife {W}orkshop at {NeurIPS} 2021}}

Funding

This project was supported by the National Institute of Biomedical Imaging and Bioengineering and the National Institute of Mental Health through the National Institutes of Health BRAIN Initiative under Award Number R01EB026937.

Owner
Billy Carson
Biomedical Engineering PhD candidate at Duke University using machine learning to investigate neurodevelopmental conditions and learn about the human brain.
Billy Carson
When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings

When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings This is the repository for t

RegLab 39 Jan 07, 2023
SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement

SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement This repository implements the approach described in SporeAgent: Reinforced

Dominik Bauer 5 Jan 02, 2023
Unofficial keras(tensorflow) implementation of MAE model from Masked Autoencoders Are Scalable Vision Learners

MAE-keras Unofficial keras(tensorflow) implementation of MAE model described in 'Masked Autoencoders Are Scalable Vision Learners'. This work has been

Yewon 11 Jun 12, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
Really awesome semantic segmentation

really-awesome-semantic-segmentation A list of all papers on Semantic Segmentation and the datasets they use. This site is maintained by Holger Caesar

Holger Caesar 400 Nov 28, 2022
Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

DenseNAS The code of the CVPR2020 paper Densely Connected Search Space for More Flexible Neural Architecture Search. Neural architecture search (NAS)

Jamin Fong 291 Nov 18, 2022
chen2020iros: Learning an Overlap-based Observation Model for 3D LiDAR Localization.

Overlap-based 3D LiDAR Monte Carlo Localization This repo contains the code for our IROS2020 paper: Learning an Overlap-based Observation Model for 3D

Photogrammetry & Robotics Bonn 219 Dec 15, 2022
git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction Code for the ECCV 2020 paper by Yiming Qian and Yasutaka Furukawa Getting

37 Dec 04, 2022
HyperCube: Implicit Field Representations of Voxelized 3D Models

HyperCube: Implicit Field Representations of Voxelized 3D Models Authors: Magdalena Proszewska, Marcin Mazur, Tomasz Trzcinski, Przemysław Spurek [Pap

Magdalena Proszewska 3 Mar 09, 2022
Y. Zhang, Q. Yao, W. Dai, L. Chen. AutoSF: Searching Scoring Functions for Knowledge Graph Embedding. IEEE International Conference on Data Engineering (ICDE). 2020

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
This is a code repository for the paper "Graph Auto-Encoders for Financial Clustering".

Repository for the paper "Graph Auto-Encoders for Financial Clustering" Requirements Python 3.6 torch torch_geometric Instructions This is a simple c

Edward Turner 1 Dec 02, 2021
Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

EleutherAI 432 Dec 16, 2022
UMich 500-Level Mobile Robotics Course

MOBILE ROBOTICS: METHODS & ALGORITHMS - WINTER 2022 University of Michigan - NA 568/EECS 568/ROB 530 For slides, lecture notes, and example codes, see

393 Dec 29, 2022
This library provides an abstraction to perform Model Versioning using Weight & Biases.

Description This library provides an abstraction to perform Model Versioning using Weight & Biases. Features Version a new trained model Promote a mod

Hector Lopez Almazan 2 Jan 28, 2022
Fuse radar and camera for detection

SAF-FCOS: Spatial Attention Fusion for Obstacle Detection using MmWave Radar and Vision Sensor This project hosts the code for implementing the SAF-FC

ChangShuo 18 Jan 01, 2023
Generating Band-Limited Adversarial Surfaces Using Neural Networks

Generating Band-Limited Adversarial Surfaces Using Neural Networks This is the official repository of the technical report that was published on arXiv

3 Jul 26, 2022
Code for "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans" CVPR 2021 best paper candidate

News 05/17/2021 To make the comparison on ZJU-MoCap easier, we save quantitative and qualitative results of other methods at here, including Neural Vo

ZJU3DV 748 Jan 07, 2023
Neural Cellular Automata + CLIP

🧠 Text-2-Cellular Automata Using Neural Cellular Automata + OpenAI CLIP (Work in progress) Examples Text Prompt: Cthulu is watching cthulu_is_watchin

Mainak Deb 21 Dec 19, 2022
When BERT Plays the Lottery, All Tickets Are Winning

When BERT Plays the Lottery, All Tickets Are Winning Large Transformer-based models were shown to be reducible to a smaller number of self-attention h

Sai 16 Nov 10, 2022
Content shared at DS-OX Meetup

Streamlit-Projects Streamlit projects available in this repo: An introduction to Streamlit presented at DS-OX (Feb 26, 2020) meetup Streamlit 101 - Ja

Arvindra 69 Dec 23, 2022