PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Last update: Oct 13, 2022

Related tags

Data Analysis PCAfold

Overview

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA). It incorporates a variety of data preprocessing tools (including data clustering and sampling), uses PCA as a dimensionality reduction technique and utilizes a novel approach to assess the quality of the obtained low-dimensional manifolds.

Citing PCAfold

PCAfold is published in the SoftwareX journal. If you use PCAfold in a scientific publication, you can cite the software as:

Zdybał, K., Armstrong, E., Parente, A. and Sutherland, J.C., 2020. PCAfold: Python software to generate, analyze and improve PCA-derived low-dimensional manifolds. SoftwareX, 12, p.100630.

or using BibTeX:

@article{pcafold2020,
title = "PCAfold: Python software to generate, analyze and improve PCA-derived low-dimensional manifolds",
journal = "SoftwareX",
volume = "12",
pages = "100630",
year = "2020",
issn = "2352-7110",
doi = "https://doi.org/10.1016/j.softx.2020.100630",
url = "http://www.sciencedirect.com/science/article/pii/S2352711020303435",
author = "Kamila Zdybał and Elizabeth Armstrong and Alessandro Parente and James C. Sutherland"
}

PCAfold documentation

PCAfold documentation contains a thorough user guide including equations, references and example code snippets. Numerous illustrative tutorials and demos are presented as well. The corresponding Jupyter notebooks can be found in the docs/tutorials directory.

Software architecture

A general overview for using PCAfold modules is presented in the diagram below:

Each module's functionalities can also be used as a standalone tool for performing a specific task and can easily combine with techniques outside of this software, such as K-Means algorithm or Artificial Neural Networks.

Installation

Dependencies

PCAfold requires python3.7 and the following packages:

Cython
matplotlib
numpy
scipy
termcolor

Build from source

Clone the PCAfold repository and move into the PCAfold directory created:

git clone http://gitlab.multiscale.utah.edu/common/PCAfold.git
cd PCAfold

Run the setup.py script as below to complete the installation:

python3.7 setup.py build_ext --inplace
python3.7 setup.py install

You are ready to import PCAfold!

Testing

To run regression tests from the base repo directory run:

python3.7 -m unittest discover

To switch verbose on, use the -v flag.

All tests should be passing. If any of the tests is failing and you can’t sort out why, please open an issue on GitLab.

Authors and contacts

Kamila Zdybał, Université Libre de Bruxelles, [email protected]
Elizabeth Armstrong, The University of Utah, [email protected]
Alessandro Parente, Université Libre de Bruxelles, [email protected]
James C. Sutherland, The University of Utah, [email protected]

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Related tags

Overview

Citing PCAfold

PCAfold documentation

Software architecture

Installation

Dependencies

Build from source

Testing

Authors and contacts

Owner

Burn Research

🌍 Create 3d-printable STLs from satellite elevation data 🌏

Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio

This python script allows you to manipulate the audience data from Sl.ido surveys

Show you how to integrate Zeppelin with Airflow

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

PyEmits, a python package for easy manipulation in time-series data.

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

Data exploration done quick.

PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

A columnar data container that can be compressed.

Analysis scripts for QG equations

Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Wafer Fault Detection - Wafer circleci with python

Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment

Universal data analysis tools for atmospheric sciences

Python package to transfer data in a fast, reliable, and packetized form.

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

follow-analyzer helps GitHub users analyze their following and followers relationship

Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer