Transform-Invariant Non-Negative Matrix Factorization

Overview

Flake8 Linter Pylint Linter Pytest and Coverage Build Documentation Publish to PyPI Open in Streamlit

Logo

Transform-Invariant Non-Negative Matrix Factorization

A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learning transform-invariant representations.

The packages supports multiple optimization backends and can be easily extended to handle application-specific types of transforms.

General Introduction

A general introduction to Non-Negative Matrix Factorization and the purpose of this package can be found on the corresponding GitHub Pages.

Installation

For using this package, you will need Python version 3.7 (or higher). The package is available via PyPI.

Installation is easiest using pip:

pip install tnmf

Demos and Examples

The package comes with a streamlit demo and a number of examples that demonstrate the capabilities of the TNMF model. They provide a good starting point for your own experiments.

Online Demo

Without requiring any installation, the demo is accessible via streamlit sharing.

Local Execution

Once the package is installed, the demo and the examples can be conveniently executed locally using the tnmf command:

  • To execute the demo, run tnmf demo.
  • A specific example can be executed by calling tnmf example .

To show the list of available examples, type tnmf example --help.

License

Copyright (c) 2021 Merck KGaA, Darmstadt, Germany

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The full text of the license can be found in the file LICENSE in the repository root directory.

Contributing

Contributions to the package are always welcome and can be submitted via a pull request. Please note, that you have to agree to the Contributor License Agreement to contribute.

Working with the Code

To checkout the code and set up a working environment with all required Python packages, execute the following commands:

git checkout https://github.com/emdgroup/tnmf.git ./tnmf
cd tmnf
python3 -m virtualenv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Now, you should be able to execute the unit tests by calling pytest to verify that the code is running as expected.

Pull Requests

Before creating a pull request, you should always try to ensure that the automated code quality and unit tests do not fail. This section explains how to run them locally to understand and fix potential issues.

Code Style and Quality

Code style and quality are checked using flake8 and pylint. To execute them, change into the repository root directory, run the following commands and inspect their output:

flake8
pylint tnmf

In order for a pull request to be accaptable, no errors may be reported here.

Unit Tests

Automated unit tests reside inside the folder tnmf/tests. They can be executed via pytest by changing into the repository root directory and running

pytest

Debugging potential failures from the command line might be cumbersome. Most Python IDEs, however, also support pytest natively in their debugger. Again, for a pull request to be acceptable, no failures may be reported here.

Code Coverage

Code coverage in the unit tests is measured using coverage. A coverage report can be created locally from the repository root directory via

coverage run
coverage combine
coverage report

This will output a concise table with an overview of python files that are not fully covered with unit tests along with the line numbers of code that has not been executed. A more detailed, interactive report can be created using

coverage html

Then, you can open the file htmlcov/index.html in a web browser of your choice to navigate through code annotated with coverage data. Required overall coverage to is configured in setup.cfg, under the key fail_under in section [coverage:report].

Building the Documentation

To build the documentation locally, change into the doc subdirectory and run make html. Then, the documentation resides at doc\_build\html\index.html.

Exploratory data analysis

Exploratory data analysis An Exploratory data analysis APP TAPIWA CHAMBOKO ๐Ÿš€ About Me I'm a full stack developer experienced in deploying artificial

tapiwa chamboko 1 Nov 07, 2021
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
Udacity-api-reporting-pipeline - Udacity api reporting pipeline

udacity-api-reporting-pipeline In this exercise, you'll use portions of each of

Fabio Barbazza 1 Feb 15, 2022
PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Python asymptotic Partial Directed Coherence and Directed Coherence estimation package for brain connectivity analysis. Free software: MIT license Doc

Heitor Baldo 3 Nov 26, 2022
CRISP: Critical Path Analysis of Microservice Traces

CRISP: Critical Path Analysis of Microservice Traces This repo contains code to compute and present critical path summary from Jaeger microservice tra

Uber Research 110 Jan 06, 2023
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022
The Spark Challenge Student Check-In/Out Tracking Script

The Spark Challenge Student Check-In/Out Tracking Script This Python Script uses the Student ID Database to match the entries with the ID Card Swipe a

1 Dec 09, 2021
Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021
CSV database for chihuahua (HUAHUA) blockchain transactions

super-fiesta Shamelessly ripped components from https://github.com/hodgerpodger/staketaxcsv - Thanks for doing all the hard work. This code does only

Arlene Macciaveli 1 Jan 07, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
This repository contains some analysis of possible nerdle answers

Nerdle Analysis https://nerdlegame.com/ This repository contains some analysis of possible nerdle answers. Here's a quick overview: nerdle.py contains

0 Dec 16, 2022
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis. You write a high level configuration file specifying your in

Blue Collar Bioinformatics 917 Jan 03, 2023
Performance analysis of predictive (alpha) stock factors

Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour

Quantopian, Inc. 2.5k Jan 09, 2023
A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

Thomas Trรคff 2 Dec 22, 2021
Python beta calculator that retrieves stock and market data and provides linear regressions.

Stock and Index Beta Calculator Python script that calculates the beta (ฮฒ) of a stock against the chosen index. The script retrieves the data and resa

sammuhrai 4 Jul 29, 2022
Statistical Analysis ๐Ÿ“ˆ focused on statistical analysis and exploration used on various data sets for personal and professional projects.

Statistical Analysis ๐Ÿ“ˆ This repository focuses on statistical analysis and the exploration used on various data sets for personal and professional pr

Andy Pham 1 Sep 03, 2022
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

Jacob Schreiber 457 Dec 20, 2022
Geospatial data-science analysis on reasons behind delay in Grab ride-share services

Grab x Pulis Detailed analysis done to investigate possible reasons for delay in Grab services for NUS Data Analytics Competition 2022, to be found in

Keng Hwee 6 Jun 07, 2022