Transform-Invariant Non-Negative Matrix Factorization

Overview

Flake8 Linter Pylint Linter Pytest and Coverage Build Documentation Publish to PyPI Open in Streamlit

Logo

Transform-Invariant Non-Negative Matrix Factorization

A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learning transform-invariant representations.

The packages supports multiple optimization backends and can be easily extended to handle application-specific types of transforms.

General Introduction

A general introduction to Non-Negative Matrix Factorization and the purpose of this package can be found on the corresponding GitHub Pages.

Installation

For using this package, you will need Python version 3.7 (or higher). The package is available via PyPI.

Installation is easiest using pip:

pip install tnmf

Demos and Examples

The package comes with a streamlit demo and a number of examples that demonstrate the capabilities of the TNMF model. They provide a good starting point for your own experiments.

Online Demo

Without requiring any installation, the demo is accessible via streamlit sharing.

Local Execution

Once the package is installed, the demo and the examples can be conveniently executed locally using the tnmf command:

  • To execute the demo, run tnmf demo.
  • A specific example can be executed by calling tnmf example .

To show the list of available examples, type tnmf example --help.

License

Copyright (c) 2021 Merck KGaA, Darmstadt, Germany

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The full text of the license can be found in the file LICENSE in the repository root directory.

Contributing

Contributions to the package are always welcome and can be submitted via a pull request. Please note, that you have to agree to the Contributor License Agreement to contribute.

Working with the Code

To checkout the code and set up a working environment with all required Python packages, execute the following commands:

git checkout https://github.com/emdgroup/tnmf.git ./tnmf
cd tmnf
python3 -m virtualenv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Now, you should be able to execute the unit tests by calling pytest to verify that the code is running as expected.

Pull Requests

Before creating a pull request, you should always try to ensure that the automated code quality and unit tests do not fail. This section explains how to run them locally to understand and fix potential issues.

Code Style and Quality

Code style and quality are checked using flake8 and pylint. To execute them, change into the repository root directory, run the following commands and inspect their output:

flake8
pylint tnmf

In order for a pull request to be accaptable, no errors may be reported here.

Unit Tests

Automated unit tests reside inside the folder tnmf/tests. They can be executed via pytest by changing into the repository root directory and running

pytest

Debugging potential failures from the command line might be cumbersome. Most Python IDEs, however, also support pytest natively in their debugger. Again, for a pull request to be acceptable, no failures may be reported here.

Code Coverage

Code coverage in the unit tests is measured using coverage. A coverage report can be created locally from the repository root directory via

coverage run
coverage combine
coverage report

This will output a concise table with an overview of python files that are not fully covered with unit tests along with the line numbers of code that has not been executed. A more detailed, interactive report can be created using

coverage html

Then, you can open the file htmlcov/index.html in a web browser of your choice to navigate through code annotated with coverage data. Required overall coverage to is configured in setup.cfg, under the key fail_under in section [coverage:report].

Building the Documentation

To build the documentation locally, change into the doc subdirectory and run make html. Then, the documentation resides at doc\_build\html\index.html.

A Python and R autograding solution

Otter-Grader Otter Grader is a light-weight, modular open-source autograder developed by the Data Science Education Program at UC Berkeley. It is desi

Infrastructure Team 93 Jan 03, 2023
A set of procedures that can realize covid19 virus detection based on blood.

A set of procedures that can realize covid19 virus detection based on blood.

Nuyoah-xlh 3 Mar 07, 2022
Data science/Analysis Health Care Portfolio

Health-Care-DS-Projects Data Science/Analysis Health Care Portfolio Consists Of 3 Projects: Mexico Covid-19 project, analyze the patient medical histo

Mohamed Abd El-Mohsen 1 Feb 13, 2022
PyTorch implementation for NCL (Neighborhood-enrighed Contrastive Learning)

NCL (Neighborhood-enrighed Contrastive Learning) This is the official PyTorch implementation for the paper: Zihan Lin*, Changxin Tian*, Yupeng Hou* Wa

RUCAIBox 73 Jan 03, 2023
Data and code accompanying the paper Politics and Virality in the Time of Twitter

Politics and Virality in the Time of Twitter Data and code accompanying the paper Politics and Virality in the Time of Twitter. In specific: the code

Cardiff NLP 3 Jul 02, 2022
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
4CAT: Capture and Analysis Toolkit

4CAT: Capture and Analysis Toolkit 4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to m

Digital Methods Initiative 147 Dec 20, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

Tuplex 791 Jan 04, 2023
Handle, manipulate, and convert data with units in Python

unyt A package for handling numpy arrays with units. Often writing code that deals with data that has units can be confusing. A function might return

The yt project 304 Jan 02, 2023
Python ELT Studio, an application for building ELT (and ETL) data flows.

The Python Extract, Load, Transform Studio is an application for performing ELT (and ETL) tasks. Under the hood the application consists of a two parts.

Schlerp 55 Nov 18, 2022
Clean and reusable data-sciency notebooks.

KPACUBO KPACUBO is a set Jupyter notebooks focused on the best practices in both software development and data science, namely, code reuse, explicit d

Matvey Morozov 1 Jan 28, 2022
Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

Arun Singh Babal 1 Jan 17, 2022
AWS Glue ETL Code Samples

AWS Glue ETL Code Samples This repository has samples that demonstrate various aspects of the new AWS Glue service, as well as various AWS Glue utilit

AWS Samples 1.2k Jan 03, 2023
A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset

xwrf A lightweight interface for reading in output from the Weather Research and Forecasting (WRF) model into xarray Dataset. The primary objective of

National Center for Atmospheric Research 43 Nov 29, 2022
Two phase pipeline + StreamlitTwo phase pipeline + Streamlit

Two phase pipeline + Streamlit This is an example project that demonstrates how to create a pipeline that consists of two phases of execution. In betw

Rick Lamers 1 Nov 17, 2021
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

🧪📈 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python a

Marc Skov Madsen 97 Dec 08, 2022
COVID-19 deaths statistics around the world

COVID-19-Deaths-Dataset COVID-19 deaths statistics around the world This is a daily updated dataset of COVID-19 deaths around the world. The dataset c

Nisa Efendioğlu 4 Jul 10, 2022
Data Analysis for First Year Laboratory at Imperial College, London.

Data Analysis for First Year Laboratory at Imperial College, London. For personal reference only, and to reference in lab reports and lab books.

Martin He 0 Aug 29, 2022
This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks

This project is the implementation template for HW 0 and HW 1 for both the programming and non-programming tracks

Donald F. Ferguson 4 Mar 06, 2022