Transform-Invariant Non-Negative Matrix Factorization

Overview

Flake8 Linter Pylint Linter Pytest and Coverage Build Documentation Publish to PyPI Open in Streamlit

Logo

Transform-Invariant Non-Negative Matrix Factorization

A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learning transform-invariant representations.

The packages supports multiple optimization backends and can be easily extended to handle application-specific types of transforms.

General Introduction

A general introduction to Non-Negative Matrix Factorization and the purpose of this package can be found on the corresponding GitHub Pages.

Installation

For using this package, you will need Python version 3.7 (or higher). The package is available via PyPI.

Installation is easiest using pip:

pip install tnmf

Demos and Examples

The package comes with a streamlit demo and a number of examples that demonstrate the capabilities of the TNMF model. They provide a good starting point for your own experiments.

Online Demo

Without requiring any installation, the demo is accessible via streamlit sharing.

Local Execution

Once the package is installed, the demo and the examples can be conveniently executed locally using the tnmf command:

  • To execute the demo, run tnmf demo.
  • A specific example can be executed by calling tnmf example .

To show the list of available examples, type tnmf example --help.

License

Copyright (c) 2021 Merck KGaA, Darmstadt, Germany

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

The full text of the license can be found in the file LICENSE in the repository root directory.

Contributing

Contributions to the package are always welcome and can be submitted via a pull request. Please note, that you have to agree to the Contributor License Agreement to contribute.

Working with the Code

To checkout the code and set up a working environment with all required Python packages, execute the following commands:

git checkout https://github.com/emdgroup/tnmf.git ./tnmf
cd tmnf
python3 -m virtualenv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Now, you should be able to execute the unit tests by calling pytest to verify that the code is running as expected.

Pull Requests

Before creating a pull request, you should always try to ensure that the automated code quality and unit tests do not fail. This section explains how to run them locally to understand and fix potential issues.

Code Style and Quality

Code style and quality are checked using flake8 and pylint. To execute them, change into the repository root directory, run the following commands and inspect their output:

flake8
pylint tnmf

In order for a pull request to be accaptable, no errors may be reported here.

Unit Tests

Automated unit tests reside inside the folder tnmf/tests. They can be executed via pytest by changing into the repository root directory and running

pytest

Debugging potential failures from the command line might be cumbersome. Most Python IDEs, however, also support pytest natively in their debugger. Again, for a pull request to be acceptable, no failures may be reported here.

Code Coverage

Code coverage in the unit tests is measured using coverage. A coverage report can be created locally from the repository root directory via

coverage run
coverage combine
coverage report

This will output a concise table with an overview of python files that are not fully covered with unit tests along with the line numbers of code that has not been executed. A more detailed, interactive report can be created using

coverage html

Then, you can open the file htmlcov/index.html in a web browser of your choice to navigate through code annotated with coverage data. Required overall coverage to is configured in setup.cfg, under the key fail_under in section [coverage:report].

Building the Documentation

To build the documentation locally, change into the doc subdirectory and run make html. Then, the documentation resides at doc\_build\html\index.html.

My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

Jifu Zhao 1.5k Jan 03, 2023
MoRecon - A tool for reconstructing missing frames in motion capture data.

MoRecon - A tool for reconstructing missing frames in motion capture data.

Yuki Nishidate 38 Dec 03, 2022
Convert tables stored as images to an usable .csv file

Convert an image of numbers to a .csv file This Python program aims to convert images of array numbers to corresponding .csv files. It uses OpenCV for

711 Dec 26, 2022
Monitor the stability of a pandas or spark dataframe ⚙︎

Population Shift Monitoring popmon is a package that allows one to check the stability of a dataset. popmon works with both pandas and spark datasets.

ING Bank 403 Dec 07, 2022
Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021
pipeline for migrating lichess data into postgresql

How Long Does It Take Ordinary People To "Get Good" At Chess? TL;DR: According to 5.5 years of data from 2.3 million players and 450 million games, mo

Joseph Wong 182 Nov 11, 2022
Uses MIT/MEDSL, New York Times, and US Census datasources to analyze per-county COVID-19 deaths.

Covid County Executive summary Setup Install miniconda, then in the command line, run conda create -n covid-county conda activate covid-county conda i

Ahmed Fasih 1 Dec 22, 2021
PyPDC is a Python package for calculating asymptotic Partial Directed Coherence estimations for brain connectivity analysis.

Python asymptotic Partial Directed Coherence and Directed Coherence estimation package for brain connectivity analysis. Free software: MIT license Doc

Heitor Baldo 3 Nov 26, 2022
An easy-to-use feature store

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

ByteHub AI 48 Dec 09, 2022
Renato 214 Jan 02, 2023
Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

HoloViz 2.9k Jan 06, 2023
A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

carlo paladino 1 Jan 02, 2022
Methylation/modified base calling separated from basecalling.

Remora Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs s

Oxford Nanopore Technologies 72 Jan 05, 2023
Pipeline to convert a haploid assembly into diploid

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes

Mikhail Kolmogorov 50 Jan 05, 2023
This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

This is an example of how to automate Ridit Analysis for a dataset with large amount of questions and many item attributes

Ishan Hegde 1 Nov 17, 2021
Calculate multilateral price indices in Python (with Pandas and PySpark).

IndexNumCalc Calculate multilateral price indices using the GEKS-T (CCDI), Time Product Dummy (TPD), Time Dummy Hedonic (TDH), Geary-Khamis (GK) metho

Dr. Usman Kayani 3 Apr 27, 2022
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020] by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wa

112 Dec 28, 2022
Useful tool for inserting DataFrames into the Excel sheet.

PyCellFrame Insert Pandas DataFrames into the Excel sheet with a bunch of conditions Install pip install pycellframe Usage Examples Let's suppose that

Luka Sosiashvili 1 Feb 16, 2022
PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.

PostQF Copyright © 2022 Ralph Seichter PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j. See the ma

Ralph Seichter 11 Nov 24, 2022
Detecting Underwater Objects (DUO)

Underwater object detection for robot picking has attracted a lot of interest. However, it is still an unsolved problem due to several challenges. We take steps towards making it more realistic by ad

27 Dec 12, 2022