Scikit-learn compatible estimation of general graphical models

Overview

Build Status GitHub version DOI

skggm : Gaussian graphical models using the scikit-learn API

In the last decade, learning networks that encode conditional independence relationships has become an important problem in machine learning and statistics. For many important probability distributions, such as multivariate Gaussians, this amounts to estimation of inverse covariance matrices. Inverse covariance estimation is now used widely in infer gene regulatory networks in cellular biology and neural interactions in the neuroscience.

However, many statistical advances and best practices in fitting such models to data are not yet widely adopted and not available in common python packages for machine learning. Furthermore, inverse covariance estimation is an active area of research where researchers continue to improve algorithms and estimators. With skggm we seek to provide these new developments to a wider audience, and also enable researchers to effectively benchmark their methods in regimes relevant to their applications of interest.

While skggm is currently geared toward Gaussian graphical models, we hope to eventually evolve it to support General graphical models. Read more here.

Inverse Covariance Estimation

Given n independently drawn, p-dimensional Gaussian random samples X with sample covariance S, the maximum likelihood estimate of the inverse covariance matrix \lambda can be computed via the graphical lasso, i.e., the program

\ell_1 penalized inverse covariance estimation

where \Lambda is a symmetric matrix with non-negative entries and

penalty

Typically, the diagonals are not penalized by setting diagonals to ensure that Theta remains positive definite. The objective reduces to the standard graphical lasso formulation of Friedman et al. when all off diagonals of the penalty matrix take a constant scalar value scalar_penalty. The standard graphical lasso has been implemented in scikit-learn.

In this package we provide a scikit-learn-compatible implementation of the program above and a collection of modern best practices for working with the graphical lasso. A rough breakdown of how this package differs from scikit's built-in GraphLasso is depicted by this chart:

sklearn/skggm feature comparison

Quick start

To get started, install the package (via pip, see below) and:


This is an ongoing effort. We'd love your feedback on which algorithms and techniques we should include and how you're using the package. We also welcome contributions.

@jasonlaska and @mnarayan


Included in inverse_covariance

An overview of the skggm graphical lasso facilities is depicted by the following diagram:

sklearn/skggm feature comparison

Information on basic usage can be found at https://skggm.github.io/skggm/tour. The package includes the following classes and submodules.

  • QuicGraphicalLasso [doc]

    QuicGraphicalLasso is an implementation of QUIC wrapped as a scikit-learn compatible estimator [Hsieh et al.] . The estimator can be run in default mode for a fixed penalty or in path mode to explore a sequence of penalties efficiently. The penalty lam can be a scalar or matrix.

    The primary outputs of interest are: covariance_, precision_, and lam_.

    The interface largely mirrors the built-in GraphLasso although some param names have been changed (e.g., alpha to lam). Some notable advantages of this implementation over GraphicalLasso are support for a matrix penalization term and speed.

  • QuicGraphicalLassoCV [doc]

    QuicGraphicalLassoCV is an optimized cross-validation model selection implementation similar to scikit-learn's GraphLassoCV. As with QuicGraphicalLasso, this implementation also supports matrix penalization.

  • QuicGraphicalLassoEBIC [doc]

    QuicGraphicalLassoEBIC is provided as a convenience class to use the Extended Bayesian Information Criteria (EBIC) for model selection [Foygel et al.].

  • ModelAverage [doc]

    ModelAverage is an ensemble meta-estimator that computes several fits with a user-specified estimator and averages the support of the resulting precision estimates. The result is a proportion_ matrix indicating the sample probability of a non-zero at each index. This is a similar facility to scikit-learn's RandomizedLasso) but for the graph lasso.

    In each trial, this class will:

    1. Draw bootstrap samples by randomly subsampling X.

    2. Draw a random matrix penalty.

    The random penalty can be chosen in a variety of ways, specified by the penalization parameter. This technique is also known as stability selection or random lasso.

  • AdaptiveGraphicalLasso [doc]

    AdaptiveGraphicalLasso performs a two step estimation procedure:

    1. Obtain an initial sparse estimate.

    2. Derive a new penalization matrix from the original estimate. We currently provide three methods for this: binary, 1/|coeffs|, and 1/|coeffs|^2. The binary method only requires the initial estimate's support (and this can be be used with ModelAverage below).

    This technique works well to refine the non-zero precision values given a reasonable initial support estimate.

  • inverse_covariance.plot_util.trace_plot

    Utility to plot lam_ paths.

  • inverse_covariance.profiling

    The .profiling submodule contains a MonteCarloProfiling() class for evaluating methods over different graphs and metrics. We currently include the following graph types:

      - LatticeGraph
      - ClusterGraph
      - ErdosRenyiGraph (via sklearn)
    

    An example of how to use these tools can be found in examples/profiling_example.py.

Parallelization Support

skggm supports parallel computation through joblib and Apache Spark. Independent trials, cross validation, and other embarrassingly parallel operations can be farmed out to multiple processes, cores, or worker machines. In particular,

  • QuicGraphicalLassoCV
  • ModelAverage
  • profiling.MonteCarloProfile

can make use of this through either the n_jobs or sc (sparkContext) parameters.

Since these are naive implementations, it is not possible to enable parallel work on all three of objects simultaneously when they are being composited together. For example, in this snippet:

model = ModelAverage(
    estimator=QuicGraphicalLassoCV(
        cv=2,
        n_refinements=6,
    )
    penalization=penalization,
    lam=lam,
    sc=spark.sparkContext,
)
model.fit(X)

only one of ModelAverage or QuicGraphicalLassoCV can make use of the spark context. The problem size and number of trials will determine the resolution that gives the fastest performance.

Installation

Both python2.7 and python3.6.x are supported. We use the black autoformatter to format our code. If contributing, please run this formatter checks will fail.

Clone this repo and run

python setup.py install

or via PyPI

pip install skggm

or from a cloned repo

cd inverse_covariance/pyquic
make
make python3  (for python3)

The package requires that numpy, scipy, and cython are installed independently into your environment first.

If you would like to fork the pyquic bindings directly, use the Makefile provided in inverse_covariance/pyquic.

This package requires the lapack libraries to by installed on your system. A configuration example with these dependencies for Ubuntu and Anaconda 2 can be found here.

Tests

To run the tests, execute the following lines.

python -m pytest inverse_covariance (python3 -m pytest inverse_covariance)
black --check inverse_covariance
black --check examples

Examples

Usage

In examples/estimator_suite.py we reproduce the plot_sparse_cov example from the scikit-learn documentation for each method provided (however, the variations chosen are not exhaustive).

An example run for n_examples=100 and n_features=20 yielded the following results.

(n_examples, n_features) = (100, 20)

(n_examples, n_features) = (100, 20)

(n_examples, n_features) = (100, 20)

For slightly higher dimensions of n_examples=600 and n_features=120 we obtained:

(n_examples, n_features) = (600, 120)

Plotting the regularization path

We've provided a utility function inverse_covariance.plot_util.trace_plot that can be used to display the coefficients as a function of lam_. This can be used with any estimator that returns a path. The example in examples/trace_plot_example.py yields:

Trace plot

Citation

If you use skggm or reference our blog post in a presentation or publication, we would appreciate citations of our package.

Jason Laska, Manjari Narayan, 2017. skggm 0.2.7: A scikit-learn compatible package for Gaussian and related Graphical Models. doi:10.5281/zenodo.830033

Here is the corresponding Bibtex entry

@misc{laska_narayan_2017_830033,
  author       = {Jason Laska and
                  Manjari Narayan},
  title        = {{skggm 0.2.7: A scikit-learn compatible package for
                   Gaussian and related Graphical Models}},
  month        = jul,
  year         = 2017,
  doi          = {10.5281/zenodo.830033},
  url          = {https://doi.org/10.5281/zenodo.830033}
}

References

BIC / EBIC Model Selection

QuicGraphicalLasso / QuicGraphicalLassoCV

Adaptive refitting (two-step methods)

Randomized model averaging

Convergence test

Repeated KFold cross-validation

Comments
  • Installation using alternate lapack lib?

    Installation using alternate lapack lib?

    Hi guys,

    I've started including skggm as an optional dependency for PyNets (see graphestimation module), but I've been struggling to get skggm to compile on HPC where I don't have root privileges and lapack is only offered as part of the MKL suite. Have you all had any success installing skggm with MKL implementations of lapack?

    If I install out-of-the-box, the installer can't find lapack: /opt/apps/gcc/7.1.0/lib/gcc/x86_64-pc-linux-gnu/7.1.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find -llapack collect2: error: ld returned 1 exit status

    A modified install from the MKL libraries would probably be a command that looks something like this:

    python3 setup.py config -I$MKLROOT/include -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -lpthread -llmkl_lapack95_lp64 install
    

    Thoughts?

    derek;

    opened by dPys 42
  • Error when Install on macos

    Error when Install on macos

    When I tried to install on MacOS, error appear with 'gcc':

    inverse_covariance/pyquic/pyquic.cpp:828:14: fatal error: 'complex' file not found
            #include <complex>
                     ^~~~~~~~~
        2 warnings and 1 error generated.
        error: command 'gcc' failed with exit status 1
    

    Can you tell me how to solve this?

    opened by iccentt 12
  • quic is not deterministic

    quic is not deterministic

    quic can give different results for the same input, and setting numpy random state does not fix the problem. Is there a potential source for this indetermination ? This makes it difficult for an estimator that uses skggm to be considered as scikit-learn's estimator (i.e. pass the check_estimator test), because some tests check that the estimator is deterministic

    Example:

    from inverse_covariance import quic
    import numpy as np
    
    np.random.seed(42)
    A = np.random.randn(20, 20)
    A = A.dot(A.T)
    
    np.random.seed(42)
    A_1, _, _, _, _, _ = quic(A, lam=0.01)
    
    np.random.seed(42)
    A_2, _, _, _, _, _ = quic(A, lam=0.01)
    
    np.testing.assert_allclose(A_1, A_2)
    

    Returns:

    AssertionError: 
    Not equal to tolerance rtol=1e-07, atol=0
    
    (mismatch 99.0%)
     x: array([ 0.985128,  0.181885,  0.414247, -0.473431, -0.483563,  0.027897,
            0.292402,  0.116016, -0.177139, -0.395764,  0.217043, -0.367557,
           -0.233921,  0.225825, -0.072036,  0.387505, -0.959885,  0.084411,...
     y: array([ 0.985154,  0.181885,  0.414257, -0.473441, -0.483574,  0.027899,
            0.292409,  0.116016, -0.177139, -0.395786,  0.217051, -0.367555,
           -0.233921,  0.225839, -0.072036,  0.387511, -0.959911,  0.084411,...
    
    bug 
    opened by wdevazelhes 8
  • Two Stage or Adaptive Estimator Class

    Two Stage or Adaptive Estimator Class

    There are several methods to estimate the inverse covariance matrix (MLE, Pseudolikelihood, Dtrace, CLIME, ...). However, the MLE produces positive semi-definite/symmetric estimate with the standard minimum variance benefits (potentially? asymptotically efficient) and thus appropriate as a final estimator.

    Therefore we want an adaptive estimator class that

    • Takes an initial estimate as an input
    • Takes a few different strategies to create adaptive weights and refits the MLE with a weighted penalty matrix.
      • [x] These could take the form of 1/|coefficient| or 1/|coefficient|^2 (classic adaptive glasso)
      • [ ] Or these could be incorporated into a non-convex SCAD or MCP penalty.

    See adaptivity/two-stage example in the GELATO (ftp://ess.r-project.org/Manuscripts/buhlmann/gelato.pdf) estimator (Section 3.3). Here the weights are very naive and basically just binary.

    enhancement 
    opened by mnarayan 8
  • [MRG] put black dependency as optional

    [MRG] put black dependency as optional

    Fixes #119

    I couldn't install skggm with python2.7 because black does not seem to exist in python 2.7 (see https://github.com/ambv/black#installation). I thought this could be a fix (putting black as an optional dependency), but I may be totally wrong here

    opened by wdevazelhes 7
  • Feature/graph generator

    Feature/graph generator

    @mnarayan work in progress on the main gem, the graph generator class (and functions).

    I have some questions about these things and how they diverge from some of the sample code, would like to chat.

    Opening this PR for work in progress visibility...

    opened by jasonlaska 6
  • Model Selection via BIC and EBIC

    Model Selection via BIC and EBIC

    Given sparse regularized inverse covariance estimates over a grid of regularization parameters, a popular criteria to choose the optimal penalty and corresponding estimate is to apply the Bayesian (or Swartz) Information Criterion or the BIC and the Extended BIC (EBIC) in high dimensional regimes.

    The BIC criterion is defined as BIC(lam) = -2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam))

    The EBIC criterion is defined as EBIC(lam) = - 2 * Loglikelihood(Sigma, Theta(lam)) + (log n) * (# of non-zeros in Theta(lam)) + 4 * (# of non-zeros in Theta(lam)) * (log p) * gam

    Here

    • n is n_samples and p is n_features
    • Sigma is the sample covariance/correlation in self.covariance_ and Theta(lam) comes from self.precision_
    • gam is an additional parameter for EBIC that takes values in [0,1]. I recommend setting gam = .1. Setting gam=0 gives back traditional BIC.
    • lam is an element in the grad of lambda penalty parameters.

    The goal is to implement model selection using the above criteria as an alternative to cross-validation.

    References: BIC in sparse inverse covariance estimation EBIC in sparse inverse covariance estimation

    opened by mnarayan 6
  • Compatibility with sklearn model_selection module

    Compatibility with sklearn model_selection module

    Older cross validation module has been deprecated

    DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection 
    module into which all the refactored classes and functions are moved. Also note that the interface of
    the new CV iterators are different from that of this module. This module will be removed in 0.20.
    

    It might make sense to make all our model selection variants (EBIC, CV, and more to come) compatible with new interface.

    enhancement sklearn cleanup 
    opened by mnarayan 5
  • TypeError: expected bytes, str found

    TypeError: expected bytes, str found

    Hi, I was trying to run the convergence_comparison.py in the examples. I run across this issue.

    Traceback (most recent call last): File "convergence_comparison.py", line 28, in vals = quic(Shat, .004, mode='default', tol=1e-6, max_iter=1000, Theta0=np.eye(Shat.shape[0]), Sigma0=np.eye(Shat.shape[0]), path=None, msg=1) File "/home/tianpei/anaconda3/lib/python3.5/site-packages/skggm-0.2.6-py3.5-linux-x86_64.egg/inverse_covariance/quic_graph_lasso.py", line 121, in quic Theta, Sigma, opt, cputime, iters, dGap) File "inverse_covariance/pyquic/pyquic.pyx", line 12, in pyquic.pyquic.quic (inverse_covariance/pyquic/pyquic.cpp:1644) TypeError: expected bytes, str found

    what is wrong with it. I was using python 3.5. It seems to be a problem of python 3

    opened by TianpeiLuke 5
  • seg fault in statistical power?

    seg fault in statistical power?

    When fitting the StatisticalPower simulation object using model_selection_estimator = QUICGraphLassoEBIC() the kernel dying in ipython notebook due to some unknown error. This will need further investigation.

    bug 
    opened by mnarayan 5
  • Create InverseCovariance Subclass of EmpiricalCovariance

    Create InverseCovariance Subclass of EmpiricalCovariance

    The QUIC algorithm estimates the inverse covariance using the sample covariance estimate $S$ as an input $ That = max_{\Theta} logdet(Theta) - Trace(Theta_S) - \lambda_| \Theta |_1 $

    As a result it makes sense for the sample covariance for QUIC to be computed using the methods inherited from the EmpiricalCovariance class.

    Additionally, the log-likelihood and error metrics for the inverse covariance will differ from that of the covariance matrix. Thus corresponding methods in EmpiricalCovariance will need to be overridden where relevant. We need the ones below

    • Negative Log likelihood for That is: Trace(That*S) - logdet(That).
    • KL-Loss for (T, That): Trace(That^{-1}T) - log (That^{-1}T) - p
    • Quadratic Loss: Trace((That*Sigma - I)^2)
    • Frobenius loss (remains the same, but computed in the precision rather than covariance)
    opened by mnarayan 5
  • Installation requirements

    Installation requirements

    Hi,

    I'm very interested in using your Graphical Lasso implementation due to the possibility of adding a matrix penalty (not included in sklearn). However, I'm having some issues installing the git repo. The installation finishes fine in a new environment (py3.6.13) but depending on test method there are a number of packages with the wrong version. An example is sklearn which seems to be required to be <0.20 due to the graph_lasso renaming (more examples below).

    Would it be possible to share a .yml file of a working environment or some more detailed package requirements?

    > python -m pytest inverse_covariance
    Traceback (most recent call last):
      File "/XXX/lib/python3.6/runpy.py", line 183, in _run_module_as_main
        mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
      File "/XXX/lib/python3.6/runpy.py", line 142, in _get_module_details
        return _get_module_details(pkg_main_name, error)
      File "/XXX/lib/python3.6/runpy.py", line 109, in _get_module_details
        __import__(pkg_name)
      File "/XXX/lib/python3.6/site-packages/pytest-6.2.2-py3.6.egg/pytest/__init__.py", line 3, in <module>
        from . import collect
      File "/XXX/lib/python3.6/site-packages/pytest-6.2.2-py3.6.egg/pytest/collect.py", line 8, in <module>
        from _pytest.deprecated import PYTEST_COLLECT_MODULE
      File "/XXX/lib/python3.6/site-packages/pytest-6.2.2-py3.6.egg/_pytest/deprecated.py", line 13, in <module>
        from _pytest.warning_types import PytestDeprecationWarning
      File "/XXX/lib/python3.6/site-packages/pytest-6.2.2-py3.6.egg/_pytest/warning_types.py", line 6, in <module>
        import attr
    ModuleNotFoundError: No module named 'attr'
    
    > python plot_functional_brain_networks.py 
    Traceback (most recent call last):
      File "plot_functional_brain_networks.py", line 20, in <module>
        from nilearn import datasets, plotting, input_data
      File "/XXX/lib/python3.6/site-packages/nilearn-0.7.0-py3.6.egg/nilearn/__init__.py", line 67, in <module>
        _check_module_dependencies()
      File "/XXX/lib/python3.6/site-packages/nilearn-0.7.0-py3.6.egg/nilearn/version.py", line 127, in _check_module_dependencies
        install_info=module_metadata.get('install_info'))
      File "/XXX/lib/python3.6/site-packages/nilearn-0.7.0-py3.6.egg/nilearn/version.py", line 73, in _import_module_with_version_check
        module = __import__(module_name)
    ModuleNotFoundError: No module named 'nibabel'. Module "nibabel" could not be found. Please install it properly to use nilearn.
    

    My environment:

    # Name                    Version                   Build  Channel
    _libgcc_mutex             0.1                 conda_forge    conda-forge
    _openmp_mutex             4.5                       1_gnu    conda-forge
    blas                      1.1                    openblas    conda-forge
    ca-certificates           2020.12.5            ha878542_0    conda-forge
    certifi                   2020.12.5        py36h5fab9bb_1    conda-forge
    cython                    0.29.22          py36hc4f0c31_0    conda-forge
    joblib                    1.0.1              pyhd8ed1ab_0    conda-forge
    ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
    libblas                   3.9.0           1_h86c2bf4_netlib    conda-forge
    libcblas                  3.9.0           3_h92ddd45_netlib    conda-forge
    libffi                    3.3                  h58526e2_2    conda-forge
    libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
    libgfortran               3.0.0                         1    conda-forge
    libgfortran-ng            9.3.0               hff62375_18    conda-forge
    libgfortran5              9.3.0               hff62375_18    conda-forge
    libgomp                   9.3.0               h2828fa1_18    conda-forge
    liblapack                 3.9.0           3_h92ddd45_netlib    conda-forge
    libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
    ncurses                   6.2                  h58526e2_4    conda-forge
    nilearn                   0.7.0                    pypi_0    pypi
    nose                      1.3.7                    pypi_0    pypi
    numpy                     1.19.5           py36h2aa4a07_1    conda-forge
    openblas                  0.2.20                        8    conda-forge
    openssl                   1.1.1j               h7f98852_0    conda-forge
    pip                       21.0.1             pyhd8ed1ab_0    conda-forge
    pytest                    6.2.2                    pypi_0    pypi
    python                    3.6.13          hffdb5ce_0_cpython    conda-forge
    python_abi                3.6                     1_cp36m    conda-forge
    readline                  8.0                  he28a2e2_2    conda-forge
    scikit-learn              0.19.2          py36_blas_openblasha84fab4_201  [blas_openblas]  conda-forge
    scipy                     1.5.3            py36h9e8f40b_0    conda-forge
    seaborn                   0.11.1                   pypi_0    pypi
    setuptools                49.6.0           py36h5fab9bb_3    conda-forge
    skggm                     0.2.8                    pypi_0    pypi
    sqlite                    3.34.0               h74cdb3f_0    conda-forge
    tk                        8.6.10               h21135ba_1    conda-forge
    wheel                     0.36.2             pyhd3deb0d_0    conda-forge
    xz                        5.2.5                h516909a_1    conda-forge
    zlib                      1.2.11            h516909a_1010    conda-forge
    
    opened by camiel-m 2
  • Update travis for python 3.7 and 3.8

    Update travis for python 3.7 and 3.8

    • [ ] Fix the build for xenial and later distributions. Likely need to eliminate installation of libatlas3gf-base. Also see updates to travis configuration for sklearn 0.22
    • [ ] Add support for python 3.7
    • [ ] Add support for python 3.8
    sklearn cleanup 
    opened by mnarayan 1
  • ModelAverage common test fails assertion on  check_no_attributes_set_in_init

    ModelAverage common test fails assertion on check_no_attributes_set_in_init

    I have decided to try to fix the #124 I have independently encountered while working with metric-learn package. I have fixed the joblib import issue intofrom sklearn.utils._joblib import delayed, Parallel, I have also fixed the sklearn GraphLassoCV import into GraphicalLassoCV import.

    So far so good.

    But tests failed in three cases. One of them is the mentioned issue with ModelAverage class. The problematic line in the init method is this one https://github.com/skggm/skggm/blob/a0ed406586c4364ea3297a658f415e13b5cbdaf8/inverse_covariance/model_average.py#L322

    The error messge is this one: AssertionError: Estimator ModelAverage should not set any attribute apart from parameters during init. Found attributes ['prng']. This is, where I have stopped.

    bug 
    opened by BBQing 1
  • 'Don`t know how to compile inverse_covariance/pyquic/QUIC.C'

    'Don`t know how to compile inverse_covariance/pyquic/QUIC.C'

    Hello Recently I installed a c++ build tool through a Visual study installer to install the skggm in my window 10 environment. And as a result of the pip install skggm with anaconda, I got the above error message. Do you have any tips for solving this situation?

    opened by Ahra-Do 8
  • Update readme comparison chart

    Update readme comparison chart

    Might want to up our documentation of differences between us and sklearn. Support for randomized lasso has been eliminated from scikit-learn. They think it is too unreliable and think that rescaling the design matrix is equivalent to putting adaptive penalties in the regularizer. But these are not equivalent operations. I suspect the difference is related to sparsity vs. co-sparsity. So this gives our implementation an advantage.

    @mnarayan can you provide me with the changes you desire and I'll update

    opened by jasonlaska 1
Releases(0.2.8)
  • 0.2.8(Sep 12, 2018)

    • update sklearn requirements to be greater than 0.19, conform to stricter interface requirements,
    • remove custom RepeatedKFold cross-validation in favor of sklearn supported (see https://github.com/skggm/skggm/pull/115/files#diff-998d139e7566f5a1ea43053260dab898L628 and https://github.com/skggm/skggm/pull/115/files#diff-998d139e7566f5a1ea43053260dab898L665) for usage changes if you were importing this directly
    • applies black autoformatting moving forward (https://github.com/ambv/black)
    • rename QuicGraphLasso prefix to QuicGraphicalLasso for future compatibility with sklearn changes. Old interface still available but will warn about deprecation.
    Source code(tar.gz)
    Source code(zip)
  • 0.2.7(Jul 16, 2017)

    New in this version:

    • python3 support
    • Adds alternatives to np.corrcoef and np.cov to initialize sample covariance, namely the spearman rank correlation and kendall's tau concordance correlation
    • Config for Travis continuous integration testing on repo
    Source code(tar.gz)
    Source code(zip)
  • 0.2.6(Dec 9, 2016)

    Fixes include:

    • AdaptiveGraphLasso doesn't break when passing in an estimator with a sparkContext
    • Better results and debugging with estimator_suite_spark.py
    • Sets default ModelAverage estimator to QuicGraphLasso instead of cross-validation version (much faster).
    Source code(tar.gz)
    Source code(zip)
  • 0.2.5(Dec 9, 2016)

    This release upgrades

    • MonteCarloProfile in inverse_covariance.profiling
    • ModelAverage
    • QuicGraphLassoCV

    to support naive parallelization via a sparkContext if instantiated with the parameter sc.

    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Nov 28, 2016)

    Improvements to inverse_covariance

    • New RepeatedKFold cross-validation class which generates multiple re-shuffled k-fold datasets. This technique is now used by default in QuicGraphLassoCV. Read about the new options here: https://github.com/skggm/skggm/blob/0.2.0/inverse_covariance/quic_graph_lasso.py#L402-L410

    Major update to the inverse_covariance.profiling submodule

    Includes new initial tools for profiling methods. Specifically:

    1. MonteCarloProfile: A workshop to measure the performance of an estimator on multivariate normal samples, given a graph generator (that generates covariance, precision, and adjacency matrices), and a set of metrics to compute in each trial.
    2. Graph: Base class and utilities to build common sparse graphs
    3. Specific graph generator classes: LatticeGraph, ClusterGraph, and ErdosRenyiGraph,
    4. Set of common metrics for profiling in inverse_covariance.profiling.metrics

    An example usage can be found in examples/profiling_example.py or in inverse_covariance/profiling/tests.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Oct 3, 2016)

    This release includes initial sklearn-compatible interface for the QUIC algorithm as well as several model selection routines. Primary classes include QuicGraphLasso, QuicGraphLassoCV, QuicGraphLassoEBIC, ModelAverage, and AdaptiveGraphLasso. We also provide some initial examples and early versions of profiling tools.

    Source code(tar.gz)
    Source code(zip)
Owner
General graphical model estimation with scikit-learn model API
Fast solver for L1-type problems: Lasso, sparse Logisitic regression, Group Lasso, weighted Lasso, Multitask Lasso, etc.

celer Fast algorithm to solve Lasso-like problems with dual extrapolation. Currently, the package handles the following problems: Lasso weighted Lasso

168 Dec 13, 2022
Topological Data Analysis for Python🐍

Scikit-TDA is a home for Topological Data Analysis Python libraries intended for non-topologists. This project aims to provide a curated library of TD

Scikit-TDA 373 Dec 24, 2022
Extra blocks for scikit-learn pipelines.

scikit-lego We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to atte

vincent d warmerdam 941 Dec 30, 2022
Data Analysis Baseline Library

dabl The data analysis baseline library. "Mr Sanchez, are you a data scientist?" "I dabl, Mr president." Find more information on the website. State o

Andreas Mueller 122 Dec 27, 2022
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

803 Jan 05, 2023
scikit-learn inspired API for CRFsuite

sklearn-crfsuite sklearn-crfsuite is a thin CRFsuite (python-crfsuite) wrapper which provides interface simlar to scikit-learn. sklearn_crfsuite.CRF i

418 Jan 09, 2023
machine learning with logical rules in Python

skope-rules Skope-rules is a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license. Skope-rules a

504 Dec 31, 2022
A Python library for dynamic classifier and ensemble selection

DESlib DESlib is an easy-to-use ensemble learning library focused on the implementation of the state-of-the-art techniques for dynamic classifier and

425 Dec 18, 2022
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

imbalanced-learn imbalanced-learn is a python package offering a number of re-sampling techniques commonly used in datasets showing strong between-cla

6.2k Jan 01, 2023
Multivariate imputation and matrix completion algorithms implemented in Python

A variety of matrix completion and imputation algorithms implemented in Python 3.6. To install: pip install fancyimpute Do not use conda. We don't sup

Alex Rubinsteyn 1.1k Dec 18, 2022
scikit-learn cross validators for iterative stratification of multilabel data

iterative-stratification iterative-stratification is a project that provides scikit-learn compatible cross validators with stratification for multilab

745 Jan 05, 2023
Large-scale linear classification, regression and ranking in Python

lightning lightning is a library for large-scale linear classification, regression and ranking in Python. Highlights: follows the scikit-learn API con

1.6k Dec 31, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 01, 2023
Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 02, 2023
A library of sklearn compatible categorical variable encoders

Categorical Encoding Methods A set of scikit-learn-style transformers for encoding categorical variables into numeric by means of different techniques

2.1k Jan 02, 2023
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2021 Links Doc

Sebastian Raschka 4.2k Dec 28, 2022
(AAAI' 20) A Python Toolbox for Machine Learning Model Combination

combo: A Python Toolbox for Machine Learning Model Combination Deployment & Documentation & Stats Build Status & Coverage & Maintainability & License

Yue Zhao 606 Dec 21, 2022