[HELP REQUESTED] Generalized Additive Models in Python

Overview

Build Status Documentation Status PyPI version codecov python27 python36 DOI

pyGAM

Generalized Additive Models in Python.

Documentation

Installation

pip install pygam

scikit-sparse

To speed up optimization on large models with constraints, it helps to have scikit-sparse installed because it contains a slightly faster, sparse version of Cholesky factorization. The import from scikit-sparse references nose, so you'll need that too.

The easiest way is to use Conda:
conda install -c conda-forge scikit-sparse nose

scikit-sparse docs

Contributing - HELP REQUESTED

Contributions are most welcome!

You can help pyGAM in many ways including:

  • Working on a known bug.
  • Trying it out and reporting bugs or what was difficult.
  • Helping improve the documentation.
  • Writing new distributions, and link functions.
  • If you need some ideas, please take a look at the issues.

To start:

  • fork the project and cut a new branch
  • Now install the testing dependencies
conda install pytest numpy pandas scipy pytest-cov cython
pip install --upgrade pip
pip install -r requirements.txt

It helps to add a sym-link of the forked project to your python path. To do this, you should install flit:

  • pip install flit
  • Then from main project folder (ie .../pyGAM) do: flit install -s

Make some changes and write a test...

  • Test your contribution (eg from the .../pyGAM): py.test -s
  • When you are happy with your changes, make a pull request into the master branch of the main project.

About

Generalized Additive Models (GAMs) are smooth semi-parametric models of the form:

alt tag

where X.T = [X_1, X_2, ..., X_p] are independent variables, y is the dependent variable, and g() is the link function that relates our predictor variables to the expected value of the dependent variable.

The feature functions f_i() are built using penalized B splines, which allow us to automatically model non-linear relationships without having to manually try out many different transformations on each variable.

GAMs extend generalized linear models by allowing non-linear functions of features while maintaining additivity. Since the model is additive, it is easy to examine the effect of each X_i on Y individually while holding all other predictors constant.

The result is a very flexible model, where it is easy to incorporate prior knowledge and control overfitting.

Citing pyGAM

Please consider citing pyGAM if it has helped you in your research or work:

Daniel Servén, & Charlie Brummitt. (2018, March 27). pyGAM: Generalized Additive Models in Python. Zenodo. DOI: 10.5281/zenodo.1208723

BibTex:

@misc{daniel\_serven\_2018_1208723,
  author       = {Daniel Servén and
                  Charlie Brummitt},
  title        = {pyGAM: Generalized Additive Models in Python},
  month        = mar,
  year         = 2018,
  doi          = {10.5281/zenodo.1208723},
  url          = {https://doi.org/10.5281/zenodo.1208723}
}

References

  1. Simon N. Wood, 2006
    Generalized Additive Models: an introduction with R

  2. Hastie, Tibshirani, Friedman
    The Elements of Statistical Learning
    http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf

  3. James, Witten, Hastie and Tibshirani
    An Introduction to Statistical Learning
    http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf

  4. Paul Eilers & Brian Marx, 1996 Flexible Smoothing with B-splines and Penalties http://www.stat.washington.edu/courses/stat527/s13/readings/EilersMarx_StatSci_1996.pdf

  5. Kim Larsen, 2015
    GAM: The Predictive Modeling Silver Bullet
    http://multithreaded.stitchfix.com/assets/files/gam.pdf

  6. Deva Ramanan, 2008
    UCI Machine Learning: Notes on IRLS
    http://www.ics.uci.edu/~dramanan/teaching/ics273a_winter08/homework/irls_notes.pdf

  7. Paul Eilers & Brian Marx, 2015
    International Biometric Society: A Crash Course on P-splines
    http://www.ibschannel2015.nl/project/userfiles/Crash_course_handout.pdf

  8. Keiding, Niels, 1991
    Age-specific incidence and prevalence: a statistical perspective

Comments
  • [WIP] Simulate from posterior of the coefficients and smoothing parameters

    [WIP] Simulate from posterior of the coefficients and smoothing parameters

    Summary

    This is a work in progress to implement #113. This PR implements the following:

    1. a new method for GAM called simulate_from_coef_posterior_conditioned_on_smoothing_parameters that simulates from the posterior distribution over the parameters conditioned on the smoothing parameters using np.random.multivariate_normal(self.coef_, self.statistics_['cov'], size=n_simulations).
    2. a new method for every distribution called sample that draws random samples from the given array of expected values.

    Some questions to consider

    1. Right now, Distribution.sample does not take a size argument. In every sample method, the size argument is always None, so that numpy simply broadcasts across the array mu and draws one random sample for each mu. This is all we need if we just use distribution.sample(mu) from the method GAM.simulate_from_coef_posterior_conditioned_on_smoothing_parameters: we draw a certain number of random samples of the coefficients, and for each of those samples of the coefficients we draw one sample from the distribution of the response. Perhaps one may want to add more control over how many samples from the response distribution one would like for each sample from the posterior of the coefficients.
    2. For the hepatitis_A_bulgaria dataset with a LinearGAM, simulate_from_coef_posterior_conditioned_on_smoothing_parameters gives the warning RuntimeWarning: covariance is not positive-semidefinite self.coef_, self.statistics_['cov'], size=n_simulations). (This happens no matter whether constraints is 'monotonic_inc', 'concave', or None.) The method still returns random samples. The default behavior of numpy.random.multivariate_normal is to raise a warning (check_valid='warn'). Is this behavior good enough? I am not familiar with how severe this problem is.
    3. Should we rename simulate_from_coef_posterior_conditioned_on_smoothing_parameters? It's long but descriptive. It'd be nice to have it called just simulate_from_posterior, and if we implement the bootstrap samples to simulate from the distribution over the smoothing parameters, too, then that could become just an optional argument in this catch-all method simulate_from_posterior.
    4. I made sample an abstract method for the Distribution class. Feel free to remove that abstract method and the two imports (from abc import ABCMeta, abstractmethod) if you don't want to bother to force subclasses of Distribution to implement these methods (because we don't expect new distributions to be written?).
    5. simulate_from_coef_posterior_conditioned_on_smoothing_parameters checks that the model is already fitted and that n_simulations is not <= 0. I also copied and pasted the code for X = check_X(X, ...). Is this checking good enough?
    6. Should we write tests? We could verify shapes of return values and maybe also verify the random samples are close enough to what we expect.
    7. Should we implement bootstrap samples to get some uncertainty over the smoothing parameters? That may be somewhat involved to make a good API, given how computationally expensive it could be, so perhaps that is left for another PR.

    Example

    Here's an example of simulate_from_coef_posterior_conditioned_on_smoothing_parameters on the first example in the README:

    example

    The extra code added was:

    response_simulations = gam.simulate_from_coef_posterior_conditioned_on_smoothing_parameters(
        XX, 100)
    for response in response_simulations:
        plt.scatter(XX, response, alpha=.01, color='k')
    

    The light-opacity black disks are random samples from the posterior.

    I made the first axis of the return value the simulations, so you can loop over simulations with response in response_simulations, or compute averages across simulations with np.mean(response_simulations, axis=0), and so on. This convention (that axis 0 is the simulation index) matches that used by PyMC3's sample function.

    opened by cbrummitt 32
  • PoissonGAM fails with dimension mismatch warning depending on n_splines

    PoissonGAM fails with dimension mismatch warning depending on n_splines

    Using a grid search and several options for n_splines, some fits fail due to a dimension mismatch.

    gam = PoissonGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))
    
    ....
    
      return (mu**y) * np.exp(-mu) / sp.misc.factorial(y)
     50% (3 of 6) |#############              | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (120,240) and (239,120) not aligned: 240 (dim 1) != 239 (dim 0)
    on model:
    PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
       constraints=None, dtype='numerical', fit_intercept=True, 
       fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
       n_splines=7, penalties='auto', spline_order=3, tol=0.0001)
    skipping...
    
      warnings.warn(msg)
     66% (4 of 6) |##################         | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (137,260) and (259,123) not aligned: 260 (dim 1) != 259 (dim 0)
    on model:
    PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
       constraints=None, dtype='numerical', fit_intercept=True, 
       fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
       n_splines=8, penalties='auto', spline_order=3, tol=0.0001)
    skipping...
    
    ...
    
    

    Training a LinearGAM model using the same dataset and grid search options does not give rise to the same error.

    gam = LinearGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))
    100% (6 of 6) |###########################| Elapsed Time: 0:00:00 Time: 0:00:00
    

    Does this occur because there are more coefficients than data? If so, a more informative warning would be helpful.

    bug 
    opened by maxpagels 18
  • any offsets?

    any offsets?

    Hi, First of all, this is a great package! Is it possible to declare an offset or exposure variable? Meaning: a regressor with coefficient fixed to 1.

    opened by ric70x7 14
  • scikit-sparse installed but not detected?

    scikit-sparse installed but not detected?

    I'm trying to use LogisticGAM with a really sparse pandas dataframe. But I'm getting this warning:

    /home/echo66/.local/share/virtualenvs/pygam-505CBPMV/lib/python3.6/site-packages/pygam/utils.py:78: UserWarning: Could not import Scikit-Sparse or Suite-Sparse.
    This will slow down optimization for models with monotonicity/convexity penalties and many splines.
    See installation instructions for installing Scikit-Sparse and Suite-Sparse via Conda.
      warnings.warn(msg)
    
    opened by echo66 11
  • add constraints

    add constraints

    some penalties should really be constraints. for example, monotonic smoothing and harmonic smoothing should be hard constraints.

    perhaps they might also be added as penalties, but a basic application would be as constraints.

    opened by dswah 10
  • How to get rid of the progress bar in pyGAM program?

    How to get rid of the progress bar in pyGAM program?

    Very good and effective library!
    I wanted to embed 'pyGAM' into my own python program, but it would have a progress bar in each loop, so that it would take a long time to run. If I can remove it and not let it run, the program will run very fast. (The screenshot of issue is below.) So, how can I get rid of progress bar in your pyGAM program without changing other functions ? Coul you help me? Thanks in advance!

    issue

    enhancement good first issue 
    opened by sunshine1204 9
  • Add method for simulating from the posterior (or just add an example to the documentation)

    Add method for simulating from the posterior (or just add an example to the documentation)

    Estimating the mean and confidence intervals (using prediction_intervals) is great. In some cases, it can be useful to simulate from the posterior distribution of the model's coefficients. An example is given in pages 242–243 of [1].

    I think the following code snippet does the trick for a LinearGAM:

    def simulate_from_posterior(linear_gam, X, n_simulations):
        """Simulate from the posterior of a LinearGAM a certain number of times.
    
        Inputs
        ------
        linear_gam : pyGAM.LinearGAM
    
        X : array of shape (n_samples, n_features)
    
        n_simulations : int
            The number of simulations from the posterior to compute
    
        Returns
        -------
        simulations : array of shape (n_samples, n_simulations)
        """
        beta_replicates = np.random.multivariate_normal(
            linear_gam.coef_, linear_gam.statistics_['cov'], size=n_simulations)
        return linear_gam._modelmat(X).dot(beta_replicates.T)
    

    I'm not sure if this should be added as an example in the documentation or added to the code (or both).

    To implement this in general, I think we'd want to add a method for each Distribution that draws a certain number of samples (called sample or random_variate?), so we'd have a NormalDist.sample, BinomialDist.sample, and so on. Then the GAM.simulate could just call self.dist.sample(self.coef_, self.statistics['cov'], size=n_simulations)? I'm not sure yet how to best handle the link functions for these simulations...

    As pointed out on pages 256–257 of [1], this procedure simulates the coefficients conditioned on the smoothing parameters, lambda (lam). To actually simulate from the coefficients, one may use bootstrap samples to get simulations of the coefficients and of the smoothing parameters; an example is given on page 257 of [1].

    [1] S. Wood. Generalized Additive Models: An Introduction with R (First Edition). Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2006.

    opened by cbrummitt 9
  • Added Sphinx-based documentation, updated requirements.txt

    Added Sphinx-based documentation, updated requirements.txt

    Hi, I discovered your project last week and have already used it to great effect, and thought you might like some docs. :smile:

    Commit message is below; I've hosted a copy of the HTML docs here, if you think this is worthwhile it's probably best to use readthedocs (hence my choice of theme). A few more details on the changes:

    • I recreated the text and code from the README as a jupyter notebook; only the image pygam_basis.png needed including as all the others were generated in the code
      • It's easy to add a link to the notebook's source on github from the notebook itself, so people can download and run it themselves
      • This uses nbsphinx, which is awesome
    • The API docs are largely generated with autodoc, which uses .. autoX:: directives; in the cases of links and callbacks I documented each class individually; for distributions and penalties you can see these are done at the module level with automodule::.
    • Numpy-style docstrings are a great choice--they mean you can use napoleon to generate documentation from them; the scikit-learn docs are a great example of what can be created automatically from the docstrings.

    There's a lot more to be done, in particular incorporating points from Pablo's post in more involved examples for classification and regression, deciding a better structure for the documentation overall, and of course working on the docstrings in the code. I didn't want to spend too much time on it before checking that you thought it worthwhile!

    Commit message follows


    Sphinx-based documentation has been added to the project; this comes from two main sources:

    • Main modules added using autodoc and napoleon
      • These are split into two sections, GAM classes and other helper classes and functions
      • Each GAM has its own page, while the helper classes and functions are grouped by module
      • Minor changes to docstrings in pygam.py to fix formatting issues in the output docs; there are a lot more of these to tackle but in general everything is pretty readable
    • The text and code from the README as a jupyter notebook, imported using nbsphinx

    Updated requirements.txt with a Documentation section, detailing the different modules necessary to build the documentation.

    opened by badge 8
  • LogisticGAM not converging

    LogisticGAM not converging

    I keep getting the error below when trying to train a LogisticGAM. This error does not happen when I use sklearn LogisticRegression or RandomForest.

    /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/links.py:149: RuntimeWarning: divide by zero encountered in true_divide return dist.levels/(mu*(dist.levels - mu)) /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/pygam.py:888: RuntimeWarning: invalid value encountered in multiply return sp.sparse.diags((self.link.gradient(mu, self.distribution)2 * self.distribution.V(mu=mu))-0.5) /opt/anaconda/envs/heartfailureNN/lib/python3.5/site-packages/pygam/pygam.py:907: RuntimeWarning: invalid value encountered in greater_equal mask = (np.abs(weights) >= np.sqrt(EPS)) * (weights != np.nan)

    opened by jeweinberg 8
  • Score method

    Score method

    Hi @dswah, I have added a GAM.score method to the base GAM class following from issue #102. I have also added a simple test for the score method to quickly check that it the score method without crashing and checks that the score which is R^2 is <=1.

    Currently it only calculates the R^2 which is the default score in scikit-learn for regression models. I can also add the accuracy score specifically to the LogisticGam class.

    I am quite new to github and this whole open source concept, so I would appreciate any feedback! Let me know if the code makes sense or if I am completely off. Thanks!

    opened by JodesL 7
  • constraint doesn't work for s() when there is tensor term te() in the model

    constraint doesn't work for s() when there is tensor term te() in the model

    using the "chicago example"

    from pygam import PoissonGAM, s, te from pygam.datasets import chicago X, y = chicago(return_X_y=True) gam = PoissonGAM(s(0, n_splines=200) + te(3, 1) + s(2)).fit(X, y) gam_test = PoissonGAM(s(0, constraints='monotonic_inc') + te(3, 1) + s(2)).fit(X, y)


    TypeError Traceback (most recent call last) in () 6 gam3 = PoissonGAM(s(0, n_splines=200) + te(3, 1) + s(2)).fit(X, y) 7 ----> 8 gam_test = PoissonGAM(s(0, constraints='monotonic_inc') + te(3, 1) + s(2)).fit(X, y)

    ...

    anaconda/lib/python2.7/site-packages/pygam/terms.py in build_constraints(self, coef, constraint_lam, constraint_l2) 363 if constraint is None: 364 constraint = 'none' --> 365 if constraint in CONSTRAINTS: 366 constraint = CONSTRAINTS[constraint] 367

    TypeError: unhashable type: 'list'

    bug 
    opened by jzang18 7
  • When Creating X Grid for 3D Plotting or Derivative Calculation, How Can We Include Exposure?

    When Creating X Grid for 3D Plotting or Derivative Calculation, How Can We Include Exposure?

    I'm having trouble understanding how to incorporate exposure when predicting on an X grid. The grid is needed when plotting or taking derivatives. Any advice or guidance here would be appreciated!

    opened by eddietaylor 0
  • One-hot encoding factor term

    One-hot encoding factor term

    Hi,

    I have an array as: [[ '1234' 0.123 'GitHub']]

    and I want to pass the third feature as a factor term f(2, coding = 'one-hot'). However, the encoding fails returning this error:

    ValueError: X data must be type int or float, but found type: <class 'numpy.object_'> Try transforming data with a LabelEncoder first. as a consequence of utils.check_array

    Any suggestion to overcome this issue?

    opened by ilagith 0
  • Pass callbacks argument to LinearGAM super init - fixes #291

    Pass callbacks argument to LinearGAM super init - fixes #291

    callbacks argument to be passed to LinearGAM's super().__init__() call.

    This basically addresses issue #291 regarding cloning LinearGAM estimator with sklearn functions, like cross_validate.

    opened by miguelfmc 0
  • Can't use flit to install pygam - pyproject.toml does not exist

    Can't use flit to install pygam - pyproject.toml does not exist

    I have been trying to install pygam from source using flit, as indicated in the docs, but I am unable to do so.

    I have been running the following (from the project's root directory):

    pip install flit flit install -s

    And after running flit install I get the following error:

    Config file pyproject.toml does not exist

    So it seems like the config toml file for installation is missing.

    Environment details Python 3.6.13 Flit 3.7.1

    opened by miguelfmc 1
  • Difference between prediction intervals and partial dependence (question)

    Difference between prediction intervals and partial dependence (question)

    Hi, thanks for the great package Unfortunately, I couldn't fully understand the difference between prediction intervals and partial dependence when I have a model of: y~s(0) meaning only one feature. These two functions produce different intervals with the same std requested (.95) Thanks in advance, Yifat

    opened by Yifath7 0
Releases(v0.8.0)
  • v0.8.0(Oct 31, 2018)

    New Features

    • cyclic p-splines: you can now train models with periodic features by using the 'cp' basis like so:
    GAM(s(0, basis='cp'))
    
    • factor smooths now allow dummy coding, via:
    GAM(f(0, coding='dummy'))
    

    Models using this coding scheme are more statistically interpretable , and computationally less expensive than those using one-hot encodings.

    Bug Fixes

    • models can mix constrained terms and un-constrained tensor-terms
    • tensor terms can be constrained
    Source code(tar.gz)
    Source code(zip)
  • v0.7.2(Oct 29, 2018)

    Bug Fixes

    • Fix not None element existance judgement bug in terms.py thanks @BeefOnionDumplings !
    • Added a warning issued in summary indicating that there is likely a bug in the p-values
    Source code(tar.gz)
    Source code(zip)
  • v0.7.1(Sep 22, 2018)

    Bug Fixes

    • fixed bug where np.int64 did not count as integers. the following no longer fails:
    LinearGAM().gridsearch(X, y, n_splines=np.arange(5, 10)).summary()
    
    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Sep 20, 2018)

  • v0.6.3(Sep 17, 2018)

    New Features

    • gridsearch(...) allows searching across a predefined grid of points, without doing the cartesian product, when grid is a np.ndarray of shape (n_points, len(flatten(gam.lam))). This is useful for RandomSearchCV - style behavior.

    Bug Fixes

    • estimate_r_squared(X, y) no longer raises AttributeError
    • dtype=auto no longer allowed for terms
    • intercept.lam = None
    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Sep 13, 2018)

    New Features

    • easier global arguments for terms
    GAM(s(0) + s(1), n_splines=10).fit(X, y)
    

    will broadcast n_splines=10 to all terms

    Bug Fixes

    • fixed inconsistencies in GAM instatiation, where
    GAM(lam=0.6).gridsearch(X, y)
    

    worked for multi-dimensional X

    but not

    GAM(lam=0.6).gridsearch(X, y)
    
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Sep 9, 2018)

    New Features

    • tensor product terms and feature interactions. On top of that, construction is more precise and less verbose:
    GAM(te(0, s(1, n_splines=5))).fit(X, y)
    
    • the partial_dependence() method can return meshgrids to help you make 3D plots of interaction terms
    • ExpectileGAM: for creating a non-parametric description of a distribution. Instead of just modeling the mean of a response, we can model any quantile using
    ExpectileGAM().fit_quantile(X, y, quantile=0.25) 
    

    Breaking Changes

    • GAM construction is different but much simpler. check out the docstrings for help.
    • generate_X_grid and partial_dependence methods require you to specify term= instead of ~feature=~
    Source code(tar.gz)
    Source code(zip)
  • v0.5.5(Jul 7, 2018)

    New Featrues

    • all GAM classes have a verbose argument. this makes them compatible with sklearn GridsearchCV + RandomizedsearchCV
    • add toy_classification dataset
    • move generate_X_grid to GAM method

    Bug Fixes

    • users should get a more pythonic experience with partial_dependence by never needing to index with i+1
    • _initial_estimate() method no longer fails on value nudge for purely integer observations
    • regenerate images
    • bugs in readme
    • fixes bug where poorly conditioned matrix would fail when using skcholmod
    • make2d should not be verbose in initial_estimate()
    Source code(tar.gz)
    Source code(zip)
  • v0.5.4(Jun 29, 2018)

    Bug Fixes

    • PoissonGAM no longer produces -inf log-likelihoods when using non-integer exposure.
    • PoissonGAM checks exposure, weight, and y array shapes before fitting.
    Source code(tar.gz)
    Source code(zip)
  • v0.5.3(Jun 28, 2018)

    Bug Fixes

    • datasets are loadable like:
    from pygam.datasets load cake
    X, y = cake(return_X_y=True)
    
    • better model initializations for complex models by using the solution to linear unpenalized problem. This makes the second order PIRLS optimizer less likely to diverge by overshooting the maximum likelihood estimate.
    • ReadMe call for collaboration, examples reference dataset loaders, fix typos
    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Apr 22, 2018)

    Bug Fixes

    • bug fix in p-value for models with unknown variance. f-statistic was sensitive to estimated variance when it should be invariant.
    • typos
    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Apr 6, 2018)

    New Features:

    • p-values!
    • you can now see p-values in the model summary. each feature function will have a p-value, and a code describing it's level of significance.

    image

    Bug Fixes

    • improving documentation
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Apr 4, 2018)

    Bug Fixes

    • use scipy stats log-pdfs for computing log-likelihoods
    • disable progress bars in gridsearch setting progress=False
    • add verbosity attribute to GAMs to control warnings
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Mar 27, 2018)

    Bug Fixes:

    • alow for changing SVD shapes during PIRLS iterations due to changing mask shapes
    • change coefficient initialization to constant model
    • change GammaGAM and InvGaussGAM to use non-canonical log-links by default.
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jan 22, 2018)

  • v0.3.0(Sep 15, 2017)

    New Features:

    • GAMs accept weights in fitting, gridsearch, likelihood, statistics...
    • PoissonGAM accepts exposure

    Changes

    • better handling of PIRLS weights
    • check for isfinite(...).all() in check_X, check_y

    Bug Fixes

    • constant covariates won't break SVD
    Source code(tar.gz)
    Source code(zip)
Simplify stop motion animation with machine learning.

Simplify stop motion animation with machine learning.

Nick Bild 25 Sep 15, 2022
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
Pytools is an open source library containing general machine learning and visualisation utilities for reuse

pytools is an open source library containing general machine learning and visualisation utilities for reuse, including: Basic tools for API developmen

BCG Gamma 26 Nov 06, 2022
Both social media sentiment and stock market data are crucial for stock price prediction

Relating-Social-Media-to-Stock-Movement-Public - We explore the application of Machine Learning for predicting the return of the stock by using the information of stock returns. A trading strategy ba

Vishal Singh Parmar 15 Oct 29, 2022
Implementation of deep learning models for time series in PyTorch.

List of Implementations: Currently, the reimplementation of the DeepAR paper(DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks

Yunkai Zhang 275 Dec 28, 2022
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Spark Python Notebooks This is a collection of IPython notebook/Jupyter notebooks intended to train the reader on different Apache Spark concepts, fro

Jose A Dianes 1.5k Jan 02, 2023
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
Python Machine Learning Jupyter Notebooks (ML website)

Python Machine Learning Jupyter Notebooks (ML website) Dr. Tirthajyoti Sarkar, Fremont, California (Please feel free to connect on LinkedIn here) Also

Tirthajyoti Sarkar 2.6k Jan 03, 2023
A machine learning toolkit dedicated to time-series data

tslearn The machine learning toolkit for time series analysis in Python Section Description Installation Installing the dependencies and tslearn Getti

2.3k Dec 29, 2022
MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

MBTR is a python package for multivariate boosted tree regressors trained in parameter space.

SUPSI-DACD-ISAAC 61 Dec 19, 2022
We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Salary-Prediction-with-Machine-Learning 1. Business Problem Can a machine learning project be implemented to estimate the salaries of baseball players

Ayşe Nur Türkaslan 9 Oct 14, 2022
AtsPy: Automated Time Series Models in Python (by @firmai)

Automated Time Series Models in Python (AtsPy) SSRN Report Easily develop state of the art time series models to forecast univariate data series. Simp

Derek Snow 465 Jan 02, 2023
Fit interpretable models. Explain blackbox machine learning.

InterpretML - Alpha Release In the beginning machines learned in darkness, and data scientists struggled in the void to explain them. Let there be lig

InterpretML 5.2k Jan 09, 2023
PLUR is a collection of source code datasets suitable for graph-based machine learning.

PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets suitable for graph-based machine learning. We provide scripts for downloading, processing, and loading the

Google Research 76 Nov 25, 2022
Book Item Based Collaborative Filtering

Book-Item-Based-Collaborative-Filtering Collaborative filtering methods are used

Şebnem 3 Jan 06, 2022
Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations.

BO-GP Bayesian optimization based on Gaussian processes (BO-GP) for CFD simulations. The BO-GP codes are developed using GPy and GPyOpt. The optimizer

KTH Mechanics 8 Mar 31, 2022
Project to deploy a machine learning model based on Titanic dataset from Kaggle

kaggle_titanic_deploy Project to deploy a machine learning model based on Titanic dataset from Kaggle In this project we used the Titanic dataset from

Vivian Yamassaki 8 May 23, 2022
MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees.

MooGBT is a library for Multi-objective optimization in Gradient Boosted Trees. MooGBT optimizes for multiple objectives by defining constraints on sub-objective(s) along with a primary objective. Th

Swiggy 66 Dec 06, 2022
A logistic regression model for health insurance purchasing prediction

Logistic_Regression_Model A logistic regression model for health insurance purchasing prediction This code is using these packages, so please make sur

ShawnWang 1 Nov 29, 2021
Send rockets to Mars with artificial intelligence(Genetic algorithm) in python.

Send Rockets To Mars With AI Send rockets to Mars with artificial intelligence(Genetic algorithm) in python. Tools Python 3 EasyDraw How to Play Insta

Mohammad Dori 3 Jul 15, 2022