scikit-survival is a Python module for survival analysis built on top of scikit-learn.

Overview

License readthedocs.org Digital Object Identifier (DOI)

Linux Build Status macOS Build Status Windows Build Status on AppVeyor codecov Codacy Badge

scikit-survival

scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.

About Survival Analysis

The objective in survival analysis (also referred to as time-to-event or reliability analysis) is to establish a connection between covariates and the time of an event. What makes survival analysis differ from traditional machine learning is the fact that parts of the training data can only be partially observed – they are censored.

For instance, in a clinical study, patients are often monitored for a particular time period, and events occurring in this particular period are recorded. If a patient experiences an event, the exact time of the event can be recorded – the patient’s record is uncensored. In contrast, right censored records refer to patients that remained event-free during the study period and it is unknown whether an event has or has not occurred after the study ended. Consequently, survival analysis demands for models that take this unique characteristic of such a dataset into account.

Requirements

  • Python 3.7 or later
  • ecos
  • joblib
  • numexpr
  • numpy 1.16 or later
  • osqp
  • pandas 0.25 or later
  • scikit-learn 0.24
  • scipy 1.0 or later
  • C/C++ compiler

Installation

The easiest way to install scikit-survival is to use Anaconda by running:

conda install -c sebp scikit-survival

Alternatively, you can install scikit-survival from source following this guide.

Examples

The user guide provides in-depth information on the key concepts of scikit-survival, an overview of available survival models, and hands-on examples in the form of Jupyter notebooks.

Help and Support

Documentation

Bug reports

  • If you encountered a problem, please submit a bug report.

Questions

  • If you have a question on how to use scikit-survival, please use GitHub Discussions.
  • For general theoretical or methodological questions on survival analysis, please use Cross Validated.

Contributing

New contributors are always welcome. Please have a look at the contributing guidelines on how to get started and to make sure your code complies with our guidelines.

References

Please cite the following paper if you are using scikit-survival.

S. Pölsterl, "scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn," Journal of Machine Learning Research, vol. 21, no. 212, pp. 1–6, 2020.
@article{sksurv,
  author  = {Sebastian P{\"o}lsterl},
  title   = {scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {212},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v21/20-729.html}
}
Comments
  • CoxPH SurvivalAnalysis and Singular Matrix Error

    CoxPH SurvivalAnalysis and Singular Matrix Error

    I'm going through the tutorial using the veterans lung cancer study and I am using the same code for my own dataset for Cox regression. My problem is to calculating the days to graft failure after a transplant and the dataset has about 900 features after encoding and other preprocessing steps and it has 130K rows. I prepared data for Cox regression (data_x is a dataframe and data_y is a numpy array of status and suvival_in_days) and took a sample of it to run. However when I run the CoxRegression, I am getting the error of: LinAlgError:Matrix is Singular I manipulated my data in different ways, but I could not understand what is the problem and how to solve it.

    awaiting response 
    opened by sarahysh12 22
  • Explain how to interpret output of .predict() in API doc

    Explain how to interpret output of .predict() in API doc

    (I also posted this as a question on Stack Overflow: https://stackoverflow.com/q/47274356/1870832 )

    I'm confused how to interpret the output of .predict from a fitted CoxnetSurvivalAnalysis model in scikit-survival. I've read through the notebook Intro to Survival Analysis in scikit-survival and the API reference, but can't find an explanation. Below is a minimal example of what leads to my confusion:

    import pandas as pd
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.linear_model import CoxnetSurvivalAnalysis
    
    # load data
    data_X, data_y = load_veterans_lung_cancer()
    
    # one-hot-encode categorical columns in X
    categorical_cols = ['Celltype', 'Prior_therapy', 'Treatment']
    
    X = data_X.copy()
    for c in categorical_cols:
        dummy_matrix = pd.get_dummies(X[c], prefix=c, drop_first=False)
        X = pd.concat([X, dummy_matrix], axis=1).drop(c, axis=1)
    
    # display final X to fit Cox Elastic Net model on
    del data_X
    print(X.head(3))
    
    

    so here's the X going into the model:

       Age_in_years  Celltype  Karnofsky_score  Months_from_Diagnosis  \
    0          69.0  squamous             60.0                    7.0   
    1          64.0  squamous             70.0                    5.0   
    2          38.0  squamous             60.0                    3.0   
    
      Prior_therapy Treatment  
    0            no  standard  
    1           yes  standard  
    2            no  standard  
    
    

    ...moving on to fitting model and generating predictions:

    # Fit Model
    coxnet_model = CoxnetSurvivalAnalysis()
    coxnet.fit(X, data_y)    
    
    # What are these predictions?    
    preds = coxnet.predict(X)
    
    

    preds has same number of records as X, but their values are wayyy different than the values in data_y, even when predicted on the same data they were fit on.

    print(preds.mean()) 
    print(data_y['Survival_in_days'].mean())
    

    output:

    -0.044114643249153422
    121.62773722627738
    
    

    So what exactly are preds? Clearly .predict means something pretty different here than in scikit-learn, but I can't figure out what. The API Reference says it returns "The predicted decision function," but what does that mean? And how do I get to the predicted estimate in months yhat for a given X? I'm new to survival analysis so I'm obviously missing something.

    opened by MaxPowerWasTaken 21
  • During install: error: command '/usr/bin/clang' failed with exit code 1

    During install: error: command '/usr/bin/clang' failed with exit code 1

    Python version: Python 3.10.3

    OS: OSX 12.4 (Proc: M1 chip)

    When trying to pip install (tried versions 0.17 and 0.18):

          222 warnings and 4 errors generated.
          error: command '/usr/bin/clang' failed with exit code 1
          [end of output]
    

    The errors seem to be:

          In file included from sksurv/linear_model/_coxnet.cpp:801:
          In file included from sksurv/linear_model/src/coxnet_wrapper.h:21:
          sksurv/linear_model/src/coxnet/coxnet.h:139:23: error: expected unqualified-id
                      if (!std::isfinite(exp_xw[k])) {
                                ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:753:12: error: reference to unresolved using declaration
              return isnan EIGEN_NOT_A_MACRO (x);
                     ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:738:12: error: reference to unresolved using declaration
              return isinf EIGEN_NOT_A_MACRO (x);
                     ^
    

          In file included from sksurv/linear_model/src/coxnet/coxnet.h:18:
          In file included from sksurv/linear_model/src/eigen/Eigen/Core:374:
          sksurv/linear_model/src/eigen/Eigen/src/Core/MathFunctions.h:723:12: error: reference to unresolved using declaration
              return isfinite EIGEN_NOT_A_MACRO (x);
                     ^
    

    Happy to provide more details if needed

    opened by tpilewicz 13
  • 0.12.0: from sksurv.ensemble import RandomSurvivalForest fails

    0.12.0: from sksurv.ensemble import RandomSurvivalForest fails

    Upon upgrading to 0.12.0

    >>> from sksurv.ensemble import RandomSurvivalForest
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/__init__.py", line 2, in <module>
        from .forest import RandomSurvivalForest  # noqa: F401
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/ensemble/forest.py", line 14, in <module>
        from ..tree import SurvivalTree
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/__init__.py", line 1, in <module>
        from .tree import SurvivalTree  # noqa: F401
      File "/Users/gchu/miniconda3/envs/dev/lib/python3.6/site-packages/sksurv/tree/tree.py", line 14, in <module>
        from ._criterion import LogrankCriterion
      File "_splitter.pxd", line 34, in init sksurv.tree._criterion
    ValueError: sklearn.tree._splitter.Splitter size changed, may indicate binary incompatibility. Expected 368 from C header, got 360 from PyObject
    >>>
    
    opened by gregchu 13
  • Fix a variety of build problems.

    Fix a variety of build problems.

    Checklist

    • [x] py.test passes
    • [x] documentation renders correctly

    What does this implement/fix? Explain your changes

    In LLVM, this project was not compiling properly. With these changes, the project seems to compile fine.

    opened by llpamies 10
  • viz of ensemble models

    viz of ensemble models

    Hi!

    would you have any advice on how to visualize decision path / decision trees from the ensemble survival model methods (either RF or Gradient Boosting)?

    opened by ad05bzag 10
  • Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

    Different results of CoxPHSurvivalAnalysis and CoxnetSurvivalAnalysis

    The documentation of CoxPHSurvivalAnalysis says:

    Cox proportional hazards model.

    And the documentation of CoxnetSurvivalAnalysis says:

    Cox's proportional hazard's model with elastic net penalty.

    So I assume the two classes implement the same model, and should return the same results when set with the same model parameters and given the same data. However, I see different results. Why? Also, what are the differences between them?

    Codes:

    from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.preprocessing import OneHotEncoder
    
    X_, y = load_veterans_lung_cancer()
    X = OneHotEncoder().fit_transform(X_)
    
    # try to match the model parameters wherever possible
    f = CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000)
    g = CoxnetSurvivalAnalysis(alphas=[0.5], alpha_min_ratio=1, n_alphas=1, 
                               l1_ratio=1e-16, tol=1e-09, normalize=False)
    
    print(f)
    print(g)
    
    f.fit(X, y)
    g.fit(X, y)
    
    print(f.coef_)
    print(g.coef_[:,0])
    

    Output:

    CoxPHSurvivalAnalysis(alpha=0.5, n_iter=100000, tol=1e-09, verbose=0)
    CoxnetSurvivalAnalysis(alpha_min_ratio=0.0001, alphas=[0.5], copy_X=True,
                l1_ratio=1e-16, max_iter=100000, n_alphas=1, normalize=False,
                penalty_factor=None, tol=1e-09, verbose=False)
    [-8.34518623e-03 -7.21105070e-01 -2.80434400e-01 -1.11234345e+00
     -3.26083027e-02 -1.93213436e-04  6.22726190e-02  2.90289950e-01]
    [-0.00346722 -0.05117406  0.06044394 -0.16433136 -0.03300373  0.0003172
     -0.00881617  0.06956854]
    

    What I've gathered:

    • CoxPHSurvivalAnalysis is sksurv's own implementation of Cox Proportional Hazard model, and supports ridge (L2) regularization.
    • CoxnetSurvivalAnalysis is a wrapper of some C++ extension codes used by R's glmnet package, and supports elastic net (L1 and L2) regularization.
    • In the test files, CoxPHSurvivalAnalysis is tested with the Rossi dataset, while CoxnetSurvivalAnalysis is tested with the Breast Cancer dataset.
    • The two classes have different constructor signatures and methods (eg, only CoxPHSurvivalAnalysis has predict_survival_function).

    Will it be some nice features to have a consolidated constructor signatures and methods for the two classes? And have them tested on the same dataset, for validation or comparison?

    Thanks.

    opened by leihuang 10
  • Add `apply` and `decision_path` to `SurvivalTree`

    Add `apply` and `decision_path` to `SurvivalTree`

    Checklist

    • [x] closes #290
    • [x] py.test passes
    • [x] tests are included
    • [x] code is well formatted
    • [x] documentation renders correctly

    What does this implement/fix? Explain your changes

    Add apply and decision_path to SurvivalForest to also enable the same methods for RandomSurvivalForest and ExtraSurvivalTrees.

    opened by Vincent-Maladiere 8
  • RandomSurvivalForest - predict_survival_function

    RandomSurvivalForest - predict_survival_function

    Describe the bug

    1. I am trying to predict the survival function for my data using RandomSurvivalForest, although the class method works well, it doesn't retrieve the times for each of the steps in the survival function. Each list containing the survival function has a lenght equal or lower to the number of unique times in our "y", hence we can't deduct to what point in time each steps belongs to.

    2. Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get the following error:

    from sksurv.datasets import load_whas500
    X, y = load_whas500()
    times = sorted(np.unique(y["lenfol"])) 
    n_times = len(times) 
    # n_times =  395
    
    estimator = RandomSurvivalForest().fit(X, y)
    surv_funcs = estimator.predict_survival_function(X.iloc[:5])
    
    surv_funcs[0]
    # array([0.9975    , 0.9975    , 0.9975    , 0.9975    , 0.9975    ,
    #       0.9975    , 0.9975    , 0.995     , 0.98883333, 0.98883333,...
    
    len(surv_funcs[0])
    # 162
    
    

    Additionally, if you follow the example given in the documentation of RandomSurvivalForest, you will get an error since the result of predict_survival_function is an 1D unlike the same function used in CoxnetSurvivalAnalysis or CoxPHSurvivalAnalysis. This is the error you get:

    from sksurv.datasets import load_whas500
    X, y = load_whas500()
    estimator = RandomSurvivalForest().fit(X, y)
    surv_funcs = estimator.predict_survival_function(X.iloc[:5])
    for fn in surv_funcs:
           plt.step(fn.x, fn(fn.x), where="post")
    
    plt.ylim(0, 1)
    plt.show()
    
    AttributeError: 'numpy.ndarray' object has no attribute 'x'
    
    opened by felipe0216 8
  • Error when using PIP to install scikit-survival 0.13 that uses PEP 517

    Error when using PIP to install scikit-survival 0.13 that uses PEP 517

    Describe the bug

    A clear and concise description of what the bug is.

    Code Sample to Reproduce the Bug

    # Insert your code here that produces the bug.
    # This example should be as succinct as possible and self-contained,
    # i.e., not rely on external data.
    # We are going to copy-paste your code and we expect to get the same result as you.
    # It should run in a fresh python session, and so include all relevant imports.
    

    Expected Results A clear and concise description of what you expected to happen.

    Actual Results Please paste or specifically describe the actual output or traceback.

    Versions Please execute the following snippet and paste the output below.

    import sklearn; sklearn.show_versions()
    import sksurv; print("sksurv:", sksurv.__version__)
    import cvxopt; print("cvxopt:", cvxopt.__version__)
    import cvxpy; print("cvxpy:", cvxpy.__version__)
    import numexpr; print("numexpr:", numexpr.__version__)
    import osqp; print("osqp:", osqp.OSQP().version())
    
    opened by SurajitTest 8
  • Loss Function

    Loss Function "ipcwls" in GradientBoostingSurvivalAnalysis leads to error

    Hi

    I was trying to train a time-to-failure model using machine sensor data. I chose the loss function 'ipcwls' which as per the docs weights the observations by their censoring weights. Although I'm not aware of the thoery behind it, it seemed like a reasonable choice. But, the code fails while applying the fit() function with the error message "input contains nan infinity or a value too large for dtype float64"

    FYI, All of my X variables are scaled and they take continuous values within +-50 range. Quite a few has small values close to zero (5-6 decimal places). Is the loss function choice leading to a division by zero situation? Need some clarity on this and when this loss function should not be used.

    Thanks, Soham

    opened by Soham2112 8
  • n_iter_no_change in GradientBoostingSurvivalAnalysis

    n_iter_no_change in GradientBoostingSurvivalAnalysis

    Describe the bug

    The documentation for the parameter "n_estimators_" of GradientBoostingSurvivalAnalysis says "The number of estimators as selected by early stopping (if n_iter_no_change is specified)." However, GradientBoostingSurvivalAnalysis does not accept n_iter_no_change as an argument.

    Code Sample to Reproduce the Bug

    from sksurv.ensemble import GradientBoostingSurvivalAnalysis
    GradientBoostingSurvivalAnalysis(n_iter_no_change = 10)
    

    Actual Results

    TypeError: GradientBoostingSurvivalAnalysis.__init__() got an unexpected keyword argument 'n_iter_no_change'
    Please paste or specifically describe the actual output or traceback.
    

    Versions System: python: 3.10.8 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] machine: Linux-5.15.0-56-generic-x86_64-with-glibc2.35

    Python dependencies: sklearn: 1.2.0 pip: 22.3.1 setuptools: 65.5.0 numpy: 1.23.4 scipy: None Cython: 0.29.32 pandas: 1.5.1 matplotlib: 3.6.2 joblib: 1.2.0 threadpoolctl: 3.1.0

    opened by TristanFauvel 0
  • Added conditional property to expose time scale predictions

    Added conditional property to expose time scale predictions

    Checklist

    • [X] closes #324
    • [X] py.test passes
    • [ ] tests are included
    • [X] code is well formatted
    • [X] documentation renders correctly

    Added a decorator for properties, which are only available, if a check returns true. The decorator provided by scikit-learn only works for functions sadly.

    @sebp I am not sure what to test exactly, maybe a test which tests whether pipelines correctly patch the property and functions through? I also think this should not show up in the documentation, as it is internal?

    opened by Finesim97 5
  • SciKit-Learn Pipeline not patched with

    SciKit-Learn Pipeline not patched with "_predict_risk_score"

    Describe the bug

    In my own evaluation code I used the check for '_predict_risk_score' to see, whether models return their predictions on the time scale or risk scale, but this doesn't work, when the estimator is wrapped in a pipeline.

    # Insert your code here that produces the bug.
    from sklearn.pipeline import Pipeline
    from sksurv.linear_model.aft import IPCRidge
    from sksurv.datasets import load_veterans_lung_cancer
    from sksurv.preprocessing import OneHotEncoder
    from sksurv.base import SurvivalAnalysisMixin
    
    
    data_x, data_y = load_veterans_lung_cancer()
    
    
    data_x_prep = OneHotEncoder().fit_transform(data_x)
    model_direct = IPCRidge().fit(data_x_prep, data_y)
    
    
    pipe = Pipeline([('encode', OneHotEncoder()),
                     ('model', IPCRidge())])
    pipe.fit(data_x, data_y)
    
    
    # Are equal
    print(model_direct.predict(data_x_prep.head()))
    print(pipe.predict(data_x.head()))
    
    
    # Steal super method
    # This does not match, because ...
    print(SurvivalAnalysisMixin.score(model_direct, data_x_prep, data_y))
    print(SurvivalAnalysisMixin.score(pipe, data_x, data_y))
    
    
    # ... the property is not patched through
    # if this returns true, the scores are treated as being on the time scale
    print(not getattr(model_direct, "_predict_risk_score", True))
    print(not getattr(pipe, "_predict_risk_score", True))
    
    
    # The second one should also be true!
    

    Expected Results A Pipeline object should also have the corresponding property set, as this might break evaluation codes.

    Actual Results The property is not available. It should be possible to just add it to the __init__.py, but I am not sure, how well it works together with the @property decorator. Currently I am finishing my master thesis, but I should be able to work out a PR on the 5th of December while testing the behaviour.

    Versions (Not running the newest version cough)

    System:
        python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)  [GCC 9.4.0]
    executable: /home/jovyan/master-thesis/env/bin/python
       machine: Linux-5.10.0-15-amd64-x86_64-with-glibc2.35
    
    Python dependencies:
          sklearn: 1.1.2
              pip: 22.2.2
       setuptools: 65.4.0
            numpy: 1.23.3
            scipy: 1.9.1
           Cython: None
           pandas: 1.5.0
       matplotlib: 3.6.0
           joblib: 1.2.0
    threadpoolctl: 3.1.0
    
    Built with OpenMP: True
    
    threadpoolctl info:
           user_api: openmp
       internal_api: openmp
             prefix: libgomp
           filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
            version: None
        num_threads: 48
    
           user_api: blas
       internal_api: openblas
             prefix: libopenblas
           filepath: /home/jovyan/master-thesis/env/lib/libopenblasp-r0.3.21.so
            version: 0.3.21
    threading_layer: pthreads
       architecture: Zen
        num_threads: 48
    
           user_api: blas
       internal_api: openblas
             prefix: libopenblas
           filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-9f9f5dbc.3.18.so
            version: 0.3.18
    threading_layer: pthreads
       architecture: Zen
        num_threads: 48
    sksurv: 0.18.0
    
    enhancement 
    opened by Finesim97 2
  • Bug in nonparametric.py when calling IPCRidge

    Bug in nonparametric.py when calling IPCRidge

    Describe the bug

    Running IPCRidge hangs with the following message

    assert (Ghat > 0).all()

    and nothing after. I found that changing the option 'reverse = False' as shown down below in kaplan_meier_estimator in the function ipc_weights in the file nonparametric.py corrects the mistake. Error message:

    AssertionError                            Traceback (most recent call last)
    Input In [74], in <cell line: 5>()
          2 set_config(display="text")  # displays text representation of estimators
          4 estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
    ----> 5 estimator.fit(data_x,data_y)
    
    File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/linear_model/aft.py:90, in IPCRidge.fit(self, X, y)
         72 """Build an accelerated failure time model.
         73 
         74 Parameters
       (...)
         86 self
         87 """
         88 event, time = check_array_survival(X, y)
    ---> 90 weights = ipc_weights(event, time)
         91 super().fit(X, numpy.log(time), sample_weight=weights)
         93 return self
    
    File /opt/homebrew/Caskroom/miniforge/base/envs/teaching_env/lib/python3.10/site-packages/sksurv/nonparametric.py:323, in ipc_weights(event, time)
        320 idx = numpy.searchsorted(unique_time, time[event])
        321 Ghat = p[idx]
    --> 323 assert (Ghat > 0).all()
        325 weights = numpy.zeros(time.shape[0])
        326 weights[event] = 1.0 / Ghat
    
    AssertionError: 
    

    Code Sample to Reproduce the Bug

    used code:

    estimator = IPCRidge(alpha = 0.5,fit_intercept=True)
    estimator.fit(data_x,data_y)
    
    

    Here is what I changed in the nonparametric.py in the line unique_time, p = kaplan_meier_estimator(event, time, reverse=False) -- changed True to False

    def ipc_weights(event, time):
        """Compute inverse probability of censoring weights
    
        Parameters
        ----------
        event : array, shape = (n_samples,)
            Boolean event indicator.
    
        time : array, shape = (n_samples,)
            Time when a subject experienced an event or was censored.
    
        Returns
        -------
        weights : array, shape = (n_samples,)
            inverse probability of censoring weights
    
        See also
        --------
        CensoringDistributionEstimator
            An estimator interface for estimating inverse probability
            of censoring weights for unseen time points.
        """
        if event.all():
            return np.ones(time.shape[0])
    
        unique_time, p = kaplan_meier_estimator(event, time, reverse=False)
    
        idx = np.searchsorted(unique_time, time[event])
        Ghat = p[idx]
    
        assert (Ghat > 0).all()
    
        weights = np.zeros(time.shape[0])
        weights[event] = 1.0 / Ghat
    
        return weights
    

    Machine and packages versions used:

    Last updated: 2022-11-08T08:59:04.111247-05:00
    
    Python implementation: CPython
    Python version       : 3.10.5
    IPython version      : 8.4.0
    
    Compiler    : Clang 13.0.1 
    OS          : Darwin
    Release     : 21.6.0
    Machine     : arm64
    Processor   : arm
    CPU cores   : 10
    Architecture: 64bit
    
    matplotlib: 3.5.2
    numpy     : 1.22.4
    pandas    : 1.4.4
    json      : 2.0.9
    
    bug 
    opened by fbarfi 4
  • Suggestions for StepFunction

    Suggestions for StepFunction

    I have 2 minor suggestions for StepFunction that I would like to see:

    1. Different argument name for 'x' in init and call. In addition, current API reference is missing.
    2. Sort the arrays inside the function.

    Thanks.

    awaiting response 
    opened by drproduck 1
  • KM_variance_estimator

    KM_variance_estimator

    Checklist

    • [x] py.test passes
    • [x] tests are included
    • [x] code is well formatted
    • [ ] documentation renders correctly

    What does this implement/fix? Explain your changes

    Hi @sebp, I added the Greenwood's estimation of KM variance to nonparametric.py (this is a prerequesite for implementing some goodness-of-fit tests). NB: I ran tox -e py310-docs but for some reason the new function does not not appear in the API doc. Best,

    opened by TristanFauvel 3
Releases(v0.19.0.post1)
Owner
Sebastian Pölsterl
Sebastian Pölsterl
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
Flood modeling by 2D shallow water equation

hydraulicmodel Flood modeling by 2D shallow water equation. Refer to Hunter et al (2005), Bates et al. (2010). Diffusive wave approximation Local iner

6 Nov 30, 2022
Calculate multilateral price indices in Python (with Pandas and PySpark).

IndexNumCalc Calculate multilateral price indices using the GEKS-T (CCDI), Time Product Dummy (TPD), Time Dummy Hedonic (TDH), Geary-Khamis (GK) metho

Dr. Usman Kayani 3 Apr 27, 2022
vartests is a Python library to perform some statistic tests to evaluate Value at Risk (VaR) Models

gg I wasn't satisfied with any of the other available Gemini clients, so I wrote my own. Requires Python 3.9 (maybe older, I haven't checked) and opti

RAFAEL RODRIGUES 5 Jan 03, 2023
wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Python based Wikidata framework for easy dataframe extraction wikirepo is a Python package that provides a framework to easily source and leverage sta

Andrew Tavis McAllister 35 Jan 04, 2023
Data analysis and visualisation projects from a range of individual projects and applications

Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python

Tom Ritman-Meer 1 Jan 25, 2022
DataPrep — The easiest way to prepare data in Python

DataPrep — The easiest way to prepare data in Python

SFU Database Group 1.5k Dec 27, 2022
My first Python project is a simple Mad Libs program.

Python CLI Mad Libs Game My first Python project is a simple Mad Libs program. Mad Libs is a phrasal template word game created by Leonard Stern and R

Carson Johnson 1 Dec 10, 2021
Single-Cell Analysis in Python. Scales to >1M cells.

Scanpy – Single-Cell Analysis in Python Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It inc

Theis Lab 1.4k Jan 05, 2023
Churn prediction with PySpark

It is expected to develop a machine learning model that can predict customers who will leave the company.

3 Aug 13, 2021
Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation

Data Intelligence Applications - Online Product Advertising and Pricing with Context Generation Overview Consider the scenario in which advertisement

Manuel Bressan 2 Nov 18, 2021
Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

EMD Group 6 Jul 01, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 09, 2023
Gathering data of likes on Tinder within the past 7 days

tinder_likes_data Gathering data of Likes Sent on Tinder within the past 7 days. Versions November 25th, 2021 - Functionality to get the name and age

Alex Carter 12 Jan 05, 2023
A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

TennisBusinessIntelligenceProject - A project consists in a set of assignements corresponding to a BI process: data integration, construction of an OLAP cube, qurying of a OPLAP cube and reporting.

carlo paladino 1 Jan 02, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
Random dataframe and database table generator

Random database/dataframe generator Authored and maintained by Dr. Tirthajyoti Sarkar, Fremont, USA Introduction Often, beginners in SQL or data scien

Tirthajyoti Sarkar 249 Jan 08, 2023
NFCDS Workshop Beginners Guide Bioinformatics Data Analysis

Genomics Workshop FIXME: overview of workshop Code of Conduct All participants s

Elizabeth Brooks 2 Jun 13, 2022
Retentioneering 581 Jan 07, 2023