Survival analysis in Python

Last update: Jan 08, 2023

Overview

What is survival analysis and why should I learn it? Survival analysis was originally developed and applied heavily by the actuarial and medical community. Its purpose was to answer why do events occur now versus later under uncertainty (where events might refer to deaths, disease remission, etc.). This is great for researchers who are interested in measuring lifetimes: they can answer questions like what factors might influence deaths?

But outside of medicine and actuarial science, there are many other interesting and exciting applications of survival analysis. For example:

SaaS providers are interested in measuring subscriber lifetimes, or time to some first action
inventory stock out is a censoring event for true "demand" of a good.
sociologists are interested in measuring political parties' lifetimes, or relationships, or marriages
A/B tests to determine how long it takes different groups to perform an action.

lifelines is a pure Python implementation of the best parts of survival analysis.

Documentation and intro to survival analysis

If you are new to survival analysis, wondering why it is useful, or are interested in lifelines examples, API, and syntax, please read the Documentation and Tutorials page

Contact

Start a conversation in our Discussions room.
Some users have posted common questions at stats.stackexchange.com
creating an issue in the Github repository.

Roadmap

You can find the roadmap for lifelines here.

Development

See our Contributing guidelines.

Comments

Some failing CoxPH tests
Noticed some strange behavior with CoxPH. I always have normalized data, which did not seem to work good with the implementation. The failures here seem unrelated to that though.

The added tests currently fail with the following trace:

Traceback (most recent call last): File "/home/jonas/workspacepython/lifelines/lifelines/tests/test_suite.py", line 937, in test_crossval_normalized event_col='E', k=3) File "/home/jonas/workspacepython/lifelines/lifelines/utils.py", line 311, in k_fold_cross_validation fitter.fit(training_data, duration_col=duration_col, event_col=event_col) File "/home/jonas/workspacepython/lifelines/lifelines/estimation.py", line 998, in fit include_likelihood=include_likelihood) File "/home/jonas/workspacepython/lifelines/lifelines/estimation.py", line 938, in _newton_rhaphson delta = solve(-hessian, step_size * gradient.T) File "/home/jonas/anaconda3/lib/python3.4/site-packages/numpy/linalg/linalg.py", line 381, in solve r = gufunc(a, b, signature=signature, extobj=extobj) File "/home/jonas/anaconda3/lib/python3.4/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular raise LinAlgError("Singular matrix") numpy.linalg.linalg.LinAlgError: Singular matrix

However, note that I have commented out one of the datasets because that seems to cause the cross-validation to end up in an infinite loop of some kind. The tests never finish (only waited for ~2 minutes).

Doing similar things with R works with no problem.

Doing the following is a fast way to check the results:

python -m unittest lifelines.tests.test_suite.CoxRegressionTests
opened by spacecowboy 37

Add concordance index function

This commit includes a function for calculating Harrell's concordance index, which can be calculated in R using 'hmisc'. The function is implemented in Fortran, with a small Python wrapper. The reason for this is that since calculating the C-index is an O(n^2) process, it quickly becomes unacceptably slow with pure Python. Comparing a pure Python implementation with the Fortran version, on arrays with length 1000, Python required 434 ms while Fortran did it in 4.73 ms. So almost a factor of 100 difference.

As a consequence of the addition of the Fortran module, the setup script now utilizes numpy's setup function which will handle the compilation of the native code.

In addition, a small unit test has been added. To be able to run the unit tests, it is likely necessary to compile the native code first with:

python setup.py build_ext --inplace

I'm not sure how you want to organize the source code, so I opted for naming the file "_statistics.f90" which compiles to a module "_statistics". The function inside is then imported and wrapped in "statistics.py". My thinking is that any "module.py" might have some native code related to it in a "_module.f90" or "_module.c" file.

As a reference, here is a pure python version of the function:

def concordance_index(event_times, predicted_event_times, event_observed=None):
    """
    Calculates the concordance index (C-index) between two series
    of event times. The first is the real survival times from
    the experimental data, and the other is the predicted survival
    times from a model of some kind.

    The concordance index is a value between 0 and 1 where,
    0.5 is the expected result from random predictions,
    1.0 is perfect concordance and,
    0.0 is perfect anti-concordance (multiply predictions with -1 to get 1.0)

    Parameters:
      event_times: a (nx1) array of observed survival times.
      predicted_event_times: a (nx1) array of predicted survival times.
      event_observed: a (nx1) array of censorship flags, 1 if observed,
                      0 if not. Default assumes all observed.

    Returns:
      c-index: a value between 0 and 1.
    """
    event_times = np.array(event_times, dtype=float)
    predicted_event_times = np.array(predicted_event_times, dtype=float)

    if event_observed is None:
        event_observed = np.ones(event_times.shape[0], dtype=float)

    if event_times.shape != predicted_event_times.shape:
        raise ValueError("Event times arrays must have the same shape!")

    def valid_comparison(time_a, time_b, event_a, event_b):
        """True if times can be compared."""
        if event_a and event_b:
            return True
        elif event_a and time_a < time_b:
            return True
        elif event_b and time_b < time_a:
            return True
        else:
            return False

    def concordance_value(time_a, time_b, pred_a, pred_b):
        if pred_a == pred_b:
            # Same as random
            return 0.5
        elif time_a < time_b and pred_a < pred_b:
            return 1.0
        elif time_b < time_a and pred_b < pred_a:
            return 1.0
        else:
            return 0.0

    paircount = 0.0
    csum = 0.0

    for a, (time_a, pred_a, event_a) in enumerate(zip(event_times,
                                                      predicted_event_times,
                                                      event_observed)):
        # Don't want to double count
        for b in range(a + 1, len(event_times)):
            time_b = event_times[b]
            pred_b = predicted_event_times[b]
            event_b = event_observed[b]

            if valid_comparison(time_a, time_b, event_a, event_b):
                paircount += 1.0
                csum += concordance_value(time_a, time_b, pred_a, pred_b)

    return csum / paircount

opened by spacecowboy 26

Better alignment and sizing of at_risk_counts

By setting ha = "center" you get nicer alignment with the x ticks.

ha = "right"

ha = "center"

I also hacked together a way to adjust the font size by adding it as a parameter to the function that accepts an integer x then adding:

ax2.set_xlabel("At risk", fontsize = x)

Probably a nicer way to incorporate that into the arguments though.

edit: Or very possible that there was already a way to adjust the font size and I just couldn't figure it out!
installation plotting

opened by NickCEBM 24
Multiple comparisons testing

Multiple comparisons corrections with something like Bonferroni would be useful. This would also require generating p-values for the logrank statistic from the Chi**2 distribution.
enhancement

opened by waltonjones 20
CoxPHFitter Error

Not sure if this is the right place for this, but I am having an issue getting the CoxPH methods to work. I am new to survival analysis, so I assume this is something wrong with my data set up. I am getting a delta contains nan's, convergence halted during the coxph.fit()? Wondering if anyone can shed some light on why this is happening?

Thanks,

ValueError Traceback (most recent call last) 3 cphf1=CoxPHFitter() ----> 4 cphf1.fit(X, 'T','E') 5 cphf1.print_summary()

/lifelines/fitters/coxph_fitter.pyc in fit(self, df, duration_col, event_col, show_progress, initial_beta, include_likelihood, strata) 313 hazards_ = self.newton_rhaphson(df, T, E, initial_beta=initial_beta, 314 show_progress=show_progress, --> 315 include_likelihood=include_likelihood) 316 317 self.hazards = pd.DataFrame(hazards_.T, columns=df.columns,

lifelines/fitters/coxph_fitter.pyc in _newton_rhaphson(self, X, T, E, initial_beta, step_size, precision, show_progress, include_likelihood) 223 delta = solve(-h, step_size * g.T) 224 if np.any(np.isnan(delta)): --> 225 raise ValueError("delta contains nan value(s). Convergence halted.") 226 227 # Save these as pending result

ValueError: delta contains nan value(s). Convergence halted.
convergence issue

opened by slipss 16
nlogn concordance index algorithm (first pass)

My dataset is about 200k rows, so even the Fortran concordance option takes more than a minute because it's an n^2 algorithm. I wrote a faster (n log n) version. On 100k rows of fake data, this takes the time down from 52s (previous fast n^2 Fortran version) to 4s (current pure-Python n log n version).

Right now it introduces a dependency on another library (blist) because I didn't want to write the order statistic tree myself. Unfortunately blist's order statistic trees might be O(log^2 n) instead of O(log n) for RANK operations, so right now this might be O(n log^2 n) in practice. Also I suspect blist is slower than a data structure that just tried to be an order statistic tree would be. Anyway, because of the dependency issue, I wanted to run this by the maintainers before I go any further with it. What do you recommend?

PS It's also not quite correct yet--it disagrees with the Fortran implementation on the full Cox model concordance test, and they both disagree with what I get in R. I still have to track this down.

opened by benkuhn 16
Speeding up Aalen Additive Regression

Hi, I've been working on a project for a few months now and one problem I have is that it can take about 4 days to run on 340k rows, with about 6 features.

I know lifelines isn't necessarily designed for this and I've discovered that the ridge regression solve step is the biggest bottleneck - 60% of the compute time happens there.

Are there alternative algorithms I can use like mini-batch say? Rather than the ridge regression?
performance

opened by springcoil 15
Create aalen_johansen_fitter.py

Adding a Aalen Johansen fitter as I mentioned in #413. Still needs some cleaning up. Items still needed: standard error estimator, tests, check to see how well jitter() works, ensure documentation and formatting matches rest of lifelines, write up an example

How it works is a follows: estimates an overall survival curve, calculates discrete time hazards for the event of interest (event_ind), calculates the cumulative density function (The survival function can then be used to generate the discrete time hazard (minus log transform S(t) and S(t-) where t- is the event time right before t, then subtract the quantities). To estimate F(t,j) you multiply S(t-) with the discrete time hazard and an indicator for j).

Potential addition: warn users not to calculate survival times from this (only generates the cumulative density function / risk) since the interpretation of those survival times is not straightforward.

Some discussion and examples: https://www.duo.uio.no/bitstream/handle/10852/10287/stat-res-03-97.pdf?sequence=1 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5557056/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4325676/

opened by pzivich 14
Provide Initial Guess to Regression Fitters

It would be great to be able to provide an initial guess point (warm-start) to the regression fitters, such as WeibullAFTFitter. I'm referring to this line:

https://github.com/CamDavidsonPilon/lifelines/blob/d9d3f9f9acb832d03166e39556770c3374c868a4/lifelines/fitters/init.py#L1018

I've been comparing this particular fitter to R's survreg, and for some datasets, their solutions don't agree at all. I'd like to provide the same initial values to both codes and hopefully get the same solution.

opened by bacalfa 12

ImportError: LogNormalFitter, LogLogisticFitter, PiecewiseExponentialFitter

I am having issues importing the following subpackages:

from lifelines import LogNormalFitter
from lifelines import LogLogisticFitter
from lifelines import PiecewiseExponentialFitter

The error message is:

ImportError: cannot import name 'LogNormalFitter' from 'lifelines' (C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\lifelines\__init__.py)

Any ideas what the problem might be?

Thank you, c.

installation

opened by cerenaaa 12

Model serialization?

First off - AWESOME library. I've been using it for a few of my projects and it's been a lifesaver.

However, I'm looking to save models. I can't seem to find any documentation on how. My specific use case is just wanting to save the Kaplan-Meier estimator and use it to make predictions later. I can of course save off the data frame of the survival function, but I'd like to pickle (or otherwise) the model and reload in a different module as I'm performing analysis/fitting and then using the model later. Is there another way besides exporting the survival function (I'd like use the predict method...)

Thanks!

opened by dwilson1988 12
Bugs of Incorrect Calculation of Baseline Hazard & Baseline Cumulative Hazard
Hello, I am a senior data scientist from Prudential Financial. While working on one project involving the Cox proportional hazard models, I found that the baseline hazard and baseline cumulative hazard were obviously calculated incorrectly using the CoxPHFitter module within the lifelines package. Before jumping into the bug details, I want to share the version information first. The lifelines package I was using was 0.27.0, but by the time I checked the source codes in GitHub again this morning, which belonged to the version 0.27.4 I believe, I still saw the same bugs.

The bugs originate from the fit_model function in the SemiParametricPHFitter class, which start from the line 1252 of the coxph_fitter.py file. Notice that, the standardized data are supplied to the fit_model function for further estimation. This will not be a problem for the Cox coefficient estimation (i.e., the params in the function), because they are restored to their original scales after being divided by corresponding standard deviations of the original data as in line 1399. However, no similar action has been taken for the predicted_partial_hazards calculated in line 1392. I notice that in line 1393, a matrix multiplication of the standardized data with the uncorrected Cox coefficients was used to avoid the scale issues. However, the location issues were never taken care of. As a result, it is almost like the raw data are shifted which direction and extent depend on their original mean values, and the effect of the shift is incorrectly transferred to the baseline hazard of the Cox model. Thus, it causes all the subsequent calculations of the baseline hazard and baseline cumulative hazard to be incorrect.

The fixes for the bugs are straightforward, which is basically to use the unstandardized data for baseline hazard related calculations. I have appended some codes below for a lazy fix. By adding them right at the line 1270, it should generate the correct baseline hazard and baseline cumulative hazard. However, this is definitely not ideal, since incorrect calculations are not removed but rather just replaced. If desired, I would be very happy to work with the lifelines development team to come up with a permanent and neater fix for it.

predicted_partial_hazards_ = ( pd.DataFrame(np.exp(dot(X.values, self.params_)), columns=["P"]).assign(T=T.values, E=E.values, W=weights.values).set_index(X.index) ) self.baseline_hazard_ = self._compute_baseline_hazards(predicted_partial_hazards_) self.baseline_cumulative_hazard_ = self._compute_baseline_cumulative_hazard(self.baseline_hazard_)
opened by bofeng2018 1

Is Generalized Gamma still having convergence problems?

I was fitting the Generalized Gamma in my survival data, however, one hour later, the code was still running and it didn't converge. So, I gave up and stopped the fit. I tried to make the negative log-likelihood of Generalized Gamma on my own and I utilized scipy.minimize with Nelder-Mead method to find the maximum likelihood parameters. It converged in two minutes. I would like to know if the Generalized Gamma is still having convergence problems and why?

def neg_log_Gama_Generalizada(params):
  if min(params)< 0:
    return(np.inf)
  gama = params[0]
  k = params[1]
  alpha = params[2]
  falha = np.array(tempo_censura[tempo_censura['censura']==1]['tempo'])
  tc = np.array(tempo_censura[tempo_censura['censura']==0]['tempo'])
  #pdf = [gama*f**(gama*k-1)*np.exp(-(f/alpha)**gama)/(gamma(k)*alpha**(gama*k)) for f in falha]
  #sf = [1 - gammainc(k,(t/alpha)**gama) for t in tc] 
  pdf = gama*falha**(gama*k-1)*np.exp(-(falha/alpha)**gama)/(gamma(k)*alpha**(gama*k))
  sf = 1 - gammainc(k,(tc/alpha)**gama) 
  log_vero = sum(np.log(pdf))+sum(np.log(sf))
  return(-log_vero)

res_gg = minimize(neg_log_Gama_Generalizada,[1,1,1],method='Nelder-Mead')
res_gg.x

array([15.55077716, 0.03200795, 17.09975645])

opened by MichelMiler 0

Feature Request: Cause-Specific Hazards Models

From this open issue, it seems there isn't much support for competing risks models in lifelines. I find myself working on a competing risks problem for which I'd like to use a cause-specific hazards model. As mentioned in this source:

Cause-specific hazard models can be fit in any statistical software package that permits estimation of the conventional Cox proportional hazards model. One simply treats those subjects who experience a competing event as being censored at the time of the occurrence of the competing event.

So actually achieving this model using two instances of CoxPHFitter isn't horrendous, but it's a bit of a pain to not be able to call, eg, a single survival function. It seems like implementation could be straightforward based on that quote. Is there any interest in a contribution adding such a model to lifelines ?

opened by anthonymichaelclark 1

KaplanMeierFitter: Index Error when adding at_risk_counts

Python 3.8 (conda env) lifelines-0.27.1

Using the into on the docs website: https://lifelines.readthedocs.io/en/latest/Survival%20analysis%20with%20lifelines.html

kmf = KaplanMeierFitter().fit(T, E, label="all_regimes")
kmf.plot_survival_function(at_risk_counts=True)
plt.tight_layout()

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [88], in <cell line: 2>()
      1 kmf = KaplanMeierFitter().fit(T, E, label="all_regimes")
----> 2 kmf.plot_survival_function(at_risk_counts=True)
      3 plt.tight_layout()

File ~/.conda/envs/survival/lib/python3.8/site-packages/lifelines/fitters/kaplan_meier_fitter.py:453, in KaplanMeierFitter.plot_survival_function(self, **kwargs)
    451 """Alias of ``plot``"""
    452 if not CensoringType.is_interval_censoring(self):
--> 453     return _plot_estimate(self, estimate="survival_function_", **kwargs)
    454 else:
    455     # hack for now.
    456     def safe_pop(dict, key):

File ~/.conda/envs/survival/lib/python3.8/site-packages/lifelines/plotting.py:961, in _plot_estimate(cls, estimate, loc, iloc, show_censors, censor_styles, ci_legend, ci_force_lines, ci_only_lines, ci_no_lines, ci_alpha, ci_show, at_risk_counts, logx, ax, **kwargs)
    950         plot_estimate_config.ax.fill_between(
    951             x,
    952             lower,
   (...)
    957             step=step,
    958         )
    960 if at_risk_counts:
--> 961     add_at_risk_counts(cls, ax=plot_estimate_config.ax)
    962     plt.tight_layout()
    964 return plot_estimate_config.ax

File ~/.conda/envs/survival/lib/python3.8/site-packages/lifelines/plotting.py:512, in add_at_risk_counts(labels, rows_to_show, ypos, xticks, ax, at_risk_count_from_start_of_period, *fitters, **kwargs)
    505     event_table_slice = f.event_table.assign(at_risk=lambda x: x.at_risk - x.removed)
    507 event_table_slice = (
    508     event_table_slice.loc[:tick, ["at_risk", "censored", "observed"]]
    509     .agg({"at_risk": lambda x: x.tail(1).values, "censored": "sum", "observed": "sum"})  # see #1385
    510     .rename({"at_risk": "At risk", "censored": "Censored", "observed": "Events"})
    511 )
--> 512 tmp = [int(c) for c in event_table_slice.loc[rows_to_show]]
    513 print(tmp)
    514 counts.extend([int(c) for c in event_table_slice.loc[rows_to_show]])

File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:879, in _LocationIndexer.__getitem__(self, key)
    876 axis = self.axis or 0
    878 maybe_callable = com.apply_if_callable(key, self.obj)
--> 879 return self._getitem_axis(maybe_callable, axis=axis)

File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1099, in _LocIndexer._getitem_axis(self, key, axis)
   1096     if hasattr(key, "ndim") and key.ndim > 1:
   1097         raise ValueError("Cannot index with multidimensional key")
-> 1099     return self._getitem_iterable(key, axis=axis)
   1101 # nested tuple slicing
   1102 if is_nested_tuple(key, labels):

File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1037, in _LocIndexer._getitem_iterable(self, key, axis)
   1034 self._validate_key(key, axis)
   1036 # A collection of keys
-> 1037 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1038 return self.obj._reindex_with_indexers(
   1039     {axis: [keyarr, indexer]}, copy=True, allow_dups=True
   1040 )

File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1254, in _LocIndexer._get_listlike_indexer(self, key, axis, raise_missing)
   1251 else:
   1252     keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
-> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255 return keyarr, indexer

File ~/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1298, in _LocIndexer._validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296 if missing == len(indexer):
   1297     axis_name = self.obj._get_axis_name(axis)
-> 1298     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1300 # We (temporarily) allow for some missing keys with .loc, except in
   1301 # some cases (e.g. setting) in which "raise_missing" will be False
   1302 if raise_missing:

KeyError: "None of [Index(['At risk', 'Censored', 'Events'], dtype='object')] are in the [index]"

opened by tobiasweede 5

survival_difference_at_fixed_point_in_time_test documentation

Hey everyone, newbie here!

The 'survival_difference_at_fixed_point_in_time_test documentation' does not explain that it performs a test using chi-squared distribution and the example can be improved by adding interpretation of the result.
docs

opened by nasserboan 1

Releases(v0.27.4)

v0.27.4(Nov 17, 2022)
0.27.4 - 2022-11-16

New features

Support py3.11

Source code(tar.gz)
Source code(zip)
v0.27.3(Sep 25, 2022)
0.27.3

New features

Fixed and silenced a lot of warnings

Bug fixes

Migrate to newer Pandas Styler for to_latex

API Changes

There were way too many functions on the summary objects, so I've hidden to_* on them.

Source code(tar.gz)
Source code(zip)
v0.27.2(Sep 8, 2022)
0.27.2 - 2022-09-07

Bug fixes

Fixed issue in add_at_risk_table when there were very late entries.

Source code(tar.gz)
Source code(zip)
v0.27.1(Jun 26, 2022)
0.27.1 - 2022-03-15

New features

all fit_ methods now accept a fit_options dict that allows one to pass kwargs to the underlying fitting algorithm.

API Changes

step_size is removed from Cox models fit. See fit_options above.

Bug fixes

fixed Cox models when "trival" matrix was passed in (one with no covariates)

Source code(tar.gz)
Source code(zip)
v0.27.0(Mar 15, 2022)
0.27.0 - 2022-03-15

Dropping Python3.6 support.

Bug fixes

Fix late entry in add_at_risk_counts.

New features

add_at_risk_counts has a new flag to determine to use start or end-of-period at risk counts.

new column in fitter's summary that display the number the parameter is being compared against.

API Changes

plot_lifetimes's duration arg has the interpretation of "relative time the subject died (since birth)", instead of the old "time observed for". These interpretations are different when there is late entry.

Source code(tar.gz)
Source code(zip)
v0.26.4(Nov 30, 2021)
0.26.4 - 2021-11-30

New features

adding weights to log rank functions

Source code(tar.gz)
Source code(zip)
v0.26.3(Sep 16, 2021)
0.26.3 - 2021-09-16

Bug fixes

Fix using formulas with CoxPHFitter.score

Source code(tar.gz)
Source code(zip)
v0.26.2(Sep 15, 2021)

0.26.2 - 2021-09-15

Error in v0.26.1 deployment
Source code(tar.gz)
Source code(zip)
v0.26.1(Sep 15, 2021)
0.26.1 - 2021-09-15

API Changes

t_0 in logrank_test now will not remove data, but will instead censor all subjects that experience the event afterwards.

update status column in lifelines.datasets.load_lung to be more standard coding: 0 is censored, 1 is event.

Bug fixes

Fix using formulas with AalenAdditiveFitter.predict_cumulative_hazard

Fix using formulas with CoxPHFitter.score

Source code(tar.gz)
Source code(zip)
0.26.0(May 27, 2021)
0.26.0 - 2021-05-26

New features

.BIC_ is now present on fitted models.

CoxPHFitter with spline baseline can accept pre-computed knot locations.

Left censoring fitting in KaplanMeierFitter is now "expected". That is, predict always predicts the survival function (as does every other model), confidence_interval_ is always the CI for the survival function (as does every other model), and so on. In summary: the API for estimates doesn't change depending on what your censoring your dataset is.

Bug fixes

Fixed an annoying bug where at_risk-table label's were not aligning properly when data spanned large ranges. See merging PR for details.

Fixed a bug in find_best_parametric_model where the wrong BIC value was being computed.

Fixed regression bug when using an array as a penalizer in Cox models.

Source code(tar.gz)
Source code(zip)
v0.25.11-2(Apr 13, 2021)
0.25.11 - 2021-04-06

A previous release (on Github) was missing correct metadata and was deleted.

Bug fixes

Fix integer-valued categorical variables in regression model predictions.

numpy > 1.20 is allowed.

Bug fix in the elastic-net penalty for Cox models that wasn't weighting the terms correctly.

Source code(tar.gz)
Source code(zip)
v0.25.10(Mar 3, 2021)
0.25.10 - 2021-03-03

New features

Better appearance when using a single row to show in add_at_risk_table.

Source code(tar.gz)
Source code(zip)
v0.25.9(Feb 5, 2021)

0.25.9 - 2021-02-04

Small bump in dependencies.
Source code(tar.gz)
Source code(zip)
v0.25.8(Jan 22, 2021)
0.25.8 - 2021-01-22

Important: we dropped Patsy as our formula framework, and adopted Formulaic. Will the latter is less mature than Patsy, we feel the core capabilities are satisfactory and it provides new opportunities.

New features

Parametric models with formulas are able to be serialized now.

a _scipy_callback function is available to use in fitting algorithms.

Source code(tar.gz)
Source code(zip)
v0.25.7(Dec 9, 2020)
0.25.7 - 2020-12-09

API Changes

Adding cumulative_hazard_at_times to NelsonAalenFitter

Bug fixes

Fixed error in CoxPHFitter when entry time == event time.

Fixed formulas in AFT interval censoring regression.

Fixed concordance_index_ when no events observed

Fixed label being overwritten in ParametricUnivariate models

Source code(tar.gz)
Source code(zip)
v0.25.6(Oct 26, 2020)

0.25.6 - 2020-10-26 New features Parametric Cox models can now handle left and interval censoring datasets. Bug fixes "improved" the output of add_at_risk_counts by removing a call to plt.tight_layout() - this works better when you are calling add_at_risk_counts on multiple axes, but it is recommended you call plt.tight_layout() at the very end of your script. Fix bug in KaplanMeierFitter's interval censoring where max(lower bound) < min(upper bound).
Source code(tar.gz)
Source code(zip)
v0.25.5(Sep 25, 2020)
0.25.5 - 2020-09-23

API Changes

check_assumptions now returns a list of list of axes that can be manipulated

Bug fixes

fixed error when using plot_partial_effects with categorical data in AFT models

improved warning when Hessian matrix contains NaNs.

fixed performance regression in interval censoring fitting in parametric models

weights wasn't being applied properly in NPMLE

Source code(tar.gz)
Source code(zip)
v0.25.4(Aug 26, 2020)
0.25.4 - 2020-08-26

New features

New baseline estimator for Cox models: piecewise

Performance improvements for parametric models' log_likelihood_ratio_test() and print_summary()

Better step-size defaults for Cox model -> more robust convergence.

Bug fixes

fix check_assumptions when using formulas.

Source code(tar.gz)
Source code(zip)
v0.25.3(Aug 24, 2020)
0.25.3 - 2020-08-24

New features

survival_difference_at_fixed_point_in_time_test now accepts fitters instead of raw data, meaning that you can use this function on left, right or interval censored data.

API Changes

See note on survival_difference_at_fixed_point_in_time_test above.

Bug fixes

fix StatisticalResult printing in notebooks

fix Python error when calling plot_covariate_groups

fix dtype mismatches in plot_partial_effects_on_outcome.

Source code(tar.gz)
Source code(zip)
v0.25.2(Aug 9, 2020)
0.25.2 - 2020-08-08

New features

Spline CoxPHFitter can now use strata.

API Changes

a small parameterization change of the spline CoxPHFitter. The linear term in the spline part was moved to a new Intercept term in the beta_.

n_baseline_knots in the spline CoxPHFitter now refers to all knots, and not just interior knots (this was confusing to me, the author.). So add 2 to n_baseline_knots to recover the identical model as previously.

Bug fixes

fix splines CoxPHFitter with when predict_hazard was called.

fix some exception imports I missed.

fix log-likelihood p-value in splines CoxPHFitter

Source code(tar.gz)
Source code(zip)
v0.25.1(Aug 1, 2020)
0.25.1 - 2020-08-01

Bug fixes

ok actually ship the out-of-sample calibration code

fix labels=False in add_at_risk_counts

all for specific rows to be shown in add_at_risk_counts

put patsy as a proper dependency.

suppress some Pandas 1.1 warnings.

Source code(tar.gz)
Source code(zip)
v0.25.0(Jul 27, 2020)
0.25.0 - 2020-07-27

New features

Formulas! lifelines now supports R-like formulas in regression models. See docs here.

plot_covariate_group now can plot other y-values like hazards and cumulative hazards (default: survival function).

CoxPHFitter now accepts late entries via entry_col.

calibration.survival_probability_calibration now works with out-of-sample data.

print_summary now accepts a column argument to filter down the displayed values. This helps with clutter in notebooks, latex, or on the terminal.

add_at_risk_counts now follows the cool new KMunicate suggestions

API Changes

With the introduction of formulas, all models can be using formulas under the hood.

For both custom regression models or non-AFT regression models, this means that you no longer need to add a constant column to your DataFrame (instead add a 1 as a formula string in the regressors dict). You may also need to remove the T and E columns from regressors. I've updated the models in the \examples folder with examples of this new model building.

Unfortunately, if using formulas, your model will not be able to be pickled. This is a problem with an upstream library, and I hope to have it resolved in the near future.

plot_covariate_groups has been deprecated in favour of plot_partial_effects_on_outcome.

The baseline in plot_covariate_groups has changed from the mean observation (including dummy-encoded categorical variables) to median for ordinal (including continuous) and mode for categorical.

Previously, lifelines used the label "_intercept" to when it added a constant column in regressions. To align with Patsy, we are now using "Intercept".

In AFT models, ancillary_df kwarg has been renamed to ancillary. This reflects the more general use of the kwarg (not always a DataFrame, but could be a boolean or string now, too).

Some column names in datasets shipped with lifelines have changed.

The never used "lifelines.metrics" is deleted.

With the introduction of formulas, plot_covariate_groups (now called plot_partial_effects_on_outcome) behaves differently for transformed variables. Users no longer need to add "derivatives" features, and encoding is done implicitly. See docs here.

all exceptions and warnings have moved to lifelines.exceptions

Bug fixes

The p-value of the log-likelihood ratio test for the CoxPHFitter with splines was returning the wrong result because the degrees of freedom was incorrect.

better print_summary logic in IDEs and Jupyter exports. Previously it should not be displayed.

p-values have been corrected in the SplineFitter. Previously, the "null hypothesis" was no coefficient=0, but coefficient=0.01. This is now set to the former.

fixed NaN bug in survival_table_from_events with intervals when no events would occur in a interval.

Source code(tar.gz)
Source code(zip)
v0.24.16(Jul 9, 2020)
0.24.16 - 2020-07-09

New features

improved algorithm choice for large Dataframes for Cox models. Should see a significant performance boost.

Bug fixes

fixed utils.median_survival_time not accepting Pandas Series.

Source code(tar.gz)
Source code(zip)
v0.24.15(Jul 7, 2020)
0.24.15 - 2020-07-07

Bug fixes

fixed an edge case in KaplanMeierFitter where a really late entry would occur after all other population had died.

fixed plot in BreslowFlemingtonHarrisFitter

fixed bug where using conditional_after and times in CoxPHFitter("spline") prediction methods would be ignored.

Source code(tar.gz)
Source code(zip)
v0.24.14(Jul 2, 2020)
0.24.14 - 2020-07-02

Bug fixes

fixed a bug where using conditional_after and times in prediction methods would result in a shape error

fixed a bug where score was not able to be used in splined CoxPHFitter

fixed a bug where some columns would not be displayed in print_summary

Source code(tar.gz)
Source code(zip)
v0.24.13(Jun 22, 2020)
0.24.13 - 2020-06-22

Bug fixes

fixed a bug where CoxPHFitter would ignore inputed alpha levels for confidence intervals

fixed a bug where CoxPHFitter would fail with working with sklearn_adapter

Source code(tar.gz)
Source code(zip)
v0.24.12(Jun 20, 2020)
0.24.12 - 2020-06-20

New features

improved convergence of GeneralizedGamma(Regression)Fitter.

Source code(tar.gz)
Source code(zip)
v0.24.11(Jun 18, 2020)
0.24.11 - 2020-06-17

New features

new spline regression model CRCSplineFitter based on the paper "A flexible parametric accelerated failure time model" by Michael J. Crowther, Patrick Royston, Mark Clements.

new survival probability calibration tool lifelines.calibration.survival_probability_calibration to help validate regression models. Based on “Graphical calibration curves and the integrated calibration index (ICI) for survival models” by P. Austin, F. Harrell, and D. van Klaveren.

API Changes

(and bug fix) scalar parameters in regression models were not being penalized by penalizer - we now penalizing everything except intercept terms in linear relationships.

Source code(tar.gz)
Source code(zip)
v0.24.10(Jun 17, 2020)
0.24.10

New features

New improvements when using splines model in CoxPHFitter - it should offer much better prediction and baseline-hazard estimation, including extrapolation and interpolation.

API Changes

Related to above: the fitted spline parameters are now available in the .summary and .print_summary methods.

Bug fixes

fixed a bug in initialization of some interval-censoring models -> better convergence.

Source code(tar.gz)
Source code(zip)
v0.24.9(Jun 5, 2020)
0.24.9 - 2020-06-05

New features

Faster NPMLE for interval censored data

New weightings available in the logrank_test: wilcoxon, tarone-ware, peto, fleming-harrington. Thanks @sean-reed

new interval censored dataset: lifelines.datasets.load_mice

Bug fixes

Cleared up some mislabeling in plot_loglogs. Thanks @sean-reed!

tuples are now able to be used as input in univariate models.

Source code(tar.gz)
Source code(zip)

Owner

Cameron Davidson-Pilon

CEO of Pioreactor. Former Director of Data Science @Shopify. Author of Bayesian Methods for Hackers and DataOrigami.

GitHub Repository lifelines.readthedocs.org

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

WebDataset WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and us

1.1k Jan 08, 2023

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

Efficient implementation of YOLOV5 in TensorFlow2

202 Jan 06, 2023

Transferable Unrestricted Attacks, which won 1st place in CVPR’21 Security AI Challenger: Unrestricted Adversarial Attacks on ImageNet.

Transferable Unrestricted Adversarial Examples This is the PyTorch implementation of the Arxiv paper: Towards Transferable Unrestricted Adversarial Ex

16 Dec 29, 2022

Dataset and Code for the paper "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021), and "Depth-only Object Tracking" (BMVC2021)

DeT and DOT Code and datasets for "DepthTrack: Unveiling the Power of RGBD Tracking" (ICCV2021) "Depth-only Object Tracking" (BMVC2021) @InProceedings

55 Dec 15, 2022

Transformer Huffman coding - Complete Huffman coding through transformer

Transformer_Huffman_coding Complete Huffman coding through transformer 2022/2/19

3 May 19, 2022

U-Time: A Fully Convolutional Network for Time Series Segmentation

U-Time & U-Sleep Official implementation of The U-Time [1] model for general-purpose time-series segmentation. The U-Sleep [2] model for resilient hig

176 Dec 19, 2022

Face Mask Detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras

Face Mask Detection Face Mask Detection System built with OpenCV, Keras/TensorFlow using Deep Learning and Computer Vision concepts in order to detect

1.4k Jan 03, 2023

Automatic Differentiation Multipole Moment Molecular Forcefield

Automatic Differentiation Multipole Moment Molecular Forcefield Performance notes On a single gpu, using waterbox_31ang.pdb example from MPIDplugin wh

4 Jan 07, 2022

The Instructed Glacier Model (IGM)

The Instructed Glacier Model (IGM) Overview The Instructed Glacier Model (IGM) simulates the ice dynamics, surface mass balance, and its coupling thro

27 Dec 16, 2022

High frequency AI based algorithmic trading module.

Flow Flow is a high frequency algorithmic trading module that uses machine learning to self regulate and self optimize for maximum return. The current

59 Dec 14, 2022

Official repository for Jia, Raghunathan, Göksel, and Liang, "Certified Robustness to Adversarial Word Substitutions" (EMNLP 2019)

Certified Robustness to Adversarial Word Substitutions This is the official GitHub repository for the following paper: Certified Robustness to Adversa

38 Oct 16, 2022

Survival analysis in Python

Related tags

Overview

Documentation and intro to survival analysis

Contact

Roadmap

Development

Comments

Releases(v0.27.4)

v0.27.4(Nov 17, 2022)

0.27.4 - 2022-11-16

New features

v0.27.3(Sep 25, 2022)

0.27.3

New features

Bug fixes

API Changes

v0.27.2(Sep 8, 2022)

0.27.2 - 2022-09-07

Bug fixes

v0.27.1(Jun 26, 2022)

0.27.1 - 2022-03-15

New features

API Changes

Bug fixes

v0.27.0(Mar 15, 2022)

0.27.0 - 2022-03-15

Bug fixes

New features

API Changes

v0.26.4(Nov 30, 2021)

0.26.4 - 2021-11-30

New features

v0.26.3(Sep 16, 2021)

0.26.3 - 2021-09-16

Bug fixes

v0.26.2(Sep 15, 2021)

0.26.2 - 2021-09-15

v0.26.1(Sep 15, 2021)

0.26.1 - 2021-09-15

API Changes

Bug fixes

0.26.0(May 27, 2021)

0.26.0 - 2021-05-26

New features

Bug fixes

v0.25.11-2(Apr 13, 2021)

0.25.11 - 2021-04-06

Bug fixes

v0.25.10(Mar 3, 2021)

0.25.10 - 2021-03-03

New features

v0.25.9(Feb 5, 2021)

0.25.9 - 2021-02-04

v0.25.8(Jan 22, 2021)

0.25.8 - 2021-01-22

New features

v0.25.7(Dec 9, 2020)

0.25.7 - 2020-12-09

API Changes

Bug fixes

v0.25.6(Oct 26, 2020)

v0.25.5(Sep 25, 2020)

0.25.5 - 2020-09-23

API Changes

Bug fixes

v0.25.4(Aug 26, 2020)

0.25.4 - 2020-08-26

New features

Bug fixes

v0.25.3(Aug 24, 2020)

0.25.3 - 2020-08-24

New features

API Changes

Bug fixes

v0.25.2(Aug 9, 2020)

0.25.2 - 2020-08-08

New features

API Changes

Bug fixes