Churn prediction with PySpark

Last update: Aug 13, 2021

Related tags

Data Analysis Churn_Prediction

Overview

Churn Prediction

Objective

It is expected to develop a machine learning model that can predict customers who will leave the company.

About Dataset

Consists of 10000 observations and 12 variables.
The independent variables contain information about customers.
The dependent variable represents the customer abandonment status.

Variables

Surname – Customer surname
CreditScore – Customer's credit score
Geography – Country where the customer is located
Gender – Customer's gender
Age – Customer's age
Tenure – Information on how many years of customer it is
NumOfProducts – Used bank product
HasCrCard – Credit card status (0=No,1=Yes)
IsActiveMember – Active Membership status (0=No,1=Yes)
EstimatedSalary – Customer's estimated salary
Exited: – Exited or not (0=No,1=Yes)

Owner

GitHub Repository

Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production

Numerics Numerical Analysis toolkit centred around PDEs, for demonstration and understanding purposes not production Use procedure: Initialise a new i

1 Nov 13, 2021

Hg002-qc-snakemake - HG002 QC Snakemake

HG002 QC Snakemake To Run Resources and data specified within snakefile (hg002QC

2 Feb 16, 2022

A model checker for verifying properties in epistemic models

Epistemic Model Checker This is a model checker for verifying properties in epistemic models. The goal of the model checker is to check for Pluralisti

2 Dec 22, 2021

Open source platform for Data Science Management automation

Hydrosphere examples This repo contains demo scenarios and pre-trained models to show Hydrosphere capabilities. Data and artifacts management Some mod

6 Aug 10, 2021

Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

1 Jan 09, 2022

Transform-Invariant Non-Negative Matrix Factorization

Transform-Invariant Non-Negative Matrix Factorization A comprehensive Python package for Non-Negative Matrix Factorization (NMF) with a focus on learn

6 Jul 01, 2022

A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

3 Aug 24, 2022

Finding project directories in Python (data science) projects, just like there R rprojroot and here packages

Find relative paths from a project root directory Finding project directories in Python (data science) projects, just like there R here and rprojroot

102 Nov 16, 2022

A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

131 Dec 26, 2022

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

7 Nov 20, 2022

Stitch together Nanopore tiled amplicon data without polishing a reference

Stitch together Nanopore tiled amplicon data using a reference guided approach Tiled amplicon data, like those produced from primers designed with pri

14 Aug 30, 2022

University Challenge 2021 With Python

University Challenge 2021 This repository contains: The TeX file of the technical write-up describing the University / HYPER Challenge 2021 under late

2 Nov 27, 2021

Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

13 Mar 31, 2022

A Python adaption of Augur to prioritize cell types in perturbation analysis.

2 Mar 29, 2022

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

largeVis This is an implementation of the largeVis algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates: A very fast algori

336 May 25, 2022

Building house price data pipelines with Apache Beam and Spark on GCP

This project contains the process from building a web crawler to extract the raw data of house price to create ETL pipelines using Google Could Platform services.

1 Nov 22, 2021

Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

6.3k Jan 08, 2023

TheMachineScraper 🐱‍👤 is an Information Grabber built for Machine Analysis

TheMachineScraper 🐱‍👤 is a tool made purely for analysing machine data for any reason.

5 Dec 01, 2022

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dy

2 Nov 08, 2021