A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

Last update: Dec 18, 2021

Related tags

Overview

FEATURE ENGINEERING

Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is expected that the dataset will be ready for modelling when passed through this script.

Story of the Dataset:
The dataset is the dataset of the people who were in the Titanic shipwreck. It consists of 768 observations and 12 variables. The target variable is specified as "Survived";

0: indicates the person's inability to survive.

1: refers to the survival of the person.

ATTRIBUTES:

PassengerId: ID of the passenger

Survived: Survival status (0: not survived, 1: survived)

Pclass: Ticket class (1: 1st class (upper), 2: 2nd class (middle), 3: 3rd class(lower))

Name: Name of the passenger

Sex: Gender of the passenger (male, female)

Age: Age in years

Sibsp: Number of siblings/spouses aboard the Titanic
Sibling = Brother, sister, stepbrother, stepsister
Spouse = Husband, wife (mistresses and fiances were ignored)

Parch: Number of parents/children aboard the Titanic
Parent = Mother, father
Child = Daughter, son, stepdaughter, stepson
Some children travelled only with a nanny , therefore Parch = 0 for them.

Ticket: Ticket number # Fare: Passenger fare

Cabin: Cabin number

Embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

REFERENCE: Data Science and ML Boot Camp, 2021, Veri Bilimi Okulu (https://www.veribilimiokulu.com/)

A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

Related tags

Overview

Owner

Pinar Oner

Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort

Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

Tools for diffing and merging of Jupyter notebooks.

Code for the TCAV ML interpretability project

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.

PySpark ML Bank Churn Prediction

Upgini : data search library for your machine learning pipelines

CD) in machine learning projectsImplementing continuous integration & delivery (CI/CD) in machine learning projects

Python 3.6+ toolbox for submitting jobs to Slurm

Project to deploy a machine learning model based on Titanic dataset from Kaggle

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

Probabilistic time series modeling in Python

Toolkit for building machine learning models that generalize to unseen domains and are robust to privacy and other attacks.

easyNeuron is a simple way to create powerful machine learning models, analyze data and research cutting-edge AI.

A toolbox to iNNvestigate neural networks' predictions!

Traingenerator 🧙 A web app to generate template code for machine learning ✨

About Solve CTF offline disconnection problem - based on python3's small crawler

Tangram makes it easy for programmers to train, deploy, and monitor machine learning models.

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

scikit-learn models hyperparameters tuning and feature selection, using evolutionary algorithms.