Code Repository for The Kaggle Book, Published by Packt Publishing

Overview

The Kaggle Book

Data analysis and machine learning for competitive data science

Code Repository for The Kaggle Book, Published by Packt Publishing

"Luca and Konradˈs book helps make Kaggle even more accessible. They are both top-ranked users and well-respected members of the Kaggle community. Those who complete this book should expect to be able to engage confidently on Kaggle – and engaging confidently on Kaggle has many rewards." — Anthony Goldbloom, Kaggle Founder & CEO

Key Features

  • Learn how Kaggle works and how to make the most of competitions from two expert Kaggle Grandmasters
  • Sharpen your modeling skills with ensembling, feature engineering, adversarial validation, AutoML, transfer learning, and techniques for parameter tuning
  • Challenge yourself with problems regarding tabular data, vision, natural language as well as simulation and optimization
  • Discover tips, tricks, and best practices for getting great results on Kaggle and becoming a better data scientist
  • Read interviews with 31 Kaggle Masters and Grandmasters telling about their experience and tips

Get a step ahead of your competitors with a concise collection of smart data handling and modeling techniques

Getting started

You can run these notebooks on cloud platforms like Kaggle Colab or your local machine. Note that most chapters require a GPU even TPU sometimes to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA.

Running on a cloud platform

To run these notebooks on a cloud platform, just click on one of the badges (Colab or Kaggle) in the table below. The code will be reproduced from Github directly onto the choosen platform (you may have to add the necessary data before running it). Alternatively, we also provide links to the fully working original notebook on Kaggle that you can copy and immediately run.

no Chapter Notebook Colab Kaggle
05 Competition Tasks and Metrics meta_kaggle Open In Colab Kaggle
06 Designing Good Validation adversarial-validation-example Open In Colab Kaggle
07 Modeling for Tabular Competitions interesting-eda-tsne-umap Open In Colab Kaggle
meta-features-and-target-encoding Open In Colab Kaggle
really-not-missing-at-random Open In Colab Kaggle
tutorial-feature-selection-with-boruta-shap Open In Colab Kaggle
08 Hyperparameter Optimization basic-optimization-practices Open In Colab Kaggle
hacking-bayesian-optimization-for-dnns Open In Colab Kaggle
hacking-bayesian-optimization Open In Colab Kaggle
kerastuner-for-imdb Open In Colab Kaggle
optuna-bayesian-optimization Open In Colab Kaggle
scikit-optimize-for-lightgbm Open In Colab Kaggle
tutorial-bayesian-optimization-with-lightgbm Open In Colab Kaggle
09 Ensembling with Blending and Stacking Solutions ensembling Open In Colab Kaggle
10 Modeling for Computer Vision augmentations-examples Open In Colab Kaggle
images-classification Open In Colab Kaggle
prepare-annotations Open In Colab Kaggle
segmentation-inference Open In Colab Kaggle
segmentation Open In Colab Kaggle
object-detection-yolov5 Open In Colab Kaggle
11 Modeling for NLP nlp-augmentations4 Open In Colab Kaggle
nlp-augmentation1 Open In Colab Kaggle
qanswering Open In Colab Kaggle
sentiment-extraction Open In Colab Kaggle
12 Simulation and Optimization Competitions connectx Open In Colab Kaggle
mab-santa Open In Colab Kaggle
rps-notebook1 Open In Colab Kaggle

Book Description

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with the rest of the community, and gain valuable experience to help grow your career.

The first book of its kind, Data Analysis and Machine Learning with Kaggle assembles the techniques and skills you’ll need for success in competitions, data science projects, and beyond. Two masters of Kaggle walk you through modeling strategies you won’t easily find elsewhere, and the tacit knowledge they’ve accumulated along the way. As well as Kaggle-specific tips, you’ll learn more general techniques for approaching tasks based on image data, tabular data, textual data, and reinforcement learning. You’ll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

What you will learn

  • Get acquainted with Kaggle and other competition platforms
  • Make the most of Kaggle Notebooks, Datasets, and Discussion forums
  • Understand different modeling tasks including binary and multi-class classification, object detection, NLP (Natural Language Processing), and time series
  • Design good validation schemes, learning about k-fold, probabilistic, and adversarial validation
  • Get to grips with evaluation metrics including MSE and its variants, precision and recall, IoU, mean average precision at k, as well as never-before-seen metrics
  • Handle simulation and optimization competitions on Kaggle
  • Create a portfolio of projects and ideas to get further in your career

Who This Book Is For

This book is suitable for Kaggle users and data analysts/scientists with at least a basic proficiency in data science topics and Python who are trying to do better in Kaggle competitions and secure jobs with tech giants. At the time of completion of this book, there are 96,190 Kaggle novices (users who have just registered on the website) and 67,666 Kaggle contributors (users who have just filled in their profile) enlisted in Kaggle competitions. This book has been written with all of them in mind and with anyone else wanting to break the ice and start taking part in competitions on Kaggle and learning from them.

Table of Contents

Part 1

  1. Introducing Kaggle and Other Data Science Competitions
  2. Organizing Data with Datasets
  3. Working and Learning with Kaggle Notebooks
  4. Leveraging Discussion Forums

Part 2

  1. Competition Tasks and Metrics
  2. Designing Good Validation
  3. Modeling for Tabular Competitions
  4. Hyperparameter Optimization
  5. Ensembling with Blending and Stacking Solutions
  6. Modeling for Computer Vision
  7. Modeling for NLP
  8. Simulation and Optimization Competitions

Part 3

  1. Creating Your Portfolio of Projects and Ideas
  2. Finding New Professional Opportunities
Owner
Packt
Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.
Packt
PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street

PINN(s): Physics-Informed Neural Network(s) for von Karman vortex street This is

ShotaDEGUCHI 2 Apr 18, 2022
Python calculations for the position of the sun and moon.

Astral This is 'astral' a Python module which calculates Times for various positions of the sun: dawn, sunrise, solar noon, sunset, dusk, solar elevat

Simon Kennedy 169 Dec 20, 2022
Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

Shunted Transformer This is the offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation by Sucheng Ren, Daquan Zhou, Shengf

156 Dec 27, 2022
Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph Augmentation Graph augmentation/self-supervision/etc. Algorithms gcn gcn+virtual node gin gin+virtual node PNA GraphTrans Augmentation methods N

UC Berkeley RISE 67 Dec 30, 2022
ConvMixer unofficial implementation

ConvMixer ConvMixer 非官方实现 pytorch 版本已经实现。 nets 是重构版本 ,test 是官方代码 感兴趣小伙伴可以对照看一下。 keras 已经实现 tf2.x 中 是tensorflow 2 版本 gelu 激活函数要求 tf=2.4 否则使用入下代码代替gelu

Jian Tengfei 8 Jul 11, 2022
This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transformer"

FlatTN This repository contains code accompanying the paper "An End-to-End Chinese Text Normalization Model based on Rule-Guided Flat-Lattice Transfor

THUHCSI 74 Nov 28, 2022
This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).

AlexNet_3dConv TensorFlow implementation of AlexNet(2012) by Alex Krizhevsky, with 3D convolutiional layers. 3D AlexNet Network with a standart AlexNe

Denis Timonin 41 Jan 16, 2022
Pyeventbus: a publish/subscribe event bus

pyeventbus pyeventbus is a publish/subscribe event bus for Python 2.7. simplifies the communication between python classes decouples event senders and

15 Apr 21, 2022
PyTorch-based framework for Deep Hedging

PFHedge: Deep Hedging in PyTorch PFHedge is a PyTorch-based framework for Deep Hedging. PFHedge Documentation Neural Network Architecture for Efficien

139 Dec 30, 2022
Unofficial keras(tensorflow) implementation of MAE model from Masked Autoencoders Are Scalable Vision Learners

MAE-keras Unofficial keras(tensorflow) implementation of MAE model described in 'Masked Autoencoders Are Scalable Vision Learners'. This work has been

Yewon 11 Jun 12, 2022
Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR

Official implementation for paper "Advancing Self-supervised Monocular Depth Learning with Sparse LiDAR"

Ziyue Feng 72 Dec 09, 2022
Instant neural graphics primitives: lightning fast NeRF and more

Instant Neural Graphics Primitives Ever wanted to train a NeRF model of a fox in under 5 seconds? Or fly around a scene captured from photos of a fact

NVIDIA Research Projects 10.6k Jan 01, 2023
Fuse radar and camera for detection

SAF-FCOS: Spatial Attention Fusion for Obstacle Detection using MmWave Radar and Vision Sensor This project hosts the code for implementing the SAF-FC

ChangShuo 18 Jan 01, 2023
Transparent Transformer Segmentation

Transparent Transformer Segmentation Introduction This repository contains the data and code for IJCAI 2021 paper Segmenting transparent object in the

谢恩泽 140 Jan 02, 2023
Occlusion robust 3D face reconstruction model in CFR-GAN (WACV 2022)

Occlusion Robust 3D face Reconstruction Yeong-Joon Ju, Gun-Hee Lee, Jung-Ho Hong, and Seong-Whan Lee Code for Occlusion Robust 3D Face Reconstruction

Yeongjoon 31 Dec 19, 2022
The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue. How do I cite D-REX? For now, cite

Alon Albalak 6 Mar 31, 2022
Official repository for the paper "Instance-Conditioned GAN"

Official repository for the paper "Instance-Conditioned GAN" by Arantxa Casanova, Marlene Careil, Jakob Verbeek, Michał Drożdżal, Adriana Romero-Soriano.

Facebook Research 510 Dec 30, 2022
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

MaskCycleGAN-VC Unofficial PyTorch implementation of Kaneko et al.'s MaskCycleGAN-VC (2021) for non-parallel voice conversion. MaskCycleGAN-VC is the

86 Dec 25, 2022
Code for the paper "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021)

MASTER-PyTorch PyTorch reimplementation of "MASTER: Multi-Aspect Non-local Network for Scene Text Recognition" (Pattern Recognition 2021). This projec

Wenwen Yu 255 Dec 29, 2022
Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

CorrelAid Machine Learning Winter School Welcome to the CorrelAid ML Winter School! Task The problem we want to solve is to classify trees in Roosevel

CorrelAid 12 Nov 23, 2022