Code Repository for The Kaggle Book, Published by Packt Publishing

Overview

The Kaggle Book

Data analysis and machine learning for competitive data science

Code Repository for The Kaggle Book, Published by Packt Publishing

"Luca and Konradˈs book helps make Kaggle even more accessible. They are both top-ranked users and well-respected members of the Kaggle community. Those who complete this book should expect to be able to engage confidently on Kaggle – and engaging confidently on Kaggle has many rewards." — Anthony Goldbloom, Kaggle Founder & CEO

Key Features

  • Learn how Kaggle works and how to make the most of competitions from two expert Kaggle Grandmasters
  • Sharpen your modeling skills with ensembling, feature engineering, adversarial validation, AutoML, transfer learning, and techniques for parameter tuning
  • Challenge yourself with problems regarding tabular data, vision, natural language as well as simulation and optimization
  • Discover tips, tricks, and best practices for getting great results on Kaggle and becoming a better data scientist
  • Read interviews with 31 Kaggle Masters and Grandmasters telling about their experience and tips

Get a step ahead of your competitors with a concise collection of smart data handling and modeling techniques

Getting started

You can run these notebooks on cloud platforms like Kaggle Colab or your local machine. Note that most chapters require a GPU even TPU sometimes to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA.

Running on a cloud platform

To run these notebooks on a cloud platform, just click on one of the badges (Colab or Kaggle) in the table below. The code will be reproduced from Github directly onto the choosen platform (you may have to add the necessary data before running it). Alternatively, we also provide links to the fully working original notebook on Kaggle that you can copy and immediately run.

no Chapter Notebook Colab Kaggle
05 Competition Tasks and Metrics meta_kaggle Open In Colab Kaggle
06 Designing Good Validation adversarial-validation-example Open In Colab Kaggle
07 Modeling for Tabular Competitions interesting-eda-tsne-umap Open In Colab Kaggle
meta-features-and-target-encoding Open In Colab Kaggle
really-not-missing-at-random Open In Colab Kaggle
tutorial-feature-selection-with-boruta-shap Open In Colab Kaggle
08 Hyperparameter Optimization basic-optimization-practices Open In Colab Kaggle
hacking-bayesian-optimization-for-dnns Open In Colab Kaggle
hacking-bayesian-optimization Open In Colab Kaggle
kerastuner-for-imdb Open In Colab Kaggle
optuna-bayesian-optimization Open In Colab Kaggle
scikit-optimize-for-lightgbm Open In Colab Kaggle
tutorial-bayesian-optimization-with-lightgbm Open In Colab Kaggle
09 Ensembling with Blending and Stacking Solutions ensembling Open In Colab Kaggle
10 Modeling for Computer Vision augmentations-examples Open In Colab Kaggle
images-classification Open In Colab Kaggle
prepare-annotations Open In Colab Kaggle
segmentation-inference Open In Colab Kaggle
segmentation Open In Colab Kaggle
object-detection-yolov5 Open In Colab Kaggle
11 Modeling for NLP nlp-augmentations4 Open In Colab Kaggle
nlp-augmentation1 Open In Colab Kaggle
qanswering Open In Colab Kaggle
sentiment-extraction Open In Colab Kaggle
12 Simulation and Optimization Competitions connectx Open In Colab Kaggle
mab-santa Open In Colab Kaggle
rps-notebook1 Open In Colab Kaggle

Book Description

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with the rest of the community, and gain valuable experience to help grow your career.

The first book of its kind, Data Analysis and Machine Learning with Kaggle assembles the techniques and skills you’ll need for success in competitions, data science projects, and beyond. Two masters of Kaggle walk you through modeling strategies you won’t easily find elsewhere, and the tacit knowledge they’ve accumulated along the way. As well as Kaggle-specific tips, you’ll learn more general techniques for approaching tasks based on image data, tabular data, textual data, and reinforcement learning. You’ll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

What you will learn

  • Get acquainted with Kaggle and other competition platforms
  • Make the most of Kaggle Notebooks, Datasets, and Discussion forums
  • Understand different modeling tasks including binary and multi-class classification, object detection, NLP (Natural Language Processing), and time series
  • Design good validation schemes, learning about k-fold, probabilistic, and adversarial validation
  • Get to grips with evaluation metrics including MSE and its variants, precision and recall, IoU, mean average precision at k, as well as never-before-seen metrics
  • Handle simulation and optimization competitions on Kaggle
  • Create a portfolio of projects and ideas to get further in your career

Who This Book Is For

This book is suitable for Kaggle users and data analysts/scientists with at least a basic proficiency in data science topics and Python who are trying to do better in Kaggle competitions and secure jobs with tech giants. At the time of completion of this book, there are 96,190 Kaggle novices (users who have just registered on the website) and 67,666 Kaggle contributors (users who have just filled in their profile) enlisted in Kaggle competitions. This book has been written with all of them in mind and with anyone else wanting to break the ice and start taking part in competitions on Kaggle and learning from them.

Table of Contents

Part 1

  1. Introducing Kaggle and Other Data Science Competitions
  2. Organizing Data with Datasets
  3. Working and Learning with Kaggle Notebooks
  4. Leveraging Discussion Forums

Part 2

  1. Competition Tasks and Metrics
  2. Designing Good Validation
  3. Modeling for Tabular Competitions
  4. Hyperparameter Optimization
  5. Ensembling with Blending and Stacking Solutions
  6. Modeling for Computer Vision
  7. Modeling for NLP
  8. Simulation and Optimization Competitions

Part 3

  1. Creating Your Portfolio of Projects and Ideas
  2. Finding New Professional Opportunities
Owner
Packt
Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.
Packt
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models Abstract Many applications of generative models rely on the marginali

Stanford Intelligent Systems Laboratory 9 Jun 06, 2022
A PyTorch implementation of "Semi-Supervised Graph Classification: A Hierarchical Graph Perspective" (WWW 2019)

SEAL ⠀⠀⠀ A PyTorch implementation of Semi-Supervised Graph Classification: A Hierarchical Graph Perspective (WWW 2019) Abstract Node classification an

Benedek Rozemberczki 202 Dec 27, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
Blind visual quality assessment on 360° Video based on progressive learning

Blind visual quality assessment on omnidirectional or 360 video (ProVQA) Blind VQA for 360° Video via Progressively Learning from Pixels, Frames and V

5 Jan 06, 2023
ImageNet Adversarial Image Evaluation

ImageNet Adversarial Image Evaluation This repository contains the code and some materials used in the experimental work presented in the following pa

Utku Ozbulak 11 Dec 26, 2022
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

Yixuan Su 79 Nov 04, 2022
Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning

Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning. Circuit Training is an open-s

Google Research 479 Dec 25, 2022
Redash reset for python

redash-reset This will use a default REDASH_SECRET_KEY key of c292a0a3aa32397cdb050e233733900f this allows you to reset the password of the user ID bu

Robert Wiggins 5 Nov 14, 2022
Pretty Tensor - Fluent Neural Networks in TensorFlow

Pretty Tensor provides a high level builder API for TensorFlow. It provides thin wrappers on Tensors so that you can easily build multi-layer neural networks.

Google 1.2k Dec 29, 2022
Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning This repository is official Tensorflow implementation of paper: Ensemb

Seunghyun Lee 12 Oct 18, 2022
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

11 Dec 10, 2022
AI grand challenge 2020 Repo (Speech Recognition Track)

KorBERT를 활용한 한국어 텍스트 기반 위협 상황인지(2020 인공지능 그랜드 챌린지) 본 프로젝트는 ETRI에서 제공된 한국어 korBERT 모델을 활용하여 폭력 기반 한국어 텍스트를 분류하는 다양한 분류 모델들을 제공합니다. 본 개발자들이 참여한 2020 인공지

Young-Seok Choi 23 Jan 25, 2022
The final project of "Applying AI to 2D Medical Imaging Data" of "AI for Healthcare" nanodegree - Udacity.

Pneumonia Detection from X-Rays Project Overview In this project, you will apply the skills that you have acquired in this 2D medical imaging course t

Omar Laham 1 Jan 14, 2022
Code for "Learning to Regrasp by Learning to Place"

Learning2Regrasp Learning to Regrasp by Learning to Place, CoRL 2021. Introduction We propose a point-cloud-based system for robots to predict a seque

Shuo Cheng (成硕) 18 Aug 27, 2022
Recurrent Neural Network Tutorial, Part 2 - Implementing a RNN in Python and Theano

Please read the blog post that goes with this code! Jupyter Notebook Setup System Requirements: Python, pip (Optional) virtualenv To start the Jupyter

Denny Britz 863 Dec 15, 2022
DGL-TreeSearch and the Gurobi-MWIS interface

Independent Set Benchmarking Suite This repository contains the code for our maximum independent set benchmarking suite as well as our implementations

Maximilian Böther 19 Nov 22, 2022
ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

ManimML ManimML is a project focused on providing animations and visualizations of common machine learning concepts with the Manim Community Library.

259 Jan 04, 2023
💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

A 3D Generative Model for Structure-Based Drug Design Coming soon... Citation @inproceedings{luo2021sbdd, title={A 3D Generative Model for Structu

Shitong Luo 118 Jan 05, 2023
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 02, 2022