Code Repository for The Kaggle Book, Published by Packt Publishing

Last update: Jan 07, 2023

Related tags

Overview

The Kaggle Book

Data analysis and machine learning for competitive data science

Code Repository for The Kaggle Book, Published by Packt Publishing

"Luca and Konradˈs book helps make Kaggle even more accessible. They are both top-ranked users and well-respected members of the Kaggle community. Those who complete this book should expect to be able to engage confidently on Kaggle – and engaging confidently on Kaggle has many rewards." — Anthony Goldbloom, Kaggle Founder & CEO

Key Features

Learn how Kaggle works and how to make the most of competitions from two expert Kaggle Grandmasters
Sharpen your modeling skills with ensembling, feature engineering, adversarial validation, AutoML, transfer learning, and techniques for parameter tuning
Challenge yourself with problems regarding tabular data, vision, natural language as well as simulation and optimization
Discover tips, tricks, and best practices for getting great results on Kaggle and becoming a better data scientist
Read interviews with 31 Kaggle Masters and Grandmasters telling about their experience and tips

Get a step ahead of your competitors with a concise collection of smart data handling and modeling techniques

Getting started

You can run these notebooks on cloud platforms like Kaggle Colab or your local machine. Note that most chapters require a GPU even TPU sometimes to run in a reasonable amount of time, so we recommend one of the cloud platforms as they come pre-installed with CUDA.

Running on a cloud platform

To run these notebooks on a cloud platform, just click on one of the badges (Colab or Kaggle) in the table below. The code will be reproduced from Github directly onto the choosen platform (you may have to add the necessary data before running it). Alternatively, we also provide links to the fully working original notebook on Kaggle that you can copy and immediately run.

no	Chapter	Notebook
05	Competition Tasks and Metrics	meta_kaggle
06	Designing Good Validation	adversarial-validation-example
07	Modeling for Tabular Competitions	interesting-eda-tsne-umap
		meta-features-and-target-encoding
		really-not-missing-at-random
		tutorial-feature-selection-with-boruta-shap
08	Hyperparameter Optimization	basic-optimization-practices
		hacking-bayesian-optimization-for-dnns
		hacking-bayesian-optimization
		kerastuner-for-imdb
		optuna-bayesian-optimization
		scikit-optimize-for-lightgbm
		tutorial-bayesian-optimization-with-lightgbm
09	Ensembling with Blending and Stacking Solutions	ensembling
10	Modeling for Computer Vision	augmentations-examples
		images-classification
		prepare-annotations
		segmentation-inference
		segmentation
		object-detection-yolov5
11	Modeling for NLP	nlp-augmentations4
		nlp-augmentation1
		qanswering
		sentiment-extraction
12	Simulation and Optimization Competitions	connectx
		mab-santa
		rps-notebook1

Book Description

Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with the rest of the community, and gain valuable experience to help grow your career.

The first book of its kind, Data Analysis and Machine Learning with Kaggle assembles the techniques and skills you’ll need for success in competitions, data science projects, and beyond. Two masters of Kaggle walk you through modeling strategies you won’t easily find elsewhere, and the tacit knowledge they’ve accumulated along the way. As well as Kaggle-specific tips, you’ll learn more general techniques for approaching tasks based on image data, tabular data, textual data, and reinforcement learning. You’ll design better validation schemes and work more comfortably with different evaluation metrics.

Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you.

What you will learn

Get acquainted with Kaggle and other competition platforms
Make the most of Kaggle Notebooks, Datasets, and Discussion forums
Understand different modeling tasks including binary and multi-class classification, object detection, NLP (Natural Language Processing), and time series
Design good validation schemes, learning about k-fold, probabilistic, and adversarial validation
Get to grips with evaluation metrics including MSE and its variants, precision and recall, IoU, mean average precision at k, as well as never-before-seen metrics
Handle simulation and optimization competitions on Kaggle
Create a portfolio of projects and ideas to get further in your career

Who This Book Is For

This book is suitable for Kaggle users and data analysts/scientists with at least a basic proficiency in data science topics and Python who are trying to do better in Kaggle competitions and secure jobs with tech giants. At the time of completion of this book, there are 96,190 Kaggle novices (users who have just registered on the website) and 67,666 Kaggle contributors (users who have just filled in their profile) enlisted in Kaggle competitions. This book has been written with all of them in mind and with anyone else wanting to break the ice and start taking part in competitions on Kaggle and learning from them.

Part 1

Introducing Kaggle and Other Data Science Competitions
Organizing Data with Datasets
Working and Learning with Kaggle Notebooks
Leveraging Discussion Forums

Part 2

Competition Tasks and Metrics
Designing Good Validation
Modeling for Tabular Competitions
Hyperparameter Optimization
Ensembling with Blending and Stacking Solutions
Modeling for Computer Vision
Modeling for NLP
Simulation and Optimization Competitions

Part 3

Creating Your Portfolio of Projects and Ideas
Finding New Professional Opportunities

Code Repository for The Kaggle Book, Published by Packt Publishing

Related tags

Overview

The Kaggle Book

Data analysis and machine learning for competitive data science

Key Features

Getting started

Running on a cloud platform

Book Description

What you will learn

Who This Book Is For

Table of Contents

Part 1

Part 2

Part 3

Owner

Packt

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

a spacial-temporal pattern detection system for home automation

Springer Link Download Module for Python

Public scripts, services, and configuration for running a smart home K3S network cluster

Code for Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations

A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks"

GeneralOCR is open source Optical Character Recognition based on PyTorch.

learned_optimization: Training and evaluating learned optimizers in JAX

Python package for visualizing the loss landscape of parameterized quantum algorithms.

kullanışlı ve işinizi kolaylaştıracak bir araç

KITTI-360 Annotation Tool is a framework that developed based on python(cherrypy + jinja2 + sqlite3) as the server end and javascript + WebGL as the front end.

Implementation of our NeurIPS 2021 paper "A Bi-Level Framework for Learning to Solve Combinatorial Optimization on Graphs".

A TikTok-like recommender system for GitHub repositories based on Gorse

This repository contains all the code and materials distributed in the 2021 Q-Programming Summer of Qode.

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Virtual Dance Reality Stage is a feature that offers you to share a stage with another user virtually.

Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)