A Collection of Cheatsheets, Books, Questions, and Portfolio For DS/ML Interview Prep

Overview

Here are the sections:

Data Science Cheatsheets

This section contains cheatsheets of basic concepts in data science that will be asked in interviews:

Data Science EBooks

This section contains books that I have read about data science and machine learning:

Data Science Question Bank

This section contains sample questions that were asked in actual data science interviews:

Data Science Case Studies

This section contains case study questions that concern designing machine learning systems to solve practical problems.

Data Science Portfolio

This section contains portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

For a more visually pleasant experience for browsing the portfolio, check out jameskle.com/data-portfolio

  • Recommendation Systems

    • Transfer Rec: My ongoing research work that intersects deep learning and recommendation systems.

    • Movie Recommendation: Designed 4 different models that recommend items on the MovieLens dataset.

    Tools: PyTorch, TensorBoard, Keras, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Scikit-Learn, Surprise, Wordcloud

  • Machine Learning

    • Trip Optimizer: Used XGBoost and evolutionary algorithms to optimize the travel time for taxi vehicles in New York City.

    • Instacart Market Basket Analysis: Tackled the Instacart Market Basket Analysis challenge to predict which products will be in a user's next order.

    Tools: Pandas, NumPy, Matplotlib, XGBoost, Geopy, Scikit-Learn

  • Computer Vision

    • Fashion Recommendation: Built a ResNet-based model that classifies and recommends fashion images in the DeepFashion database based on semantic similarity.

    • Fashion Classification: Developed 4 different Convolutional Neural Networks that classify images in the Fashion MNIST dataset.

    • Dog Breed Classification: Designed a Convolutional Neural Network that identifies dog breed.

    • Road Segmentation: Implemented a Fully-Convolutional Network for semantic segmentation task in the Kitty Road Dataset.

    Tools: TensorFlow, Keras, Pandas, NumPy, Matplotlib, Scikit-Learn, TensorBoard

  • Natural Language Processing

  • Data Analysis and Visualization

    • World Cup 2018 Team Analysis: Analysis and visualization of the FIFA 18 dataset to predict the best possible international squad lineups for 10 teams at the 2018 World Cup in Russia.

    • Spotify Artists Analysis: Analysis and visualization of musical styles from 50 different artists with a wide range of genres on Spotify.

    Tools: Pandas, NumPy, Matplotlib, Rspotify, httr, dplyr, tidyr, radarchart, ggplot2

Data Journalism Portfolio

This section contains portfolio of data journalism articles completed by me for freelance clients and self-learning purposes.

For a more visually pleasant experience for browsing the portfolio, check out jameskle.com/data-journalism

Downloadable Cheatsheets

These PDF cheatsheets come from BecomingHuman.AI.

1 - Neural Network Basics

Neural Network Basics

2 - Neural Network Graphs

Neural Network Graphs

3 - Machine Learning with Emojis

Machine Learning with Emojis

4 - Scikit-Learn With Python

Scikit-Learn With Python

5 - Python Basics

Python Basics

6 - NumPy Basics

NumPy Basics

7 - Pandas Basics

Pandas Basics

8 - Data Wrangling With Pandas

Data Wrangling With Pandas Part 1

Data Wrangling With Pandas Part 2

9 - SciPy Linear Algebra

SciPy Linear Algebra

10 - Matplotlib Basics

Matplotlib Basics

11 - Keras

Keras

12 - Big-O

Big-O

Owner
James Le
Data Journalist πŸ“ -> Data Scientist πŸ“Š -> Machine Learning Researcher πŸ” -> Data Advocate 🀝
James Le
Comprehensive Python Cheatsheet

Comprehensive Python Cheatsheet Download text file, Buy PDF, Fork me on GitHub or Check out FAQ. Contents 1. Collections: List, Dictionary, Set, Tuple

Jefferson 1 Jan 23, 2022
This repo contains everything you'll ever need to learn/revise python basics

Python Notes/cheat sheet Simplified notes to get your Python basics right Just compare code and output side by side and feel the rush of enlightenment

Hem 5 Oct 06, 2022
πŸ“š Papers & tech blogs by companies sharing their work on data science & machine learning in production.

applied-ml Curated papers, articles, and blogs on data science & machine learning in production. βš™οΈ Figuring out how to implement your ML project? Lea

Eugene Yan 22.1k Jan 03, 2023
MkDocs plugin for setting revision date from git per markdown file

mkdocs-git-revision-date-plugin MkDocs plugin that displays the last revision date of the current page of the documentation based on Git. The revision

Terry Zhao 48 Jan 06, 2023
SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful.

Simple Codes That You Might Need SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful. In

CodeWriter21 2 Jan 21, 2022
Plotting and analysis tools for ARTIS simulations

Artistools Artistools is collection of plotting, analysis, and file format conversion tools for the ARTIS radiative transfer code. Installation First

ARTIS Monte Carlo Radiative Transfer 8 Nov 07, 2022
Example Python code for running the mango-explorer marketmaker

πŸ₯­ Mango Explorer πŸ“– Introduction This guide will show you how to load and run a customisable marketmaker that runs on Mango Markets using the mango-e

Blockworks Foundation 2 Apr 11, 2022
Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

CLI tool to measure the build time of different, free configurable Sphinx-Projec

useblocks 11 Nov 25, 2022
Tutorial for STARKs with supporting code in python

stark-anatomy STARK tutorial with supporting code in python Outline: introduction overview of STARKs basic tools -- algebra and polynomials FRI low de

121 Jan 03, 2023
NetBox plugin that stores configuration diffs and checks templates compliance

Config Officer - NetBox plugin NetBox plugin that deals with Cisco device configuration (collects running config from Cisco devices, indicates config

77 Dec 21, 2022
A collection of lecture notes, drawings, flash cards, mind maps, scripts

Neuroanatomy A collection of lecture notes, drawings, flash cards, mind maps, scripts and other helpful resources for the course "Functional Organizat

Georg Reich 3 Sep 21, 2022
Valentine-with-Python - A Python program generates an animation of a heart with cool texts of your loved one

Valentine with Python Valentines with Python is a mini fun project I have coded.

Niraj Tiwari 4 Dec 31, 2022
Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

ht-getter Searches a document for hash tags. Supports multiple natural languages. Works in various contexts. This package uses a non-regex approach an

Rairye 1 Mar 01, 2022
More detailed upload statistics for Nicotine+

More Upload Statistics A small plugin for Nicotine+ 3.1+ to create more detailed upload statistics. ⚠ No data previous to enabling this plugin will be

Nick 1 Dec 17, 2021
Seamlessly integrate pydantic models in your Sphinx documentation.

Seamlessly integrate pydantic models in your Sphinx documentation.

Franz WΓΆllert 71 Dec 26, 2022
Credit EDA Case Study Using Python

This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loa

Purvi Padliya 1 Jan 14, 2022
Paper and Code for "Curriculum Learning by Optimizing Learning Dynamics" (AISTATS 2021)

Curriculum Learning by Optimizing Learning Dynamics (DoCL) AISTATS 2021 paper: Title: Curriculum Learning by Optimizing Learning Dynamics [pdf] [appen

Tianyi Zhou 15 Dec 06, 2022
Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents.

Iury de oliveira gomes figueiredo 60 Nov 16, 2022
A Power BI/Google Studio Dashboard to analyze previous OTC CatchUps

OTC CatchUp Dashboard A Power BI/Google Studio dashboard analyzing OTC CatchUps. File Contents * β”œβ”€β”€β”€data β”œβ”€β”€β”€old summaries ─── *.md β”œ

11 Oct 30, 2022
A system for Python that generates static type annotations by collecting runtime types

MonkeyType MonkeyType collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type

Instagram 4.1k Jan 07, 2023