A comprehensive repository containing 30+ notebooks on learning machine learning!

Overview

A Complete Machine Learning Package


Techniques, tools, best practices and everything you need to to learn machine learning!

toolss

This is a comprehensive repository containing 30+ notebooks on Python programming, data manipulation, data analysis, data visualization, data cleaning, classical machine learning, Computer Vision and Natural Language Processing(NLP).

All notebooks were created with the readers in mind. Every notebook starts with a high-level overview of any specific algorithm/concepts being covered. Wherever possible, visuals are used to make things clear.

Viewing and Running the Notebooks

The easiest way to view all the notebooks is to use Nbviewer.

  • Render nbviewer

If you want to play with the codes, you can use the following platforms:

  • Open In Colab

  • Launch in Deepnote

Deepnote will direct you to Intro to Machine Learning. Heads to the project side bar for more notebooks.

Tools Overview

The following are the tools that are covered in the notebooks. They are popular tools that machine learning engineers and data scientists need in one way or another and day to day.

  • Python is a high level programming language that has got a lot of popularity in the data community and with the rapid growth of the libraries and frameworks, this is a right programming language to do ML.

  • NumPy is a scientific computing tool used for array or matrix operations.

  • Pandas is a great and simple tool for analyzing and manipulating data from a variety of different sources.

  • Matplotlib is a comprehensive data visualization tool used to create static, animated, and interactive visualizations in Python.

  • Seaborn is another data visualization tool built on top of Matplotlib which is pretty simple to use.

  • Scikit-Learn: Instead of building machine learning models from scratch, Scikit-Learn makes it easy to use classical models in a few lines of code. This tool is adapted by almost the whole of the ML community and industries, from the startups to the big techs.

  • TensorFlow and Keras for neural networks: TensorFlow is a popular deep learning framework used for building models suitable for different fields such as Computer Vision and Natural Language Processing. At its backend, it uses Keras which is a high level API for building neural networks easily. TensorFlow has gained a lot of popularity in the ML community due to its complete ecosystem made of wholesome tools including TensorBoard, TF Datasets, TensorFlow Lite, TensorFlow Extended, TensorFlow.js, etc...

Outline

Part 1 - Intro to Python and Working with Data

0 - Intro to Python for Machine Learning

1 - Data Computation With NumPy

  • Creating a NumPy Array
  • Selecting Data: Indexing and Slicing An Array
  • Performing Mathematical and other Basic Operations
  • Perform Basic Statistics
  • Manipulating Data

2 - Data Manipulation with Pandas

  • Basics of Pandas
    • Series and DataFrames
    • Data Indexing and Selection
    • Dealing with Missing data
    • Basic operations and Functions
    • Aggregation Methods
    • Groupby
    • Merging, Joining and Concatenate
  • Beyond Dataframes: Working with CSV, and Excel
  • Real World Exploratory Data Analysis (EDA)

3 - Data Visualization with Matplotlib and Seaborn

4 - Real World Data - Exploratory Analysis and Data Preparation

Part 2 - Machine Learning

5 - Intro to Machine Learning

  • Intro to Machine Learning
  • Machine Learning Workflow
  • Evaluation Metrics
  • Handling Underfitting and Overfitting

6 - Classical Machine Learning with Scikit-Learn

Part 3 - Deep Learning

7 - Intro to Artificial Neural Networks and TensorFlow

8 - Deep Computer Vision with TensorFlow

9 - Natural Language Processing with TensorFlow

Used Datasets

Many of the datasets used for this repository are from the following sources:

Further Resources

Machine Learning community is very vibrant. There are many faboulous learning resources, some of which are paid or free available. Here is a list of courses that has got high community ratings. They are not listed in an order they are to be taken.

Courses

  • Machine Learning by Coursera: This course was tought by Andrew Ng. It is one of the most popular machine learning courses, it has been taken by over 4M of people. The course focuses more about the fundamentals of machine learning techniques and algorithms. It is free on Coursera.

  • Deep Learning Specialization: Also tought by Andrew Ng., Deep Learning Specialization is also a foundations based course. It teaches a decent foundations of major deep learning architectures such as convolutional neural networks and recurrent neural networks. The full course can be audited on Coursera, or watch freely on Youtube.

  • MIT Intro to Deep Learning: This course provide the foundations of deep learning in resonably short period of time. Each lecture is one hour or less, but the materials are still the best in classs. Check the course page here, and lecture videos here.

  • CS231N: Convolutional Neural Networks for Visual Recognition by Stanford: CS231N is one of the best deep learning and computer vision courses. The 2017 version was taught by Fei-Fei Li, Justin Johnson and Serena Yeung. The 2016 version was taught by Fei-Fei, Johnson and Andrej Karpathy. See 2017 lecture videos here, and other materials here.

  • Practical Deep Learning for Coders by fast.ai: This is also an intensive deep learning course pretty much the whole spectrum of deep learning architectures and techniques. The lecture videos and other resources such as notebooks on the course page.

  • Full Stack Deep Learning: While the majority of machine learning courses focuses on modelling, this course focuses on shipping machine learning systems. It teaches how to design machine learning projects, data management(storage, access, processing, versioning, and labeling), training, debugging, and deploying machine learning models. See 2021 version here and 2019 here. You can also skim through the project showcases to see the kind of the courses outcomes through learners projects.

  • NYU Deep Learning Spring 2021: Taught at NYU by Yann LeCun, Alfredo Canziani, this course is one of the most creative courses out there. The materials are presented in amazing way. Check the lecture videos here, and the course repo here.

  • CS224N: Natural Language Processing with Deep Learning by Stanford: If you are interested in Natural Language Processing, this is a great course to take. It is taught by Christopher Manning, one of the world class NLP stars. See the lecture videos here.

Books

Below is of the most awesome machine learning books.

  • The Hundred-Page Machine Learning Book: Authored by Andriy Burkov, this is one of the shortest but concise and well written book that you will ever find on the internet. You can read the book for free here.

  • Machine Learning Engineering: Also authored by Andriy Burkov, this is another great machine learning book that uncover each step of machine learning workflow, from data collection, preparation....to model serving and maintenance. The book is also free here.

  • Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Authored by Aurelion Geron, this is one of the best machine learning books. It is clearly written and full of ideas and best practices. You can ge the book here, or see its repository here.

  • Deep Learning: Authored by 3 deep learning legends, Ian Goodfellow and Yoshua Bengio and Aaron Courville, this is one of the great deep learning books that is freely available. You can get it here.

  • Deep Learning with Python: Authored by Francois Chollet, The Keras designer, this is a very comprehensive deep learning book. You can get the book here, and the book repo here.

  • Dive into Deep Learning: This is also a great deep learning book that is freely available. The book uses both PyTorch and TensorFlow. You can read the entire book here.

  • Neural Networks and Deep Learning: This is also another great deep learning online book by Michael Nielsen. You can read the entire book here.

If you are interested in more machine learning and deep learning resources, check this, this


This repository was created by Jean de Dieu Nyandwi. You can find him on:

If you find any of this thing helpful, shoot him a tweet or a mention :)

Owner
Jean de Dieu Nyandwi
Building machine learning systems!
Jean de Dieu Nyandwi
Self Organising Map (SOM) for clustering of atomistic samples through unsupervised learning.

Self Organising Map for Clustering of Atomistic Samples - V2 Description Self Organising Map (also known as Kohonen Network) implemented in Python for

Franco Aquistapace 0 Nov 16, 2021
Covid-polygraph - a set of Machine Learning-driven fact-checking tools

Covid-polygraph, a set of Machine Learning-driven fact-checking tools that aim to address the issue of misleading information related to COVID-19.

1 Apr 22, 2022
Neighbourhood Retrieval (Nearest Neighbours) with Distance Correlation.

Neighbourhood Retrieval with Distance Correlation Assign Pseudo class labels to datapoints in the latent space. NNDC is a slim wrapper around FAISS. N

The Learning Machines 1 Jan 16, 2022
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec

Chip Huyen 3.3k Jan 05, 2023
Scikit-Garden or skgarden is a garden for Scikit-Learn compatible decision trees and forests.

Scikit-Garden or skgarden (pronounced as skarden) is a garden for Scikit-Learn compatible decision trees and forests.

260 Dec 21, 2022
A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model

A Microsoft Azure Web App project named Covid 19 Predictor using Machine learning Model (Random Forest Classifier Model ) that helps the user to identify whether someone is showing positive Covid sym

Priyansh Sharma 2 Oct 06, 2022
Random Forest Classification for Neural Subtypes

Random Forest classifier for neural subtypes extracted from extracellular recordings from human brain organoids.

Michael Zabolocki 1 Jan 31, 2022
Dragonfly is an open source python library for scalable Bayesian optimisation.

Dragonfly is an open source python library for scalable Bayesian optimisation. Bayesian optimisation is used for optimising black-box functions whose

744 Jan 02, 2023
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

BASTA team 16 Nov 15, 2022
A library to generate synthetic time series data by easy-to-use factors and generator

timeseries-generator This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_ge

Nike Inc. 87 Dec 20, 2022
YouTube Spam Detection with python

YouTube Spam Detection This code deletes spam comment on youtube videos based on two characteristics (currently) If the author of the comment has a se

MohamadReza Taalebi 5 Sep 27, 2022
Solve automatic numerical differentiation problems in one or more variables.

numdifftools The numdifftools library is a suite of tools written in _Python to solve automatic numerical differentiation problems in one or more vari

Per A. Brodtkorb 181 Dec 16, 2022
A Multipurpose Library for Synthetic Time Series Generation in Python

TimeSynth Multipurpose Library for Synthetic Time Series Please cite as: J. R. Maat, A. Malali, and P. Protopapas, “TimeSynth: A Multipurpose Library

278 Dec 26, 2022
Both social media sentiment and stock market data are crucial for stock price prediction

Relating-Social-Media-to-Stock-Movement-Public - We explore the application of Machine Learning for predicting the return of the stock by using the information of stock returns. A trading strategy ba

Vishal Singh Parmar 15 Oct 29, 2022
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
Fundamentals of Machine Learning

Fundamentals-of-Machine-Learning This repository introduces the basics of machine learning algorithms for preprocessing, regression and classification

Happy N. Monday 3 Feb 15, 2022
A repository to work on Machine Learning course. Select an algorithm to classify writer's gender, of Hebrew texts.

MachineLearning A repository to work on Machine Learning course. Select an algorithm to classify writer's gender, of Hebrew texts. Tested algorithms:

Haim Adrian 1 Feb 01, 2022
Backtesting an algorithmic trading strategy using Machine Learning and Sentiment Analysis.

Trading Tesla with Machine Learning and Sentiment Analysis An interactive program to train a Random Forest Classifier to predict Tesla daily prices us

Renato Votto 31 Nov 17, 2022
InfiniteBoost: building infinite ensembles with gradient descent

InfiniteBoost Code for a paper InfiniteBoost: building infinite ensembles with gradient descent (arXiv:1706.01109). A. Rogozhnikov, T. Likhomanenko De

Alex Rogozhnikov 183 Jan 03, 2023
The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it inside a loop of Design, Model Development and Operations.

MLOps The MLOps is the process of continuous integration and continuous delivery of Machine Learning artifacts as a software product, keeping it insid

Maykon Schots 25 Nov 27, 2022