Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Overview

Stat4ML

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

This is the first course from our trio courses:

  1. Statistics Foundation for ML

https://github.com/Bellman281/Stat4ML/

  1. Introduction to Statistical Learning https://github.com/Bellman281/Intro_Statistical_Learning

  2. Advanced Statistical Learning for DL ( to be anounced)

Registration Form for cohort 2 of STAT4ML:

https://forms.gle/ZqLJLmv1K5nGVx3m7

Notes about the course:

Instructor : Omid Safarzadeh,

LinkedIn: https://www.linkedin.com/in/omidsafarzadeh/

IG : @deepdatascientists

Course Text Book: Statistical Inference 2nd Edition by George Casella (Author), Roger L. Berger (Author) :

https://www.amazon.com/Statistical-Inference-George-Casella-dp-0534243126/dp/0534243126/ref=mt_other?_encoding=UTF8&me=&qid=

Pre Requisitives

Recall from Calculus:

    Derivative
          Chain rule
    Integral
          Techniques of Integration
          Substitution
    Integration by parts

Matrix Algebra Review:

    Matrix operations
    Matrix Multiplication
       Properties of determinants
       Inverse Matrix
       Matrix Transpose
       Properties of transpose
    Partioned Matrices
    Eigenvalues and Eigenvectors
    Matrix decomposition
       LU decomposition
       Cholesky decomposition
       QR decomposition
       SVD
    Matrix Differentiation

Course 1 :

Slide 1 : Probability Theory Foundation

 Sample Space
 Probability Theory Foundation
    Axiomatic Foundations
    The Calculus of Probabilities
 Independence
 Conditional Probability
    Bayes Theorem
 Random Variables
 Probability Function
    Distribution Functions
    Density function

Slide 2: Moments

   Moments
       Expected Value
       Variance
       Covariance and Correlation
   Moment Generating Functions
       Normal mgf
   Matrix Notation for Moments

Slide 3: Distribution Functions

   Distributions
     Discrete Distribution
       Discrete Uniform Distribution
       Binomial Distribution
       Poisson Distribution
     Continuous Distribution
       Uniform Distribution
       Exponential Distribution
       Normal Distribution
       Lognormal Distribution
       Laplace Distribution
       Beta Distribution

Slide 4: Conditional and Multivariate Distributions

Joint and Marginal Distribution
Conditional Distributions and Independence
Bivariate Transformations
Hierarchical Models and Mixture Distribution
Bivariate Normal Distribution
Multivariate Distribution

Slide 5: Convergence Concepts

Random Samples
   Sums of Random Variable from a Random Sample
Inequalities
Convergence Concepts:
   Almost Sure Convergence
   Convergence in Probability
   Convergence in Distribution
The Delta Method

Slide 6: Maximum Likelihood Estimation

Maximum Likelihood Estimation
  Motivation and the Main Ideas
  Properties of the Maximum Likelihood Estimator

Slide 7: Bayesian and posterior distribution Estimation

   Computing the posterior
   Maximum likelihood estimation (MLE)
Maximum a posteriori (MAP) estimation
   Posterior mean
   MAP properties
Bayesian linear regression
Owner
Omid Safarzadeh
Deep Learning Expert, Kaggler
Omid Safarzadeh
Artificial Conversational Entity for queries in Eulogio "Amang" Rodriguez Institute of Science and Technology (EARIST)

🤖 Coeus - EARIST A.C.E 💬 Coeus is an Artificial Conversational Entity for queries in Eulogio "Amang" Rodriguez Institute of Science and Technology,

Dids Irwyn Reyes 3 Oct 14, 2022
Gold standard corpus annotated with verb-preverb connections for Hungarian.

Hungarian Preverb Corpus A gold standard corpus manually annotated with verb-preverb connections for Hungarian. corpus The corpus consist of the follo

RIL Lexical Knowledge Representation Research Group 3 Jan 27, 2022
Lattice methods in TensorFlow

TensorFlow Lattice TensorFlow Lattice is a library that implements constrained and interpretable lattice based models. It is an implementation of Mono

504 Dec 20, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

ESPnet 5.9k Jan 03, 2023
An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

PMR computer tutorials on HMMs (2021-2022) This is a repository for computer tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a Univer

Vaidotas Šimkus 10 Dec 06, 2022
Practical Machine Learning with Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Dipanjan (DJ) Sarkar 2k Jan 08, 2023
L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources.

L3Cube-MahaCorpus L3Cube-MahaCorpus a Marathi monolingual data set scraped from different internet sources. We expand the existing Marathi monolingual

21 Dec 17, 2022
nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch. Most of the models in NLP were implemented with less than 100 lines of code.(except comments or blank li

Tae-Hwan Jung 11.9k Jan 08, 2023
👄 The most accurate natural language detection library for Python, suitable for long and short text alike

1. What does this library do? Its task is simple: It tells you which language some provided textual data is written in. This is very useful as a prepr

Peter M. Stahl 334 Dec 30, 2022
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022
Implementation of Natural Language Code Search in the project CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

CodeBERT-Implementation In this repo we have replicated the paper CodeBERT: A Pre-Trained Model for Programming and Natural Languages. We are interest

Tanuj Sur 4 Jul 01, 2022
Chinese Grammatical Error Diagnosis

nlp-CGED Chinese Grammatical Error Diagnosis 中文语法纠错研究 基于序列标注的方法 所需环境 Python==3.6 tensorflow==1.14.0 keras==2.3.1 bert4keras==0.10.6 笔者使用了开源的bert4keras

12 Nov 25, 2022
Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

THUNLP 2.3k Jan 08, 2023
Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

Zeyao Du 5.6k Jan 04, 2023
Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics. Jury offers a smooth and easy-to-use interface. It uses datasets for underlying metric computa

Open Business Software Solutions 129 Jan 06, 2023
Script to generate VAD dataset used in Asteroid recipe

About the dataset LibriVAD is an open source dataset for voice activity detection in noisy environments. It is derived from LibriSpeech signals (clean

11 Sep 15, 2022
Prompt tuning toolkit for GPT-2 and GPT-Neo

mkultra mkultra is a prompt tuning toolkit for GPT-2 and GPT-Neo. Prompt tuning injects a string of 20-100 special tokens into the context in order to

61 Jan 01, 2023
A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

flair 12.3k Dec 31, 2022
Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

simple_diarizer Simplified diarization pipeline using some pretrained models. Made to be a simple as possible to go from an input audio file to diariz

Chau 65 Dec 30, 2022
Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration 🚃

This repository provides a library for efficient training of masked language models (MLM), built with fairseq. We fork fairseq to give researchers mor

Princeton Natural Language Processing 92 Dec 27, 2022