This is my implementation on the K-nearest neighbors algorithm from scratch using Python

Overview

K Nearest Neighbors (KNN) algorithm

In this Machine Learning world, there are various algorithms designed for classification problems such as Logistic Regression, Decision Tree, Random Forest, Gradient Boosting over Decision Trees, etc. and KNN is not exceptional. When approaching to a classification problem, off the top of my head I will come up with the algorithm (KNN) first because it may be the most basic and simple algorithm to implement. In this project, I'm going to build a KNN model to do tasks on classifying flower types in the dataset Iris.

About data

The dataset I'm going to use is a flower dataset called Iris. This is a bit of information about this dataset that you should care about:

  • The data set contains 3 classes (Iris Setosa, Iris Versicolor, Iris Virginica) of 50 instances each, where each class refers to a type of iris plant.
  • There are 4 attributes in total, which are sepal length in cm, sepal width in cm, petal length in cm, petal width in cm.
  • Classes: Iris setosa, Iris versicolor, Iris virginica

Iris species

Environment

MacOS Monterey 12.1, Anaconda Virtual Environment, Python 3.9.7 64-bit

Requirements

  1. Numpy
  2. Pandas
  3. Scikit-learn

KNN procedures

  1. Load the data

  2. Initialise the value of K

  3. For getting the predicted class, iterate from 1 to the total number of training data points

    3.1 Calculate the distance between a test sample and each row of training data. Here we will use L2 norm (Euclidean distance) as our distance metric since it’s the most popular method

    3.2 Sort the calculated distances in ascending order based on distance values

    3.3 Get top K nearest neighbors from the sorted array

    3.4 Get a class with the maximum number of votes

    3.5 Return the predicted class

How to run this program

In your project folder, open a terminal and run python knn.py

Result

An example result

Reference:

For more information about the KNN algorithm, please follow this link:

Owner
sonny1902
Every cloud has a silver lining
sonny1902
ETNA is an easy-to-use time series forecasting framework.

ETNA is an easy-to-use time series forecasting framework. It includes built in toolkits for time series preprocessing, feature generation, a variety of predictive models with unified interface - from

Tinkoff.AI 674 Jan 07, 2023
A simple and lightweight genetic algorithm for optimization of any machine learning model

geneticml This package contains a simple and lightweight genetic algorithm for optimization of any machine learning model. Installation Use pip to ins

Allan Barcelos 8 Aug 10, 2022
Predict the output which should give a fair idea about the chances of admission for a student for a particular university

Predict the output which should give a fair idea about the chances of admission for a student for a particular university.

ArvindSandhu 1 Jan 11, 2022
Machine Learning University: Accelerated Natural Language Processing Class

Machine Learning University: Accelerated Natural Language Processing Class This repository contains slides, notebooks and datasets for the Machine Lea

AWS Samples 2k Jan 01, 2023
Machine learning that just works, for effortless production applications

Machine learning that just works, for effortless production applications

Elisha Yadgaran 16 Sep 02, 2022
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
The easy way to combine mlflow, hydra and optuna into one machine learning pipeline.

mlflow_hydra_optuna_the_easy_way The easy way to combine mlflow, hydra and optuna into one machine learning pipeline. Objective TODO Usage 1. build do

shibuiwilliam 9 Sep 09, 2022
End to End toy example of MLOps

churn_model MLOps Toy Example End to End You might find below links useful Connect VSCode to Git MLFlow Port Heroku App Project Organization β”œβ”€β”€ LICEN

Ashish Tele 6 Feb 06, 2022
Python package for causal inference using Bayesian structural time-series models.

Python Causal Impact Causal inference using Bayesian structural time-series models. This package aims at defining a python equivalent of the R CausalI

Thomas Cassou 219 Dec 11, 2022
A repository to work on Machine Learning course. Select an algorithm to classify writer's gender, of Hebrew texts.

MachineLearning A repository to work on Machine Learning course. Select an algorithm to classify writer's gender, of Hebrew texts. Tested algorithms:

Haim Adrian 1 Feb 01, 2022
Simple and flexible ML workflow engine.

This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable flow to handle requests. Engine is designed to be configurable wit

Katana ML 295 Jan 06, 2023
Reproducibility and Replicability of Web Measurement Studies

Reproducibility and Replicability of Web Measurement Studies This repository holds additional material to the paper "Reproducibility and Replicability

6 Dec 31, 2022
Code Repository for Machine Learning with PyTorch and Scikit-Learn

Code Repository for Machine Learning with PyTorch and Scikit-Learn

Sebastian Raschka 1.4k Jan 03, 2023
Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

(intron I nterrogator and C lassifier) intronIC is a program that can be used to classify intron sequences as minor (U12-type) or major (U2-type), usi

Graham Larue 4 Jul 26, 2022
A high performance and generic framework for distributed DNN training

BytePS BytePS is a high performance and general distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on eith

Bytedance Inc. 3.3k Dec 28, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.2k Jan 02, 2023
A simple python program which predicts the success of a movie based on it's type, actor, actress and director

Movie-Success-Prediction A simple python program which predicts the success of a movie based on it's type, actor, actress and director. The program us

Mahalinga Prasad R N 1 Dec 17, 2021
Software Engineer Salary Prediction

Based on 2021 stack overflow data, this machine learning web application helps one predict the salary based on years of experience, level of education and the country they work in.

Jhanvi Mimani 1 Jan 08, 2022
This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

Martin Huber 59 Dec 09, 2022
A library to generate synthetic time series data by easy-to-use factors and generator

timeseries-generator This repository consists of a python packages that generates synthetic time series dataset in a generic way (under /timeseries_ge

Nike Inc. 87 Dec 20, 2022