Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced) and then train the model on this data so that it can continuously update itself with the environment and become more generalized with time. In this regard, a training and prediction dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values then they are considered good files on which we can train or predict our model, however, if it didn't match then they are moved to the bad folder. Moreover, we also identify the columns with null values. If all the data in a column is missing then the file is also moved to the bad folder. On the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and considered it as good data.

Data Insertion in Database:

First, open a connection to the database if it exists otherwise create a new one. A table with the name train_good_raw_dt or pred_good_raw_dt is created in the database, based on the training or prediction process, for inserting the good data files obtained from the data validation step. If the table is already present then new files are inserted in that table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file, to be used for the model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The columns with zero standard deviation were also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, Random Forest, K-Neighbours, Logistic Regression and XGBoost. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for all the models and select the one with the best score. Similarly, the best model is selected for each cluster. For every cluster, the models are saved so that they can be used in future predictions. In the end, the confusion matrix of the model associated with every cluster is also saved to give the a glance over the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Retraining:

After the prediction, the prediction data is merged with the previous training dataset and then the models were retrained on this data using the hyperparameter values obtained from the GridSearch. The cycle repeats with every prediction it does and learns from the newly acquired data, making it more robust.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
A stupid obfuscation thing

StupidObfuscation A stupid obfuscation thing How it works The obfuscator takes a string, splits into pieces of one, then, using the table from letter.

Echo 2 May 03, 2022
The first Python 1v1.lol triggerbot working with colors !

1v1.lol TriggerBot Afin d'utiliser mon triggerbot, vous devez activer le plein écran sur 1v1.lol sur votre naviguateur (quelque-soit ce dernier). Vous

Venax 5 Jul 25, 2022
Return-Parity-MDP - Towards Return Parity in Markov Decision Processes

Towards Return Parity in Markov Decision Processes Code for the AISTATS 2022 pap

Jianfeng Chi 3 Nov 27, 2022
A set of simple functions to upload and fetch pastes on paste.uploadgram.me

pastegram-py A set of simple functions to upload and fetch pastes on paste.uploadgram.me. API Documentation Methods upload_paste(contents: bytes, file

Uploadgram 3 Sep 13, 2022
🙌Kart of 210+ projects based on machine learning, deep learning, computer vision, natural language processing and all. Show your support by ✨ this repository.

ML-ProjectKart 📌 Repository This kart showcases the finest collection of all projects based on machine learning, deep learning, computer vision, natu

Prathima Kadari 203 Dec 28, 2022
PwnDatas-DB-Project(PDDP)

PwnDatas-DB-Project PwnDatas-DB-Project(PDDP) 安裝依賴: pip3 install pymediawiki 使用: cd /opt git https://github.com/JustYoomoon/PwnDatas-DB-Project.git c

21 Jul 16, 2021
Cloud Native sample microservices showcasing Full Stack Observability using AppDynamics and ThousandEyes

Cloud Native Sample Bookinfo App Observability Bookinfo is a sample application composed of four Microservices written in different languages.

Cisco DevNet 13 Jul 21, 2022
Remote execution of a simple function on the server

FunFetch Remote execution of a simple function on the server All types of Python support objects.

Decave 4 Jun 30, 2022
To lazy to read your homework ? Get it done with LOL

LOL To lazy to read your homework ? Get it done with LOL Needs python 3.x L:::::::::L OO:::::::::OO L:::::::::L L:::::::

KorryKatti 4 Dec 08, 2022
Meilleur outil de hacking Zapp en 2021 pour Termux

WhatsApp-Tool Meilleur outil de hacking Zapp en 2021 pour Termux Cet outil est le seul prennant en compte les dernières mises à jour de WhatsApp. FONC

2 Aug 17, 2022
A shim for the typeshed changes in mypy 0.900

types-all A shim for the typeshed changes in mypy 0.900 installation pip install types-all why --install-types is annoying, this installs all the thin

Anthony Sottile 28 Oct 20, 2022
Vaksina - Vaksina COVID QR Validation Checker With Python

Vaksina COVID QR Validation Checker Vaksina is a general purpose library intende

Michael Casadevall 33 Aug 20, 2022
Simple cash register system made with guizero

Eje-Casher なにこれ guizeroで作った簡易レジシステムです。実際にコミケで使う予定です。 これを誰かがそのまま使うかどうかというよりは、guiz

Akira Ouchi 4 Nov 07, 2022
Programa que organiza pastas automaticamente

📂 Folder Organizer 📂 Programa que organiza pastas automaticamente Requisitos • Como usar • Melhorias futuras • Capturas de Tela Requisitos Antes de

João Victor Vilela dos Santos 1 Nov 02, 2021
Homed - Light-weight, easily configurable, dockerized homepage

homed GitHub Repo Docker Hub homed is a light-weight customizable portal primari

Matt Walters 12 Dec 15, 2022
🇮🇳 A Indian Flag Animation Project Made With Python

🇮🇳 A Indian Flag Animation Project Made With Python

MuFaz-TG 2 Oct 21, 2022
pvaPy provides Python bindings for EPICS pvAccess

PvaPy - PvAccess for Python The PvaPy package is a Python API for EPICS7. It supports both PVA and CA providers, all standard EPICS7 types (structures

EPICS Base 25 Dec 05, 2022
Example platform plugin that fixes fentry calls in Binja

Example Binja Platform Plugin This is an example Binja platform plugin which fixes up linux kernel module calls to __fentry__. __fentry__ is the linux

_yrp 2 Oct 07, 2021
Open Source Repository for CFD Solvers

Background and Validation This wiki is built in Notion. Here are all the tips you need to contribute. General Background Flow over a cylinder The proj

1 Dec 30, 2021
ARK sõidueksami Matrixi bot

ARK Sõidueksami bot Küsib ARK-i lehelt uusimad eksami ajad ja saadab sõnumi Matrixi kanali Dev setup Linux python3 -m venv venv source venv/bin/activa

Arti Zirk 3 Jun 15, 2021