To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Experimental proxy for dumping the unencrypted packet data from Brawl Stars (WIP)

Brawl Stars Proxy Experimental proxy for version 39.99 of Brawl Stars. It allows you to capture the packets being sent between the Brawl Stars client

4 Oct 29, 2021
Absolute solvation free energy calculations with OpenFF and OpenMM

ABsolute SOLVantion Free Energy Calculations The absolv framework aims to offer a simple API for computing the change in free energy when transferring

7 Dec 07, 2022
Svg-turtle - Use the Python turtle to write SVG files

SaVaGe Turtle Use the Python turtle to write SVG files If you're using the Pytho

Don Kirkby 7 Dec 21, 2022
Additional useful operations for Python

Pyteal Extensions Additional useful operations for Python Available Operations MulDiv64: calculate m1*m2/d with no overflow on multiplication (TEAL 3+

Ulam Labs 11 Dec 14, 2022
A Lynx that manages a group that puts the federation first.

Lynx Super Federation Management Group Lynx was created to manage your groups on telegram and focuses on the Lynx Federation. I made this to root out

Unknown 2 Nov 01, 2022
A Kodi add-on for watching content hosted on PeerTube.

A Kodi add-on for watching content hosted on PeerTube. This add-on is under development so only basic features work, and you're welcome to improve it.

1 Dec 18, 2021
A notebook explaining the principle of adversarial attacks and their defences

TL;DR: A notebook explaining the principle of adversarial attacks and their defences Abstract: Deep neural networks models have been wildly successful

1 Jan 22, 2022
A python program to detect rickrolls with just the youtube link.

rickroll_detector A python program to detect rickrolls with just the youtube link. Usage: clone this repo or download zip run the main.py file with py

Tricky 4 Nov 06, 2022
Path of Exile Vendor Recipe Tracker (Chaos/Regal orb)

Path of Exile Vendor Trade Tracker Are you tired of manually keeping track of collected and missing items for farming Chaos or Regal Orbs in PoE? Me t

1 Nov 09, 2021
A micro-service that can be extended to help in monitoring systems

A micro-service that can be extended to help in monitoring systems. Be extensible to be incorporated in any of the systems to facilitate timely interventions.

Peter Kagwe 1 Feb 06, 2022
CuraMultiplyByGrid - Cura Плагин для размножения детали сеткой на весь стол автоматически без поворота

CuraMultiplyByGrid Cura Плагин для размножения детали сеткой на весь стол автоматически без поворота. Размножение в куре настолько ужасно реализовано,

3 Dec 02, 2022
Objetivo: de forma colaborativa pasar de nodos de Dynamo a Python.

ITTI_Ed01_De-nodos-a-python ITTI. EXPERT TRAINING EN AUTOMATIZACIÓN DE PROCESOS BIM: OFFICIAL DE AUTODESK. Edición 1 Enlace al Master Enunciado: Traba

1 Jun 06, 2022
Цифрова збрoя проти xуйлoвської пропаганди.

Паляниця Цифрова зброя проти xуйлoвської пропаганди. Щоб негайно почати шкварити рашистські сайти – мерщій у швидкий старт! ⚡️ А коли ворожі сервери в

8 Mar 22, 2022
Voldemort's Python import helper

importmagician Voldemort's Python import helper pip install importmagician Import from uninstalled Python directories Say you have a directory (relat

Zhengyang Feng 4 Mar 09, 2022
This library is an abstraction for Splunk-related development, maintenance, or migration operations

This library is an abstraction for Splunk-related development, maintenance, or migration operations. It provides a single CLI or SDK to conveniently perform various operations such as managing a loca

NEXTPART 6 Dec 21, 2022
An extension module to make reaction based menus with disnake

disnake-ext-menus An experimental extension menu that makes working with reaction menus a bit easier. Installing python -m pip install -U disnake-ext-

1 Nov 25, 2021
A not exist cat image generator python package

A not exist cat image generator python package

Fayas Noushad 2 Dec 03, 2021
Manually Install Python 2.7 pip without any problem !

Python2.7_install_pip Manually Install Python 2.7 pip without any problem ! Download installPip.py to your system and Run the code using this Command

Ali Jafari 1 Dec 09, 2021
A python script to turn tabs into spaces the right way.

detab A python script to turn tabs into spaces the right way. detab turns all tabs into spaces, not just leading tabs. Not all tabs have the same leng

1 Jan 26, 2022
AKSWINPOSTINIT -- AKS Windows node post provisioning initialization

AKSWINPOSTINIT -- AKS Windows node post provisioning initialization Features This is a tool that provides one-time powershell script initilization for

Ping He 3 Nov 25, 2021