To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Robotic hamster to give you financial advice

hampp Robotic hamster to give you financial advice. I am not liable for any advice that the hamster gives. Follow at your own peril. Description Hampp

1 Nov 17, 2021
Context-free grammar to Sublime-syntax file

Generate a sublime-syntax file from a non-left-recursive, follow-determined, context-free grammar

Haggai Nuchi 8 Nov 17, 2022
Educational Repo. Used whilst learning Flask.

flask_python Educational Repo. Used whilst learning Flask. The below instructions will be required whilst establishing as new project. Install Flask (

Jordan 2 Oct 15, 2021
A pypi package details search python module

A pypi package details search python module

Fayas Noushad 5 Nov 30, 2021
⏰ Shutdown Timer is an application that you can shutdown, restart, logoff, and hibernate your computer with a timer.

Shutdown Timer is a an application that you can shutdown, restart, logoff, and hibernate your computer with a timer. After choosing an action from the

Mehmet Güdük 5 Jun 27, 2022
CircuitPython Driver for Adafruit 24LC32 I2C EEPROM Breakout 32Kbit / 4 KB

Introduction CircuitPython driver for Adafruit 24LC32 I2C EEPROM Breakout Dependencies This driver depends on: Adafruit CircuitPython Bus Device Regis

Adafruit Industries 4 Oct 03, 2022
Pyhexdmp - Python hex dump module

Pyhexdmp - Python hex dump module

25 Oct 23, 2022
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Olga Tsiouri 1 Jan 23, 2022
DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Moroccan Web Portals

DownTime-Score DownTime-Score is a Small project aimed to Monitor the performance and the availabillity of a variety of the Vital and Critical Morocca

adnane-tebbaa 5 Apr 30, 2022
XHacks 2021 Startup Track Winner: Be Heard. Educate, Enact, Empower. No voice left behind. (backend)

Be Heard: X Hacks 2021 Submission Educate, Enact, Empower. No voice left behind. Inspiration To say 2020 was an eventful year would be an understateme

3 Jul 14, 2022
Python module to work with Magneto Database directly without using broken Magento 2 core

Python module to work with Magneto Database directly without using broken Magento 2 core

Egor Shitikov 13 Nov 10, 2022
This is a vscode extension with a Virtual Assistant that you can play with when you are bored or you need help..

VS Code Virtual Assistant This is a vscode extension with a Virtual Assistant that you can play with when you are bored or you need help. Its currentl

Soham Ghugare 6 Aug 22, 2021
Pytorch implementation of "Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates"

Peer Loss functions This repository is the (Multi-Class & Deep Learning) Pytorch implementation of "Peer Loss Functions: Learning from Noisy Labels wi

Kushal Shingote 1 Feb 08, 2022
Grade 8 Version of Space Invaders

Space-Invaders Grade 8 Version of Space Invaders Compatability This program is Python 3 Compatable, and not Python 2 Compatable because i haven't test

Space64 0 Feb 16, 2022
Just RESTing

petnica-api-workshop Just RESTing Setup Using pipenv You can setup this project with pipenv if you want isolated libraries. After you've installed pip

Aleksa Tešić 1 Oct 23, 2021
thonny plugin for gitonic

thonny-gitonic thonny plugin for gitonic open gitonic in thonny by pressing Control+Shift+g, or via tools menu press ESC key to minimize gitonic windo

karl 1 Apr 12, 2022
A browser login credentials thief for windows and Linux

Thief 🦹🏻 A browser login credentials thief for windows and Linux Python script to decrypt login credentials from browsers in windows or linux Decryp

Ash 1 Dec 13, 2021
Nimbus - Open Source Cloud Computing Software - 100% Apache2 licensed

⚠️ The Nimbus infrastructure project is no longer under development. ⚠️ For more information, please read the news announcement. If you are interested

Nimbus 194 Jun 30, 2022
An alternative app for core Armoury Crate functions.

NoROG DISCLAIMER: Use at your own risk. This is alpha-quality software. It has not been extensively tested, though I personally run it daily on my lap

12 Nov 29, 2022
Open source tools to allow working with ESP devices in the browser

ESP Web Tools Allow flashing ESPHome or other ESP-based firmwares via the browser. Will automatically detect the board type and select a supported fir

ESPHome 195 Dec 31, 2022