To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Random Programming Language Project

Crastle Random Programming Language Project Freedom of expression Are you a fan of curly brace languages? Then use curly braces! Not a fan of curly br

DevNugget 2 Dec 23, 2021
A collection of some leetcode challenges in python and JavaScript

Python and Javascript Coding Challenges Some leetcode questions I'm currently working on to open up my mind to better ways of problem solving. Impleme

Ted Ngeene 1 Dec 20, 2021
Beatsaber for Python

beatsaber Beatsaber for Python It was automatically generated with mkpylib. If you're reading this message, it m

Shawn Presser 3 Jul 30, 2021
Med to csv - A simple way to parse MedAssociate output file in tidy data

MedAssociates to CSV file A simple way to parse MedAssociate output file in tidy

Jean-Emmanuel Longueville 5 Sep 09, 2022
Extract continuous and discrete relaxation spectra from G(t)

pyReSpect-time Extract continuous and discrete relaxation spectra from stress relaxation modulus G(t). The papers which describe the method and test c

3 Nov 03, 2022
Defichain maxi - Scripts to optimize performance on defichain rewards

defichain_maxi This script is made to optimize your defichain vault rewards by m

kuegi 75 Dec 31, 2022
Get a list of the top-10 rejected libraries in your WhiteSource inventory

WhiteSource Top 10 Rejected Libraries Generate a spreadsheet listing the 10 most common libraries in your WhiteSource inventory that were rejected by

WhiteSource-PS-tools 10 Mar 23, 2022
Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting

Junhua Zou 7 Oct 20, 2022
J MBF - Assalamualaikum Mamang...

★ VISITOR ★ ★ INFORMATION ★ Script Ini DiBuat Oleh YayanXD Script Ini Akan DiPerjual Belikan Tanggal 9 Januari 2022 Jika Mau Beli Script Silahkan Hub

Risky [ Zero Tow ] 5 Apr 08, 2022
GCP Scripts and API Client Toolss

GCP Scripts and API Client Toolss Script Authentication The scripts and CLI assume GCP Application Default Credentials are set. Credentials can be set

3 Feb 21, 2022
Python pyside2 kütüphanesi ile oluşturduğum drone için yer kontrol istasyonu yazılımı.

Ground Control Station (Yer Kontrol İstasyonu) Teknofest yarışmasında yerlilik kısmında Yer Kontrol İstasyonu yazılımı seçeneği bulunuyordu. Bu yüzden

Emirhan Bülbül 4 May 14, 2022
An early stage integration of Hotwire Turbo with Django

Note: This is not ready for production. APIs likely to change dramatically. Please drop by our Slack channel to discuss!

Hotwire for Django 352 Jan 06, 2023
IEEE ITU bunyesinde komitelere verilen Python3 egitiminin dokumanlastirilmis versiyonlari bu repository altinda tutulmaktadir.

IEEE ITU Python Egitimi Nasil Faydalanmaliyim? Dersleri izledikten sonra dokumanlardaki kodlari yorum satirlari isaretlerini kaldirarak deneyebilirsin

İTÜ IEEE Student Branch 47 Sep 04, 2022
An evolutionary multi-agent platform based on mesa and NEAT

An evolutionary multi-agent platform based on mesa and NEAT

Valerio1988 6 Dec 04, 2022
Process GPX files (adding sensor metrics, uploading to InfluxDB, etc.) exported from imxingzhe.com

Xingzhe GPX Processor 行者轨迹处理工具 Xingzhe sells cheap GPS bike meters with sensor support including cadence, heart rate and power. But the GPX files expo

Shengqi Chen 8 Sep 23, 2022
This simple script generates a backup of a given Python and R environment

Python Environment Backup It’s always good to maintain your Python and R Anaconda environment packages properly listed and well-kept in case you have

Andrew Laganaro 1 Jul 13, 2022
A small Blender addon for changing an object's local orientation while in edit mode

A small Blender addon for changing an object's local orientation while in edit mode.

Jonathan Lampel 50 Jan 06, 2023
This is a Blender 2.9 script for importing mixamo Models to Godot-3

Mixamo-To-Godot This is a Blender 2.9 script for importing mixamo Models to Godot-3 The script does the following things Imports the mixamo models fro

8 Sep 02, 2022
because rico hates uuid's

terrible-uuid-lambda because rico hates uuid's sub 200ms response times! Try it out here: https://api.mathisvaneetvelde.com/uuid https://api.mathisvan

Mathis Van Eetvelde 2 Feb 15, 2022
All Assignments , Test , Quizzes and Exams with solutions from NIT Patna B.Tech CSE 5th Semester.

A 🌟 to repo would be delightful, just do it ✔️ it is inexpensive. All Assignments , Quizzes and Exam papers at one place with clean and elegant solut

LakhanKumawat ᵖ⁺ 16 Dec 05, 2022