To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
TurtleBot Control App - TurtleBot Control App With Python

TURTLEBOT CONTROL APP INDEX: 1. Introduction 2. Environments 2.1. Simulated Envi

Rafanton 4 Aug 03, 2022
Would upload anything I do with/related to brainfuck

My Brainfu*k Repo Basically wanted to create something with Brainfu*k but realized that with the smol brain I have, I need to see the cell values real

Rafeed 1 Mar 22, 2022
Beacon Object File (BOF) to obtain a usable TGT for the current user.

Beacon Object File (BOF) to obtain a usable TGT for the current user.

Connor McGarr 109 Dec 25, 2022
Wordler - A program to support you to solve the wordle puzzles

solve wordle (https://www.powerlanguage.co.uk/wordle) A program to support you t

Viktor Martinović 2 Jan 17, 2022
ColabFold / AlphaFold2_advanced on your local PC (or macOS)

LocalColabFold ColabFold / AlphaFold2_advanced on your local PC (or macOS) Installation For Linux Make sure curl and wget commands are already install

Yoshitaka Moriwaki 207 Dec 22, 2022
Package to provide translation methods for pyramid, and means to reload translations without stopping the application

Package to provide translation methods for pyramid, and means to reload translations without stopping the application

Grzegorz Śliwiński 4 Nov 20, 2022
Pdraw - Generate Deterministic, Procedural Artwork from Arbitrary Text

pdraw.py: Generate Deterministic, Procedural Artwork from Arbitrary Text pdraw a

Brian Schrader 2 Sep 12, 2022
Convert .1pux to .csv

1PasswordConverter Convert .1pux to .csv 1Password uses this new export format .1pux, I assume stands for 1 Password User eXport. As of right now, 1Pa

Shayne Hartford 7 Dec 16, 2022
All kinds of programs are accepted here, raise a genuine PR, and claim a PR, Make 4 successful PR's and get the Stickers and T-Shirt from hacktoberfest 2021

this repository is excluded from hacktoberfest Hacktoberfest-2021 This repository aims to help code beginners with their first successful pull request

34 Sep 11, 2022
SciPy library main repository

SciPy SciPy (pronounced "Sigh Pie") is an open-source software for mathematics, science, and engineering. It includes modules for statistics, optimiza

SciPy 10.7k Jan 09, 2023
Push a record and you will receive a email when that date

Push a record and you will receive a email when that date

5 Nov 28, 2022
For radiometrically calibrating and PSF deconvolving IRIS data

irispreppy For radiometrically calibrating and PSF deconvolving IRIS data. I dislike how I need to own proprietary software (IDL) just to simply prepa

Aaron W. Peat 4 Nov 01, 2022
Direct Multi-view Multi-person 3D Human Pose Estimation

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation [paper] [video-YouTube, video-Bilibili] [slides] This is

Sea AI Lab 253 Jan 05, 2023
Hacking and Learning consistently for 100 days straight af.

#100DaysOfHacking Hacking and Learning consistently for 100 days straight af. [yes, no breaks except mental-break ones, Obviously.] This Repo is one s

FENIL SHAH 17 Sep 09, 2022
Enhanced version of blender's bvh add-on with more settings supported. The bvh's rest pose should have the same handedness as the armature while could use a different up/forward definiton.

Enhanced bvh add-on (importer/exporter) for blender Enhanced bvh add-on (importer/exporter) for blender Enhanced bvh importer Enhanced bvh exporter Ho

James Zhao 16 Dec 20, 2022
A gamey, snakey esoteric programming language

Snak Snak is an esolang based on the classic snake game. Installation You will need python3. To use the visualizer, you will need the curses module. T

David Rutter 3 Oct 10, 2022
Object-oriented programming (OOP) is a method of structuring a program by bundling related properties and behaviors into individual objects. In this tutorial, you’ll learn the basics of object-oriented programming in Python.

06_Python_Object_Class Introduction 👋 Objected oriented programming as a discipline has gained a universal following among developers. Python, an in-

Milaan Parmar / Милан пармар / _米兰 帕尔马 239 Dec 20, 2022
A collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to provide an easy-to-use access to them.

KGQA Datasets Brief Introduction This repository is a collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to

Semantic Systems research group 21 Jan 06, 2023
UUID_ApiGenerator - This an API that will return a key-value pair of randomly generated UUID

This an API that will return a key-value pair of randomly generated UUID. Key will be a timestamp and value will be UUID. While the

1 Jan 28, 2022