Identifies the faulty wafer before it can be used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells.

Overview

Retrainable-Faulty-Wafer-Detector

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced) and then train the model on this data so that it can continuously update itself with the environment and become more generalized with time. In this regard, a training and prediction dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values then they are considered good files on which we can train or predict our model, however, if it didn't match then they are moved to the bad folder. Moreover, we also identify the columns with null values. If all the data in a column is missing then the file is also moved to the bad folder. On the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and considered it as good data.

Data Insertion in Database:

First, open a connection to the database if it exists otherwise create a new one. A table with the name train_good_raw_dt or pred_good_raw_dt is created in the database, based on the training or prediction process, for inserting the good data files obtained from the data validation step. If the table is already present then new files are inserted in that table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file, to be used for the model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The columns with zero standard deviation were also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, Random Forest, K-Neighbours, Logistic Regression and XGBoost. For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for all the models and select the one with the best score. Similarly, the best model is selected for each cluster. For every cluster, the models are saved so that they can be used in future predictions. In the end, the confusion matrix of the model associated with every cluster is also saved to give the a glance over the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Retraining:

After the prediction, the prediction data is merged with the previous training dataset and then the models were retrained on this data using the hyperparameter values obtained from the GridSearch. The cycle repeats with every prediction it does and learns from the newly acquired data, making it more robust.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Autogenerador tonto de paquetes para ROSCPP

Autogenerador tonto de paquetes para ROSCPP Autogenerador de paquetes que usan C++ en ROS. Por ahora tiene las siguientes capacidades: Permite crear p

1 Nov 26, 2021
A deployer and package manager for OceanBase open-source software.

OceanBase Deploy OceanBase Deploy (简称 OBD)是 OceanBase 开源软件的安装部署工具。OBD 同时也是包管理器,可以用来管理 OceanBase 所有的开源软件。本文介绍如何安装 OBD、使用 OBD 和 OBD 的命令。 安装 OBD 您可以使用以下方

OceanBase 59 Dec 27, 2022
A script to add issues to a project in Github based on label or status.

Add Github Issues to Project (Beta) A python script to move Github issues to a next-gen (beta) Github Project Getting Started These instructions will

Kate Donaldson 3 Jan 16, 2022
Pytorch implementation of "Peer Loss Functions: Learning from Noisy Labels without Knowing Noise Rates"

Peer Loss functions This repository is the (Multi-Class & Deep Learning) Pytorch implementation of "Peer Loss Functions: Learning from Noisy Labels wi

Kushal Shingote 1 Feb 08, 2022
Battle-Ship - Python-console battle ship

Battle-Ship this SHOULD work in lenux(if i spelled it wrong spam issues till I fix it) the thing that maby wont work is where it clears the screen the

pl608 2 Jan 06, 2022
Choice Coin 633 Dec 23, 2022
Python Library to get fast extensive Dummy Data for testing

Dumda Python Library to get fast extensive Dummy Data for testing https://pypi.org/project/dumda/ Installation pip install dumda Usage: Cities from d

Oliver B. 0 Dec 27, 2021
An example project which contains the Unity components necessary to complete Navigation2's SLAM tutorial with a Turtlebot3, using a custom Unity environment in place of Gazebo.

Navigation 2 SLAM Example This example provides a Unity Project and a colcon workspace that, when used together, allows a user to substitute Unity as

Unity Technologies 183 Jan 04, 2023
Python bindings for `ign-msgs` and `ign-transport`

Python Ignition This project aims to provide Python bindings for ignition-msgs and ignition-transport. It is a work in progress... C++ and Python libr

Rhys Mainwaring 3 Nov 08, 2022
An execution framework for systematic strategies

WAGMI is an execution framework for systematic strategies. It is very much a work in progress, please don't expect it to work! Architecture The Django

Rich Atkinson 10 Mar 28, 2022
A pypi package details search python module

A pypi package details search python module

Fayas Noushad 5 Nov 30, 2021
Quantity Takeoff with Python. Collecting groups of elements by filters

The free tool QuantityTakeoff allows you to group elements from Revit and IFC models (in BIMJSON-CSV format) with just a few filters and find the required volume values for the grouped elements.

OpenDataBIM 9 Jan 06, 2023
Url-check-migration-python - A python script using Apica API's to migrate URL checks between environments

url-check-migration-python A python script using Apica API's to migrate URL chec

Angelo Aquino 1 Feb 16, 2022
Medical appointments No-Show classifier

Medical Appointments No-shows Why do 20% of patients miss their scheduled appointments? A person makes a doctor appointment, receives all the instruct

4 Apr 20, 2022
CalHacks 8 Repo: Megha Jain, Gaurav Bhatnagar, Howard Meng, Vibha Tantry

CalHacks8 CalHacks 8 Repo: Megha Jain, Gaurav Bhatnagar, Howard Meng, Vibha Tantry Setup FE Install React Native via Expo, run App.js. Backend Create

0 Aug 20, 2022
OnTime is a small python that you set a time and on that time, app will send you notification and also play an alarm.

OnTime Always be OnTime! What is OnTime? OnTime is a small python that you set a time and on that time, app will send you notification and also play a

AmirHossein Mohammadi 11 Jan 16, 2022
Demo of a WAM Prolog implementation in Python

Prol: WAM demo This is a simplified Warren Abstract Machine (WAM) implementation for Prolog, that showcases the main instructions, compiling, register

Bruno Kim Medeiros Cesar 62 Dec 26, 2022
Password manager using MySQL and Python 3.10.2

Password Manager Password manager using MySQL and Python 3.10.2 Installation Install my-project with github git clone https://github.com/AyaanSiddiq

1 Feb 18, 2022
Code for the manim-generated scenes used in 3blue1brown videos

This project contains the code used to generate the explanatory math videos found on 3Blue1Brown. This almost entirely consists of scenes generated us

Grant Sanderson 4.1k Jan 02, 2023