To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Monitoring of lake dynamics

slamcore_utils Description This repo contains the slamcore-setup-dataset script. It can be used for installing a sample dataset for offline testing an

10 Jun 23, 2022
Python script to combine the statistical results of a TOPAS simulation that was split up into multiple batches.

topas-merge-simulations Python script to combine the statistical results of a TOPAS simulation that was split up into multiple batches At the top of t

Sebastian Schäfer 1 Aug 16, 2022
Data on COVID-19 (coronavirus) cases, deaths, hospitalizations, tests • All countries • Updated daily by Our World in Data

COVID-19 Dataset by Our World in Data Find our data on COVID-19 and its documentation in public/data. Documentation Data: complete COVID-19 dataset Da

Our World in Data 5.5k Jan 03, 2023
A numbers extract from string python package

Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.com/FayasNoushad/Numbers-Extract/blob/main/LICENS

Fayas Noushad 4 Nov 28, 2021
An Notifier Program that Notifies you to relax your eyes Every 15 Minutes👀

Every 15 Minutes ⌛ Every 15 Minutes is an application that is used to Notify you to Relax your eyes Every 15 Minutes, This is fully made with Python a

FSP Gang s' YT 2 Oct 18, 2021
Demodulate and error correct FIS-B and ADS-B signals on 978 MHz.

FIS-B 978 ('fisb-978') is a set of programs that demodulates and error corrects FIS-B (Flight Information System - Broadcast) and ADS-B (Automatic Dep

2 Nov 15, 2022
Tucan Discord Token Generator - Remastered

TucanGEN-SRC Tucan Discord Token Generator - Remastered Tucan source made better by me. -- idk if it works anymore Includes: hCaptcha Bypass Automatic

Vast 8 Nov 04, 2022
Jack Morgan's Advent of Code Solutions

Advent-of-Code Jack Morgan's Advent of Code Solutions Usage Run . initiate.sh year day To initiate a day. This sets up a template python file, and pul

Jack Morgan 1 Dec 10, 2021
An application for automation of the mining function in the game Alienworlds.IO

alienautomation A Python script made to automate the tidious job of mining on AlienWorlds This script: Automatically opens the browser Automatically l

anonieXdev 42 Dec 03, 2022
One line Brainfuck interpreter in Python

One line Brainfuck interpreter in Python

16 Dec 21, 2022
Яндекс тренировки по алгоритмам. Июнь 2021

Young&&Yandex Тренировки по алгоритмам Если вы хотите попасть на летнюю стажировку в Яндекс, но пока не уверены в своих силах, приходите на наши трени

Podlevskiy Viktor 6 Sep 03, 2021
Este script añade la config de s4vitar a bspwm automaticamente!

Se ha testeado este script en ParrotOS, Kali y Ubuntu. Funciona para todos los sistemas operativos basados en Debian. Instalación git clone https://gi

yorkox 201 Dec 30, 2022
freeCodeCamp Scientific Computing with Python Project for Certification.

Time_Calculator_freeCodeCamp freeCodeCamp Scientific Computing with Python Project for Certification. Write a function named add_time that takes in tw

Rajdeep Mondal 1 Dec 23, 2021
VacationCycleLogicBackEnd - Vacation Cycle Logic BackEnd With Python

Vacation Cycle Logic BackEnd Getting Started Existing virtualenv If your project

Mohamed Gamal 0 Jan 03, 2022
The ROS publisher/subscriber example packaged as a snap

publisher-subscriber The ROS publisher/subscriber example packaged as a snap, based on ROS Noetic and Ubuntu Core 20. Strictly confined. This example

3 Dec 03, 2021
This package tries to emulate the behaviour of syntax proposed in PEP 671 via a decorator

Late-Bound Arguments This package tries to emulate the behaviour of syntax proposed in PEP 671 via a decorator. Usage Mention the names of the argumen

Shakya Majumdar 0 Feb 06, 2022
Module-based cryptographic tool

Cryptosploit A decryption/decoding/cracking tool using various modules. To use it, you need to have basic knowledge of cryptography. Table of Contents

/SNESE_AR\ 33 Nov 27, 2022
A module that can manage you're gtps

Growtopia Private Server Controler Module For Controle Your GTPS | Build in Python3 Creator Information

iFanpS 6 Jan 14, 2022
Simple GUI menu for micropython using a rotary encoder and basic display.

Micropython encoder based menu This is a simple menu system written in micropython. It uses a switch, a rotary encoder and an OLED display.

80 Jan 07, 2023
An extended, game oriented, turtle

Burtle A Better TURTLE. Makes making games easier. write less do more!! Documentation & guide: https://alannxq.github.io/burtle/ Installation pip inst

5 May 19, 2022