To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
A Tandy Color Computer 1, 2, and 3 assembler written in Python

CoCo Assembler and File Utility Table of Contents What is it? Requirements License Installing Assembler Assembler Usage Input File Format Print Symbol

Craig Thomas 16 Nov 03, 2022
A Web app to Cross-Seed torrents in Deluge/qBittorrent/Transmission

SeedCross A Web app to Cross-Seed torrents in Deluge/qBittorrent/Transmission based on CrossSeedAutoDL Require Jackett Deluge/qBittorrent/Transmission

ccf2012 76 Dec 19, 2022
Imports an object based on a string import_string('package.module:function_name')() - Based on werkzeug.utils

DEPRECATED don't use it. Please do: import importlib foopath = 'src.apis.foo.Foo' module_name = '.'.join(foopath.split('.')[:-1]) # to get src.apis.f

Bruno Rocha Archived Projects 11 Nov 12, 2022
Implementation of the Folders📂 esoteric programming language, a language with no code and just folders.

Folders.py Folders is an esoteric programming language, created by Daniel Temkin in 2015, which encodes the program entirely into the directory struct

Sina Khalili 425 Dec 17, 2022
Python Function to manage users via SCIM

Python Function to manage users via SCIM This script helps you to manage your v2 users. You can add and delete users or groups, add users to groups an

4 Oct 11, 2022
DG - A(n) (unusual) programming language

DG - A(n) (unusual) programming language General structure There are no infix-operators (i.e. 1 + 1) Each operator takes 2 parameters When there are m

1 Mar 05, 2022
Shai-Hulud - A qtile configuration for the (spice) masses

Shai-Hulud - A qtile configuration for the (spice) masses Installation Notes These dotfiles are set up to use GNU stow for installation. To install, f

16 Dec 30, 2022
Direct Multi-view Multi-person 3D Human Pose Estimation

Implementation of NeurIPS-2021 paper: Direct Multi-view Multi-person 3D Human Pose Estimation [paper] [video-YouTube, video-Bilibili] [slides] This is

Sea AI Lab 253 Jan 05, 2023
A simple script for generating screenshots with Vapoursynth

Vapoursynth-Screenshots A simple script for generating screenshots with Vapoursynth. About I'm lazy, and hate changing variables for each batch of scr

7 Dec 31, 2022
My solutions for Advent of Code 2021 🌟🎄

🌟 Advent of Code 2021 🎄 My solutions for Advent of Code 2021. About · What is Advent of Code? · Contents · Usage · Table of puzzles (TODO: add final

Amanda P. Pinha 2 Dec 05, 2022
Context-free grammar to Sublime-syntax file

Generate a sublime-syntax file from a non-left-recursive, follow-determined, context-free grammar

Haggai Nuchi 8 Nov 17, 2022
A python script made for personal use to monitor for sports card restocks on target.com since they are sold out often

TargetProductMonitor A python script made for personal use to monitor for sports card resocks on target.com since they are sold out often. When a rest

Bryan Lorden 2 Jul 31, 2022
El_Binario - A converter for Binary, Decimal, Hexadecimal and Octal numbers

El_Binario El_Binario es un conversor de números Binarios, Decimales, Hexadecima

2 Jan 28, 2022
一个可以自动生成PTGen,MediaInfo,截图,并且生成发布所需内容的脚本

Differential 差速器 一个可以自动生成PTGen,MediaInfo,截图,并且生成发种所需内容的脚本 为什么叫差速器 差速器是汽车上的一种能使左、右轮胎以不同转速转动的结构。使用同样的动力输入,差速器能够输出不同的转速。就如同这个工具之于PT资源,差速器帮你使用同一份资源,输出不同PT

Lei Shi 96 Dec 15, 2022
Replay Felica Exchange For Python

FelicaReplay Replay Felica Exchange Description Standalone Replay Module Usage Save FelicaRelay (=2.0) output to file, then python replay.py [FILE].

3 Jul 14, 2022
Python DSL for writing PDDL

PDDL in Python – Python DSL for writing a PDDL A minimal implementation of a DSL which allows people to write PDDL in python. Based on parsing python’

International Business Machines 21 Nov 22, 2022
Start and stop your NiceHash miners using this script.

NiceHash Mining Scheduler Use this script to schedule your NiceHash Miner(s). Electricity costs between 4-9pm are high in my area and I want NiceHash

SeaRoth 2 Sep 30, 2022
A few of my adventures with Devito.

Devito-playbox A few of my adventures with Devito. This repository contains a few notebooks and scripts that will lead me in the road of learning this

Átila Saraiva Quintela Soares 1 Feb 08, 2022
A gamey, snakey esoteric programming language

Snak Snak is an esolang based on the classic snake game. Installation You will need python3. To use the visualizer, you will need the curses module. T

David Rutter 3 Oct 10, 2022
This Program Automates The Procces Of Adding Camos On Guns And Saving Them On Modern Warfare Guns

This Program Automates The Procces Of Adding Camos On Guns And Saving Them On Modern Warfare Guns

Flex Tools 6 May 26, 2022