To effectively detect the faulty wafers

Overview

wafer_fault_detection

Aim of the project:

In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystalline silicon (c-Si), used for the fabrication of integrated circuits and, in photovoltaics, to manufacture solar cells. The wafer serves as the substrate for microelectronic devices built in and upon the wafer. The project aims to successfully identify the state of the provided wafer by classifying it between one of the two-class +1 (good, can be used as a substrate) or -1 (bad, the substrate need to be replaced). In this regard, a training dataset is provided to build a machine learning classification model, which can predict the wafer quality.

Data Description:

The columns of provided data can be classified into 3 parts: wafer name, sensor values and label. The wafer name contains the batch number of the wafer, whereas the sensor values obtained from the measurement carried out on the wafer. The label column contains two unique values +1 and -1 that identifies if the wafer is good or need to be replaced. Additionally, we also require a schema file, which contains all the relevant information about the training files such as file names, length of date value in the file name, length of time value in the file name, number of columns, name of the columns, and their datatype.

Directory creation:

All the necessary folders were created to effectively separate the files so that the end-user can get easy access to them.

Data Validation:

In this step, we matched our dataset with the provided schema file to match the file names, the number of columns it should contain, their names as well as their datatype. If the files matched with the schema values, then it is considered a good file on which we can train or predict our model, if not then the files are considered as bad and moved to the bad folder. Moreover, we also identify the columns with null values. If the whole column data is missing then we also consider the file as bad, on the contrary, if only a fraction of data in a column is missing then we initially fill it with NaN and consider it as good data.

Data Insertion in Database:

First, we create a database with the given name passed. If the database is already created, open the connection to the database. A table with the name- "train_good_raw_dt" or "pred_good_raw_dt" is created in the database, based on training or prediction, for inserting the good data files obtained from the data validation step. If the table is already present, then the new table is not created, and new files are inserted in the already present table as we want training to be done on new as well as old training files. In the end, the data in a stored database is exported as a CSV file to be used for model training.

Data Pre-processing and Model Training:

In the training section, first, the data is checked for the NaN values in the columns. If present, impute the NaN values using the KNN imputer. The column with zero standard deviation was also identified and removed as they don't give any information during model training. A prediction schema was created based on the remained dataset columns. Afterwards, the KMeans algorithm is used to create clusters in the pre-processed data. The optimum number of clusters is selected by plotting the elbow plot, and for the dynamic selection of the number of clusters, we are using the "KneeLocator" function. The idea behind clustering is to implement different algorithms to train data in different clusters. The Kmeans model is trained over pre-processed data and the model is saved for further use in prediction. After clusters are created, we find the best model for each cluster. We are using four algorithms, "Random Forest" “K Neighbours”, “Logistic Regression” and "XGBoost". For each cluster, both the algorithms are passed with the best parameters derived from GridSearch. We calculate the AUC scores for both models and select the model with the best score. Similarly, the best model is selected for each cluster. All the models for every cluster are saved for use in prediction. In the end, the confusion matrix of the model associated with every cluster is also saved to give a glance at the performance of the models.

Prediction:

In data prediction, first, the essential directories are created. The data validation, data insertion and data processing steps are similar to the training section. The KMeans model created during training is loaded, and clusters for the pre-processed prediction data is predicted. Based on the cluster number, the respective model is loaded and is used to predict the data for that cluster. Once the prediction is made for all the clusters, the predictions along with the Wafer names are saved in a CSV file at a given location.

Deployment:

We will be deploying the model to Heroku Cloud.

Owner
Arun Singh Babal
Engineer | Data Science Enthusiasts | Machine Learning | Deep Learning | Advanced Computer Vision.
Arun Singh Babal
Socorro is the Mozilla crash ingestion pipeline. It accepts and processes Breakpad-style crash reports. It provides analysis tools.

Socorro Socorro is a Mozilla-centric ingestion pipeline and analysis tools for crash reports using the Breakpad libraries. Support This is a Mozilla-s

Mozilla Services 552 Dec 19, 2022
Broken Link Finder is a Burp Extension to detect broken links for a passive scanning domains and links.

Broken Link Finder Broken Link Finder is a Burp Extension to detect broken links for a passive scanning domains and links. Inspired by InitRoot's link

Red Section 10 Sep 11, 2021
Procscan is a quick and dirty python script used to look for potentially dangerous api call patterns in a Procmon PML file.

PROCSCAN Procscan is a quick and dirty python script used to look for potentially dangerous api call patterns in a Procmon PML file. Installation git

Daniel Santos 9 Sep 02, 2022
Program Input Nilai Mahasiswa Menggunakan Fungsi

PROGRAM INPUT NILAI MAHASISWA MENGGUNAKAN FUNGSI Nama : Maulana Reza Badrudin Nim : 312110510 Matkul : Bahas Pemograman DESKRIPSI Deklarasi dicti

Maulana Reza Badrudin 1 Jan 05, 2022
An kind of operating system portal to a variety of apps with pure python

pyos An kind of operating system portal to a variety of apps. Installation Run this on your terminal: git clone https://github.com/arjunj132/pyos.git

1 Jan 22, 2022
A collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to provide an easy-to-use access to them.

KGQA Datasets Brief Introduction This repository is a collection of existing KGQA datasets in the form of the huggingface datasets library, aiming to

Semantic Systems research group 21 Jan 06, 2023
Structured Exceptions for Python

XC: Structured exceptions for Python XC encourages a structured, disciplined approach to use of exceptions: it reduces the overhead of declaring excep

Bob Gautier 2 May 28, 2021
A simple wrapper for joy library

Joy CodeGround A simple wrapper for joy library to render joy sketches in browser using vs code, (or in other words, for those who are allergic to Jup

rijfas 9 Sep 08, 2022
Hydralit package is a wrapping and template project to combine multiple independant Streamlit applications into a multi-page application.

Hydralit The Hydralit package is a wrapping and template project to combine multiple independant (or somewhat dependant) Streamlit applications into a

Jackson Storm 108 Jan 08, 2023
This repository contains each day of Advent of Code 2021 that I've done.

Advent of Code - 2021 I will use this repository as my Advent of Code1 (AoC) repo for the 2021 challenge. I'm changing how I am tackling the problems

Brett Chapin 2 Jan 12, 2022
A tool to help calculate how to split conveyors in Satisfactory into specific ratios.

Satisfactory Splitter Calculator A tool to help calculate how to split conveyors in Satisfactory into specific ratios. Dependencies Python 3.9 PyYAML

RobotiCat 5 Dec 22, 2022
Tie together `drf-spectacular` and `djangorestframework-dataclasses` for easy-to-use apis and openapi schemas.

Speccify Tie together drf-spectacular and djangorestframework-dataclasses for easy-to-use apis and openapi schemas. Usage @dataclass class MyQ

Lyst 4 Sep 26, 2022
A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.

deep-translator Translation for humans A flexible FREE and UNLIMITED tool to translate between different languages in a simple way using multiple tran

Nidhal Baccouri 806 Jan 04, 2023
Intelligent Employer Profiling Platform.

Intelligent Employer Profiling Platform Setup Instructions Generating Model Data Ensure that Python 3.9+ and pip is installed. Install project depende

Harvey Donnelly 2 Jan 09, 2022
RISE allows you to instantly turn your Jupyter Notebooks into a slideshow

RISE RISE allows you to instantly turn your Jupyter Notebooks into a slideshow. No out-of-band conversion is needed, switch from jupyter notebook to a

Damian Avila 3.4k Jan 04, 2023
Discord's own Dumbass made for shits n' Gigs!

FWB3 Discord's own Dumbass made for shits n' Gigs! Please note: This bot is made to be stupid and funny, If you want to get into bot development you'r

1 Dec 06, 2021
Process RunGap output file of a workout and load data into Apple Numbers Spreadsheet and my website with API calls

BSD 3-Clause License Copyright (c) 2020, Mike Bromberek All rights reserved. ProcessWorkout Exercise data is exported in JSON format to iCloud using

Mike Bromberek 1 Jan 03, 2022
Nateve transpiler developed with python.

Adam Adam is a Nateve Programming Language transpiler developed using Python. Nateve Nateve is a new general domain programming language open source i

Nateve 7 Jan 15, 2022
The newest contender in Server Gateway Interface.

nsgi The newest contender in Server Gateway Interface. Why use this webserver? This webserver is made with the newest version of asyncio, and sockets,

OpenRobot 1 Feb 12, 2022
Find your desired product in Digikala using this app.

Digikala Search Find your desired product in Digikala using this app. با این برنامه محصول مورد نظر خود را در دیجیکالا پیدا کنید. About me Full name: M

Matin Ardestani 17 Sep 15, 2022