Wafer Fault Detection using MlOps Integration

Overview

Wafer Fault Detection using MlOps Integration

This is an end to end machine learning project with MlOps integration for predicting the quality of wafer sensors.

Demo

  • Link

Table of Contents

  • Problem Statement
  • How to run the application
  • Technologies used
  • Proposed Solution and Architecture
  • WorkFlow of project
  • Technologies used

Problem Statement

Improper maintenance on a machine or system impacts to worsen mean time between failure (MTBF). Manual diagnostic procedures tend to extended downtime at the system breakdown. Machine learning techniques based on the internet of things (IoT) sensor data were used to make predictive maintenance to determine whether the sensor needs to be replaced or not.

How to implement the project

  • Create a conda environment
conda create -n waferops python=3.6.9
  • Activate the environment
conda activate wafer-ops
  • Install the requirements.txt file
pip install -r requirements.txt

Before running the project atleast in local environment (personal pc or laptop) run this command in new terminal, basically run the mlflow server.

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root artifacts --host 0.0.0.0 -p 5000

After running the mlflow server in new terminal, open another terminal and run the following command, since we are using fastapi. The command to run the application will change a bit

uvicorn main:app --reload

WorkFlow of the Project

To solve the problem statement we have proposed a customized machine learning approach.

WorkFlow of Project

In the first place, whenever we start a machine learning project, we need to sign a data sharing agreement with the client, where sign off some of the parameters like,

  • Format of data - like csv format or json format,etc
  • Number of Columns
  • Length of date stamp in the file
  • Length of time stamp in the file
  • DataType of each sensor - like float,int,string

The client will send multiple set of files in batches at a given location. In our case, the data which will be given to us, will consist of wafer names and 590 columns of different sensor values for each wafer. The last column will have Good/Bad value for each wafer as per the data sharing agreement

  • +1 indicates bad wafer
  • -1 indicates good wafer

These data can be found in the schema training json file.More details are present in LLD documentation of project.

Technical Aspects of the Project

As discussed, the client will send multiple set of files in batches at a given location. After signing the data sharing agreement, we create the master data management which is nothing but the schema training json file and schema prediction json (this is be used for prediction data). We have divided the project into multiple modules, for high level understanding some of them are

Training Validation

In this module,we will trigger the training validation pipeline,which will be responsible for training validation. In the training validation pipeline,we are internally triggering some of the pipelines, some of the internal function are

  • Training raw data validation - This function is responsible for validating the raw data based on schema training json file, and we have manually created a regex pattern for validating the filename of the data. We are even validating length of date time stamp, length of time stamp of the data. If some of the data does not match the criteria of the master data management, if move that files to bad folder and will not be used for training or prediction purposes.

  • Data Transformation - Previously, we have created both good and bad directory for storing the data based on the master data management. Now for the data transformation we are only performing the data transformation on good data folder. In the data transformation, we replace the missing values with the nan values.

  • DataBase Operation - Now that we have validated the data and transformed the data which is suitable for the further training purposes. In database operation we are using SQL-Lite. From the good folder we are inserting the data into a database. After the insertion of the data is done we are deleting the good data folder and move the bad folder to archived folder. Next inserting the good database, we are extracting the data from the database and converting into csv format.

Training Model

In the previous pipeline,after the database operation, we have exported the good data from database to csv format. In the training model pipeline, we are first fetching the data from the exported csv file.

Next comes the preprocessing of the data, where we are performing some of the preprocessing functions such as remove columns, separate label feature, imputing the missing the values if present. Dropping the columns with zero standard deviation.

As mentioned we are trying to solve the problem by using customized machine learning approach.We need to create clusters of data which represents the variation of data. Clustering of the data is based on K-Means clustering algorithm.

For every cluster which has been created two machine learning models are being trained which are RandomForest and XGBoost models with GridSearchCV as the hyperparameter tuning technique. The metrics which are monitoring are accuracy and roc auc score as the metric.

After training all the models, we are saving them to trained models folders.

Now that the models are saved into the trained models folder, here the mlops part comes into picture, where in for every cluster we are logging the parameters, metrics and models to mlflow server. On successful completion of training of all the models and logging them to mlflow, next pipeline will be triggered which is load production model pipeline.

Since all the trained models, will have different metrics and parameters, which can productionize them based on metrics. For this project we have trained 6 models and we will productionize 3 models along with KMeans model for the prediction service.

Here is glimpse of the mlflow server showing stages of the models (Staging or Production based on metrics)

mlflow server image

Prediction pipeline

The prediction pipeline will be triggered following prediction validation and prediction from the model. In this prediction pipeline, the same validation steps like validating file name and so on. The prediction pipeline, and the preprocessing of prediction data. For the prediction, we will load the trained kmeans model and then predict the number of clusters, and for every cluster, model will be loaded and the prediction will be done. The predictions will saved to predictions.csv file and then prediction is completed.

Technologies Used

  • Python
  • Sklearn
  • FastAPI
  • Machine Learning
  • Numpy
  • Pandas
  • MlFlow
  • SQL-Lite

Algorithms Used

  • Random Forest
  • XGBoost

Metrics

  • Accuracy
  • ROC AUC score

Cloud Deployment

  • AWS
Owner
Sethu Sai Medamallela
Aspiring Machine Learning Engineer
Sethu Sai Medamallela
A generalist algorithm for cell and nucleus segmentation.

Cellpose | A generalist algorithm for cell and nucleus segmentation. Cellpose was written by Carsen Stringer and Marius Pachitariu. To learn about Cel

MouseLand 733 Dec 29, 2022
Simple data balancing baselines for worst-group-accuracy benchmarks.

BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating

Meta Research 29 Dec 02, 2022
Source codes for Improved Few-Shot Visual Classification (CVPR 2020), Enhancing Few-Shot Image Classification with Unlabelled Examples

Source codes for Improved Few-Shot Visual Classification (CVPR 2020), Enhancing Few-Shot Image Classification with Unlabelled Examples (WACV 2022) and Beyond Simple Meta-Learning: Multi-Purpose Model

PLAI Group at UBC 42 Dec 06, 2022
Face and Body Tracking for VRM 3D models on the web.

Kalidoface 3D - Face and Full-Body tracking for Vtubing on the web! A sequal to Kalidoface which supports Live2D avatars, Kalidoface 3D is a web app t

Rich 257 Jan 02, 2023
fcn by tensorflow

Update An example on how to integrate this code into your own semantic segmentation pipeline can be found in my KittiSeg project repository. tensorflo

9 May 22, 2022
Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting" by Shu et al.

[Re] Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping

Robert Cedergren 1 Mar 13, 2020
Vignette is a face tracking software for characters using osu!framework.

Vignette is a face tracking software for characters using osu!framework. Unlike most solutions, Vignette is: Made with osu!framework, the game framewo

Vignette 412 Dec 28, 2022
TAug :: Time Series Data Augmentation using Deep Generative Models

TAug :: Time Series Data Augmentation using Deep Generative Models Note!!! The package is under development so be careful for using in production! Fea

35 Dec 06, 2022
Hcpy - Interface with Home Connect appliances in Python

Interface with Home Connect appliances in Python This is a very, very beta inter

Trammell Hudson 116 Dec 27, 2022
Tensorflow implementation of "Learning Deep Features for Discriminative Localization"

Weakly_detector Tensorflow implementation of "Learning Deep Features for Discriminative Localization" B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and

Taeksoo Kim 363 Jun 29, 2022
How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

Bogdan Kulynych 49 Nov 05, 2022
Rethinking Nearest Neighbors for Visual Classification

Rethinking Nearest Neighbors for Visual Classification arXiv Environment settings Check out scripts/env_setup.sh Setup data Download the following fin

Menglin Jia 29 Oct 11, 2022
PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

IIM - Crowd Localization This repo is the official implementation of paper: Learning Independent Instance Maps for Crowd Localization. The code is dev

tao han 91 Nov 10, 2022
GNN4Traffic - This is the repository for the collection of Graph Neural Network for Traffic Forecasting

GNN4Traffic - This is the repository for the collection of Graph Neural Network for Traffic Forecasting

564 Jan 02, 2023
Torch-ngp - A pytorch implementation of the hash encoder proposed in instant-ngp

HashGrid Encoder (WIP) A pytorch implementation of the HashGrid Encoder from ins

hawkey 1k Jan 01, 2023
(CVPR 2021) PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds by Mutian Xu*, Runyu Ding*, Hengshuang Zhao, and Xiaojuan Qi. Int

CVMI Lab 228 Dec 25, 2022
Code for "On the Effects of Batch and Weight Normalization in Generative Adversarial Networks"

Note: this repo has been discontinued, please check code for newer version of the paper here Weight Normalized GAN Code for the paper "On the Effects

Sitao Xiang 182 Sep 06, 2021
GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich

Course Description The programming language Julia is being more and more adopted in High Performance Computing (HPC) due to its unique way to combine

Samuel Omlin 192 Jan 03, 2023
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022
Lightwood is Legos for Machine Learning.

Lightwood is like Legos for Machine Learning. A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glu

MindsDB Inc 312 Jan 08, 2023