Wafer Fault Detection using MlOps Integration

Overview

Wafer Fault Detection using MlOps Integration

This is an end to end machine learning project with MlOps integration for predicting the quality of wafer sensors.

Demo

  • Link

Table of Contents

  • Problem Statement
  • How to run the application
  • Technologies used
  • Proposed Solution and Architecture
  • WorkFlow of project
  • Technologies used

Problem Statement

Improper maintenance on a machine or system impacts to worsen mean time between failure (MTBF). Manual diagnostic procedures tend to extended downtime at the system breakdown. Machine learning techniques based on the internet of things (IoT) sensor data were used to make predictive maintenance to determine whether the sensor needs to be replaced or not.

How to implement the project

  • Create a conda environment
conda create -n waferops python=3.6.9
  • Activate the environment
conda activate wafer-ops
  • Install the requirements.txt file
pip install -r requirements.txt

Before running the project atleast in local environment (personal pc or laptop) run this command in new terminal, basically run the mlflow server.

mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root artifacts --host 0.0.0.0 -p 5000

After running the mlflow server in new terminal, open another terminal and run the following command, since we are using fastapi. The command to run the application will change a bit

uvicorn main:app --reload

WorkFlow of the Project

To solve the problem statement we have proposed a customized machine learning approach.

WorkFlow of Project

In the first place, whenever we start a machine learning project, we need to sign a data sharing agreement with the client, where sign off some of the parameters like,

  • Format of data - like csv format or json format,etc
  • Number of Columns
  • Length of date stamp in the file
  • Length of time stamp in the file
  • DataType of each sensor - like float,int,string

The client will send multiple set of files in batches at a given location. In our case, the data which will be given to us, will consist of wafer names and 590 columns of different sensor values for each wafer. The last column will have Good/Bad value for each wafer as per the data sharing agreement

  • +1 indicates bad wafer
  • -1 indicates good wafer

These data can be found in the schema training json file.More details are present in LLD documentation of project.

Technical Aspects of the Project

As discussed, the client will send multiple set of files in batches at a given location. After signing the data sharing agreement, we create the master data management which is nothing but the schema training json file and schema prediction json (this is be used for prediction data). We have divided the project into multiple modules, for high level understanding some of them are

Training Validation

In this module,we will trigger the training validation pipeline,which will be responsible for training validation. In the training validation pipeline,we are internally triggering some of the pipelines, some of the internal function are

  • Training raw data validation - This function is responsible for validating the raw data based on schema training json file, and we have manually created a regex pattern for validating the filename of the data. We are even validating length of date time stamp, length of time stamp of the data. If some of the data does not match the criteria of the master data management, if move that files to bad folder and will not be used for training or prediction purposes.

  • Data Transformation - Previously, we have created both good and bad directory for storing the data based on the master data management. Now for the data transformation we are only performing the data transformation on good data folder. In the data transformation, we replace the missing values with the nan values.

  • DataBase Operation - Now that we have validated the data and transformed the data which is suitable for the further training purposes. In database operation we are using SQL-Lite. From the good folder we are inserting the data into a database. After the insertion of the data is done we are deleting the good data folder and move the bad folder to archived folder. Next inserting the good database, we are extracting the data from the database and converting into csv format.

Training Model

In the previous pipeline,after the database operation, we have exported the good data from database to csv format. In the training model pipeline, we are first fetching the data from the exported csv file.

Next comes the preprocessing of the data, where we are performing some of the preprocessing functions such as remove columns, separate label feature, imputing the missing the values if present. Dropping the columns with zero standard deviation.

As mentioned we are trying to solve the problem by using customized machine learning approach.We need to create clusters of data which represents the variation of data. Clustering of the data is based on K-Means clustering algorithm.

For every cluster which has been created two machine learning models are being trained which are RandomForest and XGBoost models with GridSearchCV as the hyperparameter tuning technique. The metrics which are monitoring are accuracy and roc auc score as the metric.

After training all the models, we are saving them to trained models folders.

Now that the models are saved into the trained models folder, here the mlops part comes into picture, where in for every cluster we are logging the parameters, metrics and models to mlflow server. On successful completion of training of all the models and logging them to mlflow, next pipeline will be triggered which is load production model pipeline.

Since all the trained models, will have different metrics and parameters, which can productionize them based on metrics. For this project we have trained 6 models and we will productionize 3 models along with KMeans model for the prediction service.

Here is glimpse of the mlflow server showing stages of the models (Staging or Production based on metrics)

mlflow server image

Prediction pipeline

The prediction pipeline will be triggered following prediction validation and prediction from the model. In this prediction pipeline, the same validation steps like validating file name and so on. The prediction pipeline, and the preprocessing of prediction data. For the prediction, we will load the trained kmeans model and then predict the number of clusters, and for every cluster, model will be loaded and the prediction will be done. The predictions will saved to predictions.csv file and then prediction is completed.

Technologies Used

  • Python
  • Sklearn
  • FastAPI
  • Machine Learning
  • Numpy
  • Pandas
  • MlFlow
  • SQL-Lite

Algorithms Used

  • Random Forest
  • XGBoost

Metrics

  • Accuracy
  • ROC AUC score

Cloud Deployment

  • AWS
Owner
Sethu Sai Medamallela
Aspiring Machine Learning Engineer
Sethu Sai Medamallela
Pytorch implementation of VAEs for heterogeneous likelihoods.

Heterogeneous VAEs Beware: This repository is under construction 🛠️ Pytorch implementation of different VAE models to model heterogeneous data. Here,

Adrián Javaloy 35 Nov 29, 2022
Label-Free Model Evaluation with Semi-Structured Dataset Representations

Label-Free Model Evaluation with Semi-Structured Dataset Representations Prerequisites This code uses the following libraries Python 3.7 NumPy PyTorch

8 Oct 06, 2022
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation [arxiv] This is the official repository for CDTrans: Cross-domain Transformer for

238 Dec 22, 2022
CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches

CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches This document describes how to install and use CRISCE (CRItical

Chair of Software Engineering II, Uni Passau 2 Feb 09, 2022
Online Multi-Granularity Distillation for GAN Compression (ICCV2021)

Online Multi-Granularity Distillation for GAN Compression (ICCV2021) This repository contains the pytorch codes and trained models described in the IC

Bytedance Inc. 299 Dec 16, 2022
Code for paper "Multi-level Disentanglement Graph Neural Network"

Multi-level Disentanglement Graph Neural Network (MD-GNN) This is a PyTorch implementation of the MD-GNN, and the code includes the following modules:

Lirong Wu 6 Dec 29, 2022
Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Punctuation Restoration using Transformer Models This repository contins official implementation of the paper Punctuation Restoration using Transforme

Tanvirul Alam 142 Jan 01, 2023
Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

Building Shazam from scratch In this repository we tried to implement a simplified copy of the Shazam application able to tell you the name of a song

Arturo Ghinassi 0 Nov 17, 2022
Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

This repo contains code for our paper State-only Imitation with Transition Dynamics Mismatch published at ICLR 2020. The code heavily uses the RL mach

20 Sep 08, 2022
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296

Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions This repo contains the dataset and code for the paper Benchmarking Ro

Jiachen Sun 168 Dec 29, 2022
Unofficial Tensorflow-Keras implementation of Fastformer based on paper [Fastformer: Additive Attention Can Be All You Need](https://arxiv.org/abs/2108.09084).

Fastformer-Keras Unofficial Tensorflow-Keras implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Tensorflo

Yam Peleg 10 Jan 30, 2022
Image restoration with neural networks but without learning.

Warning! The optimization may not converge on some GPUs. We've personally experienced issues on Tesla V100 and P40 GPUs. When running the code, make s

Dmitry Ulyanov 7.4k Jan 01, 2023
Original code for "Zero-Shot Domain Adaptation with a Physics Prior"

Zero-Shot Domain Adaptation with a Physics Prior [arXiv] [sup. material] - ICCV 2021 Oral paper, by Attila Lengyel, Sourav Garg, Michael Milford and J

Attila Lengyel 40 Dec 21, 2022
An official implementation of MobileStyleGAN in PyTorch

MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis Official PyTorch Implementation The accompanying videos c

Sergei Belousov 602 Jan 07, 2023
SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Wentao Zhu 24 May 20, 2022
Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch

alias-free-gan-pytorch Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) This implementation

Kim Seonghyeon 502 Jan 03, 2023
BMW TechOffice MUNICH 148 Dec 21, 2022
Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

Finite basis physics-informed neural networks (FBPINNs) This repository reproduces the results of the paper Finite Basis Physics-Informed Neural Netwo

Ben Moseley 65 Dec 28, 2022
Benchmarking Pipeline for Prediction of Protein-Protein Interactions

B4PPI Benchmarking Pipeline for the Prediction of Protein-Protein Interactions How this benchmarking pipeline has been built, and how to use it, is de

Loïc Lannelongue 4 Jun 27, 2022
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

19.4k Jan 04, 2023