A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Last update: Jan 03, 2023

Overview

SVHNClassifier-PyTorch

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

If you're interested in C++ inference, move HERE

Results

Steps	GPU	Batch Size	Learning Rate	Patience	Decay Step	Decay Rate	Training Speed (FPS)	Accuracy
54000	GTX 1080 Ti	512	0.16	100	625	0.9	~1700	95.65%

Sample

$ python infer.py -c=./logs/model-54000.pth ./images/test-75.png
length: 2
digits: 7 5 10 10 10

$ python infer.py -c=./logs/model-54000.pth ./images/test-190.png
length: 3
digits: 1 9 0 10 10

Loss

Requirements

Python 3.6
torch 1.0
torchvision 0.2.1
visdom
```
$ pip install visdom
```

h5py

In Ubuntu:
$ sudo apt-get install libhdf5-dev
$ sudo pip install h5py

protobuf
```
$ pip install protobuf
```
lmdb
```
$ pip install lmdb
```

Setup

Clone the source code

$ git clone https://github.com/potterhsu/SVHNClassifier-PyTorch
$ cd SVHNClassifier-PyTorch

Download SVHN Dataset format 1

Extract to data folder, now your folder structure should be like below:

SVHNClassifier
    - data
        - extra
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - test
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat
        - train
            - 1.png 
            - 2.png
            - ...
            - digitStruct.mat

Usage

(Optional) Take a glance at original images with bounding boxes
```
Open `draw_bbox.ipynb` in Jupyter
```

Convert to LMDB format

$ python convert_to_lmdb.py --data_dir ./data

(Optional) Test for reading LMDBs

Open `read_lmdb_sample.ipynb` in Jupyter

Train

$ python train.py --data_dir ./data --logdir ./logs

Retrain if you need

$ python train.py --data_dir ./data --logdir ./logs_retrain --restore_checkpoint ./logs/model-100.pth

Evaluate

$ python eval.py --data_dir ./data ./logs/model-100.pth

Visualize

$ python -m visdom.server
$ python visualize.py --logdir ./logs

Infer

$ python infer.py --checkpoint=./logs/model-100.pth ./images/test1.png

Clean

$ rm -rf ./logs
or
$ rm -rf ./logs_retrain

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Related tags

Overview

SVHNClassifier-PyTorch

Results

Sample

Loss

Requirements

Setup

Usage

Owner

Potter Hsu

Get started learning C# with C# notebooks powered by .NET Interactive and VS Code.

PyTorch code to run synthetic experiments.

Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Pytorch implementation for "Open Compound Domain Adaptation" (CVPR 2020 ORAL)

This is the official Pytorch implementation of "Lung Segmentation from Chest X-rays using Variational Data Imputation", Raghavendra Selvan et al. 2020

Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

Medical Insurance Cost Prediction using Machine earning

Official code for paper "Optimization for Oriented Object Detection via Representation Invariance Loss".

Captcha-tensorflow - Image Captcha Solving Using TensorFlow and CNN Model. Accuracy 90%+

ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Joint detection and tracking model named DEFT, or ``Detection Embeddings for Tracking.

Apache Flink

rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

交互式标注软件，暂定名 iann

Implementation of the paper titled "Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees"

Prefix-Tuning: Optimizing Continuous Prompts for Generation

The codes reproduce the figures and statistics in the paper, "Controlling for multiple covariates," by Mark Tygert.

This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

PartImageNet is a large, high-quality dataset with part segmentation annotations