基于openpose和图像分类的手语识别项目

Overview

手语识别


0、使用到的模型

(1). openpose,作者:CMU-Perceptual-Computing-Lab

https://github.com/CMU-Perceptual-Computing-Lab/openpose

(2). 图像分类classification,作者:Bubbliiiing

https://github.com/bubbliiiing/classification-pytorch

B站对应视频:https://www.bilibili.com/video/BV143411B7wg

(3). 手语教学视频,作者:二碳碳

https://www.bilibili.com/video/BV1XE41137LV

(感谢大佬们的开源项目和教程,都已star加三连)




1、大致思路

方法一: 将视频输入到openpose中,检测出关节点的变化轨迹,将轨迹绘制在一张图片上,把这张图片传到图像分类网络中检测属于哪个动作

视频  ----->  |  openpose  |-----> 关节点运动轨迹图-------> |  图像分类模型  | ----------> 单词分类  

方法二: 将视频输入到openpose中,检测出每一帧中关节点的位置,将多帧进行堆叠,形成一个三维张量,其中两个维度是图片的宽和高,一个维度是时间,然后对这个三维张量使用三维卷积进行训练和预测

视频  ----->  |  openpose  |-----> 多张关节点位置图 ---------> |  堆叠  | --------> 三维张量 -------> |  三维卷积网络  | ----------> 单词分类  



2、环境配置

python:3.7(其他版本会导致openpose无法运行,建议使用anaconda的python环境)
cuda:10
cudnn:7或8应该都行
(配置cuda和cudnn会比较麻烦,如果实在不想配,你可以去openpose的github网站下载使用cpu的版本,这里这个版本应该不支持cpu)


具体的配置环境方式:

(0).python和cuda和cudnn自己装


(1).下载文件

下载代码文件后,再从网盘下载模型和数据文件(没有这些跑不起来),网盘链接:

链接:https://pan.baidu.com/s/1Q2aVVhMhSfWL4qKS9QslkQ 
提取码:abcd 

将从github下载的文件夹和网盘下载的文件夹合并,然后就可以下一步了。 (当然你大可直接找我要u盘拿完整的文件)


(2).安装requirements.txt中的库

cmd进入环境后,cd到项目文件夹下,执行指令:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

(3).安装torch和torchvision

先下载好torch(1.2.0)和torchvision(0.4.0)的whl文件,下载地址:

链接:https://pan.baidu.com/s/1QIuJfEE5qQFpXY8ZlHeLNQ 
提取码:abcd 

(当然你依旧可直接找我拿u盘)

下载好torch和torchvision的whl文件后,cmd进入环境,cd到下载文件夹下,执行指令:

pip install [torch或torchvision的whl文件的文件名]

(先装torch再装torchvision,不然有可能会报错)



3、测试运行openpose

项目文件夹下有三个文件:

test.py
test_video.py
test_video_track_point.py

分别对应openpose的功能:检测图片、检测视频、检测视频并绘制关节点轨迹
具体的使用方法可以看文件中的注释部分


在test_video_track_point.py中,取消掉最后几行的注释,就可以将绘制的轨迹图送到classification中去做分类检测
(不过现阶段分类器尚未做好)




4、classification的训练和使用

可以看下classification文件夹中的README.md文件,大佬已经在里边讲得很详细了

Automatic Number Plate Recognition (ANPR) is a highly accurate system capable of reading vehicle number plates without human intervention

ANPR ANPR is therefore the underlying technology used to find a vehicle license/number plate and it, in turn, supplies this information to a next stag

Melih Emin Kılıçoğlu 1 Jan 09, 2022
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021
Computer vision applications project (Flask and OpenCV)

Computer Vision Applications Project This project is at it's initial phase. This is all about the implementation of different computer vision techniqu

Suryam Thapa 1 Jan 26, 2022
Face Anonymizer - FaceAnonApp v1.0

Face Anonymizer - FaceAnonApp v1.0 Blur faces from image and video files in /data/files folder. Contents Repo of the source files for the FaceAnonApp.

6 Apr 18, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022
ocroseg - This is a deep learning model for page layout analysis / segmentation.

ocroseg This is a deep learning model for page layout analysis / segmentation. There are many different ways in which you can train and run it, but by

NVIDIA Research Projects 71 Dec 06, 2022
A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

tooraj taraz 3 Feb 10, 2022
A Joint Video and Image Encoder for End-to-End Retrieval

Frozen️ in Time ❄️ ️️️️ ⏳ A Joint Video and Image Encoder for End-to-End Retrieval (arXiv) Repository to contain the code, models, data for end-to-end

225 Dec 25, 2022
Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

CRNN_Tensorflow This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-En

MaybeShewill-CV 1000 Dec 27, 2022
Code related to "Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity" paper

DataTuner You have just found the DataTuner. This repository provides tools for fine-tuning language models for a task. See LICENSE.txt for license de

81 Jan 01, 2023
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels"

Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels" Please refer to htt

Ke Sun 1 Feb 14, 2022
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Overview This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perfo

Jerod Weinman 489 Dec 21, 2022
Erosion and dialation using structure element in OpenCV python

Erosion and dialation using structure element in OpenCV python

Tamzid hasan 2 Nov 11, 2021
Repositório para registro de estudo da biblioteca opencv (Python)

OpenCV (Python) Objetivo do Repositório: Registrar avanços no estudo da biblioteca opencv. O repositório estará aberto a qualquer pessoa e há tambem u

1 Jun 14, 2022
PSENet - Shape Robust Text Detection with Progressive Scale Expansion Network.

News Python3 implementations of PSENet [1], PAN [2] and PAN++ [3] are released at https://github.com/whai362/pan_pp.pytorch. [1] W. Wang, E. Xie, X. L

1.1k Dec 24, 2022
基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化

SimpleRPA 基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化 简介 SimpleRPA是一款python语言编写的开源RPA工具(桌面自动控制工具),用户可以通过配置yaml格式的文件,来实现桌面软件的自动化控制,简化繁杂重复的工作,比如运营人员给用户发消息,

Song Hui 7 Jun 26, 2022
Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Visual Behavior 86 Dec 28, 2022
Select range and every time the screen changes, OCR is activated.

ASOCR(Auto Screen OCR) Select range and every time you press Space key, OCR is activated. 範囲を選ぶと、あなたがスペースキーを押すたびに、画面が変わる度にOCRが起動します。 usage1: simple OC

1 Feb 13, 2022