MTCNN face detection implementation for TensorFlow, as a PIP package.

Last update: Dec 30, 2022

Overview

MTCNN

Implementation of the MTCNN face detector for Keras in Python3.4+. It is written from scratch, using as a reference the implementation of MTCNN from David Sandberg (FaceNet's MTCNN) in Facenet. It is based on the paper Zhang, K et al. (2016) [ZHANG2016].

INSTALLATION

Currently it is only supported Python3.4 onwards. It can be installed through pip:

$ pip install mtcnn

This implementation requires OpenCV>=4.1 and Keras>=2.0.0 (any Tensorflow supported by Keras will be supported by this MTCNN package). If this is the first time you use tensorflow, you will probably need to install it in your system:

$ pip install tensorflow

or with conda

$ conda install tensorflow

Note that tensorflow-gpu version can be used instead if a GPU device is available on the system, which will speedup the results.

USAGE

The following example illustrates the ease of use of this package:

>>> from mtcnn import MTCNN
>>> import cv2
>>>
>>> img = cv2.cvtColor(cv2.imread("ivan.jpg"), cv2.COLOR_BGR2RGB)
>>> detector = MTCNN()
>>> detector.detect_faces(img)
[
    {
        'box': [277, 90, 48, 63],
        'keypoints':
        {
            'nose': (303, 131),
            'mouth_right': (313, 141),
            'right_eye': (314, 114),
            'left_eye': (291, 117),
            'mouth_left': (296, 143)
        },
        'confidence': 0.99851983785629272
    }
]

The detector returns a list of JSON objects. Each JSON object contains three main keys: 'box', 'confidence' and 'keypoints':

The bounding box is formatted as [x, y, width, height] under the key 'box'.
The confidence is the probability for a bounding box to be matching a face.
The keypoints are formatted into a JSON object with the keys 'left_eye', 'right_eye', 'nose', 'mouth_left', 'mouth_right'. Each keypoint is identified by a pixel position (x, y).

Another good example of usage can be found in the file "example.py." located in the root of this repository. Also, you can run the Jupyter Notebook "example.ipynb" for another example of usage.

BENCHMARK

The following tables shows the benchmark of this mtcnn implementation running on an Intel i7-3612QM CPU @ 2.10GHz, with a CPU-based Tensorflow 1.4.1.

Pictures containing a single frontal face:

Image size	Total pixels	Process time	FPS
460x259	119,140	0.118 seconds	8.5
561x561	314,721	0.227 seconds	4.5
667x1000	667,000	0.456 seconds	2.2
1920x1200	2,304,000	1.093 seconds	0.9
4799x3599	17,271,601	8.798 seconds	0.1

Pictures containing 10 frontal faces:

Image size	Total pixels	Process time	FPS
474x224	106,176	0.185 seconds	5.4
736x348	256,128	0.290 seconds	3.4
2100x994	2,087,400	1.286 seconds	0.7

MODEL

By default the MTCNN bundles a face detection weights model.

The model is adapted from the Facenet's MTCNN implementation, merged in a single file located inside the folder 'data' relative to the module's path. It can be overriden by injecting it into the MTCNN() constructor during instantiation.

The model must be numpy-based containing the 3 main keys "pnet", "rnet" and "onet", having each of them the weights of each of the layers of the network.

For more reference about the network definition, take a close look at the paper from Zhang et al. (2016) [ZHANG2016].

LICENSE

MIT License.

REFERENCE

[ZHANG2016]

(1, 2) Zhang, K., Zhang, Z., Li, Z., and Qiao, Y. (2016). Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 23(10):1499–1503.

MTCNN face detection implementation for TensorFlow, as a PIP package.

Related tags

Overview

MTCNN

INSTALLATION

USAGE

BENCHMARK

MODEL

LICENSE

REFERENCE

Owner

Iván de Paz Centeno

Repository for the Bias Benchmark for QA dataset.

Normal Learning in Videos with Attention Prototype Network

A GPU-optional modular synthesizer in pytorch, 16200x faster than realtime, for audio ML researchers.

Cross Quality LFW: A database for Analyzing Cross-Resolution Image Face Recognition in Unconstrained Environments

A collection of 100 Deep Learning images and visualizations

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

Crab is a ﬂexible, fast recommender engine for Python that integrates classic information ﬁltering recommendation algorithms in the world of scientiﬁc Python packages (numpy, scipy, matplotlib).

A semismooth Newton method for elliptic PDE-constrained optimization

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

Code for the AAAI 2022 paper "Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph".

Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

Task Transformer Network for Joint MRI Reconstruction and Super-Resolution (MICCAI 2021)

[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

PyTorch implementation of the YOLO (You Only Look Once) v2

Lolviz - A simple Python data-structure visualization tool for lists of lists, lists, dictionaries; primarily for use in Jupyter notebooks / presentations

Download & Install mods for your favorit game with a few simple clicks

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

The 1st place solution of track2 (Vehicle Re-Identification) in the NVIDIA AI City Challenge at CVPR 2021 Workshop.

CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection