当前位置：网站首页>[paper notes] - face recognition facenet - 2015-cvpr

[paper notes] - face recognition facenet - 2015-cvpr

2022-07-18 17:41:00 【chaikeya】

subject ：FaceNet: A Unified Embedding for Face Recognition and Clustering

FaceNet Two networks are used , Namely ZFNet and GoogleNet（Inception V1）.FaceNet Is the direct use of triplet loss Training models （ The input image , Output... In European space ）128 The eigenvectors of the dimensions ,triplets It is composed of two face images from one person and a third image from another person , The purpose of training is that the Euclidean distance between face pairs from the same person is much smaller than the Euclidean distance between face pairs from different people .

DOI：10.1109/CVPR.2015.7298682

Time ：2015-3-12 Upload on arxiv
meeting ：2015-CVPR
Institutions ：Google

Thesis link ：https://arxiv.org/abs/1503.03832
Code link ：code

key word ： Face verification 、 distinguish 、 clustering ; A triple ;

Raise questions ：

The previous algorithms use face image sets with known identities to train a classification model , Then take the output of a middle layer as the feature representation of the face . The disadvantage of this method is ： Not direct enough and inefficient . Not direct enough means that the features of the specified layer you want to learn can be well generalized to unknown faces ; Low efficiency means that the characteristic dimension of general learning is very high （ Greater than 1000 dimension ）, There are some methods to extract features PCA Dimensionality reduction , But this is just a linear transformation , It can be easily realized through a network layer .

Solution ：

Directly through CNN Learn the European Spatial Features of an input face image , Then the smaller the Euclidean distance between the feature vectors of two images , The more likely it is that two images are the same person . Once we have this face image feature extraction model , Then face verification becomes a problem of comparing the similarity of two images and the specified threshold ; Face recognition becomes a feature vector set KNN Classification problem ; Face clustering can be achieved by clustering face feature sets k-means Clustering complete .

Innovation points ：

new loss function ：triplet loss
Feature dimensions can be sparse 128 dimension
The effect of the experiment is good , It performs best on public data sets （LFW-99.63%）
Less pretreatment is required

Effect demonstration ： chart 1 The number in represents the Euclidean distance between image features , You can see , The intra class distance of the image is obviously smaller than the inter class distance , The threshold is approximately 1.1 about .

Model structure ：

Pictured 2 What it shows is FaceNet Model structure .Deep Architecture It is a deep convolution network elimination softmax Structure after . Then a feature normalization , Map the feature onto a hypersphere , Then a new loss function is proposed Triplet Loss.

The so-called embedding （embedding）, It can be understood as a mapping relationship , That is, the feature is mapped from the original feature space to a new feature space , The new feature can be called an embedding of the original feature .

Triplet Loss：

seeing the name of a thing one thinks of its function ,triplet It's a triple （anchor, positive,Negative）, For a face image sample of a specific individual （anchor）, Other face image samples belonging to this individual are positive, Other face image samples that do not belong to this individual are Negative. Then the process of learning is ： For as many triples as possible , bring anchor and positive The distance between them should be less than anchor and negative Distance between . Pictured 3 Shown , Before network learning ,A and P The European distance of may be greater than A and N Distance between , After learning ,A and P The specific will be reduced ,A and N The distance between them will increase , Last A and P The distance between them will be less than A and N Distance between , That is, through learning , Make the distance between classes greater than the distance within classes .

Triplet The choice of ：

Network structure ：

The first is to add 1*1 Convolution ZFNet, Yes 140million Parameters of , The forward calculation amount of each image is 1.6Billion Subfloating point operation . This model is deployed in the data center for reasoning , The network structure is shown in the figure below ：

The second network is deployed on mobile devices , It's using GoogleNet, The parameters are less than those of the first network 20 times , The amount of computation is five times less than that of the first network , The author defines NNS1 ~ NNS4 Four kinds of small networks .

Data sets and evaluation indicators ：

experimental result ：

The training set is private in size 800 Ten thousand people 1~2 A data set of 100 million images . Advanced pedestrian face detection , Then scale the detected face block to 96×96 To 224×224 To train the face feature extraction model . And verified by experiments , The final extracted face feature dimension （Embedding dimension ） by 128 Dimension time , The best effect , Here's the picture .

Final , Mark faces in the wild （LFW） The test set is （99.63%±0.09）,YouTube Faces DB（YTF） The data set is tested as （95.12%±0.39）.

summary ：

FaceNet The main difficulty lies in the choice of ternary , If sanyuanzi doesn't choose well , Training will fail , So the author uses the hard case mining strategy , To be specific, see 3.2 section .FaceNet The data set used is still large , The overall performance is good .

Reference material ：

Reference blog ： Reference blog

B Station popular science ：B Stop video

Euclidean distance ： Euclidean distance

KNN Nearest neighbor classification algorithm ： Machine learning KNN Nearest neighbor classification algorithm

K-Means（ clustering ）：K-Means（ clustering ）

Code explanation ：FaceNet Source code interpretation

原网站

版权声明
本文为[chaikeya]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/199/202207161210145948.html