人脸识别

本文介绍人脸识别算法,包括如何实现triplet代价函数,用训练好的模型将人脸图像映射为一个128维的编码向量,计算编码向量之间的距离实现人脸鉴定和人脸识别。

朴素人脸鉴定

人脸鉴定最直接的做法是对比两个图像之间的像素点,如果它们之间的距离小于某个选定的阈值,则判定为一个人。

图 1

这种算法性能很差,原因是:人脸的曝光度、朝向甚至是头部的位置稍微变化,像素值的变化就很大。

另一种更加稳健的做法,是先学习一个编码函数,然后再对比这个编码函数之间的差异,来判断是否为一个人。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
K.set_image_data_format('channels_first')
import cv2
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from fr_utils import *
from inception_blocks_v2 import *

%matplotlib inline
%load_ext autoreload
%autoreload 2

np.set_printoptions(threshold=np.nan)
Using TensorFlow backend.

将人脸图像编码为128维的向量

利用ConvNet计算编码函数

FaceNet模型需要大量数据和时间去训练,因此本文只加载训练好的模型。网络架构是Inception 模型Szegedy et al.

需要注意的事项:

  • 这个网络使用96*96的RGB图像作为输入,具体的说,输入图像集维度为:\((m, n_C, n_H, n_W) = (m, 3, 96, 96)\)
  • 输出编码函数的维度为\((m, 128)\)
1
FRmodel = faceRecoModel(input_shape=(3, 96, 96))
1
print("Total Params:", FRmodel.count_params())
Total Params: 3743280
图 2:
对比两个编码向量之间的距离,判断两张图片是否是一个人

判断一个编码是否好的条件:

  • 同一个人的两张图片编码之后十分相似
  • 不同人的两张图片编码之后差异很大

Triplet代价函数的目的就是实现这个功能,将同一个人的两张图片(Anchor和Positive)编码之后尽量推近,而将不同人的两张图片(Anchor和Negative)编码之后拉开。


图 3:
从左到右图片分别称为: Anchor (A), Positive (P), Negative (N)

Triplet代价函数

对一张图像\(x\), 其编码函数为\(f(x)\), 其中 \(f\) 为神经网络计算得到的函数。

训练过程使用三对图像 \((A, P, N)\):

  • A 是 "Anchor" 图像--某个人的图像.
  • P 是"Positive" 图像--和Anchor图像同一个人的图像.
  • N 是"Negative" 图像--和Anchor图像不同人的图像.

这三对图像都是从训练数据集中提取出来的。\((A^{(i)}, P^{(i)}, N^{(i)})\) 表示第\(i\)个训练样本.

希望保证每一张图像 \(A^{(i)}\)和正图像 \(P^{(i)}\) 的距离至少比它和负图像之间 \(N^{(i)}\))之间的距离至少近 \(\alpha\)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# GRADED FUNCTION: triplet_loss

def triplet_loss(y_true, y_pred, alpha = 0.2):
"""
Implementation of the triplet loss as defined by formula (3)

Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 128)
positive -- the encodings for the positive images, of shape (None, 128)
negative -- the encodings for the negative images, of shape (None, 128)

Returns:
loss -- real number, value of the loss
"""

anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]

### START CODE HERE ### (≈ 4 lines)
# Step 1: Compute the (encoding) distance between the anchor and the positive, you will need to sum over axis=-1
pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), axis=-1)
# Step 2: Compute the (encoding) distance between the anchor and the negative, you will need to sum over axis=-1
neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), axis=-1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = pos_dist - neg_dist + alpha
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.reduce_sum(tf.maximum(basic_loss, 0))
### END CODE HERE ###

return loss
1
2
3
4
5
6
7
8
9
with tf.Session() as test:
tf.set_random_seed(1)
y_true = (None, None, None)
y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),
tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))
loss = triplet_loss(y_true, y_pred)

print("loss = " + str(loss.eval()))
loss = 528.143

加载训练好的模型

FaceNet模型最小化上面的triplet代价函数。由于需要大量数据和计算,本文加载已经训练好的模型。


图 4:
例子:计算三个人编码向量的距离
1
2
FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
load_weights_from_FaceNet(FRmodel)

应用这个模型

人脸鉴定

将所有人的图像编码为128维向量,放在一个python字典中,组成数据库。当一个新成员进入,利用他输入的ID号,检查是否和数据库中这个人的编码向量是否匹配。

1
2
3
4
5
6
7
8
9
10
11
12
13
database = {}
database["danielle"] = img_to_encoding("images/danielle.png", FRmodel)
database["younes"] = img_to_encoding("images/younes.jpg", FRmodel)
database["tian"] = img_to_encoding("images/tian.jpg", FRmodel)
database["andrew"] = img_to_encoding("images/andrew.jpg", FRmodel)
database["kian"] = img_to_encoding("images/kian.jpg", FRmodel)
database["dan"] = img_to_encoding("images/dan.jpg", FRmodel)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", FRmodel)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", FRmodel)
database["kevin"] = img_to_encoding("images/kevin.jpg", FRmodel)
database["felix"] = img_to_encoding("images/felix.jpg", FRmodel)
database["benoit"] = img_to_encoding("images/benoit.jpg", FRmodel)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", FRmodel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# GRADED FUNCTION: verify

def verify(image_path, identity, database, model):
"""
Function that verifies if the person on the "image_path" image is "identity".

Arguments:
image_path -- path to an image
identity -- string, name of the person you'd like to verify the identity. Has to be a resident of the Happy house.
database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
model -- your Inception model instance in Keras

Returns:
dist -- distance between the image_path and the image of "identity" in the database.
door_open -- True, if the door should open. False otherwise.
"""

### START CODE HERE ###

# Step 1: Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)
encoding = img_to_encoding(image_path, model)

# Step 2: Compute distance with identity's image (≈ 1 line)
dist = np.linalg.norm(database[identity] - encoding)

# Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)
if dist < 0.7:
print("It's " + str(identity) + ", welcome home!")
door_open = True
else:
print("It's not " + str(identity) + ", please go away")
door_open = False

### END CODE HERE ###

return dist, door_open

Younes试图进入开心屋,摄像头拍了一张他的照片,系统开始验证是否是Younes。

1
verify("images/camera_0.jpg", "younes", database, FRmodel)
It's younes, welcome home!





(0.65939283, True)

Benoit破坏了开心屋的规则,偷了Kian的ID卡,试图假装成Kian混进开心屋。门前摄像头拍了一张他的照片,系统开始验证他是否是Kian。

1
verify("images/camera_2.jpg", "kian", database, FRmodel)
It's not kian, please go away





(0.86224014, False)

人脸识别

和前面的人脸鉴定不同,这是1-k问题。流程如下:

  • 计算新人图像的编码向量
  • 从数据库中找到和新人图像距离最小的图像。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# GRADED FUNCTION: who_is_it

def who_is_it(image_path, database, model):
"""
Implements face recognition for the happy house by finding who is the person on the image_path image.

Arguments:
image_path -- path to an image
database -- database containing image encodings along with the name of the person on the image
model -- your Inception model instance in Keras

Returns:
min_dist -- the minimum distance between image_path encoding and the encodings from the database
identity -- string, the name prediction for the person on image_path
"""

### START CODE HERE ###

## Step 1: Compute the target "encoding" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)
encoding = img_to_encoding(image_path, model)

## Step 2: Find the closest encoding ##

# Initialize "min_dist" to a large value, say 100 (≈1 line)
min_dist = 100

# Loop over the database dictionary's names and encodings.
for (name, db_enc) in database.items():

# Compute L2 distance between the target "encoding" and the current "emb" from the database. (≈ 1 line)
dist = np.linalg.norm(db_enc - encoding)

# If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
if dist < min_dist:
min_dist = dist
identity = name

### END CODE HERE ###

if min_dist > 0.7:
print("Not in the database.")
else:
print ("it's " + str(identity) + ", the distance is " + str(min_dist))

return min_dist, identity
1
who_is_it("images/camera_0.jpg", database, FRmodel)
it's younes, the distance is 0.659393





(0.65939283, 'younes')

结论及讨论

结论:

  • 人脸鉴定解决的是1:1匹配问题;而人脸识别解决的是1:K匹配问题
  • Triplet代价函数用于训练神经网络以学习人脸图像的编码函数,十分有效
  • 相同的编码函数可以用于人脸鉴定和人脸识别。通过计算两张图像编码向量之间的距离,可以判断它们是否是一个人的图像

可以进一步改善算法性能的方面:

  • 将每个人更多图像(不同的灯光、不同的角度等)的编码向量放入数据库中。给定新的图像,对比这张图像和这个人所有图像的编码函数,提高对比的精度
  • 将图像剪切到只包含脸部,这种预处理手段可以消除脸部周围的不相关像素对算法的影响,提高算法的稳定性

参考资料