Introduction
The importance of face anti-spoofing
Limitation of traditional methods and early deep learning methods
Auxiliary supervision, central difference convolution and semantic information
Summary
Related work
Texture-based method
Time-based method
Authors | Methods | Time | Advantages and limitations |
---|---|---|---|
Facial motion features based method | 2007 | Use multi-frame input to capture facial motion features, but can be easily confused by some paper-cut attacks | |
Määttä et al. [1] | Traditional texture-based method | 2011 | Rely on texture features but its feature extraction ability is limited |
Liu, Shuying et al. [36] | Binary CNN-based method | 2015 | Utilize the powerful self-extracting feature ability of CNNs but may not capture the essential features distinguishing spoof and real faces accurately and lead to overfitting |
Pseudo-depth labels based method | 2017 | First, utilize pseudo-depth labels to guide a network, which enhance the performance of face anti-spoofing | |
Multi-frame based method | 2021 | Require multiple frames as input and it is challenging to deploy in production environments | |
This study | Depth map supervision CDC network | 2023 | Utilize depth labels to guide the network and CDC to get more spatial features, leverage domain knowledge and enhance the model’s ability to capture semantic information |
Depth map auxiliary supervision
Proposed method
Network structure
Central difference convolution
Notation | Illustrate |
---|---|
\(p_0\) | Current position on the input and output feature maps |
\(p_n\) | Position computed in R |
\(\theta \) | Weight between CDC and the traditional convolution |
\(\beta \) | Threshold to change between L1 and L2 losses |
\(\ell _{absolute} (x,y)\) | Depth map loss |
\(K_i^{contrast}\) | A convolution kernel |
\({\ell _{contrast}}(x,y)\) | Contrast depth loss |
\({\mathcal {L}}_{depth}(x,y)\) | Depth estimation loss |
p | Prediction probability of classifier |
\(CE(p_t)\) | Conventional binary cross-entropy loss |
\(\alpha _t\) | Weight to balance CEloss |
\(FL(p_t)\) | Focal loss equal to classify loss |
\({\mathcal {L}}(x,y)\) | Total loss |
UCDCN
Loss function
Depth map loss
Contrast depth loss
Classifier loss
Dataset | Year | Subjects | Sessions | Live/attack | Pose range | Different express | Extra light | Spoof attacks |
---|---|---|---|---|---|---|---|---|
Replay-Attack | 2012 | 50 | 1 | 200/1000 | Frontal | No | Yes | Print, 2 Replay |
Oulu-NPU | 2017 | 55 | 3 | 1980/3960 | Frontal | No | Yes | 2 Print, 2 Replay |
SiW | 2018 | 165 | 4 | 1320/3300 | [−90, 90] | Yes | Yes | 2 Print, 4 Replay |
Implementation details
Datasets
Pre-processing stage
Data augmentation
Training strategies
Dataset | Replay-Attack | Oulu-NPU | SiW |
---|---|---|---|
Training images | 56,975 | 247,666 | 1,437,879 |
Test images | 760,00 | 240,823 | 1,221,438 |
Prot | Method | APCER (%) | BPCER (%) | ACER (%) |
---|---|---|---|---|
1 | CPqD | 2.9 | 10.8 | 6.9 |
FAS-BAS [7] | 1.6 | 1.6 | 1.6 | |
GRADIANT [47] | 1.3 | 12.5 | 6.9 | |
CDCN [13] | 0.4 | 1.7 | 1.0 | |
UCDCN-\(\text {L}_2\) (Ours) | 1.4 | 0.4 | 0.9 | |
UCDCN-\(\text {L}_3\) (Ours) | 2.84 | 2.39 | 2.61 | |
2 | MixedFASNet | 9.7 | 2.5 | 6.1 |
GRADIANT [47] | 3.1 | 1.9 | 2.5 | |
FAS-BAS [7] | 2.7 | 2.7 | 2.7 | |
CDCN [13] | 1.5 | 1.4 | 1.5 | |
UCDCN-\(\text {L}_2\) (Ours) | 0.4 | 1.5 | 0.9 | |
UCDCN-\(\text {L}_3\) (Ours) | 2.84 | 2.39 | 2.61 | |
3 | MixedFASNet | 5.3 ± 6.7 | 7.8 ± 5.5 | 6.5 ± 4.6 |
GRADIANT [47] | 2.6 ± 3.9 | 5.0 ± 5.3 | 3.8 ± 2.4 | |
FAS-BAS [7] | 2.7 ± 1.3 | 3.1 ± 1.7 | 2.9 ± 1.5 | |
CDCN [13] | 2.4 ± 1.3 | 2.2 ± 2.0 | 2.3 ± 1.4 | |
UCDCN-\(\text {L}_2\) (Ours) | 1.6 ± 0.6 | 2.7 ± 1.2 | 2.2 \(\varvec{\pm }\) 0.9 | |
UCDCN-\(\text {L}_3\) (Ours) | 1.7 ± 0.7 | 4.6 ± 3.2 | 3.2 ± 1.3 | |
4 | Massy HNU | 35.8 ± 35.3 | 8.3 ± 4.1 | 22.1 ± 17.6 |
GRADIANT [47] | 5.0 ± 4.5 | 15.0 ± 7.1 | 10.0 ± 5.0 | |
FAS-BAS [7] | 9.3 ± 5.6 | 10.4 ± 6.0 | 9.5 ± 6.0 | |
CDCN [13] | 4.6 ± 4.6 | 9.2 ± 8.0 | 6.9 ± 2.9 | |
UCDCN-\(\text {L}_2\) (Ours) | 4.4 ± 2.9 | 6.2 ± 4.4 | 5.3 \(\varvec{\pm }\) 3.4 | |
UCDCN-\(\text {L}_3\) (Ours) | 5.1 ± 4.3 | 9.6 ± 7.1 | 7.4 ± 3.2 |
Dataset | Acc (%) | APCER (%) | BPCER (%) | ACER (%) |
---|---|---|---|---|
Replay-Attack | 99.18% | 0.813% | 0.05% | 0.41% |
Oulu-NPU | 96.35% | 2.6% | 1.01% | 1.82% |
SiW | 99.61% | 0.31% | 0.08% | 0.19% |
Dataset | Classifier | Acc (%) | APCER (%) | BPCER (%) | ACER (%) |
---|---|---|---|---|---|
SiW | Linear | 99.61 | 0.31 | 0.08 | 0.19 |
SiW | ResNet18 | 99.62 | 0.27 | 0.12 | 0.19 |
SiW | MobileNet | 99.47 | 0.48 | 0.05 | 0.27 |
SiW | ShuffleNet | 99.55 | 0.39 | 0.06 | 0.23 |
SiW | VGG11 | 99.16 | 0.45 | 0.40 | 0.42 |
Replay | Linear | 99.18 | 0.81 | 0.01 | 0.41 |
Replay | ResNet18 | 99.05 | 0.94 | 0.01 | 0.47 |
Replay | MobileNet | 98.20 | 0.48 | 1.32 | 0.90 |
Replay | ShuffleNet | 98.99 | 1.01 | 0.00 | 0.51 |
Replay | VGG11 | 97.82 | 0.68 | 1.50 | 1.09 |
OULU | Linear | 96.35 | 2.63 | 1.01 | 1.82 |
OULU | ResNet18 | 95.12 | 4.41 | 0.47 | 2.44 |
OULU | MobileNet | 95.39 | 2.30 | 2.32 | 2.31 |
OULU | ShuffleNet | 94.94 | 4.00 | 1.07 | 2.53 |
OULU | VGG11 | 92.94 | 6.30 | 0.75 | 3.53 |
Classifier | Accuracy (%) | APCER (%) | BPCER (%) | ACER (%) |
---|---|---|---|---|
Linear | 96.35 | 2.63 | 1.01 | 1.82 |
ResNet18 | 95.12 | 4.42 | 0.47 | 2.44 |
ShuffleNet | 94.94 | 4.00 | 1.07 | 2.53 |
MobileNet | 95.39 | 2.30 | 2.32 | 2.31 |
VGG11 | 92.94 | 6.30 | 0.75 | 3.53 |
Prot | Subset | Session | Phones | Users | Attacks created using | Real | Attack | All |
---|---|---|---|---|---|---|---|---|
1 | Train | 1, 2 | 6 | 1–20 | Printer 1, 2; Display 1, 2 | 240 | 960 | 1200 |
Dev | 1, 2 | 6 | 21–35 | Printer 1, 2; Display 1, 2 | 180 | 720 | 900 | |
Test | 3 | 6 | 36–55 | Printer 1, 2; Display 1, 2 | 120 | 480 | 600 | |
2 | Train | 1, 2, 3 | 6 | 1–20 | Printer 1; Display 1 | 360 | 720 | 1080 |
Dev | 1, 2, 3 | 6 | 21–35 | Printer 1; Display 1 | 270 | 540 | 810 | |
Test | 1, 2, 3 | 6 | 36–55 | Printer 2; Display 2 | 360 | 720 | 1080 | |
3 | Train | 1, 2, 3 | 5 | 1–20 | Printer 1, 2; Display 1, 2 | 300 | 1200 | 1500 |
Dev | 1, 2, 3 | 5 | 21–35 | Printer 1, 2; Display 1, 2 | 225 | 900 | 1125 | |
Test | 1, 2, 3 | 1 Phone | 36–55 | Printer 1, 2; Display 1, 2 | 60 | 240 | 300 | |
4 | Train | 1,2 | 5 | 1–20 | Printer 1; Display 1 | 200 | 400 | 600 |
Dev | 1, 2 | 5 | 21–35 | Printer 1; Display 1 | 150 | 300 | 450 | |
Test | 3 | 1 Phone | 36–55 | Printer 2; Display 2 | 20 | 40 | 60 |