Face Recognition and Shapley values
My post last month explained Facebook’s Deepfake Detection Challenge and created new face frames with OpenCV and a GAN (generator and discriminator).
Things have moved quickly!
- The challenge moved to this Kaggle challenge; it has more videos and anyone can sign up.
- I learned that deepfakes typically don’t use GANs; instead they use autoencoders.
Planning Next Steps
On my planned steps toward deepfake detection, my next step is a classifier of one actor’s real and faked scenes, and an attention map to show which pixels that the classifier uses. Focusing on one actor means I’ll be doing facial recognition, an easier problem than deepfake detection, but… baby steps.
One Actor, One Scene
The scene is filmed in front of a door and has six face alternates. The challenge comes with two deepfake techniques (named Method A and Method B) and one of the alternates appears in both methods.
I decide to use OpenCV to crop to the area around the actor’s head. I select different 240 frames of the original video, and 80 later frames of three fake faces, so the training data will be balanced and not have identical frames to compare.
I made a binary classifier which TensorFlow said was highly accurate, but the image tool which I was using expected a multiclass output from ImageNet. After attempting to fake ImageNet output, I considered switching to ELI5, but this had difficulty working with different layers from TensorFlow and Keras.
Start with a pipeline that works
I want to bring images into a binary classifier, and output the prediction from some Explainable AI framework. I stumbled on “shap” for Shapley values. I started with their Imagenet example, which works as promised:
My next step was to swap in my video frames as the source data. Imagenet expects a 224x224 input image, so I resized one frame each from original and fake sources.Image.open("./mlin/training/originals/face38.jpg").resize((224, 224), Image.ANTIALIAS)
In Imagenet land, these images matched ‘abaya’ or other clothing items.
Training a custom image classifier
I searched around for a Keras image classifier, and found this guide.
Dropping that model into our existing code, at first I was impressed — training data is 75–85% accurate! Then I saw that validation frames were not as accurate.
A selection of two validation frames (original face at left, deepfake at right) hints at two problems:
- The position of the door and light around the actor’s head are included as important pixel data.
- The classifier expects the face in a constant position, likely from the early frames of the video.
My interpretation is that we included too many pixels which were too close to each other, during the data generation step.
Variety of pixels
I spaced out my training data by randomly selecting 25% of face-detected frames until I have enough data to work with. I also allow fake frames to come from the same video timeframe. Here we see two fakes and an original:
The blue echos in the first two look really similar to me, only slightly shifted up or down. I saw similar results on originals, then retrained the model, for the image which you see here.
Interestingly, in the original video, the weight scale is about 1/3 of the range that we saw in the fake videos.
Going forward
The current code is here (CoLab notebook isn’t sharing well) https://gist.github.com/mapmeld/0730b9e3e9a7baf68b937c30ff04e9a9
I found a pipeline to train a neural network and explain the results with the SHAP image map. This will allow me to confirm that a final deepfake detector is examining users’ faces, and not extraneous information in backgrounds. As is, I’m not confident that my training frames had enough variety. Sometimes I look at layer 7 or 8 with my SHAP imagemap and it looks different. Sometimes I rerun the code and can’t generate a colorful imagemap. In the future, I’ll try to understand more about what’s occurring inside there. I’ll produce more frames for future training and validation.