Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Overview

Face Recognition: Too Bias, or Not Too Bias?

Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition: too bias, or not too bias? " In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-1. 2020.
@inproceedings{robinson2020face,
               title={Face recognition: too bias, or not too bias?},
               author={Robinson, Joseph P and Livitz, Gennady and Henon, Yann and Qin, Can and Fu, Yun and Timoner, Samson},
               booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
               pages={0--1},
               year={2020}
             }
    

Robinson, Joseph P., Can Qin, Yann Henon, Samson Timoner, and Yun Fu. "Balancing Biases and Preserving Privacy on Balanced Faces in the Wild." In CoRR arXiv:2103.09118, (2021).
@article{robinson2021balancing,
        title={Balancing Biases and Preserving Privacy on Balanced Faces in the Wild},
        author={Robinson, Joseph P and Qin, Can and Henon, Yann and Timoner, Samson and Fu, Yun},
        journal={arXiv preprint arXiv:2103.09118},
        year={2021}
       }
    

Teaser

Balanced Faces in the Wild (BFW): Data, Code, Evaluations

version: 0.4.5 (following Semantic Versioning Scheme-- learn more here, https://semver.org)

Intended to address problems of bias in facial recognition, we built BFW as a labeled data resource made available for evaluating recognition systems on a corpus of facial imagery made-up of EQUAL face count for all subjects: EQUAL across demographics, and, thus, face data balanced in faces per subject, individuals per ethnicity, and ethnicities per gender or vise versa.

Data can be accessed via Google form or Microsft form. Do not hesitate to report an issue for any and all inquiries.

Project Overview

This project investigates bias in automatic facial recognition (FR). Specifically, subjects are grouped into predefined subgroups based on gender, ethnicity, and soon-to-be age. For this, we propose a novel image collection called Balanced Faces in the Wild (BFW), which is balanced across eight subgroups (i.e., 800 face images of 100 subjects, each with 25 face samples). Thus, along with the name (i.e., identification) labels and task protocols (e.g., list of pairs for face verification, pre-packaged data-table with additional metadata and labels, etc.), BFW clearly groups into ethnicities (i.e., Asian (A), Black (B), Indian (I), and White (W)) and genders (i.e., Females (F) and Males (M)). Thus, the motivation and intent are that BFW will provide a proxy to characterize FR systems with demographic-specific analysis now possible. For instance, various confusion metrics, along with the predefined criteria (i.e., score threshold), are fundamental when characterizing performance ratings of FR systems. The following visualization summarizes the confusion metrics in a way that relates to the different measurements.

metrics

As discussed, the motivation for designing, building, and releasing BFW for research purposes has been discussed. We expect the data, all-in-all, will continue to evolve. Nonetheless, as is, there are vast options on ways to advance technology and our understanding thereof. Let us now focus on the contents of the repo (i.e., code-base) for which was created to support the data of BFW (i.e., data proxy), making all experiments in paper easily reproducible and, thus, the work more friendly for getting started.

Experimental-based contributions and findings

Several observations were made that widened our understanding of bias in FR. Views were demonstrated experimentally, with all code used in experiments added as a part of this repo.

Score sensitivity

For instance, it is shown that the scoring sensitivity within different subgroups verifies. That is, faces of the same identity tend to shift in expected values (e.g., given a correct pair of Black faces, on average, have similarity scores smaller than a true pair of White, and the middle range of scores for Males compared to Females). This is demonstrated using fundamental signal detection models (SDM), along with detection error trade-off (DET) curves.

Global threshold

Once an FR system is deployed, a criterion (i.e., threshold) is set (or tunable) such that similarity scores that do not pass are assumed false matches and are filtered out of the candidate pool for potential true pairs. In other words, thresholds act as decision boundaries that map scores (or distances) to nominal values such as genuine or imposter. Considering the variable sensitivity found prior, intuition tells us that a variable threshold is optimal. Thus, returning to the fundamental concepts of signal detection theory, we show that using a single, global threshold yields skewed performance ratings across different subgroups. For this, we demonstrate that subgroup-specific thresholds are optimal in terms of overall performance and balance across subgroups.

All-in-all

All of this and more (i.e., evaluation and analysis of FR systems on BFW data, along with data structures and implementation schemes optimized for the problems at hand, are included in modules making up the project and demonstrated in notebook tutorials). We will continue to add tools for a fair analysis of FR systems. Thus, not only the experiments but also the data we expect to grow. All contributions are not only welcome but are entirely encouraged.

Here are quick links to key aspects of this resource.

Register and download via this form.

Final note. Thee repo is a work-in-progress. Certainly, it is ready to be cloned and used; however, expect regular improvements, both in the implementation and documentation (i.e., getting started instructions will be enhanced). For now, it is recommended to begin with README files listed just above, along with the tutorial notebooks found in code-> notebooks with brief descriptions in README and more detail inline of each notebook. Again, PRs are more than welcome :)

Paper abstract

We reveal critical insights into bias problems in state-of-the-art facial recognition (FR) systems using a novel Balanced Faces In the Wild (BFW) dataset: data balanced for gender and ethnic groups. We show variations in the optimal scoring threshold for face pairs across different subgroups. Thus, the conventional approach of learning a global threshold for all pairs results in performance gaps between subgroups. By learning subgroup-specific thresholds, we reduce performance gaps and show a notable boost in overall performance. Furthermore, we do a human evaluation to measure human bias, which supports the hypothesis that an analogous bias exists in human perception. For the BFW database, source code, and more, visit https://github.com/visionjo/facerec-bias-bfw.

To Do

  • Begin Template
  • Create demo notebooks
  • Add manuscript
  • Documentation (sphinx)
  • Update README (this)
  • Pre-commit, formatter (Black) and .gitignore
  • Complete test harness
  • Modulate (refactor) code
  • Complete datatable (i.e., extend pandas.DataFrame)
  • Add scripts and CLI

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code without warranty, so long as you provide attribution to the authors. See LICENSE.md (LICENSE) for the full license text.

The manuscript text is not open source. The authors reserve the rights to the article content, which is currently submitted for publication in the 2020 IEEE Conference on AMFG.

Acknowledgement

We would like to thank the PINGA organization on Github for the project template used to structure this project.

Comments
  • About the MTCNN face detections and preprocessing

    About the MTCNN face detections and preprocessing

    Hi,

    It would be great if you could clarify a few questions regarding this dataset please.

    1. Is it possible for you to provide the MTCNN output face detections (bounding boxes and facial landmarks) for the face samples in BFW?

    2. Am I right in assuming MTCNN takes as input the images in "face-samples" folder of the dataset? If yes, what settings do we use with MTCNN in order for us to detect a face correctly on all the facial images provided in face-samples? If not, can you help us reproduce your face detection results by providing us with the original images on which the MTCNN was run to obtain the results in face-samples?

    3. Are the images in facial-samples actually crops which are aligned?

    Thanks in advance for your help.

    opened by manisoftwartist 2
  • Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    3 Figures based on the paper "Face Recognition: Too Bias, or Not Too Bias" are

    1. DET curves: FPR versus FNR by moving threshold
    2. Score distributions for genuine and imposter using violin plots
    3. Confusion matrix for Rank 1 and any Rank.
    opened by suchanv 2
  • Devel

    Devel

    Develop branch-- prepare for next version release.

    Aim for the following for version 0.1.1:

    • [ ] Notebook is updated to use interface recently modulated (#21)
    • [ ] Update Documentation to explain steps to run (#21)
      • [ ] add to README in root
      • [x] move results from README in root to the README in results/
      • [x] move data section from README in root to the README in data/
        • [x] save curve data along with PDF (i.e., in results/)
      • [ ] Add simple (brief) docstring where missing)
      • [ ] A sample (toy) set is run end-to-end (demonstrate in README)
        • [ ] if small enough, add to repo (i.e., < 40 MB or so)
    • [ ] Finish script to generate Tar @ Far table
    • [ ] Improve annotation in notebooks; more description, i.e., tutorial-like.
    • [ ] create pdf versions of notebooks and add to project in notebooks/pdfs (or create nbviewer and point to it)
    • [ ] add assertions (and tests) where appropriate-- at least critical cases, such a specific type is expected.
    • [ ] Consider moving some of the analysis functions to visualizations.
      • [ ] modulate the handling of plt.axes objects
      • [ ] add optional input arguments for the title and other figure cosmetics or settings
    • [x] Add benchmarks for sphereface features. Make these the results showcased throughout.
    documentation enhancement Benchmark Project-level 
    opened by visionjo 1
  • Questions on verification_RFW and training procedure

    Questions on verification_RFW and training procedure

    Hi, Thanks for your great work and sharing of the code on these two papers ! It takes me days to read the paper and go through the repository and I have a few questions:

    (2) Do you have the code for training the features (asian_females, asian_males, black_females, black_males, indian_females, indian_males,...). Since I have a hard time finding something like train.py (e.g. the loss function and training process). (I suppose the released code is mainly on image pre-processing and result analysis) (Since BFW dataset is not as large as other face dataset and it may possible for me to train it from scratch on one GPU)

    (3) I am little confused about how the BFW is used in two papers, as I understand:

    in paper Face Recognition: Too Bias, or Not Too Bias? , the train and test model are as follows: train: CASIA_webface trained using Sphereface loss test: LFW where does BFW dataset not used in training in this set of experiments?

    in paper Balancing Biases and Preserving Privacy on Balanced Faces in the Wild the train, test model are as follows: tain: (1) MS1M trained using Arcface loss --> to get 512-dim embedding (f_in in Fig.6) (2) BFW dataset is used to train the encoder and two classifiers in Fig 6 test: 4-folds used for training and 1-fold used for testing (using the best threshold chosen)

    is that right?

    (4) There are some difference from "bfw-v0.1.5-datatable.csv" and the TABLE-2 in paper 2: for example: there are 921379 records in TABLE-2 while ther are 923898 records from the csv file? and there is no "{dir_meta}thresholds.pkl" file.

    Thanks for your time and any help would be appreciated !

    opened by lizhenstat 7
  • Regarding face identification

    Regarding face identification

    Hey,

    Thanks for the awesome work!

    I wanted to know how I can modify the repo to use for face identification task instead of verification.

    Any help would be highly appreciated.

    opened by shivmgg 1
  • Sphinx documentation

    Sphinx documentation

    Setup the project for sphinx.

    Include clear instruction on how to maintain (i.e., once in place, we'll include as part of the build process (see in docs/)

    Setup for tutorials on the different concepts and experiments done as part of this line of work (i.e., facial bias and BFW database)

    documentation enhancement 
    opened by visionjo 0
  • Create plan for Dash interface

    Create plan for Dash interface

    Project plan (lead: Dylan; support: Rohan):

    • [ ] what features to include
    • [ ] Specifications
    • [ ] Interface layout (use lucidchart or equivalent)
    • [ ] Division of tasks and proposed timeline
    Plan and design Project-level 
    opened by visionjo 0
Releases(v0.0.3)
Owner
Joseph P. Robinson
Ph.D., Northeastern, 2020. Focus: applied machine learning, mostly vision. At Vicarious Surgical's ASDAI group, an AI Engineer working on our surgical robot.
Joseph P. Robinson
Specificity-preserving RGB-D Saliency Detection

Specificity-preserving RGB-D Saliency Detection Authors: Tao Zhou, Huazhu Fu, Geng Chen, Yi Zhou, Deng-Ping Fan, and Ling Shao. 1. Preface This reposi

Tao Zhou 35 Jan 08, 2023
A light-weight image labelling tool for Python designed for creating segmentation data sets.

An image labelling tool for creating segmentation data sets, for Django and Flask.

117 Nov 21, 2022
Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Aesara is a Python library that allows one to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.

Aesara 898 Jan 07, 2023
PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{CV2018, author = {Donny You ( Donny You 40 Sep 14, 2022

Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022
Python program that works as a contact list

Lista de Contatos Programa em Python que funciona como uma lista de contatos. Features Adicionar novo contato Remover contato Atualizar contato Pesqui

Victor B. Lino 3 Dec 16, 2021
PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

PatrickStar: Parallel Training of Large Language Models via a Chunk-based Memory Management Meeting PatrickStar Pre-Trained Models (PTM) are becoming

Tencent 633 Dec 28, 2022
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.

Ziqi Yuan 10 Sep 30, 2022
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train format

ttopt Description Gradient-free global optimization algorithm for multidimensional functions based on the low rank tensor train (TT) format and maximu

5 May 23, 2022
Auditing Black-Box Prediction Models for Data Minimization Compliance

Data-Minimization-Auditor An auditing tool for model-instability based data minimization that is introduced in "Auditing Black-Box Prediction Models f

Bashir Rastegarpanah 2 Mar 24, 2022
Pytorch implementation of NEGEV method. Paper: "Negative Evidence Matters in Interpretable Histology Image Classification".

Pytorch 1.10.0 code for: Negative Evidence Matters in Interpretable Histology Image Classification (https://arxiv. org/abs/xxxx.xxxxx) Citation: @arti

Soufiane Belharbi 4 Dec 01, 2022
L-Verse: Bidirectional Generation Between Image and Text

Far beyond learning long-range interactions of natural language, transformers are becoming the de-facto standard for many vision tasks with their power and scalabilty

Kim, Taehoon 102 Dec 21, 2022
This is an open source library implementing hyperbox-based machine learning algorithms

hyperbox-brain is a Python open source toolbox implementing hyperbox-based machine learning algorithms built on top of scikit-learn and is distributed

Complex Adaptive Systems (CAS) Lab - University of Technology Sydney 21 Dec 14, 2022
A annotation of yolov5-5.0

代码版本:0714 commit #4000 $ git clone https://github.com/ultralytics/yolov5 $ cd yolov5 $ git checkout 720aaa65c8873c0d87df09e3c1c14f3581d4ea61 这个代码只是注释版

Laughing 229 Dec 17, 2022
Extension to fastai for volumetric medical data

FAIMED 3D use fastai to quickly train fully three-dimensional models on radiological data Classification from faimed3d.all import * Load data in vari

Keno 26 Aug 22, 2022
Tensorflow implementation of DeepLabv2

TF-deeplab This is a Tensorflow implementation of DeepLab, compatible with Tensorflow 1.2.1. Currently it supports both training and testing the ResNe

Chenxi Liu 21 Sep 27, 2022
Detectron2 for Document Layout Analysis

Detectron2 trained on PubLayNet dataset This repo contains the training configurations, code and trained models trained on PubLayNet dataset using Det

Himanshu 163 Nov 21, 2022
Convolutional neural network that analyzes self-generated images in a variety of languages to find etymological similarities

This project is a convolutional neural network (CNN) that analyzes self-generated images in a variety of languages to find etymological similarities. Specifically, the goal is to prove that computer

1 Feb 03, 2022
Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022