Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Overview

Face Recognition: Too Bias, or Not Too Bias?

Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition: too bias, or not too bias? " In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-1. 2020.
@inproceedings{robinson2020face,
               title={Face recognition: too bias, or not too bias?},
               author={Robinson, Joseph P and Livitz, Gennady and Henon, Yann and Qin, Can and Fu, Yun and Timoner, Samson},
               booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
               pages={0--1},
               year={2020}
             }
    

Robinson, Joseph P., Can Qin, Yann Henon, Samson Timoner, and Yun Fu. "Balancing Biases and Preserving Privacy on Balanced Faces in the Wild." In CoRR arXiv:2103.09118, (2021).
@article{robinson2021balancing,
        title={Balancing Biases and Preserving Privacy on Balanced Faces in the Wild},
        author={Robinson, Joseph P and Qin, Can and Henon, Yann and Timoner, Samson and Fu, Yun},
        journal={arXiv preprint arXiv:2103.09118},
        year={2021}
       }
    

Teaser

Balanced Faces in the Wild (BFW): Data, Code, Evaluations

version: 0.4.5 (following Semantic Versioning Scheme-- learn more here, https://semver.org)

Intended to address problems of bias in facial recognition, we built BFW as a labeled data resource made available for evaluating recognition systems on a corpus of facial imagery made-up of EQUAL face count for all subjects: EQUAL across demographics, and, thus, face data balanced in faces per subject, individuals per ethnicity, and ethnicities per gender or vise versa.

Data can be accessed via Google form or Microsft form. Do not hesitate to report an issue for any and all inquiries.

Project Overview

This project investigates bias in automatic facial recognition (FR). Specifically, subjects are grouped into predefined subgroups based on gender, ethnicity, and soon-to-be age. For this, we propose a novel image collection called Balanced Faces in the Wild (BFW), which is balanced across eight subgroups (i.e., 800 face images of 100 subjects, each with 25 face samples). Thus, along with the name (i.e., identification) labels and task protocols (e.g., list of pairs for face verification, pre-packaged data-table with additional metadata and labels, etc.), BFW clearly groups into ethnicities (i.e., Asian (A), Black (B), Indian (I), and White (W)) and genders (i.e., Females (F) and Males (M)). Thus, the motivation and intent are that BFW will provide a proxy to characterize FR systems with demographic-specific analysis now possible. For instance, various confusion metrics, along with the predefined criteria (i.e., score threshold), are fundamental when characterizing performance ratings of FR systems. The following visualization summarizes the confusion metrics in a way that relates to the different measurements.

metrics

As discussed, the motivation for designing, building, and releasing BFW for research purposes has been discussed. We expect the data, all-in-all, will continue to evolve. Nonetheless, as is, there are vast options on ways to advance technology and our understanding thereof. Let us now focus on the contents of the repo (i.e., code-base) for which was created to support the data of BFW (i.e., data proxy), making all experiments in paper easily reproducible and, thus, the work more friendly for getting started.

Experimental-based contributions and findings

Several observations were made that widened our understanding of bias in FR. Views were demonstrated experimentally, with all code used in experiments added as a part of this repo.

Score sensitivity

For instance, it is shown that the scoring sensitivity within different subgroups verifies. That is, faces of the same identity tend to shift in expected values (e.g., given a correct pair of Black faces, on average, have similarity scores smaller than a true pair of White, and the middle range of scores for Males compared to Females). This is demonstrated using fundamental signal detection models (SDM), along with detection error trade-off (DET) curves.

Global threshold

Once an FR system is deployed, a criterion (i.e., threshold) is set (or tunable) such that similarity scores that do not pass are assumed false matches and are filtered out of the candidate pool for potential true pairs. In other words, thresholds act as decision boundaries that map scores (or distances) to nominal values such as genuine or imposter. Considering the variable sensitivity found prior, intuition tells us that a variable threshold is optimal. Thus, returning to the fundamental concepts of signal detection theory, we show that using a single, global threshold yields skewed performance ratings across different subgroups. For this, we demonstrate that subgroup-specific thresholds are optimal in terms of overall performance and balance across subgroups.

All-in-all

All of this and more (i.e., evaluation and analysis of FR systems on BFW data, along with data structures and implementation schemes optimized for the problems at hand, are included in modules making up the project and demonstrated in notebook tutorials). We will continue to add tools for a fair analysis of FR systems. Thus, not only the experiments but also the data we expect to grow. All contributions are not only welcome but are entirely encouraged.

Here are quick links to key aspects of this resource.

Register and download via this form.

Final note. Thee repo is a work-in-progress. Certainly, it is ready to be cloned and used; however, expect regular improvements, both in the implementation and documentation (i.e., getting started instructions will be enhanced). For now, it is recommended to begin with README files listed just above, along with the tutorial notebooks found in code-> notebooks with brief descriptions in README and more detail inline of each notebook. Again, PRs are more than welcome :)

Paper abstract

We reveal critical insights into bias problems in state-of-the-art facial recognition (FR) systems using a novel Balanced Faces In the Wild (BFW) dataset: data balanced for gender and ethnic groups. We show variations in the optimal scoring threshold for face pairs across different subgroups. Thus, the conventional approach of learning a global threshold for all pairs results in performance gaps between subgroups. By learning subgroup-specific thresholds, we reduce performance gaps and show a notable boost in overall performance. Furthermore, we do a human evaluation to measure human bias, which supports the hypothesis that an analogous bias exists in human perception. For the BFW database, source code, and more, visit https://github.com/visionjo/facerec-bias-bfw.

To Do

  • Begin Template
  • Create demo notebooks
  • Add manuscript
  • Documentation (sphinx)
  • Update README (this)
  • Pre-commit, formatter (Black) and .gitignore
  • Complete test harness
  • Modulate (refactor) code
  • Complete datatable (i.e., extend pandas.DataFrame)
  • Add scripts and CLI

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code without warranty, so long as you provide attribution to the authors. See LICENSE.md (LICENSE) for the full license text.

The manuscript text is not open source. The authors reserve the rights to the article content, which is currently submitted for publication in the 2020 IEEE Conference on AMFG.

Acknowledgement

We would like to thank the PINGA organization on Github for the project template used to structure this project.

Comments
  • About the MTCNN face detections and preprocessing

    About the MTCNN face detections and preprocessing

    Hi,

    It would be great if you could clarify a few questions regarding this dataset please.

    1. Is it possible for you to provide the MTCNN output face detections (bounding boxes and facial landmarks) for the face samples in BFW?

    2. Am I right in assuming MTCNN takes as input the images in "face-samples" folder of the dataset? If yes, what settings do we use with MTCNN in order for us to detect a face correctly on all the facial images provided in face-samples? If not, can you help us reproduce your face detection results by providing us with the original images on which the MTCNN was run to obtain the results in face-samples?

    3. Are the images in facial-samples actually crops which are aligned?

    Thanks in advance for your help.

    opened by manisoftwartist 2
  • Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    3 Figures based on the paper "Face Recognition: Too Bias, or Not Too Bias" are

    1. DET curves: FPR versus FNR by moving threshold
    2. Score distributions for genuine and imposter using violin plots
    3. Confusion matrix for Rank 1 and any Rank.
    opened by suchanv 2
  • Devel

    Devel

    Develop branch-- prepare for next version release.

    Aim for the following for version 0.1.1:

    • [ ] Notebook is updated to use interface recently modulated (#21)
    • [ ] Update Documentation to explain steps to run (#21)
      • [ ] add to README in root
      • [x] move results from README in root to the README in results/
      • [x] move data section from README in root to the README in data/
        • [x] save curve data along with PDF (i.e., in results/)
      • [ ] Add simple (brief) docstring where missing)
      • [ ] A sample (toy) set is run end-to-end (demonstrate in README)
        • [ ] if small enough, add to repo (i.e., < 40 MB or so)
    • [ ] Finish script to generate Tar @ Far table
    • [ ] Improve annotation in notebooks; more description, i.e., tutorial-like.
    • [ ] create pdf versions of notebooks and add to project in notebooks/pdfs (or create nbviewer and point to it)
    • [ ] add assertions (and tests) where appropriate-- at least critical cases, such a specific type is expected.
    • [ ] Consider moving some of the analysis functions to visualizations.
      • [ ] modulate the handling of plt.axes objects
      • [ ] add optional input arguments for the title and other figure cosmetics or settings
    • [x] Add benchmarks for sphereface features. Make these the results showcased throughout.
    documentation enhancement Benchmark Project-level 
    opened by visionjo 1
  • Questions on verification_RFW and training procedure

    Questions on verification_RFW and training procedure

    Hi, Thanks for your great work and sharing of the code on these two papers ! It takes me days to read the paper and go through the repository and I have a few questions:

    (2) Do you have the code for training the features (asian_females, asian_males, black_females, black_males, indian_females, indian_males,...). Since I have a hard time finding something like train.py (e.g. the loss function and training process). (I suppose the released code is mainly on image pre-processing and result analysis) (Since BFW dataset is not as large as other face dataset and it may possible for me to train it from scratch on one GPU)

    (3) I am little confused about how the BFW is used in two papers, as I understand:

    in paper Face Recognition: Too Bias, or Not Too Bias? , the train and test model are as follows: train: CASIA_webface trained using Sphereface loss test: LFW where does BFW dataset not used in training in this set of experiments?

    in paper Balancing Biases and Preserving Privacy on Balanced Faces in the Wild the train, test model are as follows: tain: (1) MS1M trained using Arcface loss --> to get 512-dim embedding (f_in in Fig.6) (2) BFW dataset is used to train the encoder and two classifiers in Fig 6 test: 4-folds used for training and 1-fold used for testing (using the best threshold chosen)

    is that right?

    (4) There are some difference from "bfw-v0.1.5-datatable.csv" and the TABLE-2 in paper 2: for example: there are 921379 records in TABLE-2 while ther are 923898 records from the csv file? and there is no "{dir_meta}thresholds.pkl" file.

    Thanks for your time and any help would be appreciated !

    opened by lizhenstat 7
  • Regarding face identification

    Regarding face identification

    Hey,

    Thanks for the awesome work!

    I wanted to know how I can modify the repo to use for face identification task instead of verification.

    Any help would be highly appreciated.

    opened by shivmgg 1
  • Sphinx documentation

    Sphinx documentation

    Setup the project for sphinx.

    Include clear instruction on how to maintain (i.e., once in place, we'll include as part of the build process (see in docs/)

    Setup for tutorials on the different concepts and experiments done as part of this line of work (i.e., facial bias and BFW database)

    documentation enhancement 
    opened by visionjo 0
  • Create plan for Dash interface

    Create plan for Dash interface

    Project plan (lead: Dylan; support: Rohan):

    • [ ] what features to include
    • [ ] Specifications
    • [ ] Interface layout (use lucidchart or equivalent)
    • [ ] Division of tasks and proposed timeline
    Plan and design Project-level 
    opened by visionjo 0
Releases(v0.0.3)
Owner
Joseph P. Robinson
Ph.D., Northeastern, 2020. Focus: applied machine learning, mostly vision. At Vicarious Surgical's ASDAI group, an AI Engineer working on our surgical robot.
Joseph P. Robinson
Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019)

Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019) Introduction Official implementation of Dynamic Multi-scale Filters for Semant

23 Oct 21, 2022
This is the latest version of the PULP SDK

PULP-SDK This is the latest version of the PULP SDK, which is under active development. The previous (now legacy) version, which is no longer supporte

78 Dec 07, 2022
INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing Existing studies on semantic parsing focus primarily on mapping a natural-la

7 Aug 22, 2022
TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

TorchFlare TorchFlare is a simple, beginner-friendly and an easy-to-use PyTorch Framework train your models without much effort. It provides an almost

Atharva Phatak 85 Dec 26, 2022
AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)

AOT-GAN for High-Resolution Image Inpainting Arxiv Paper | AOT-GAN: Aggregated Contextual Transformations for High-Resolution Image Inpainting Yanhong

Multimedia Research 214 Jan 03, 2023
Code for our paper "Sematic Representation for Dialogue Modeling" in ACL2021

AMR-Dialogue An implementation for paper "Semantic Representation for Dialogue Modeling". You may find our paper here. Requirements python 3.6 pytorch

xfbai 45 Dec 26, 2022
PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Quasi-Recurrent Neural Network (QRNN) for PyTorch Updated to support multi-GPU environments via DataParallel - see the the multigpu_dataparallel.py ex

Salesforce 1.3k Dec 28, 2022
Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation [Project website] [Paper] This project is a PyTorch i

Cognitive Learning for Vision and Robotics (CLVR) lab @ USC 6 Feb 28, 2022
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
The spiritual successor to knockknock for PyTorch Lightning, get notified when your training ends

Who's there? The spiritual successor to knockknock for PyTorch Lightning, to get a notification when your training is complete or when it crashes duri

twsl 70 Oct 06, 2022
The implementation of FOLD-R++ algorithm

FOLD-R-PP The implementation of FOLD-R++ algorithm. The target of FOLD-R++ algorithm is to learn an answer set program for a classification task. Inst

13 Dec 23, 2022
[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".

SMYRF: Efficient attention using asymmetric clustering Get started: Abstract We propose a novel type of balanced clustering algorithm to approximate a

Giannis Daras 46 Dec 22, 2022
This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Mask-RCNN on Mycobacterium tuberculosis This is an example of object detection on Mycobacterium Tuberculosis using Mask RCNN. Implement of Mask R-CNN

Jun-En Ding 1 Sep 16, 2021
This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Prompt-Based Multi-Modal Image Segmentation This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation". The sys

Timo Lüddecke 305 Dec 30, 2022
Mini-hmc-jax - A simple implementation of Hamiltonian Monte Carlo in JAX

mini-hmc-jax This is a simple implementation of Hamiltonian Monte Carlo in JAX t

Martin Marek 6 Mar 03, 2022
Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of images as "pixels"

picinpics Script that receives an Image (original) and a set of images to be used as "pixels" in reconstruction of the Original image using the set of

RodrigoCMoraes 1 Oct 24, 2021
Cards Against Humanity AI

cah-ai This is a Cards Against Humanity AI implemented using a pre-trained Semantic Search model. How it works A player is described by a combination

Alex Nichol 2 Aug 22, 2022
Neural network pruning for finding a sparse computational model for controlling a biological motor task.

MothPruning Scientific Overview Originally inspired by biological nervous systems, deep neural networks (DNNs) are powerful computational tools for mo

Olivia Thomas 0 Dec 14, 2022
A Python package for performing pore network modeling of porous media

Overview of OpenPNM OpenPNM is a comprehensive framework for performing pore network simulations of porous materials. More Information For more detail

PMEAL 336 Dec 30, 2022
Official PyTorch implementation and pretrained models of the paper Self-Supervised Classification Network

Self-Classifier: Self-Supervised Classification Network Official PyTorch implementation and pretrained models of the paper Self-Supervised Classificat

Elad Amrani 24 Dec 21, 2022