Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Last update: Nov 01, 2022

Related tags

Overview

Introduction

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models".

In this work, we demonstrate that existing self-supervised speech model such as HuBERT, wav2vec 2.0, CPC and TERA are vulnerable to membership inference attack (MIA) and thus could reveal sensitive informations related to the training data.

Requirements

Python >= 3.6
Install sox on your OS
Install s3prl on your OS

git clone https://github.com/s3prl/s3prl
cd s3prl
pip install -e ./

Install the specific fairseq

pip install [email protected]+https://github.com//pytorch/[email protected]#egg=fairseq

Preprocessing

First, extract the self-supervised feature of utterances in each corpus according to your needs.

Currently, only LibriSpeech is available.

BASE_PATH=/path/of/the/corpus
OUTPUT_PATH=/path/to/save/feature
MODEL=wav2vec2
SPLIT=train-clean-100 # you should extract train-clean-100, dev-clean, dev-other, test-clean, test-other

python preprocess_feature_LibriSpeech.py \
    --base_path $BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --split $SPLIT

Speaker-level MIA

After extracting the features, you can apply the attack against the models using either basic attack and improved attack.

Noted that you should run the basic attack to generate the .csv file with similarity scores before performing improved attack.

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-speaker-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-speaker-level-attack-similarity.csv"

python customized-speaker-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-speaker-similarity-model-${MODEL}.pt"

Utterance-level MIA

The process for utterance-level MIA is similar to that of speaker-level:

Basic Attack

SEEN_BASE_PATH=/path/you/save/feature/of/seen/corpus
UNSEEN_BASE_PATH=/path/you/save/feature/of/unseen/corpus
OUTPUT_PATH=/path/to/output/results
MODEL=wav2vec2

python predefined-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \

Improved Attack

python train-utterance-level-similarity-model.py \
    --seen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --speaker_list "${OUTPUT_PATH}/${MODEL}-customized-utterance-level-attack-similarity.csv"

python customized-utterance-level-MIA.py \
    --seen_base_path $SEEN_BATH_PATH \
    --unseen_base_path $UNSEEN_BATH_PATH \
    --output_path $OUTPUT_PATH \
    --model $MODEL \
    --similarity_model_path "${OUTPUT_PATH}/customized-utterance-similarity-model-${MODEL}.pt"

Citation

If you find our work useful, please cite:

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Related tags

Overview

Introduction

Requirements

Preprocessing

Speaker-level MIA

Basic Attack

Improved Attack

Utterance-level MIA

Basic Attack

Improved Attack

Citation

Owner

Wei-Cheng Tseng

[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.

Code repository for Self-supervised Structure-sensitive Learning, CVPR'17

A trashy useless Latin programming language written in python.

The end-to-end platform for building voice products at scale

A curated list of programmatic weak supervision papers and resources

This is an example of a reproducible modelling project

House_prices_kaggle - Predict sales prices and practice feature engineering, RFs, and gradient boosting

Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals.

Predictive AI layer for existing databases.

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

Artificial Intelligence playing minesweeper 🤖

Dilated Convolution with Learnable Spacings PyTorch

A public available dataset for road boundary detection in aerial images

Image Segmentation using U-Net, U-Net with skip connections and M-Net architectures

Unsupervised Pre-training for Person Re-identification (LUPerson)

YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

Machine learning for NeuroImaging in Python

Tools for robust generative diffeomorphic slice to volume reconstruction

An adaptive hierarchical energy management strategy for hybrid electric vehicles

AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning