Self-Supervised Speech Pre-training and Representation Learning Toolkit.

Last update: Jan 08, 2023

Overview

What's New

Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site for the challenge details and the SUPERB documentation in this toolkit!
Aug 2021: We now have a tutorial that introduces our toolkit, you can watch it on Youtube!
July 2021: We are now working on packaging s3prl and reorganizing the file structure in v0.3. Please consider using the stable v0.2.0 for now. We will test and release v0.3 before August.
June 2021: Support SUPERB: Speech processing Universal PERformance Benchmark, submitted to Interspeech 2021. Use the tag superb-interspeech2021 or v0.2.0.
June 2021: Support extracting multiple hidden states from the SSL pretrained models
Jan 2021: Readme updated with detailed instructions on how to use our latest version!
Dec 2020: We are migrating to a newer version for a more general, flexible, and scalable code. See the introduction below for more information! The legacy version can be accessed the tag v0.1.0.

Introduction and Usages

This is an open source toolkit called s3prl, which stands for Self-Supervised Speech Pre-training and Representation Learning. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.

The toolkit has three major usages:

Pretrain

Pretrain upstream models, including Mockingjay, Audio ALBERT and TERA.
Document: pretrain/README.md

Upstream

Easily load most of the existing upstream models with pretrained weights in a unified I/O interface.
Pretrained models are registered through torch.hub, which means you can use these models in your own project by one-line plug-and-play without depending on this toolkit's coding style.
Document: upstream/README.md

Downstream

Utilize upstream models in lots of downstream tasks
Benchmark upstream models with SUPERB Benchmark
Document: downstream/README.md

Below is an intuitive illustration on how this toolkit may help you:

Feel free to use or modify our toolkit in your research. Here is a list of papers using our toolkit. Any question, bug report or improvement suggestion is welcome through opening up a new issue.

If you find this toolkit helpful to your research, please do consider citing our papers, thanks!

Installation

Python >= 3.6
Install sox on your OS
Install s3prl

pip install -e ./

Install the specific fairseq

pip install [email protected]+https://github.com//pytorch/[email protected]#egg=fairseq

Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the README.md under each upstream folder. E.g., upstream/pase/README.md

Development pattern for contributors

Create a personal fork of the main S3PRL repository in GitHub.
Make your changes in a named branch different from master, e.g. you create a branch new-awesome-feature.
Contact us if you have any questions during development.
Generate a pull request through the Web interface of GitHub.
Please verify that your code is free of basic mistakes, we appreciate any contribution!

Reference Repositories

Pytorch, Pytorch.
Audio, Pytorch.
Kaldi, Kaldi-ASR.
Transformers, Hugging Face.
PyTorch-Kaldi, Mirco Ravanelli.
fairseq, Facebook AI Research.
CPC, Facebook AI Research.
APC, Yu-An Chung.
VQ-APC, Yu-An Chung.
NPC, Alexander-H-Liu.
End-to-end-ASR-Pytorch, Alexander-H-Liu
Mockingjay, Andy T. Liu.
ESPnet, Shinji Watanabe
speech-representations, aws lab
PASE, Santiago Pascual and Mirco Ravanelli
LibriMix, Joris Cosentino and Manuel Pariente

License

The majority of S3PRL Toolkit is licensed under CC-BY-NC, however portions of the project are available under separate license terms: S3PRL is licensed under the MIT license.

Used by

List of papers that used our toolkit (Feel free to add your own paper by making a pull request)

Self-Supervised Pretraining

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders (Liu et al., 2020)

@article{mockingjay,
   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
   ISBN={9781509066315},
   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
   DOI={10.1109/icassp40776.2020.9054458},
   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   publisher={IEEE},
   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
   year={2020},
   month={May}
}

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech (Liu et al., 2020)

@misc{tera,
    title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
    author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
    year={2020},
    eprint={2007.06028},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation (Chi et al., 2020)

@inproceedings{audio_albert,
    title={Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation},
    author={Po-Han Chi and Pei-Hung Chung and Tsung-Han Wu and Chun-Cheng Hsieh and Shang-Wen Li and Hung-yi Lee},
    year={2020},
    booktitle={SLT 2020},
}

Explanability

Understanding Self-Attention of Self-Supervised Audio Transformers (Yang et al., 2020)

@inproceedings{understanding_sat,
    author={Shu-wen Yang and Andy T. Liu and Hung-yi Lee},
    title={{Understanding Self-Attention of Self-Supervised Audio Transformers}},
    year=2020,
    booktitle={Proc. Interspeech 2020},
    pages={3785--3789},
    doi={10.21437/Interspeech.2020-2231},
    url={http://dx.doi.org/10.21437/Interspeech.2020-2231}
}

Adversarial Attack

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning (Wu et al., 2020), code for computing LNSR: utility/observe_lnsr.py

@inproceedings{mockingjay_defense,
    author={Haibin Wu and Andy T. Liu and Hung-yi Lee},
    title={{Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning}},
    year=2020,
    booktitle={Proc. Interspeech 2020},
    pages={3780--3784},
    doi={10.21437/Interspeech.2020-2026},
    url={http://dx.doi.org/10.21437/Interspeech.2020-2026}
}

Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models (Wu et al., 2021)

@misc{asv_ssl,
    title={Adversarial defense for automatic speaker verification by cascaded self-supervised learning models}, 
    author={Haibin Wu and Xu Li and Andy T. Liu and Zhiyong Wu and Helen Meng and Hung-yi Lee},
    year={2021},
    eprint={2102.07047},
    archivePrefix={arXiv},
    primaryClass={eess.AS}

Voice Conversion

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations (Lin et al., 2021)

@misc{s2vc,
      title={S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations}, 
      author={Jheng-hao Lin and Yist Y. Lin and Chung-Ming Chien and Hung-yi Lee},
      year={2021},
      eprint={2104.02901},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Benchmark and Evaluation

SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)

@misc{superb,
      title={SUPERB: Speech processing Universal PERformance Benchmark}, 
      author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
      year={2021},
      eprint={2105.01051},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)

@misc{ssr_mos,
    title={Utilizing Self-supervised Representations for MOS Prediction}, 
    author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},
    year={2021},
    eprint={2104.03017},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

}

Citation

If you find this toolkit useful, please consider citing following papers.

If you use our pre-training scripts, or the downstream tasks considered in TERA and Mockingjay, please consider citing the following:

@misc{tera,
  title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
  author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
  year={2020},
  eprint={2007.06028},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

@article{mockingjay,
   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
   ISBN={9781509066315},
   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
   DOI={10.1109/icassp40776.2020.9054458},
   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   publisher={IEEE},
   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
   year={2020},
   month={May}
}

If you use our organized upstream interface and features, or the SUPERB downstream benchmark, please consider citing the following:

@inproceedings{yang21c_interspeech,
  author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
  title={{SUPERB: Speech Processing Universal PERformance Benchmark}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1194--1198},
  doi={10.21437/Interspeech.2021-1775}
}

Comments

module 'hub' has no attribute 'mockingjay_local'

Hello. I am trying to run the Mockingjay downstream task using this command python run_downstream.py -m train -u mockingjay_local -k '<path to .ckpt>' -d phone_linear -n mockingjayDown. on an HPC. I am getting the following error:

  File "run_downstream.py", line 225, in <module>
    main()
  File "run_downstream.py", line 220, in main
    runner = Runner(args, config)
  File "<path>/s3prl/downstream/runner.py", line 103, in __init__
    self.upstream = self._get_upstream()
  File "<path>/s3prl/downstream/runner.py", line 143, in _get_upstream
    Upstream = getattr(hub, self.args.upstream)
AttributeError: module 'hub' has no attribute 'mockingjay_local'

Please let me know how to resolve the issue or if I need to provide more details. Thanks!

opened by MiPlayer123 20

Speaker Diarization Scoring
Add NIST scoring for standard diarization error rate (der)

The results on three models (upstream + downstream):

baseline(fbank) + rnn 7.03

apc + rnn 7.20

wav2vec2 + rnn 4.36
opened by ftshijt 20
There are tasks that ESPNET does with S3PRL that fail

File "/media/shiyanshi/E/espnet/espnet2/asr/frontend/s3prl.py", line 26, in init import s3prl.nn ModuleNotFoundError: No module named 's3prl.nn' Error: S3PRL is not properly installed. Please install S3PRL: cd ${MAIN_ROOT}/tools && make s3prl.done

But S3PRL is successfully installed and can also be imported successfully in the terminal，How do I fix it?
enhancement

opened by abcdbosh 18
Upstream request: wavLM

I see WavLM now topped all of the SUPERB tasks (10 tasks). So, I would like to request to add this audio embedding to upstream.

Paper: https://arxiv.org/pdf/2110.13900.pdf Code/Model: https://github.com/microsoft/unilm/tree/master/wavlm

Currently, only base and base+ models are available; the large version will be added soon.

opened by bagustris 16
The model rewrite in config is not reflected

Hi, thank you for a great repository!

I'm running a downstream task in ER. I wanted to change the neural network CNNselfAttention to FCN, so I ran the following, but the network doesn't seem to have changed. It is reflected in the config*.yaml in /result/downstream/ExpName.　But the training results are the same as the default (CNNSelfattention)

・The code I ran python3 run_downstream.py -n ExpName -m train -u fbank -d emotion -c downstream/emotion/config.yaml -o "config.downstream_expert.modelrc.DeepModel.model_type='FCN'"

Excuse me, how can I change this to FCN?

opened by miyazakieiji 16
Why is such a large memory cost on gpu

Hello! I was tring to run an experiment of "Hubert + PR" using single gpu. I have noticed it that the task cost nearly 40+G memory on gpu when I start training. After training for some time, it has reported "cuda out of memory" and I have to stop the task. I encountered similar situation when I run the experiment of "Wavlm + ASR", which cost about 30G memory. Such a large memory cost didn't appear in other downstream tasks such as KS, IC. I ran all the experiments with a default config.yaml. So why does the task use so much memory? Is it normal?

opened by TCL606 15
Error while loading finetuned wav2vec 2.0 large

Hi, As per the ppt I try to load wav2vec2 with the following code and get the following error:

upstream = torch.hub.load("s3prl/s3prl",'wav2vec2_url',ckpt = 'https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_new.pt') Using cache found in /home/sreyan/.cache/torch/hub/s3prl_s3prl_master Using cache found in /home/sreyan/.cache/torch/hub/s3prl_cache/1c76d6e88090f01736036b28dc995fef583f47f42662d55286332557f957609f for https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_new.pt Traceback (most recent call last): File "", line 1, in File "/home/sreyan/.conda/envs/semeval/lib/python3.7/site-packages/torch/hub.py", line 370, in load model = _load_local(repo_or_dir, model, *args, **kwargs) File "/home/sreyan/.conda/envs/semeval/lib/python3.7/site-packages/torch/hub.py", line 399, in _load_local model = entry(*args, **kwargs) File "/home/sreyan/.cache/torch/hub/s3prl_s3prl_master/upstream/wav2vec2/hubconf.py", line 23, in wav2vec2_url return wav2vec2_local(_urls_to_filepaths(ckpt, refresh=refresh), *args, **kwargs) File "/home/sreyan/.cache/torch/hub/s3prl_s3prl_master/upstream/wav2vec2/hubconf.py", line 14, in wav2vec2_local return _UpstreamExpert(ckpt, *args, **kwargs) File "/home/sreyan/.cache/torch/hub/s3prl_s3prl_master/upstream/wav2vec2/expert.py", line 24, in init model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt]) File "/home/sreyan/fairseq/fairseq/checkpoint_utils.py", line 339, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "/home/sreyan/fairseq/fairseq/checkpoint_utils.py", line 273, in load_checkpoint_to_cpu state = _upgrade_state_dict(state) File "/home/sreyan/fairseq/fairseq/checkpoint_utils.py", line 550, in _upgrade_state_dict state["cfg"] = convert_namespace_to_omegaconf(state["args"]) File "/home/sreyan/fairseq/fairseq/dataclass/utils.py", line 351, in convert_namespace_to_omegaconf with initialize(config_path=config_path): AttributeError: enter

I would want to finetune finetuned wav2vec 2.0 on speech sentiment task. Any help would be highly appreciated.

opened by Sreyan88 14
about distilhubert

when I run "python run_pretrain.py -u distiller -g pretrain/distiller/config_model.yaml -n distilhubert";

I got error " File "/home/wangsiyuan/kaldi-wavlm/s3prl-test/s3prl/pretrain/distiller/pretrain_expert.py", line 278, in forward teacher_hiddens = torch.stack(teacher_hiddens, dim=1) # B x N x T x D RuntimeError: stack expects each tensor to be equal size, but got [18, 302, 768] at entry 0 and [18, 301, 768] at entry 1"

Tests have shown that,The teacher model has 12 blocks, the 12th block is one frame away from the other blocks；

After Padding，another error occur , the compute loss denote that student model output is one frame away from the output of teacher model........

Other error: when I use multi GPU ,I got "IndexError: Caught IndexError in replica 0 on device 0." I use torch 1.9.0 or 1.10.1 +cu111,can not fix it

opened by c976237222 13
Integrate Hugging Face Hub & add Docker image
This PR implements two main features:

Integration with the 🤗 Hub for downstream fine-tuning.

The --hub flag allows users to pick any (suitable) upstream model from the PyTorch or 🤗 Hubs, while the --push_to_hf_hub flag pushes all the artifacts from fine-tuning to the 🤗 Hub for inference / evaluation.

A fine-tuning run with these flags looks like:

python run_downstream.py -n exp_dir -m train -u ${upstream_model} -d ${downstream_task} --hub huggingface --push_to_hf_hub True

Upstream models on the 🤗 Hub require an expert.py interface to be defined and you can find an example here.

Downstream models are automatically wrapped in a model.py file that defines the interface for inference and you can find an example here. By default we use the *best*.ckpt checkpoint for inference / evaluation and fall back to the final checkpoint if a "best" one is not produced during training.

By storing all the artifacts, we can visualize the Tensorboard logs and reproduce training runs if needed from the args_*.yaml and config_*.yaml files.

Update: the tensorboard logs are only visible for public repos and by default we create a private repo (in case participants don't want to share their fine-tuned models with everyone). The participant can view the logs by simply making their repo public if they wish

A Docker image for downstream fine-tuning

This builds on the above Hub integration and should be runnable on any infra that has the NVIDIA Container Toolkit installed. See the downstream README for more details on how to build the image / run it. Once this PR is merged, an interesting exercise will be to see if you can run the Docker container on your own infra 😃

Miscellaneous

We have also included some changes to:

The downstream README

The ASR and SD modules now include a template folder for the 🤗 Hub interfaces

cc @leo19941227
opened by lewtun 13
train downstream ASR using own upstream

Hi, I want to use the pertained model for downstream ASR task, however in the s3prl/downstream/asr/feat/ directory, there is no config file, is the ASR task properly configured? Thanks.

opened by zyzpower 13
(WIP) a better version of enhancement and separation downstream
Hi @leo19941227 , I am making the pull request for a better version of enhancement and separation downstream. In this pull request, I

Add two new configs which have a much smaller model size and better performance

Made some small changes to the code, including (1) modifying the loss function, supporting L1 loss, and computing loss in log domain (for smaller input scale and more stable training) (2) removing the original postprocess function. Originally, I found there are some issues when I am using librosa.istft, and I am using the postprocess function to remove the impulse at the end of the signal. Now, I have found a better way to deal with this issue.
opened by HuangZiliAndy 12
Is there no vq_apc local in s3prl?

Hi, I pre-trained the vq_apc model for comparison, but when I tried to extract the feature representation of vq_apc, it failed.

upstream=getattr(hub, 'vq_apc_local')('result/pretrain/vq_apc/states-epoch-50.ckpt')

Can you add vq_apc_local?

opened by kaen2891 0
SID task loss function.

ASV and SID tasks are very similar and yet have different loss functions. ASV has AMsoftmax, and SID has softmax loss function, respectively.

Why was this choice made? Furthermore, changing the loss function is acceptable or not?

opened by raotnameh 1
Bump setuptools from 59.5.0 to 65.5.1 in /requirements
Bumps setuptools from 59.5.0 to 65.5.1.

Release notes

Sourced from setuptools's releases.

v65.5.1

No release notes provided.

v65.5.0

No release notes provided.

v65.4.1

No release notes provided.

v65.4.0

No release notes provided.

v65.3.0

No release notes provided.

v65.2.0

No release notes provided.

v65.1.1

No release notes provided.

v65.1.0

No release notes provided.

v65.0.2

No release notes provided.

v65.0.1

No release notes provided.

v65.0.0

No release notes provided.

v64.0.3

No release notes provided.

v64.0.2

No release notes provided.

v64.0.1

No release notes provided.

v64.0.0

No release notes provided.

v63.4.3

No release notes provided.

v63.4.2

No release notes provided.

... (truncated)

Changelog

Sourced from setuptools's changelog.

v65.5.1

Misc ^^^^

#3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok

#3659: Fixed REDoS vector in package_index.

v65.5.0

Changes ^^^^^^^

#3624: Fixed editable install for multi-module/no-package src-layout projects.

#3626: Minor refactorings to support distutils using stdlib logging module.

Documentation changes ^^^^^^^^^^^^^^^^^^^^^

#3419: Updated the example version numbers to be compliant with PEP-440 on the "Specifying Your Project’s Version" page of the user guide.

Misc ^^^^

#3569: Improved information about conflicting entries in the current working directory and editable install (in documentation and as an informational warning).

#3576: Updated version of validate_pyproject.

v65.4.1

Misc ^^^^

#3613: Fixed encoding errors in expand.StaticModule when system default encoding doesn't match expectations for source files.

#3617: Merge with pypa/[email protected] including fix for pypa/distutils#181.

v65.4.0

Changes ^^^^^^^

#3609: Merge with pypa/[email protected] including support for DIST_EXTRA_CONFIG in pypa/distutils#177.

v65.3.0

... (truncated)

Commits

a462cb5 Bump version: 65.5.0 → 65.5.1

de35d8b Merge pull request #3656 from bmorris3/typos

58e23de Update changelog. Ref #3659.

43a9c9b Limit the amount of whitespace to search/backtrack. Fixes #3659.

5791343 Add test capturing failed expectation. Ref #3659.

1f97905 ⚫ Fade to black.

6254567 Remove workaround for emacs.

729b180 ⚫ Fade to black.

c068081 Typo corrections

f777a40 Suppress deprecation warning in --rsyncdir. Workaround for #3655.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
unspecifed upstream models

hello there are several unspecified upstream models in s3prl hub like: passt_base ssast_frame_base wav2vec2_base_s2st_en_librilight wav2vec2_conformer_large_s2st_en_librilight ,... can you provide an explanation for these models? is there a place for all the upstream models details?

opened by marziye-A 0
ContentVec support
Paper: ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers Code: https://github.com/auspicious3000/contentvec

Models: (the default is contentvec_km100)

contentvec_km100

contentvec_km500

The model architecture is identical to HuBERT Base, so only s3prl/upstream/hubert/hubconf.py is modified.
opened by vectominist 0

Releases(v0.3.4)

v0.3.4(May 27, 2022)

Emergent fix for an installation bug
Source code(tar.gz)
Source code(zip)
s3prl-0.3.4.tar.gz(284.90 KB)
v0.3.3(May 23, 2022)
Add new models: discretebert, lighthubert, data2vec

upgrade to use the latest fairseq

Source code(tar.gz)
Source code(zip)
s3prl-0.3.3.tar.gz(285.94 KB)

Owner

s3prl

The Self-Supervised Speech Pre-training and Representation Learning Toolkit Development Team

GitHub Repository https://youtu.be/PkMFnS6cjAc

Spatial Transformer Nets in TensorFlow/ TensorLayer

MOVED TO HERE Spatial Transformer Networks Spatial Transformer Networks (STN) is a dynamic mechanism that produces transformations of input images (or

36 Nov 23, 2022

Python package for missing-data imputation with deep learning

MIDASpy Overview MIDASpy is a Python package for multiply imputing missing data using deep learning methods. The MIDASpy algorithm offers significant

77 Dec 03, 2022

This is the code for the paper "Motion-Focused Contrastive Learning of Video Representations" (ICCV'21).

Motion-Focused Contrastive Learning of Video Representations Introduction This is the code for the paper "Motion-Focused Contrastive Learning of Video

11 Sep 23, 2022

(3DV 2021 Oral) Filtering by Cluster Consistency for Large-Scale Multi-Image Matching

Scalable Cluster-Consistency Statistics for Robust Multi-Object Matching (3DV 2021 Oral Presentation) Filtering by Cluster Consistency (FCC) is a very

11 Sep 28, 2022

Scenic: A Jax Library for Computer Vision and Beyond

Scenic Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop c

1.6k Dec 27, 2022

A working implementation of the Categorical DQN (Distributional RL).

Categorical DQN. Implementation of the Categorical DQN as described in A distributional Perspective on Reinforcement Learning. Thanks to @tudor-berari

98 Sep 20, 2022

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding Website • Colab • Paper This repository contains code and links to pre-trained mod

770 Dec 28, 2022

This is the code used in the paper "Entity Embeddings of Categorical Variables".

This is the code used in the paper "Entity Embeddings of Categorical Variables". If you want to get the original version of the code used for the Kagg

845 Nov 29, 2022

Simple implementation of Mobile-Former on Pytorch

Simple-implementation-of-Mobile-Former At present, only the model but no trained. There may be some bug in the code, and some details may be different

103 Dec 31, 2022

A large dataset of 100k Google Satellite and matching Map images, resembling pix2pix's Google Maps dataset.

Larger Google Sat2Map dataset This dataset extends the aerial ⟷ Maps dataset used in pix2pix (Isola et al., CVPR17). The provide script download_sat2m

34 Dec 28, 2022

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

189 Dec 21, 2022

Convert dog pictures into various painting styles. Try LimnPet

LimnPet Cartoon stylization service project Try our service » Home page · Team notion · Members 목차 프로젝트 소개 프로젝트 목표 사용한 기술스택과 수행도구 팀원 구현 기능 주요 기능 추가 기능

7 Jul 14, 2022

TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

TLoL-py - League of Legends Deep Learning Library TLoL-py is the Python component of the TLoL League of Legends deep learning library. It provides a s

7 Nov 29, 2022

A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

188 Dec 29, 2022

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

121 Dec 17, 2022

Official implementation of particle-based models (GNS and DPI-Net) on the Physion dataset.

Physion: Evaluating Physical Prediction from Vision in Humans and Machines [paper] Daniel M. Bear, Elias Wang, Damian Mrowca, Felix J. Binder, Hsiao-Y

18 Dec 19, 2022

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction. NeurIPS 2021.

59 Nov 25, 2022

In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy

PixMix Introduction In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard te

79 Dec 30, 2022

DANA paper supplementary materials

DANA Supplements This repository stores the data, results, and R scripts to generate these reuslts and figures for the corresponding paper Depth Norma

0 Dec 17, 2021

Tensor-Based Quantum Machine Learning

TensorLy_Quantum TensorLy-Quantum is a Python library for Tensor-Based Quantum Machine Learning that builds on top of TensorLy and PyTorch. Website: h

85 Dec 03, 2022

Self-Supervised Speech Pre-training and Representation Learning Toolkit.

Related tags