xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Overview

PyPI PyPI - License Documentation Status CircleCI PRs Welcome

Description

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Getting started

The full documentation contains instructions for getting started, deep dives and tutorials about the various APIs. If in doubt, please check out the HOWTO. Only some general considerations are laid out in the README.

Installation

To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through python-virtualenv or conda for instance. There are two ways you can install it:

Directly from the pip package

You can also fetch the latest release from PyPi. This will not contain the wheels for the sparse attention kernels, for which you will need to build from source.

conda create --name xformer_env
conda activate xformer_env
pip install xformers

Build from source (dev mode)

These commands will fetch the latest version of the code, create a dedicated conda environment, activate it then install xFormers from source. If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.

git clone [email protected]:fairinternal/xformers.git
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .

Sparse attention kernels

Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run pip install -e . and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via python3 setup.py clean && python3 setup.py develop, so similarly wipe the build folder and redo a pip install -e.

Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:

  • NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with module unload cuda module load cuda/xx.x, possibly also nvcc
  • the version of GCC that you're using matches the current NVCC capabilities
  • the TORCH_CUDA_ARCH_LIST env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"

Triton

Some parts of xFormers use Triton, and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running pip install triton. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance python3 xformers/benchmarks/benchmnark_triton_softmax.py

Triton will cache the compiled kernels to /tmp/triton by default. If this becomes an issue, this path can be specified through the TRITON_CACHE_DIR environment variable.

Testing the installation

This will run a benchmark of the attention mechanisms exposed by xFormers, and generate a runtime and memory plot. If this concludes without errors, the installation is successful. This step is optional, and you will need some extra dependencies for it to be able to go through : pip install -r requirements-benchmark.txt.

Once this is done, you can run this particular benchmark as follows:

python3 xformers/benchmarks/benchmark_encoder.py --activations relu  --plot -emb 256 -bs 32 -heads 16

Using xFormers

Transformers key concepts

Let's start from a classical overview of the Transformer architecture (illustration from Lin et al,, "A Survey of Transformers")

You'll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of attention mechanisms, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm). These boundaries do not work for all models, but we found in practice that given some accomodations it could capture most of the state of the art.

Models are thus not implemented in monolithic files, which are typically complicated to handle and modify. Most of the concepts present in the above illustration correspond to an abstraction level, and when variants are present for a given sub-block it should always be possible to select any of them. You can focus on a given encapsulation level and modify it as needed.

Repo map

├── components                  # Parts zoo, any of which can be used directly
│   └── attention
│        └ ...                  # all the supported attentions
│   └── feedforward             #
│        └ ...                  # all the supported feedforwards
│   └─- positional_embedding    #
│        └ ...                  # all the supported positional embeddings
│   ├── activations.py          #
│   └── multi_head_dispatch.py  # (optional) multihead wrap
d├── factory
│   ├── block_factory.py        # (optional) helper to programatically generate layers
│   └── model_factory.py        # (optional) helper to programatically generate models
├── models
...                             # Full models, ready to be used
Attention mechanisms

Feed forward mechanisms

Positional embedding

Key Features

  1. Many attention mechanisms, interchangeables
  2. Optimized building blocks, beyond PyTorch primitives
    1. sparse attention
    2. block-sparse attention
    3. fused softmax
    4. fused linear layer
    5. fused layer norm
  3. Benchmarking and testing tools
    1. micro benchnmarks
    2. transformer block benchmark
    3. LRA, with SLURM suppot
  4. Programatic and sweep friendly layer and model construction
  5. Hackable
    1. Not using monolithic CUDA kernels, composable building blocks
    2. Using Triton for some optimized parts, explicit, pythonic and user-accessible

FAQ ?

We've tried to collect a relatively exhaustive list of explanations in the HOWTO

License

xFormers has a BSD-style license, as found in the LICENSE file.

Citing xFormers

If you use xFormers in your publication, please cite it by using the following BibTeX entry.

@Misc{xFormers2021,
  author =       {Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang },
  title =        {xFormers: A modular and hackable Transformer modelling library},
  howpublished = {\url{https://github.com/facebookresearch/xformers}},
  year =         {2021}
}
Comments
  • [feat] Dropout(Activation(x+bias)), now with partial BW fusion

    [feat] Dropout(Activation(x+bias)), now with partial BW fusion

    What does this PR do?

    This was a long time in the making.. Fusing the BW part of the activation/bias/dropout kernel. Not quite perfect but in some places the speed goes really bananas (like x3 or x4 the naive calls). Fusing this implied flipping the whole problem upside down, basically the seeds have to be per collum, and the kernels (fw and bw) also work that way. This allows us to fuse the bias gradient computations, since it's a sum over that direction

    TODO:

    • [x] add more unit tests to check that the dropout drops are respected on average
    • [x] possibly make sure that the rand mask does not repeat (may or may not be a big deal). Ok this is doable by making the kernels cooperate on the same col, like Phil does on LayerNorm
    • [x] improve on the scheduling for small buffers
    • [x] Fix the atomic add funkiness (works for now but this does not look completely right, num_warps dependent)

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [x] Did you write any new necessary tests?
      • [ ] N/A
    • [x] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 26
  • Support Windows and ideally build wheels for it

    Support Windows and ideally build wheels for it

    🚀 Feature

    Supporting Windows in xformers.

    Motivation

    xformers provide excellent tools to increase the speed of inference, for example close to 2x in Stable Diffusion. Sadly, it lacks Windows support. This has barred us from using it on https://github.com/AUTOMATIC1111/stable-diffusion-webui as most users and developers (including myself) use Windows.

    Pitch

    Currently, xformers will fail to compile on Windows for a multitude of errors, some of which are trivial but most are not. Enabling Windows usage by fixing these errors and ideally distributing Windows wheels would allow projects to make xformers a necessary requirement & use it.

    Alternatives

    Additional context

    cc. @fmassa

    opened by C43H66N12O12S2 23
  • triton 2.0 changes

    triton 2.0 changes

    What does this PR do?

    Fixes triton to work with version 2.0.0.

    TODOs:

    • [x] Move the syntax to triton2
    • [x] Fix fused dropout
    • [ ] Fix the blocksparse op API having changed
    • [x] Fix fused linear layer
    • [x] Update the benchmarks

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [ ] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by kashif 23
  • Pip installation fails, `CUTCLASS` not found

    Pip installation fails, `CUTCLASS` not found

    🐛 Bug

    pip installation fails in a docker container, CUTCLASS not found, git submodule update --init --recursive not executed

    To Reproduce

    Dockerfile

    FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
    RUN pip install xformers
    

    then

    docker build .
    

    Error Trace

    open
    #1 [internal] load build definition from Dockerfile
    #1 sha256:bc3772a9760c6470030d3506e7afa0b9caa2a77f63376fe30fc296a334d5c980
    #1 transferring dockerfile: 116B done
    #1 DONE 0.0s
    
    #2 [internal] load .dockerignore
    #2 sha256:5b674e66e988c8852edbf605c0d0921ac6eed40841cd55d9112e0d92242091a1
    #2 transferring context: 2B done
    #2 DONE 0.0s
    
    #3 [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
    #3 sha256:409f78a4f3551ef4b6d7a4b064ff72bb54f0677d599351b4d0dcdff08b926834
    #3 DONE 0.8s
    
    #4 [1/2] FROM docker.io/pytorch/pytorch:[email protected]:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75
    #4 sha256:2e3e89abd93f2e7b42b070196f0e6be4ce38a2d360c98232440e1d90189bdb02
    #4 CACHED
    
    #5 [2/2] RUN pip install xformers
    #5 sha256:ef3133015f56a22d509f2aa1ef730afdcaa2591838105ba332650ff73ceb9ff9
    #5 1.012 Collecting xformers
    #5 1.313   Downloading xformers-0.0.13.tar.gz (292 kB)
    #5 1.429      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.5/292.5 kB 2.6 MB/s eta 0:00:00
    #5 1.534   Preparing metadata (setup.py): started
    #5 2.952   Preparing metadata (setup.py): finished with status 'error'
    #5 2.961   error: subprocess-exited-with-error
    #5 2.961
    #5 2.961   × python setup.py egg_info did not run successfully.
    #5 2.961   │ exit code: 1
    #5 2.961   ╰─> [8 lines of output]
    #5 2.961       Traceback (most recent call last):
    #5 2.961         File "<string>", line 36, in <module>
    #5 2.961         File "<pip-setuptools-caller>", line 34, in <module>
    #5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 239, in <module>
    #5 2.961           ext_modules=get_extensions(),
    #5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 158, in get_extensions
    #5 2.961           "CUTLASS submodule not found. Did you forget "
    #5 2.961       RuntimeError: CUTLASS submodule not found. Did you forget to run `git submodule update --init --recursive` ?
    #5 2.961       [end of output]
    #5 2.961
    #5 2.961   note: This error originates from a subprocess, and is likely not a problem with pip.
    #5 2.965 error: metadata-generation-failed
    #5 2.965
    #5 2.965 × Encountered error while generating package metadata.
    #5 2.965 ╰─> See above for output.
    #5 2.965
    #5 2.965 note: This is an issue with the package mentioned above, not pip.
    #5 2.965 hint: See above for details.
    #5 ERROR: executor failed running [/bin/sh -c pip install xformers]: exit code: 1
    ------
     > [2/2] RUN pip install xformers:
    ------
    executor failed running [/bin/sh -c pip install xformers]: exit code: 1
    

    Expected behavior

    installation should work.

    Environment

    in the container, running docker on windows

    open
    PyTorch version: 1.12.1
    Is debug build: False
    CUDA used to build PyTorch: 11.3
    ROCM used to build PyTorch: N/A 
    
    OS: Ubuntu 18.04.6 LTS (x86_64) 
    GCC version: Could not collect  
    Clang version: Could not collect
    CMake version: Could not collect
    Libc version: glibc-2.17
    
    Python version: 3.7.13 (default, Mar 29 2022, 02:18:16)  [GCC 7.5.0] (64-bit runtime)
    Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
    Is CUDA available: True
    CUDA runtime version: Could not collect
    GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060
    Nvidia driver version: 517.48
    cuDNN version: Could not collect
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    Is XNNPACK available: True
    
    Versions of relevant libraries:
    [pip3] numpy==1.21.5
    [pip3] torch==1.12.1
    [pip3] torchtext==0.13.1
    [pip3] torchvision==0.13.1
    [conda] blas                      1.0                         mkl
    [conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
    [conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
    [conda] mkl-service               2.4.0            py37h7f8727e_0
    [conda] mkl_fft                   1.3.1            py37hd3c417c_0
    [conda] mkl_random                1.2.2            py37h51133e4_0
    [conda] numpy                     1.21.5           py37he7a7128_2
    [conda] numpy-base                1.21.5           py37hf524024_2
    [conda] pytorch                   1.12.1          py3.7_cuda11.3_cudnn8.3.2_0    pytorch
    [conda] pytorch-mutex             1.0                        cuda    pytorch
    [conda] torchtext                 0.13.1                     py37    pytorch
    [conda] torchvision               0.13.1               py37_cu113    pytorch
    

    Additional context

    I don't think this problem has anything to do with os/python/pytorch/cuda/nvcc versions, the setup.py seems to be tailored for local / manual install, and fails in the context.

    opened by AbdBarho 20
  • [feat] add split_dim arg to reversible, remove retain_grad, add benchmark_reversible

    [feat] add split_dim arg to reversible, remove retain_grad, add benchmark_reversible

    This PR removes the repeated chunk and cat operations in xformers' RevNet code. This way, the RevNet implementation will become a little bit faster.
    I'd strongly recommend calling a library like MemCNN or RevLib directly as they make it easier to switch the coupling function and generally give the user more freedom.

    Unfortunately, I can't sign the CLA at the moment, as it keeps saying

    Sorry, something went wrong. We're working on getting this fixed as soon as we can.

    CLA Signed 
    opened by ClashLuke 20
  • Added SmeLU

    Added SmeLU

    What does this PR do?

    Fixes #262 .

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [x] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by kashif 17
  • [chore] release v0.0.13

    [chore] release v0.0.13

    What does this PR do?

    bump the dev version number to be able to release v0.0.13, see #402

    Before submitting

    • [ ] Did you have fun?
      • Make sure you had fun coding 🙃
    • [ ] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 14
  • [feat] Compositional attention

    [feat] Compositional attention

    What does this PR do?

    Implements Compositional Attention (based on the reference implementation), as mentioned in https://github.com/facebookresearch/xformers/issues/41

    Paper

    TODOs

    • [x] Sane defaults
    • [x] Speedup wherever possible. Looks like it takes a lot of memory also at the moment, probably some dummy mistakes
    • [x] Maybe self-attention optimization (single proj) -> doable if moving the projections within the attention to the inproj class, worth it?
    • [x] Add a lot of explanations/documentations
    • [0] Some IR results ? -> that would be for another task probably ?

    cc @sarthmit if interested

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [x] Did you make sure to update the docs?
      • [ ] N/A
    • [x] Did you write any new necessary tests?
      • [ ] N/A
    • [x] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 14
  • Is xformers still not support cuda 12.0?

    Is xformers still not support cuda 12.0?

    ❓ Questions and Help

    I got the following error while installing...

    Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Obtaining file:///F:/Stable_Diffusion/stable-diffusion-webui-master/repositories/xformers Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [9 lines of output] No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0' Traceback (most recent call last): File "", line 2, in File "", line 34, in File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 293, in symlink_package( File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 83, in symlink_package os.symlink(src=path_from, dst=path_to) OSError: [WinError 1314] A required privilege is not held by the client: 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\third_party\flash-attention\flash_attn' -> 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\xformers_flash_attn' [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

    × Encountered error while generating package metadata. ╰─> See above for output.

    Is it for cuda 12. Should I degrade the cuda version?

    or what is the problem, can anyone help?

    opened by debdip 13
  • Cannot install xformers on linux server

    Cannot install xformers on linux server

    ❓ Questions and Help

    When I tried either pip install or build from source, I get this issue:

     × python setup.py egg_info did not run successfully.
      │ exit code: 1
      ╰─> [18 lines of output]
          Traceback (most recent call last):
            File "<string>", line 2, in <module>
            File "<pip-setuptools-caller>", line 34, in <module>
            File "/home/username/xformers/setup.py", line 239, in <module>
              ext_modules=get_extensions(),
            File "/home/username/xformers/setup.py", line 187, in get_extensions
              cuda_version = get_cuda_version(CUDA_HOME)
            File "/home/username/xformers/setup.py", line 51, in get_cuda_version
              raw_output = subprocess.check_output([nvcc_bin, "-V"], universal_newlines=True)
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 424, in check_output
              return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 505, in run
              with Popen(*popenargs, **kwargs) as process:
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 951, in __init__
              self._execute_child(args, executable, preexec_fn, close_fds,
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 1821, in _execute_child
              raise child_exception_type(errno_num, err_msg, err_filename)
          FileNotFoundError: [Errno 2] No such file or directory: '/home/username/anaconda3/envs/test_env/bin/nvcc'
          [end of output]
    

    here's the output of nvcc --version

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Tue_Mar__8_18:18:20_PST_2022
    Cuda compilation tools, release 11.6, V11.6.124
    Build cuda_11.6.r11.6/compiler.31057947_0
    

    and as additional information, I was able to install pytorch the usual way/verify that cuda is available.

    opened by fedshyvana 13
  • Encoder decoder arch doesnt work when sequence lengths are different

    Encoder decoder arch doesnt work when sequence lengths are different

    🐛 Bug

    I get an error when the sequence lengths to the encoder and decoder are different, e.g. in the code snippet below:

    Command

    EMB = 384
    SEQ_ENC = 128
    SEQ_DEC = 64
    BATCH = 16
    VOCAB = 64
    
    my_config = [
        # A list of the encoder or decoder blocks which constitute the Transformer.
        # Note that a sequence of different encoder blocks can be used, same for decoders
        {
            "reversible": False,  # Optionally make these layers reversible, to save memory
                "block_type": "encoder",
                "num_layers": 3,  # Optional, this means that this config will repeat N times
                "dim_model": EMB,
                "layer_norm_style": "pre",  # Optional, pre/post
                "position_encoding_config": {
                    "name": "vocab",  # whatever position encodinhg makes sense
                    "seq_len": SEQ_ENC,
                    "vocab_size": VOCAB,
                },
                "multi_head_config": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "linformer",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": False,
                        "seq_len": SEQ_ENC,
                    },
                },
                "feedforward_config": {
                    "name": "MLP",
                    "dropout": 0,
                    "activation": "relu",
                    "hidden_layer_multiplier": 4,
                },
            },
        {
            "reversible": False,  # Optionally make these layers reversible, to save memory
    
                "block_type": "decoder",
                "num_layers": 3,  # Optional, this means that this config will repeat N times
                "dim_model": EMB,
                "layer_norm_style": "pre",  # Optional, pre/post
                "position_encoding_config": {
                    "name": "vocab",  # whatever position encodinhg makes sense
                    "seq_len": SEQ_DEC,
                    "vocab_size": VOCAB,
                },
                "multi_head_config_masked": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "nystrom",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": True,
                        "seq_len": SEQ_DEC,
                    },
                },
                "multi_head_config_cross": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "favor",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": True,
                        "seq_len": SEQ_DEC,
                    },
                },
                "feedforward_config": {
                    "name": "MLP",
                    "dropout": 0,
                    "activation": "relu",
                    "hidden_layer_multiplier": 4,
                },
            },
    ]
    
    # This part of xFormers is entirely type checked and needs a config object,
    # could be changed in the future
    config = xFormerConfig(my_config)
    model = xFormer.from_config(config)
    
    #  Test out with dummy inputs
    src = (torch.rand((BATCH, SEQ_ENC)) * VOCAB).abs().to(torch.int)
    tgt = (torch.rand((BATCH, SEQ_DEC)) * VOCAB).abs().to(torch.int)
    y = model(src=src, tgt=tgt)
    
    print(y.shape)
    

    Expected behavior

    torch.Size([16, 64, 384])
    

    however, I get:

    RuntimeError: einsum(): operands do not broadcast with remapped shapes [original->remapped]: [64, 128, 96, 96]->[64, 128, 96, 96] [64, 64, 96]->[64, 64, 1, 96]
    
    ongoing 
    opened by kashif 13
  • How to set random seeds fixed

    How to set random seeds fixed

    ❓ Questions and Help

    Different results occur when I run the same code twice. And the set_seed func run before all.

    def set_seed(seed):
        random.seed(seed)  # Python random module.
        np.random.seed(seed)  # Numpy module.
        set_seed(seed)
        os.environ['PYTHONHASHSEED'] = str(seed)
        torch.random.manual_seed(seed)
        torch.manual_seed(seed)
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
        torch.backends.cudnn.benchmark = cudnn_benchmark
        torch.backends.cudnn.deterministic = cudnn_deterministic
    
    opened by scp92 1
  • Allowing decoder only definition

    Allowing decoder only definition

    🚀 Feature

    Allow only a decoder config to be defined.

    Motivation

    I want to define only a decoder and pass in a memory vector from another source.

    Pitch

    I tried this change locally and it allows me to do what I want it to do: https://github.com/facebookresearch/xformers/compare/main...nh2liu:xformers:patch-1

    Not sure if this has extending implications because it seems this code has been around for a while but the comments # If decoder: either use the encoder ouput, or just decode, both options are possible indicate that this may be a bug.

    Alternatives

    • NOOP encoder will also allow this functionality.
    opened by nh2liu 2
  • build from source failed

    build from source failed

    🐛 Bug

    Command

    pip install ninja pip install -v -U git+https://github.com/facebookresearch/[email protected]#egg=xformers

    ERROR INFO

    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:650:66:   required from ‘void _GLOBAL__N__7fac2228_12_attention_cu_724ba955_12677::launch_attention(at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, float, at::PhiloxCudaState) [with bool compute_logsumexp = true]’
    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:793:92:   required from here
    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:612:58: warning: ‘at::GenericPackedTensorAccessor<T, N, PtrTraits, index_t> at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 3; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations]
      612 |     return attn_bias.packed_accessor<scalar_t, 3>();
          |                                                          ^
    /usr/local/lib/python3.8/dist-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
      247 |   GenericPackedTensorAccessor<T,N,PtrTraits,index_t> packed_accessor() const & {
          | ^ ~~~~~~~~~~~~~
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1901, in _run_ninja_build
        subprocess.run(
      File "/usr/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py", line 301, in <module>
        setuptools.setup(
      File "/usr/local/lib/python3.8/dist-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py", line 68, in run
        return orig.install.run(self)
      File "/usr/lib/python3.8/distutils/command/install.py", line 589, in run
        self.run_command('build')
      File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
        build_ext.build_extensions(self)
      File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
        _build_ext.build_ext.build_extensions(self)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
        _build_ext.build_extension(self, ext)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
        objects = self.compiler.compile(sources,
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1573, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1917, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    Running setup.py install for xformers: finished with status 'error'
    

    ERROR: Command errored out with exit status 1: /usr/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"'; file='"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-lj6j_c0s/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/oppoer/.local/include/python3.8/xformers Check the logs for full command output. WARNING: You are using pip version 21.2.4; however, version 22.3.1 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.

    Environment

    My docker image is: nvcr.io/nvidia/pytorch:22.06-py3

    Collecting environment information... PyTorch version: 1.13.0a0+936e930 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

    OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.24.1 Libc version: glibc-2.31

    Python version: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-3.10.0-957.27.2.el7.x86_64-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.8.89 GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB Nvidia driver version: 470.129.06 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

    Versions of relevant libraries: [pip3] functorch==1.13.0a0+936e930 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.2 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.13.0a0+936e930 [pip3] torch-tensorrt==1.3.0a0 [pip3] torchtext==0.13.0a0+fae8e8c [pip3] torchvision==0.15.0a0 [conda] Could not collect

    opened by GxjGit 7
  • Unable to Build from latest

    Unable to Build from latest

    🐛 Bug

    Command

    cd xformers
    git pull
    git submobule update --recursive --remote
    pip install -e .
    

    To Reproduce

    Steps to reproduce the behavior:

    1. pull latest from git, (at hash f82722f61f972c02ebc54431e3e4717f21b3e9b9)
    2. pull latest submodules
    3. build

    Expected behavior

    Building to run successfully

    Environment

    Collecting environment information...
    PyTorch version: 1.12.1+cu116
    Is debug build: False
    CUDA used to build PyTorch: 11.6
    ROCM used to build PyTorch: N/A
    
    OS: Ubuntu 20.04.5 LTS (x86_64)
    GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    Clang version: 10.0.0-4ubuntu1
    CMake version: version 3.25.0
    Libc version: glibc-2.31
    
    Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0] (64-bit runtime)
    Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
    Is CUDA available: True
    CUDA runtime version: 11.6.124
    GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060 SUPER
    Nvidia driver version: 526.47
    cuDNN version: Probably one of the following:
    /usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    Is XNNPACK available: True
    
    Versions of relevant libraries:
    [pip3] mypy-extensions==0.4.3
    [pip3] numpy==1.23.2
    [pip3] pytorch-lightning==1.7.5
    [pip3] torch==1.12.1+cu116
    [pip3] torchaudio==0.12.1+cu116
    [pip3] torchdynamo==1.12.0
    [pip3] torchmetrics==0.9.3
    [pip3] torchvision==0.13.1+cu116
    [conda] numpy                     1.23.2                   pypi_0    pypi
    [conda] pytorch-lightning         1.7.5                    pypi_0    pypi
    [conda] torch                     1.12.1+cu116             pypi_0    pypi
    [conda] torchaudio                0.12.1+cu116             pypi_0    pypi
    [conda] torchdynamo               1.12.0                   pypi_0    pypi
    [conda] torchmetrics              0.9.3                    pypi_0    pypi
    [conda] torchvision               0.13.1+cu116             pypi_0    pypi
    
    • PyTorch Version (e.g., 1.0): 1.12.1+cu116
    • OS (e.g., Linux): WSL
    • How you installed PyTorch (conda, pip, source): pip install -e .
    • Build command you used (if compiling from source): pip install -e .
    • Python version: 3.8.13
    • CUDA/cuDNN version: 11.6
    • GPU models and configuration: NVIDIA GeForce RTX 2060 SUPER
    • Any other relevant information: It worked on a previous commit

    Additional context

    Error message from compiler:

        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: namespace "cutlass::gemm::warp" has no member "WarpSize"
    
        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: type name is not allowed
    
        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: the global scope has no "value"
    
        3 errors detected in the compilation of "/home/jonno/xformers/xformers/csrc/attention/cuda/fmha/attention_forward_generic.cu".
        /home/jonno/anaconda3/envs/dyn/lib/python3.8/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.6
          warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
        error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    
    opened by JonnoFTW 3
Releases(v0.0.13)
  • v0.0.13(Sep 26, 2022)

  • v0.0.12(Aug 8, 2022)

    [0.0.12] - 2022-08-08

    Fixed

    • Removed duplicated biases in the FusedMLP layers [#317]
    • Rotary embeddings respecting input types [#326]
    • Poolformer style instantiating useless projection layers [#349]
    • Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]
    • Pass use_triton flag to LayerNorm module [#336]

    Added

    • Four blocksparsity layouts from DeepSpeed [#320]
    • Support several initialization options [#312]
    • Conv2DFeedforward feedforward part [#321]
    • VisualAttention [#329]
    • Automatic blocksparse for causal attention [#334]
    • Better hierarchical transformer generation [#345]
    • Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]
    • Refactor LRA code to use Pytorch Lightning [#343]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.11(May 30, 2022)

    [0.0.11] - 2022-05-30

    Fixed

    • Fix some torchscriptability [#246]
    • Fix FourierMix being compatible with AMP [#258]
    • Better asserts on QKV dimensions [#264]
    • Better perfs for FusedMLP and FusedLinearLayer [#283]
    • Deepnorm init missing self-attention [#284]

    Added

    • Simplicial Embeddings [#259]
    • Mem efficient attention, FW pass [#267]
    • MHA benchmark
    • MLP benchmark
    • Move all triton kernels to triton v2 [#272]
    • Mem efficient attention, BW pass [#281]
    • Metaformer support [#294]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.10(Mar 15, 2022)

    Fixed

    • Expose bias flag for feedforwards, same default as Timm [#220]
    • Update eps value for layernormm, same default as torch [#221]
    • PreNorm bugfix, only one input was normalized [#233]

    Added

    • Add DeepNet (DeepNorm) residual path and init [#227]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.9(Feb 9, 2022)

    Added

    • Compositional Attention [#41]
    • Experimental Ragged attention [#189]
    • Mixture of Experts [#181]
    • BlockSparseTensor [#202]
    • nd-tensor support for triton softmax [#210]

    Fixed

    • bugfix Favor, single feature map [#183]
    • sanity check blocksparse settings [#207]
    • fixed some pickability [#204]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.8(Jan 7, 2022)

  • v0.0.7(Nov 30, 2021)

  • v0.0.6(Nov 24, 2021)

    Fixed

    • Fix self attention optimization not being triggered, broken residual path [#119]
    • Improve speed by not using contiguous Tensors when not needed [#119]

    Added

    • Attention mask wrapper [#113]
    • ViT comparison benchmark [#117]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.5(Nov 18, 2021)

  • v0.0.4(Nov 17, 2021)

    • Fixing causality not being respected by the scaled dot product attention
    • Fixing Favor causal trainability
    • Enabling FusedLayerNorm by default if Triton is available
    • Fixing Favor with fp16
    Source code(tar.gz)
    Source code(zip)
  • v0.03(Nov 5, 2021)

  • v0.0.2(Nov 1, 2021)

    [0.0.2] - 2021-11-01

    Fixed

    • More robust blocksparse [#24]

    Added

    • Rotary embeddings [#32]
    • More flexible layernorm [#50]
    • More flexible blockfactory config (key deduplication)
    Source code(tar.gz)
    Source code(zip)
Owner
Facebook Research
Facebook Research
A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Transformer Embedder A Word Level Transformer layer based on PyTorch and 🤗 Transformers. How to use Install the library from PyPI: pip install transf

Riccardo Orlando 27 Nov 20, 2022
ADCS - Automatic Defect Classification System (ADCS) for SSMC

Table of Contents Table of Contents ADCS Overview Summary Operator's Guide Demo System Design System Logic Training Mode Production System Flow Folder

Tam Zher Min 2 Jun 24, 2022
A Telegram bot to add notes to Flomo.

flomo bot 使用 Telegram 机器人发送笔记到你的 Flomo. 你需要有一台可访问 Telegram 的服务器。 Steps @BotFather 新建机器人,获取 token Flomo 官网获取 API,链接 https://flomoapp.com/mine?source=in

Zhen 44 Dec 30, 2022
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。 安装 pip install -U ttskit 注意 可能需另外安装的依赖包:torch,版本要求torch=1.6.0,=1.7.1,根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

KDD 483 Jan 04, 2023
The proliferation of disinformation across social media has led the application of deep learning techniques to detect fake news.

Fake News Detection Overview The proliferation of disinformation across social media has led the application of deep learning techniques to detect fak

Kushal Shingote 1 Feb 08, 2022
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Google Research Datasets 740 Dec 24, 2022
NeMo: a toolkit for conversational AI

NVIDIA NeMo Introduction NeMo is a toolkit for creating Conversational AI applications. NeMo product page. Introductory video. The toolkit comes with

NVIDIA Corporation 5.3k Jan 04, 2023
Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

A Infomation Grathering tool that reverse search phone numbers and get their details ! What is phomber? Phomber is one of the best tools available fo

S41R4J 121 Dec 27, 2022
Two-stage text summarization with BERT and BART

Two-Stage Text Summarization Description We experiment with a 2-stage summarization model on CNN/DailyMail dataset that combines the ability to filter

Yukai Yang (Alexis) 6 Oct 22, 2022
Mkdocs + material + cool stuff

Modern-Python-Doc-Example mkdocs + material + cool stuff Doc is live here Features out of the box amazing good looking website thanks to mkdocs.org an

Francesco Saverio Zuppichini 61 Oct 26, 2022
A curated list of FOSS tools to improve the Hacker News experience

Awesome-Hackernews Hacker News is a social news website focusing on computer technologies, hacking and startups. It promotes any content likely to "gr

Bryton Lacquement 141 Dec 27, 2022
TLA - Twitter Linguistic Analysis

TLA - Twitter Linguistic Analysis Tool for linguistic analysis of communities TLA is built using PyTorch, Transformers and several other State-of-the-

Tushar Sarkar 47 Aug 14, 2022
Write Python in Urdu - اردو میں کوڈ لکھیں

UrduPython Write simple Python in Urdu. How to Use Write Urdu code in سامپل۔پے The mappings are as following: "۔": ".", "،":

Saad A. Bazaz 26 Nov 27, 2022
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers

Microsoft 105 Jan 08, 2022
NLP, Machine learning

Netflix-recommendation-system NLP, Machine learning About Recommendation algorithms are at the core of the Netflix product. It provides their members

Harshith VH 6 Jan 12, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT.

KR-BERT-SimCSE Implementing SimCSE(paper, official repository) using TensorFlow 2 and KR-BERT. Training Unsupervised python train_unsupervised.py --mi

Jeong Ukjae 27 Dec 12, 2022
The Classical Language Toolkit

Notice: This Git branch (dev) contains the CLTK's upcoming major release (v. 1.0.0). See https://github.com/cltk/cltk/tree/master and https://docs.clt

Classical Language Toolkit 754 Jan 09, 2023
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The goal is to create a single, flexible, and user-friendly toolkit that can be used to easily develop state-of-the-art speech technologies, including systems for speech recognition, speaker recognit

SpeechBrain 5.1k Jan 09, 2023