Memory efficient transducer loss computation

Overview

Introduction

This project implements the optimization techniques proposed in Improving RNN Transducer Modeling for End-to-End Speech Recognition to reduce the memory consumption for computing transducer loss.

How does it differ from the RNN-T loss from torchaudio

It produces same output as torchaudio for the same input, so optimized_transducer should be equivalent to torchaudio.functional.rnnt_loss().

This project is more memory efficient and potentially faster (TODO: This needs some benchmarks)

Also, torchaudio accepts only output from nn.Linear, but we also support output from log-softmax (You can set the option from_log_softmax to True in this case).

How does it differ from warp-transducer

It borrows the methods of computing alpha and beta from warp-transducer. Therefore, optimized_transducer produces the same alpha and beta as warp-transducer for the same input.

However, warp-transducer produces different gradients for CPU and CUDA when using the same input. See https://github.com/HawkAaron/warp-transducer/issues/93

This project produces consistent gradient on CPU and CUDA for the same input, just like what torchaudio is doing. (We borrow the gradient computation formula from torchaudio).

optimized_transducer uses less memory than that of warp-transducer and is potentially faster. (TODO: This needs some benchmarks).

Installation

You can install it via pip:

pip install optimized_transducer

To check that optimized_transducer was installed successfully, please run

python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"

which should print the version of the installed optimized_transducer, e.g., 1.2.

Installation FAQ

What operating systems are supported ?

It has been tested on Ubuntu 18.04. It should also work on macOS and other unixes systems. It may work on Windows, though it is not tested.

How to display installation log ?

Use

pip install --verbose optimized_transducer

How to reduce installation time ?

Use

export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -j to make.

Which version of PyTorch is supported ?

It has been tested on PyTorch >= 1.5.0. It may work on PyTorch < 1.5.0

How to install a CPU version of optimized_transducer ?

Use

export OT_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF"
export OT_MAKE_ARGS="-j"
pip install --verbose optimized_transducer

It will pass -DCMAKE_BUILD_TYPE=Release -DOT_WITH_CUDA=OFF to cmake.

What Python versions are supported ?

Python >= 3.6 is known to work. It may work for Python 2.7, though it is not tested.

Where to get help if I have problems with the installation ?

Please file an issue at https://github.com/csukuangfj/optimized_transducer/issues and describe your problem there.

Usage

optimized_transducer expects that the output shape of the joint network is NOT (N, T, U, V), but is (sum_all_TU, V), which is a concatenation of 2-D tensors: (T_1 * U_1, V), (T_2 * U_2, V), ..., (T_N, U_N, V). Note: (T_1 * U_1, V) is just the reshape of a 3-D tensor (T_1, U_1, V).

Suppose your original joint network looks somewhat like the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out = encoder_out.unsqueeze(2) # Now encoder out is (N, T, 1, D)
decoder_out = decoder_out.unsqueeze(1) # Now decoder out is (N, 1, U, D)

x = encoder_out + decoder_out # x is of shape (N, T, U, D)
activation = torch.tanh(x)

logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = torchaudio.functional.rnnt_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
)

You need to change it to the following:

encoder_out = torch.rand(N, T, D) # from the encoder
decoder_out = torch.rand(N, U, D) # from the decoder, i.e., the prediction network

encoder_out_list = [encoder_out[i, :logit_lengths[i], :] for i in range(N)]
decoder_out_list = [decoder_out[i, :target_lengths[i]+1, :] for i in range(N)]

x = [e.unsqueeze(1) + d.unsqueeze(0) for e, d in zip(encoder_out_list, decoder_out_list)]
x = [p.reshape(-1, D) for p in x]
x = torch.cat(x)

activation = torch.tanh(x)
logits = linear(activation) # linear is an instance of `nn.Linear`.

loss = optimized_transducer.transducer_loss(
    logits=logits,
    targets=targets,
    logit_lengths=logit_lengths,
    target_lengths=target_lengths,
    blank=blank_id,
    reduction="mean",
    from_log_softmax=False,
)

Caution: We used from_log_softmax=False in the above example since logits is the output of nn.Linear.

Hint: If logits is the output of log-softmax, you should use from_log_softmax=True.

In most cases, you should pass the output of nn.Linear to compute the loss, i.e., use from_log_softmax=False, to save memory.

If you want to do some operations on the output of log-softmax before feeding it to optimized_transducer.transducer_loss(), from_log_softmax=True is helpful in this case. But be aware that this will increase the memory usage.

For more usages, please refer to

For developers

As a developer, you don't need to use pip install optimized_transducer. To make development easier, you can use

git clone https://github.com/csukuangfj/optimized_transducer.git
cd optimized_transducer
mkdir build
cd build
cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

I usually create a file path.sh inside the build direcotry, containing

export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH

so what you need to do is

cd optimized_transducer/build
source path.sh

# Then you are ready to run Python tests
python3 optimized_transducer/python/tests/test_compute_transducer_loss.py

# You can also use "import optimized_transducer" in your Python projects

To run all Python tests, use

cd optimized_transducer/build
ctest --output-on-failure
Comments
  • Issue with optimized-transducer installation

    Issue with optimized-transducer installation

    I started installing K2, lhotse and Icefall. So far I was able to test K2 and it works perfectly, lhotse also works but when I tried to install icefall I got a weird issue about optimized-transducer. The log is below.

    Collecting kaldilm Using cached kaldilm-1.11-cp38-cp38-linux_x86_64.whl Collecting kaldialign Using cached kaldialign-0.2-cp38-cp38-linux_x86_64.whl Requirement already satisfied: sentencepiece>=0.1.96 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 3)) (0.1.96) Requirement already satisfied: tensorboard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 4)) (2.7.0) Requirement already satisfied: typeguard in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from -r requirements.txt (line 5)) (2.13.3) Collecting optimized_transducer Using cached optimized_transducer-1.3.tar.gz (47 kB) Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.8.1) Requirement already satisfied: werkzeug>=0.11.15 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.0.2) Requirement already satisfied: numpy>=1.12.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.21.2) Requirement already satisfied: protobuf>=3.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.19.3) Requirement already satisfied: wheel>=0.26 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.37.1) Requirement already satisfied: setuptools>=41.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (58.0.4) Requirement already satisfied: grpcio>=1.24.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.43.0) Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.4.6) Requirement already satisfied: absl-py>=0.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (1.0.0) Requirement already satisfied: google-auth<3,>=1.6.3 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.3.3) Requirement already satisfied: requests<3,>=2.21.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (2.27.1) Requirement already satisfied: markdown>=2.6.8 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (3.3.6) Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from tensorboard->-r requirements.txt (line 4)) (0.6.1) Requirement already satisfied: six in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from absl-py>=0.4->tensorboard->-r requirements.txt (line 4)) (1.16.0) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.2.4) Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.2.8) Requirement already satisfied: rsa<5,>=3.1.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (4.8) Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (1.3.0) Requirement already satisfied: importlib-metadata>=4.4 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (4.10.1) Requirement already satisfied: zipp>=0.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard->-r requirements.txt (line 4)) (3.7.0) Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 4)) (0.4.8) Requirement already satisfied: charset-normalizer~=2.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2.0.10) Requirement already satisfied: certifi>=2017.4.17 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (2021.10.8) Requirement already satisfied: idna<4,>=2.5 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (3.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->-r requirements.txt (line 4)) (1.26.8) Requirement already satisfied: oauthlib>=3.0.0 in /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard->-r requirements.txt (line 4)) (3.1.1) Building wheels for collected packages: optimized-transducer Building wheel for optimized-transducer (setup.py): started Building wheel for optimized-transducer (setup.py): finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-qa004082 cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (153 lines): running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

              cd build/temp.linux-x86_64-3.8
    
              cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
              make  _optimized_transducer
    

    -- Enabled languages: CXX;CUDA -- The CXX compiler identification is GNU 6.5.0 -- The CUDA compiler identification is NVIDIA 11.1.74 -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86 -- Adding arch 35 -- Adding arch 50 -- Adding arch 60 -- Adding arch 61 -- Adding arch 70 -- Adding arch 75 -- Adding arch 80 -- Adding arch 86 -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86 -- Downloading pybind11 -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src -- pybind11 v2.6.0 -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12") -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so -- Performing Test HAS_FLTO -- Performing Test HAS_FLTO - Success -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -- Looking for C++ include pthread.h -- Looking for C++ include pthread.h - found -- Looking for pthread_create -- Looking for pthread_create - not found -- Looking for pthread_create in pthreads -- Looking for pthread_create in pthreads - not found -- Looking for pthread_create in pthread -- Looking for pthread_create in pthread - found -- Found Threads: TRUE CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package): Policy CMP0074 is not set: find_package uses _ROOT variables. Run "cmake --help-policy CMP0074" for policy details. Use the cmake_policy command to set the policy and suppress this warning.

    Environment variable CUDA_ROOT is set to:
    
      /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
    For compatibility, CMake is ignoring the variable.
    

    Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include) This warning is for project developers. Use -Wno-dev to suppress it.

    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1") -- Caffe2: CUDA detected: 11.1 -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0 -- Caffe2: Header version is: 11.1 -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message): Caffe2: Cannot find cuDNN library. Turning the option off Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include) /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a -- Automatic GPU detection failed. Building for common architectures. -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86 CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message): Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN libraries. Please set the proper cuDNN prefixes and / or install cuDNN. Call Stack (most recent call first): /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package) cmake/torch.cmake:11 (find_package) CMakeLists.txt:130 (include)

    -- Configuring incomplete, errors occurred! See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log". See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log". make: *** No rule to make target `_optimized_transducer'. Stop. Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in setuptools.setup( File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup return distutils.core.setup(**attrs) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup dist.run_commands() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/wheel/bdist_wheel.py", line 299, in run self.run_command('build') File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run _build_ext.run(self) File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension raise Exception( Exception: Build optimized_transducer failed. Please check the error message. You can ask for help by creating an issue on GitHub.

    Click: https://github.com/csukuangfj/optimized_transducer/issues/new


    ERROR: Failed building wheel for optimized-transducer Running setup.py clean for optimized-transducer Failed to build optimized-transducer Installing collected packages: optimized-transducer, kaldilm, kaldialign Running setup.py install for optimized-transducer: started Running setup.py install for optimized-transducer: finished with status 'error' ERROR: Command errored out with exit status 1: command: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer cwd: /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/ Complete output (155 lines): running install running build running build_py creating build creating build/lib.linux-x86_64-3.8 creating build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/init.py -> build/lib.linux-x86_64-3.8/optimized_transducer copying optimized_transducer/python/optimized_transducer/transducer_loss.py -> build/lib.linux-x86_64-3.8/optimized_transducer running build_ext For fast compilation, run: export OT_MAKE_ARGS="-j"; python setup.py install Setting PYTHON_EXECUTABLE to /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python build command is:

                cd build/temp.linux-x86_64-3.8
    
                cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173
    
                make  _optimized_transducer
    
    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 6.5.0
    -- The CUDA compiler identification is NVIDIA 11.1.74
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++
    -- Check for working CXX compiler: /cm/shared/apps/gcc6/6.5.0/bin/g++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Check for working CUDA compiler: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86
    -- Adding arch 35
    -- Adding arch 50
    -- Adding arch 60
    -- Adding arch 61
    -- Adding arch 70
    -- Adding arch 75
    -- Adding arch 80
    -- Adding arch 86
    -- OT_COMPUTE_ARCHS: 35;50;60;61;70;75;80;86
    -- Downloading pybind11
    -- pybind11 is downloaded to /tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python (found version "3.8.12")
    -- Found PythonLibs: /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    CMake Warning (dev) at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:29 (find_package):
      Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
      Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
      command to set the policy and suppress this warning.
    
      Environment variable CUDA_ROOT is set to:
    
        /cm/shared/apps/cuda11.1/toolkit/11.1.0
    
      For compatibility, CMake is ignoring the variable.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    This warning is for project developers.  Use -Wno-dev to suppress it.
    
    -- Found CUDA: /cm/shared/apps/cuda11.1/toolkit/11.1.0 (found version "11.1")
    -- Caffe2: CUDA detected: 11.1
    -- Caffe2: CUDA nvcc is: /cm/shared/apps/cuda11.1/toolkit/11.1.0/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /cm/shared/apps/cuda11.1/toolkit/11.1.0
    -- Caffe2: Header version is: 11.1
    -- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH)
    CMake Warning at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message):
      Caffe2: Cannot find cuDNN library.  Turning the option off
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- /cm/shared/apps/cuda11.1/toolkit/11.1.0/lib64/libnvrtc.so shorthash is 1f6b333a
    -- Automatic GPU detection failed. Building for common architectures.
    -- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;8.0;8.6;8.6+PTX
    -- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
    CMake Error at /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
      Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
      libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.
    Call Stack (most recent call first):
      /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
      cmake/torch.cmake:11 (find_package)
      CMakeLists.txt:130 (include)
    
    
    -- Configuring incomplete, errors occurred!
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeOutput.log".
    See also "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/build/temp.linux-x86_64-3.8/CMakeFiles/CMakeError.log".
    make: *** No rule to make target `_optimized_transducer'.  Stop.
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 101, in <module>
        setuptools.setup(
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/install.py", line 61, in run
        return orig.install.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/install.py", line 545, in run
        self.run_command('build')
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py", line 60, in build_extension
        raise Exception(
    Exception:
    Build optimized_transducer failed. Please check the error message.
    You can ask for help by creating an issue on GitHub.
    
    Click:
    	https://github.com/csukuangfj/optimized_transducer/issues/new
    
    ----------------------------------------
    

    ERROR: Command errored out with exit status 1: /home/local/QCRI/ahussein/anaconda3/envs/k2/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"'; file='"'"'/tmp/pip-install-jw6digfq/optimized-transducer_865f3ecab82f4f25914b71cca4901173/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-mcbah0p8/install-record.txt --single-version-externally-managed --compile --install-headers /home/local/QCRI/ahussein/anaconda3/envs/k2/include/python3.8/optimized-transducer Check the logs for full command output.

    opened by AmirHussein96 8
  • Warprnnt gradient for CPU

    Warprnnt gradient for CPU

    @csukuangfj Just wanted to note that the gradient is not incorrect for CPU vs GPU, the instructions clearly state that for CPU you need to provide log_softmax(joint-logits) whereas for the GPU you should only provide joint-logits since the cuda kernel will efficiently compute the log_softmax internally.

    Anyway yours is also an efficient implementation, also written in c++, could you benchmark the solutions if you have time ? Even a naive one would give some hint as to speed in relative terms. The memory efficient implementation of yours is very interesting too, which reduces speed but saves a lot of memory.

    opened by titu1994 2
  • "ModuleNotFoundError: No module named '_optimized_transducer'" when testing.

    I install the optimized_transducer as follows:

    git clone https://github.com/csukuangfj/optimized_transducer.git
    cd optimized_transducer
    mkdir build
    cd build
    cmake -DOT_BUILD_TESTS=ON -DCMAKE_BUILD_TYPE=Release ..
    export PYTHONPATH=$PWD/../optimized_transducer/python:$PWD/lib:$PYTHONPATH
    

    The cmake log as follows:

    -- Enabled languages: CXX;CUDA
    -- The CXX compiler identification is GNU 7.5.0
    -- The CUDA compiler identification is NVIDIA 10.1.243
    -- Check for working CXX compiler: /usr/bin/c++
    -- Check for working CXX compiler: /usr/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features
    -- Detecting CXX compile features - done
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
    -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
    -- Detecting CUDA compiler ABI info
    -- Detecting CUDA compiler ABI info - done
    -- Autodetected CUDA architecture(s):  7.0
    -- OT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_70,code=sm_70
    -- OT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75
    -- Skipping arch 35
    -- Skipping arch 50
    -- Skipping arch 60
    -- Skipping arch 61
    -- Adding arch 70
    -- Skipping arch 75
    -- OT_COMPUTE_ARCHS: 70
    -- Downloading pybind11
    -- pybind11 is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/pybind11-src
    -- pybind11 v2.6.0
    -- Found PythonInterp: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python (found version "3.8.11")
    -- Found PythonLibs: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/libpython3.8.so
    -- Performing Test HAS_FLTO
    -- Performing Test HAS_FLTO - Success
    -- Python executable: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/bin/python
    -- Looking for C++ include pthread.h
    -- Looking for C++ include pthread.h - found
    -- Looking for pthread_create
    -- Looking for pthread_create - not found
    -- Looking for pthread_create in pthreads
    -- Looking for pthread_create in pthreads - not found
    -- Looking for pthread_create in pthread
    -- Looking for pthread_create in pthread - found
    -- Found Threads: TRUE
    -- Found CUDA: /usr/local/cuda (found version "10.1")
    -- Caffe2: CUDA detected: 10.1
    -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
    -- Caffe2: CUDA toolkit directory: /usr/local/cuda
    -- Caffe2: Header version is: 10.1
    -- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
    -- Found cuDNN: v7.6.2  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
    -- Autodetected CUDA architecture(s):  7.0
    -- Added CUDA NVCC flags for: -gencode;arch=compute_70,code=sm_70
    -- Found Torch: /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torch/lib/libtorch.so
    -- PyTorch version: 1.7.0+cu101
    -- PyTorch cuda version: 10.1
    -- Use FetchContent provided by k2
    -- Downloading googletest
    
    -- googletest is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-src
    -- googletest's binary dir is /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/googletest-build
    -- The C compiler identification is GNU 7.5.0
    -- Check for working C compiler: /usr/bin/cc
    -- Check for working C compiler: /usr/bin/cc -- works
    -- Detecting C compiler ABI info
    -- Detecting C compiler ABI info - done
    -- Detecting C compile features
    -- Detecting C compile features - done
    -- Downloading moderngpu
    -- moderngpu is downloaded to /ceph-meixu/luomingshuang/optimized_transducer/build/_deps/moderngpu-src
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /ceph-meixu/luomingshuang/optimized_transducer/build
    

    But when I use python optimized_transducer/python/tests/test_compute_transducer_loss.py for testing, there is an error as follows:

    /ceph-meixu/luomingshuang/anaconda3/envs/k2-python/lib/python3.8/site-packages/torchaudio/backend/utils.py:53: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
      warnings.warn(
    Traceback (most recent call last):
      File "optimized_transducer/python/tests/test_compute_transducer_loss.py", line 8, in <module>
        import optimized_transducer
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/ceph-meixu/luomingshuang/optimized_transducer/optimized_transducer/python/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ModuleNotFoundError: No module named '_optimized_transducer'
    

    Hope to know how I can solve it. Thanks!

    opened by luomingshuang 2
  • Update transducer-loss.h

    Update transducer-loss.h

    I found that https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/csrc/transducer-loss.h#L17 and https://github.com/csukuangfj/optimized_transducer/blob/0c75a5712f709024165fe62360dd25905cca8c68/optimized_transducer/python/tests/test_compute_transducer_loss.py#L61 were not Inconsistent. I think that the front was not correct. Here I fixed it. @csukuangfj , what do you think?

    opened by shanguanma 2
  • fix for CMakeLists.txt

    fix for CMakeLists.txt

    When I run make -j in the build dir, there is an error happens: error: #error C++14 or later compatible compiler is required to use ATen.. So I add the following two commands to CMakeLists.txt and the make -j process can run successfully.

    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_CXX_STANDARD_REQUIRED ON)
    

    I'm not sure if the above two commands are necesary for the CMakeLists.txt in all environments.

    opened by luomingshuang 1
  • Fix installation on macOS.

    Fix installation on macOS.

    To fix the following error when running

    python3 -c "import optimized_transducer; print(optimized_transducer.__version__)"
    

    on macOS:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/__init__.py", line 1, in <module>
        from .transducer_loss import TransducerLoss, transducer_loss  # noqa
      File "/Users/fangjun/py38/lib/python3.8/site-packages/optimized_transducer/transducer_loss.py", line 3, in <module>
        import _optimized_transducer
    ImportError: dlopen(/Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so, 2): Symbol not found: _THPVariableClass
      Referenced from: /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
      Expected in: flat namespace
     in /Users/fangjun/py38/lib/python3.8/site-packages/_optimized_transducer.cpython-38-darwin.so
    
    opened by csukuangfj 0
  • Disable warp level parallel reduction

    Disable warp level parallel reduction

    Somehow it produces incorrect alpha and beta for a large value of sum_all_TU using warps.

    We disable warp level parallel reduction for now and use the method from https://github.com/HawkAaron/warp-transducer to compute alpha and beta.

    Will revisit the issues about warps after gaining more experience with CUDA programming.

    opened by csukuangfj 0
  • transducer grad compute formular

    transducer grad compute formular

    The formular for gradient is below inwarprnnt_numba and warp_transducer cpu:

        T, U, _ = log_probs.shape
        grads = np.full(log_probs.shape, -float("inf"))
        log_like = betas[0, 0]  # == alphas[T - 1, U - 1] + betas[T - 1, U - 1]
    
        # // grad to last blank transition
        grads[T - 1, U - 1, blank] = alphas[T - 1, U - 1]
        grads[: T - 1, :, blank] = alphas[: T - 1, :] + betas[1:, :]
    
        # // grad to label transition
        for u, l in enumerate(labels):
            grads[:, u, l] = alphas[:, u] + betas[:, u + 1]
    
        grads = -np.exp(grads + log_probs - log_like)
    

    that is not same to torchaudio, optimized_transducer and ,warp_transducer gpu, but you said that warp_transducer cpu grad is same to optimized_transducer and torchaudio, how that is achieved?

    opened by zh794390558 9
  • install error

    install error

    1. CUDA_cublas_LIBRARY not found error when compiling ,my cuda version 10.2
    2. /usr/include/c++/7/bits/basic_string.tcc(1067): error: expression must have pointer type detected during: instantiation of "std::basic_string<_CharT, _Traits, _Alloc>::_Rep *std::basic_string<_CharT, _Traits, _Alloc>::_Rep::_S_create(std::basic_string<_CharT, _Traits, _Alloc>::size_type, std::basic_string<_CharT, _Traits, _Alloc>::size_type, const _Alloc &) [with _CharT=char16_t, _Traits=std::char_traits<char16_t>, _Alloc=std::allocator<char16_t>]"

    To Fix the above two problems, I have to use root to modify some settings of the linux system. Is there any better solution?

    opened by zmqwer 0
  • loss value and decode library?

    loss value and decode library?

    thanks very much for your great project! I have two questions to ask: 1. how big is the the transducer loss for a well performed model? or the model is converged? 2. is there any fast decode solution? I found the decode module in many project implementing the beam search decode algorithm is extremely slow

    opened by xiongjun19 10
Releases(v1.4)
Owner
Fangjun Kuang
Was vorbei ist, ist vorbei.
Fangjun Kuang
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Ponder(ing) Transformer Implementation of a Transformer that learns to adapt the number of computational steps it takes depending on the difficulty of

Phil Wang 65 Oct 04, 2022
unet for image segmentation

Implementation of deep learning framework -- Unet, using Keras The architecture was inspired by U-Net: Convolutional Networks for Biomedical Image Seg

zhixuhao 4.1k Dec 31, 2022
Reinforcement learning for self-driving in a 3D simulation

SelfDrive_AI Reinforcement learning for self-driving in a 3D simulation (Created using UNITY-3D) 1. Requirements for the SelfDrive_AI Gym You need Pyt

Surajit Saikia 17 Dec 14, 2021
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
[NeurIPS 2021] PyTorch Code for Accelerating Robotic Reinforcement Learning with Parameterized Action Primitives

Robot Action Primitives (RAPS) This repository is the official implementation of Accelerating Robotic Reinforcement Learning via Parameterized Action

Murtaza Dalal 55 Dec 27, 2022
PyTorch Implementation of Region Similarity Representation Learning (ReSim)

ReSim This repository provides the PyTorch implementation of Region Similarity Representation Learning (ReSim) described in this paper: @Article{xiao2

Tete Xiao 74 Jan 03, 2023
An University Project of Quera Web Crawling.

WebCrawlerProject An University Project of Quera Web Crawling. خزشگر اینستاگرام در این پروژه شما باید با استفاده از کتابخانه های زیر یک خزشگر اینستاگر

Mahdi 3 Aug 12, 2022
Pytorch library for seismic data augmentation

Pytorch library for seismic data augmentation

Artemii Novoselov 27 Nov 22, 2022
StarGAN2 for practice

StarGAN2 for practice This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scie

vadim epstein 87 Sep 24, 2022
Tensorflow/Keras Plug-N-Play Deep Learning Models Compilation

DeepBay This project was created with the objective of compile Machine Learning Architectures created using Tensorflow or Keras. The architectures mus

Whitman Bohorquez 4 Sep 26, 2022
A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

YOLOv4 CrowdHuman Tutorial This is a tutorial demonstrating how to train a YOLOv4 people detector using Darknet and the CrowdHuman dataset. Table of c

JK Jung 118 Nov 10, 2022
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Luke Tonin 195 Dec 17, 2022
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Winnie Xu 95 Nov 26, 2021
DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

DeepConsensus DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS)

Google 149 Dec 19, 2022
Rule based classification A hotel s customers dataset

Rule-based-classification-A-hotel-s-customers-dataset- Aim: Categorize new customers by segment and predict how much revenue they can generate This re

Şebnem 4 Jan 02, 2022
Reverse engineer your pytorch vision models, in style

🔍 Rover Reverse engineer your CNNs, in style Rover will help you break down your CNN and visualize the features from within the model. No need to wri

Mayukh Deb 32 Sep 24, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
End-to-end beat and downbeat tracking in the time domain.

WaveBeat End-to-end beat and downbeat tracking in the time domain. | Paper | Code | Video | Slides | Setup First clone the repo. git clone https://git

Christian J. Steinmetz 60 Dec 24, 2022
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

35 Sep 08, 2021
Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

AgentFormer This repo contains the official implementation of our paper: AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecast

Ye Yuan 161 Dec 23, 2022