DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Overview

Project DeepSpeech

Documentation Task Status

DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io.

For the latest release, including pre-trained models and checkpoints, see the latest release on GitHub.

For contribution guidelines, see CONTRIBUTING.rst.

For contact and support information, see SUPPORT.rst.

Comments
  • Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0,  0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 31 , 12, 2048]

    Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 31 , 12, 2048]

    For support and discussions, please use our Discourse forums.

    If you've found a bug, or have a feature request, then please create an issue with the following information:

    • Have I written custom code (as opposed to running examples on an unmodified clone of the repository): no
    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
    • TensorFlow installed from (our builds, or upstream TensorFlow): pip
    • TensorFlow version (use command below): 1.15
    • Python version: 3.5
    • Bazel version (if compiling from source):
    • GCC/Compiler version (if compiling from source):
    • CUDA/cuDNN version: 10.0
    • GPU model and memory: 4 gtx 1080 Ti
    • Exact command to reproduce:
    [email protected]:~/projects/DeepSpeech$ more .compute_msprompts
    #!/bin/bash
    
    set -xe
    
    #apt-get install -y python3-venv libopus0
    
    #python3 -m venv /tmp/venv
    #source /tmp/venv/bin/activate
    
    #pip install -U setuptools wheel pip
    #pip install .
    #pip uninstall -y tensorflow
    #pip install tensorflow-gpu==1.14
    
    #mkdir -p ../keep/summaries
    
    data="${SHARED_DIR}/data"
    fis="${data}/LDC/fisher"
    swb="${data}/LDC/LDC97S62/swb"
    lbs="${data}/OpenSLR/LibriSpeech/librivox"
    cv="${data}/mozilla/CommonVoice/en_1087h_2019-06-12/clips"
    npr="${data}/NPR/WAMU/sets/v0.3"
    
    python -u DeepSpeech.py \
      --train_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/treino_filtered_alphabet.csv \
      --dev_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/dev_filtered_alphabet.csv \
      --test_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/teste_filtered_alphabet.csv \
      --train_batch_size 12 \
      --dev_batch_size 24 \
      --test_batch_size 24 \
      --scorer ~/projects/corpora/deepspeech-pretrained-ptbr/kenlm.scorer \
      --alphabet_config_path ~/projects/corpora/deepspeech-pretrained-ptbr/alphabet.txt \
      --train_cudnn \
      --n_hidden 2048 \
      --learning_rate 0.0001 \
      --dropout_rate 0.40 \
      --epochs 150 \
      --noearly_stop \
      --audio_sample_rate 8000 \
      --save_checkpoint_dir ~/projects/corpora/deepspeech-fulltrain-ptbr  \
      --use_allow_growth \
      --log_level 0
    

    I'm getting the following error when using my ptbr 8khz dataset to train. Have tried to downgrade and upgrade cuda, cudnn, nvidia-drivers, and ubuntu (16 and 18) and the error persists. I have tried with datasets containing two different characteristics: 6s and 15s in length. Both contain audios in 8khz.

    [email protected]:~/projects/DeepSpeech$ bash .compute_msprompts
    + data=/data
    + fis=/data/LDC/fisher
    + swb=/data/LDC/LDC97S62/swb
    + lbs=/data/OpenSLR/LibriSpeech/librivox
    + cv=/data/mozilla/CommonVoice/en_1087h_2019-06-12/clips
    + npr=/data/NPR/WAMU/sets/v0.3
    + python -u DeepSpeech.py --train_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/treino_filtered_alphabet.csv --dev_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/dev_filtered_alphabet.csv --test_files /home/an
    dre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/teste_filtered_alphabet.csv --train_batch_size 12 --dev_batch_size 24 --test_batch_size 24 --scorer /home/andre/projects/corpora/deepspeech-pretrained-ptbr/kenlm.scorer --alphabet_config_path /home/andre/pro
    jects/corpora/deepspeech-pretrained-ptbr/alphabet.txt --train_cudnn --n_hidden 2048 --learning_rate 0.0001 --dropout_rate 0.40 --epochs 150 --noearly_stop --audio_sample_rate 8000 --save_checkpoint_dir /home/andre/projects/corpora/deepspeech-fulltrain-ptbr --use_allow_g
    rowth --log_level 0
    2020-06-18 12:30:07.508455: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2020-06-18 12:30:07.531012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3597670000 Hz
    2020-06-18 12:30:07.531588: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5178d70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
    2020-06-18 12:30:07.531608: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
    2020-06-18 12:30:07.533960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
    2020-06-18 12:30:09.563468: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5416390 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
    2020-06-18 12:30:09.563492: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.563497: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.563501: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.563505: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
    2020-06-18 12:30:09.570577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:05:00.0
    2020-06-18 12:30:09.571728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:06:00.0
    2020-06-18 12:30:09.572862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:09:00.0
    2020-06-18 12:30:09.573993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:0a:00.0
    2020-06-18 12:30:09.574226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:09.575280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2020-06-18 12:30:09.576167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2020-06-18 12:30:09.576401: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2020-06-18 12:30:09.577541: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2020-06-18 12:30:09.578426: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2020-06-18 12:30:09.581112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-06-18 12:30:09.589736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
    2020-06-18 12:30:09.589770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:09.594742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
    2020-06-18 12:30:09.594757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3
    2020-06-18 12:30:09.594763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y
    2020-06-18 12:30:09.594767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y
    2020-06-18 12:30:09.594770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y
    2020-06-18 12:30:09.594774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N
    2020-06-18 12:30:09.600428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 10478 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
    2020-06-18 12:30:09.602038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
    2020-06-18 12:30:09.603572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
    2020-06-18 12:30:09.605112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:3 with 10481 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1)
    swig/python detected a memory leak of type 'Alphabet *', no destructor found.
    W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAI
    NING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
    2020-06-18 12:30:10.102127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:05:00.0
    2020-06-18 12:30:10.103272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:06:00.0
    2020-06-18 12:30:10.104379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:09:00.0
    2020-06-18 12:30:10.105484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:0a:00.0
    2020-06-18 12:30:10.105521: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:10.105533: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2020-06-18 12:30:10.105562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2020-06-18 12:30:10.105574: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2020-06-18 12:30:10.105586: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2020-06-18 12:30:10.105597: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2020-06-18 12:30:10.105610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-06-18 12:30:10.114060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_types(iterator)`.
    W0618 12:30:10.218584 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_types(iterator)`.
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_shapes(iterator)`.
    W0618 12:30:10.218781 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_shapes(iterator)`.
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_classes(iterator)`.
    W0618 12:30:10.218892 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use `tf.compat.v1.data.get_output_classes(iterator)`.
    WARNING:tensorflow:
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    
    W0618 12:30:10.324707 139639980619584 lazy_loader.py:50]
    The TensorFlow contrib module will not be included in TensorFlow 2.0.
    For more information, please see:
      * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
      * https://github.com/tensorflow/addons
      * https://github.com/tensorflow/io (for I/O related ops)
    If you depend on functionality not listed there, please file an issue.
    
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0618 12:30:10.326326 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0618 12:30:10.326326 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dt
    ype is deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a f
    uture version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    W0618 12:30:10.326584 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype i
    s deprecated and will be removed in a future version.
    Instructions for updating:
    Call initializer instance with the dtype argument instead of passing it to the constructor
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py:246: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    W0618 12:30:10.401312 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py:246: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use tf.where in 2.0, which has the same broadcast rule as np.where
    WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
    Instructions for updating:
    Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
    W0618 12:30:11.297271 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will
    be removed in a future version.
    Instructions for updating:
    Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
    2020-06-18 12:30:11.458650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:05:00.0
    2020-06-18 12:30:11.459790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:06:00.0
    2020-06-18 12:30:11.460897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:09:00.0
    2020-06-18 12:30:11.462003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
    name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
    pciBusID: 0000:0a:00.0
    2020-06-18 12:30:11.462041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
    2020-06-18 12:30:11.462071: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    2020-06-18 12:30:11.462085: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
    2020-06-18 12:30:11.462097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
    2020-06-18 12:30:11.462109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
    2020-06-18 12:30:11.462121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
    2020-06-18 12:30:11.462133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    2020-06-18 12:30:11.470539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
    2020-06-18 12:30:11.470679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
    2020-06-18 12:30:11.470694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3
    2020-06-18 12:30:11.470699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y
    2020-06-18 12:30:11.470703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y
    2020-06-18 12:30:11.470707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y
    2020-06-18 12:30:11.470710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N
    2020-06-18 12:30:11.476196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10478 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute ca
    pability: 6.1)
    2020-06-18 12:30:11.477355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute ca
    pability: 6.1)
    2020-06-18 12:30:11.478490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute ca
    pability: 6.1)
    2020-06-18 12:30:11.479608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10481 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute ca
    pability: 6.1)
    D Session opened.
    I Could not find best validating checkpoint.
    I Could not find most recent checkpoint.
    I Initializing all variables.
    2020-06-18 12:30:12.233482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
    I STARTING Optimization
    Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                             2020-06-18 12:30:14.672316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
    Epoch 0 |   Training | Elapsed Time: 0:00:16 | Steps: 33 | Loss: 18.239303                                                                                                                                                                                                   2
    020-06-18 12:30:30.589204: E tensorflow/stream_executor/dnn.cc:588] CUDNN_STATUS_EXECUTION_FAILED
    in tensorflow/stream_executor/cuda/cuda_dnn.cc(1778): 'cudnnRNNForwardTrainingEx( cudnn.handle(), rnn_desc.handle(), input_desc.data_handle(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.param
    s_handle(), params.opaque(), output_desc.data_handle(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, workspace.opaque(), w
    orkspace.size(), reserve_space.opaque(), reserve_space.size())'
    2020-06-18 12:30:30.589243: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at cudnn_rnn_ops.cc:1517 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_uni
    ts, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
    Traceback (most recent call last):
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
        return fn(*args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
        target_list, run_metadata)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
        run_metadata)
    tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
      (0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[{{node tower_0/cudnn_lstm/CudnnRNNV3_1}}]]
      (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[{{node tower_0/cudnn_lstm/CudnnRNNV3_1}}]]
             [[tower_2/CTCLoss/_147]]
    1 successful operations.
    2 derived errors ignored.
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "DeepSpeech.py", line 12, in <module>
        ds_train.run_script()
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 968, in run_script
        absl.app.run(main)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
        _run_main(main, args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 940, in main
        train()
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 608, in train
        train_loss, _ = run_set('train', epoch, train_init_op)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 568, in run_set
        feed_dict=feed_dict)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
        run_metadata_ptr)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
        feed_dict_tensor, options, run_metadata)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
        run_metadata)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
      (0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[node tower_0/cudnn_lstm/CudnnRNNV3_1 (defined at /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
      (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
             [[node tower_0/cudnn_lstm/CudnnRNNV3_1 (defined at /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
             [[tower_2/CTCLoss/_147]]
    1 successful operations.
    2 derived errors ignored.
    
    Original stack trace for 'tower_0/cudnn_lstm/CudnnRNNV3_1':
      File "DeepSpeech.py", line 12, in <module>
        ds_train.run_script()
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 968, in run_script
        absl.app.run(main)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
        _run_main(main, args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
        sys.exit(main(argv))
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 940, in main
        train()
    
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 487, in train
        gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 313, in get_tower_results
        avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 240, in calculate_mean_edit_distance_and_loss
        logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 191, in create_model
        output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 129, in rnn_impl_cudnn_rnn
        sequence_lengths=seq_length)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/layers/base.py", line 548, in __call__
        outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
        return converted_call(f, options, args, kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
        return _call_unconverted(f, args, kwargs, options)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
        return f(*args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 440, in call
        training)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 518, in _forward
        seed=self._seed)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1132, in _cudnn_rnn
        outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 2051, in cudnn_rnnv3
        time_major=time_major, name=name)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
        op_def=op_def)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
        return func(*args, **kwargs)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
        attrs, op_def, compute_device)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
        op_def=op_def)
      File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
        self._traceback = tf_stack.extract_stack()
    
    upstream-issue 
    opened by andrenatal 155
  • Add support for netstandard and net core in dotnet client

    Add support for netstandard and net core in dotnet client

    Added additional targets for dotnet client to support netstandard2.0, netstandard2.1, netcoreapp3.1.

    Not sure if something else required, probably need to update nuget, let me know if something missing or I need to add something more.

    opened by stepkillah 113
  • Use this model for Urdu language

    Use this model for Urdu language

    I wanted to use this model for urdu language .But I found this in FAQ '' DeepSpeech's requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

    How can I design a neural network for speech transcription for languages like urdu ?

    enhancement Priority: P4 
    opened by MalikMahnoor 79
  • Electron Windows build (electron-builder) is not finding the deepspeech.node binding

    Electron Windows build (electron-builder) is not finding the deepspeech.node binding

    I'm using electron-builder to package my electron app into an installer. It's working great in Mac and Linux, but the Windows version cannot find the deepspeech native binding file.

    I am not sure if this is a bug that would need to be resolved in the DeepSpeech module, or in electron-builder, or in electron itself.

    I could follow up with a small test example to demonstrate the problem.

    Basically, after creating the Windows exe installer (npm run dist from electron-builder), if I find the executable in my file system and run it directly from Git Bash, I can see the error messages in the console, and I receive this:

    electron/js2c/asar.js:140
          if (!isAsar) return old.apply(this, arguments);
                                  ^
    
    Error: The specified module could not be found.
    \\?\C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar.unpacked\node_modules\deepspeech\lib\binding\v0.7.4\win32-x64\electron-v9.0\deepspeech.node
        at process.func [as dlopen] (electron/js2c/asar.js:140:31)
        at Object.Module._extensions..node (internal/modules/cjs/loader.js:1034:18)
        at Object.func [as .node] (electron/js2c/asar.js:149:18)
        at Module.load (internal/modules/cjs/loader.js:815:32)
        at Module._load (internal/modules/cjs/loader.js:727:14)
        at Function.Module._load (electron/js2c/asar.js:769:28)
        at Module.require (internal/modules/cjs/loader.js:852:19)
        at require (internal/modules/cjs/helpers.js:74:18)
        at Object.<anonymous> (C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar\node_modules\deepspeech\index.js:18:17)
        at Module._compile (internal/modules/cjs/loader.js:967:30)
    

    What's weird is this file actually does exist:

    C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar.unpacked\node_modules\deepspeech\lib\binding\v0.7.4\win32-x64\electron-v9.0\deepspeech.node
    

    Maybe it's the junk at the start that causes a problem, I'm not sure.

    \\?
    \C:\
    

    I kind of suspect electron-builder's app.asar package format is probably where the problem lies, and I may file another bug report there too and reference this one.

    bug 
    opened by dsteinman 69
  • No working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION

    No working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION

    Hello,

    when I trie to use the setup.py on from version 0.7.4 it always calls this error:

    No local packages or working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION error: Could not find suitable distribution for Requirement.parse('ds_ctcdecoder==training/deepspeech_training/VERSION')

    With the version 0.7.3 and older it finds the ds_ctcdecoder but always calls that I need numpy in version 1.16, when I install 1.16 it calls me that I need numpy 1.13.3 because of other modules and so on. That's why I think my only chance to use DeepSpeech is with the newest version.

    I'm on windwos 10 with python 3.6.

    Thanks in advance!

    bug help wanted good first bug 
    opened by SirZontax 65
  • Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model

    Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
    • TensorFlow installed from (our builds, or upstream TensorFlow): mozilla tensorflow
    • TensorFlow version (use command below): tensorflow-gpu 1.13
    • Python version: 3.6
    • Bazel version (if compiling from source): 0.19.2
    • GCC/Compiler version (if compiling from source):
    • CUDA/cuDNN version: 10.0
    • GPU model and memory: NVIDIA K80
    • Exact command to reproduce:

    I trained a french model on a small french dataset and when I tried to do inferences using the exported model like this : python3.6 deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t I got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Any suggestions to resolve this please ?

    opened by testdeepv 62
  • The CTC decoder timesteps now corresponds to the timesteps of the most probable CTC path, instead of the earliest timesteps of all possible paths.

    The CTC decoder timesteps now corresponds to the timesteps of the most probable CTC path, instead of the earliest timesteps of all possible paths.

    This follows the issue #3180 .

    I suggest a new way of handling timesteps produced by the CTC decoder. There is no strange heuristic, and I think the logic is clear : when fusing two different paths leading to the same prefix, not only fuse the probabilities (the probabilities are added), but also fuse the timestep sequences (for the last letter in the sequence, choose the timestep from the most probable path).

    The place where two different paths leading to the same prefix are fused are the places where log_sum_exp is called, because this function fuses the probabilities. So, timesteps would now be fused at the same places.

    The other change is that each PathTrie node would now store the full sequence of timesteps. This is because one prefix can be an ancestor of another and their timesteps on a given node can differ. Having the full sequence of timesteps in each node, we have no need to duplicate a node with different timesteps, and it is much simpler like that. Moreover, it makes sense to store the full sequence of timesteps, because the combined probabilities are also stored there. The total probability is not the sum of the probability of each output token, and, in the same way, the correct sequence of timesteps is not the concatenation of the timestep of each output token.

    Since I need to compare the probability of different paths (to keep the timesteps of the most probable one), it is important to compare paths of the same length (eg. paths from the beginning up to the current time). So, exactly the same way as it is done for the probabilities, I need to know the timesteps of the previous time, and store the timesteps of the current time separately.

    In the end, timesteps are handled in a way very similar to the way probabilities are handled.

    Results on an example

    To evaluate the resulting timesteps, I first take the argmax of my logits. In my example, it gives :

    tou_________________________________________________ss  les  _a__mouurreuux  de  se_p_ort__ diiivverrr_  ss'enn__ _r_é___j_uuii__rr__on_t____    aa_v_eecc_   l''aap__p_rroo___chhee_____    de_  ll'hhi__v_e_rr____  et   la   rre__p_rri_ssee  dee  la  c_ouppee ddu  mon_deee      ss__kk_i___      less    ii_mm_a__ggees_   de   _g_ll_i_ss__ssee____________________             ree__t_rrou_vveennt    uunee    __pllaa___cee____      de    _cchhooixx__  d_ans_  lles  _pp_a____ggeess        ssspp_o_r_t_ii_vees_  de  v_o_s_ _jourrnnaauxx  ttéé_l_é___v_ii___ss_é__s__      ddeeu_x_   __é___pprreeuu_vvees___________________          _auu_jjoouurrdd''hhuuii___       _o____nno___rr_o_____d__a___mm________ ___s___a_n______t__a______  _q__a___tt__e___rr_i____n__a_____     __p_r____mmie_r__  s__a___l__o___m___    _g_é____ant__  de   lla  _c_ou_ppee  ddu   m_on___deee____________________________________________________
    

    As the logits are the only input of the decoder, I base my evaluation on them instead of comparing with the audio file directly. It is known that the CTC loss does not guarantee alignment between the audio file and the logits, so the best thing the decoder can do is to fit the logits as best as it can. This is reasonable because, in practice, the logits are aligned quite well with the audio file.

    Then, for each word, I take the part of the logits corresponding to the output timesteps, take the argmax (as said above), and print the corresponding decoded text.

    Finally, I assume that good timesteps should lead to a good match between the word and its corresponding text decoded from the argmax of the logits.

    Before this PR, the result in my example is (text between slashes is from the logits argmax, spaces are trimmed) :

    [WordScoreRange(word=tous /tou/, score=None, ranges=((0, 4),)),   
     WordScoreRange(word=les /les/, score=None, ranges=((55, 59),)),       
     WordScoreRange(word=amoureux /amoureu/, score=None, ranges=((60, 74),)),
     WordScoreRange(word=de /d/, score=None, ranges=((75, 78),)),           
     WordScoreRange(word=sport /seport/, score=None, ranges=((80, 90),)),           
     WordScoreRange(word=divers /diver/, score=None, ranges=((91, 102),)),
     WordScoreRange(word=s'en /s'en/, score=None, ranges=((103, 111),)),  
     WordScoreRange(word=réjouiront /réjuiron/, score=None, ranges=((112, 136),)),
     WordScoreRange(word=avec /avec/, score=None, ranges=((141, 153),)),  
     WordScoreRange(word=l'approche /l'approche/, score=None, ranges=((155, 179),)),
     WordScoreRange(word=de /de/, score=None, ranges=((185, 191),)),  
     WordScoreRange(word=l'hiver /l'hive/, score=None, ranges=((192, 206),)),
     WordScoreRange(word=et /e/, score=None, ranges=((208, 215),)),
     WordScoreRange(word=la /la/, score=None, ranges=((217, 221),)),
     WordScoreRange(word=reprise /reprise/, score=None, ranges=((222, 238),)),
     WordScoreRange(word=de /de/, score=None, ranges=((240, 243),)),
     WordScoreRange(word=la /la/, score=None, ranges=((244, 248),)),
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((249, 257),)),
     WordScoreRange(word=du /d/, score=None, ranges=((258, 261),)),
     WordScoreRange(word=monde /monde/, score=None, ranges=((263, 270),)),
     WordScoreRange(word=de /e/, score=None, ranges=((271, 275),)),
     WordScoreRange(word=ski /ski/, score=None, ranges=((276, 286),)),
     WordScoreRange(word=les /les/, score=None, ranges=((290, 298),)),
     WordScoreRange(word=images /image/, score=None, ranges=((300, 316),)),
     WordScoreRange(word=de /d/, score=None, ranges=((318, 322),)),
     WordScoreRange(word=glisse /glisse/, score=None, ranges=((324, 341),)),
     WordScoreRange(word=retrouvent /retrouvent/, score=None, ranges=((363, 394),)),
     WordScoreRange(word=une /une/, score=None, ranges=((395, 402),)),
     WordScoreRange(word=place /place/, score=None, ranges=((404, 419),)),
     WordScoreRange(word=de /de/, score=None, ranges=((425, 432),)),
     WordScoreRange(word=choix /choix/, score=None, ranges=((433, 445),)),
     WordScoreRange(word=dans /dans/, score=None, ranges=((448, 455),)),
     WordScoreRange(word=les /les/, score=None, ranges=((457, 462),)),
     WordScoreRange(word=pages /pages/, score=None, ranges=((463, 478),)),
     WordScoreRange(word=sportives /sportives/, score=None, ranges=((479, 506),)),
     WordScoreRange(word=de /de/, score=None, ranges=((508, 511),)),
     WordScoreRange(word=vos /vos/, score=None, ranges=((512, 518),)),
     WordScoreRange(word=journaux /journaux/, score=None, ranges=((520, 532),)),
     WordScoreRange(word=télévisés /télévisé/, score=None, ranges=((533, 559),)),
     WordScoreRange(word=deux /deux/, score=None, ranges=((563, 575),)),
     WordScoreRange(word=épreuves /épreuve/, score=None, ranges=((577, 598),)),
     WordScoreRange(word=aujourd'hui /aujourd'hu/, score=None, ranges=((618, 649),)),
     WordScoreRange(word=on /o/, score=None, ranges=((654, 665),)),
     WordScoreRange(word=a /am/, score=None, ranges=((685, 699),)),
     WordScoreRange(word=santa /santa/, score=None, ranges=((703, 726),)),
     WordScoreRange(word=caterina /qaterina/, score=None, ranges=((729, 756),)),
     WordScoreRange(word=premier /prmier/, score=None, ranges=((762, 783),)),
     WordScoreRange(word=salon /salom/, score=None, ranges=((785, 803),)),
     WordScoreRange(word=géant /géant/, score=None, ranges=((805, 818),)),
     WordScoreRange(word=de /d/, score=None, ranges=((819, 823),)),
     WordScoreRange(word=la /l/, score=None, ranges=((825, 829),)),
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((831, 841),)),
     WordScoreRange(word=du /d/, score=None, ranges=((842, 846),)),
     WordScoreRange(word=monde /mon/, score=None, ranges=((848, 857),))]
    

    After this PR, the result in my example is :

    [WordScoreRange(word=tous /tous/, score=None, ranges=((0, 54),)), 
     WordScoreRange(word=les /les/, score=None, ranges=((56, 59),)),        
     WordScoreRange(word=amoureux /amoureux/, score=None, ranges=((62, 75),)),
     WordScoreRange(word=de /de/, score=None, ranges=((77, 79),)),          
     WordScoreRange(word=sport /seport/, score=None, ranges=((81, 91),)),           
     WordScoreRange(word=divers /diver/, score=None, ranges=((92, 103),)),
     WordScoreRange(word=s'en /s'en/, score=None, ranges=((105, 113),)),  
     WordScoreRange(word=réjouiront /réjuiront/, score=None, ranges=((114, 140),)),
     WordScoreRange(word=avec /avec/, score=None, ranges=((145, 155),)),  
     WordScoreRange(word=l'approche /l'approche/, score=None, ranges=((158, 185),)),
     WordScoreRange(word=de /de/, score=None, ranges=((189, 192),)),  
     WordScoreRange(word=l'hiver /l'hiver/, score=None, ranges=((194, 212),)),
     WordScoreRange(word=et /et/, score=None, ranges=((214, 216),)),                                                               
     WordScoreRange(word=la /la/, score=None, ranges=((219, 221),)),
     WordScoreRange(word=reprise /reprise/, score=None, ranges=((224, 239),)),                                                                                                                         
     WordScoreRange(word=de /de/, score=None, ranges=((241, 244),)),            
     WordScoreRange(word=la /la/, score=None, ranges=((246, 248),)),              
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((250, 258),)),
     WordScoreRange(word=du /du/, score=None, ranges=((259, 262),)),            
     WordScoreRange(word=monde /monde/, score=None, ranges=((264, 272),)),
     WordScoreRange(word=de //, score=None, ranges=((273, 275),)),
     WordScoreRange(word=ski /ski/, score=None, ranges=((278, 289),)),
     WordScoreRange(word=les /les/, score=None, ranges=((295, 299),)),
     WordScoreRange(word=images /images/, score=None, ranges=((303, 318),)),
     WordScoreRange(word=de /de/, score=None, ranges=((321, 323),)),
     WordScoreRange(word=glisse /glisse/, score=None, ranges=((327, 362),)),
     WordScoreRange(word=retrouvent /retrouvent/, score=None, ranges=((375, 394),)),
     WordScoreRange(word=une /une/, score=None, ranges=((398, 403),)),
     WordScoreRange(word=place /place/, score=None, ranges=((409, 424),)),
     WordScoreRange(word=de /de/, score=None, ranges=((430, 432),)),
     WordScoreRange(word=choix /choix/, score=None, ranges=((437, 448),)),
     WordScoreRange(word=dans /dans/, score=None, ranges=((450, 456),)),
     WordScoreRange(word=les /les/, score=None, ranges=((458, 462),)),
     WordScoreRange(word=pages /pages/, score=None, ranges=((465, 479),)),
     WordScoreRange(word=sportives /sportives/, score=None, ranges=((487, 507),)),
     WordScoreRange(word=de /de/, score=None, ranges=((509, 511),)),
     WordScoreRange(word=vos /vos/, score=None, ranges=((513, 519),)),
     WordScoreRange(word=journaux /journaux/, score=None, ranges=((521, 533),)),
     WordScoreRange(word=télévisés /télévisés/, score=None, ranges=((535, 562),)),
     WordScoreRange(word=deux /deux/, score=None, ranges=((568, 576),)),
     WordScoreRange(word=épreuves /épreuves/, score=None, ranges=((581, 618),)),
     WordScoreRange(word=aujourd'hui /aujourd'hui/, score=None, ranges=((629, 654),)),
     WordScoreRange(word=on /onoro/, score=None, ranges=((662, 680),)),
     WordScoreRange(word=a /am/, score=None, ranges=((685, 699),)),
     WordScoreRange(word=santa /santa/, score=None, ranges=((703, 726),)),
     WordScoreRange(word=caterina /qaterina/, score=None, ranges=((729, 761),)),
     WordScoreRange(word=premier /prmier/, score=None, ranges=((768, 783),)),
     WordScoreRange(word=salon /salom/, score=None, ranges=((785, 803),)),
     WordScoreRange(word=géant /géant/, score=None, ranges=((808, 820),)),
     WordScoreRange(word=de /de/, score=None, ranges=((822, 824),)),
     WordScoreRange(word=la /l/, score=None, ranges=((826, 829),)),
     WordScoreRange(word=coupe /coupe/, score=None, ranges=((833, 842),)),
     WordScoreRange(word=du /d/, score=None, ranges=((843, 846),)),
     WordScoreRange(word=monde /mond/, score=None, ranges=((850, 858),))]
    

    We can see that before this PR, there are 17 words where timesteps are too early (about one letter shift, it is visible at the end but not at the begining of words because I have trimed spaces). After this PR, the fit is almost prefect. For some reason, there are still 3 remaining errors, all in the 4 last words.

    opened by godefv 55
  • Language model incorrectly drops spaces for out-of-vocabulary words

    Language model incorrectly drops spaces for out-of-vocabulary words

    Mozilla DeepSpeech will sometimes create long runs of text with no spaces:

    omiokaarforfthelastquarterwastoget
    

    This happens even with short audio clips (4 seconds) with a native American english speaker recorded using a high quality microphone in Mac OS X laptops. I've isolated the problem to interaction with the language model rather than the acoustic model or length of audio clips, as the problem goes away when the language model is turned off.

    The problem might be related to encountering out-of-vocabulary terms.

    I’ve put together test files with results that show the issue is related to the language model somehow rather than the length of the audio or the acoustic model.

    I’ve provided 10 chunked WAV files at 16khz 16 bit depth, each 4 seconds long, that are a subset of a fuller 15 minute audio file (I have not provided that full 15 minute file, as a few shorter reproducible chunks are sufficient to reproduce the problem):

    https://www.dropbox.com/sh/3qy65r6wo8ldtvi/AAAAVinsD_kcCi8Bs6l3zOWFa?dl=0

    The audio segments deliberately include occasional out-of-vocabulary terms, mostly technical, such as “OKR”, “EdgeStore”, “CAPE”, etc.

    Also in that folder are several text files that show the output with the standard language model being used, showing the garbled words together (chunks_with_language_model.txt):

    Running inference for chunk 1
    so were trying again a maybeialstart this time
    
    Running inference for chunk 2
    omiokaarforfthelastquarterwastoget
    
    Running inference for chunk 3
    to car to state deloedmarchinstrumnalha
    
    Running inference for chunk 4
    a tonproductcaseregaugesomd produce sidnelfromthat
    
    Running inference for chunk 5
    i am a to do that you know 
    
    Running inference for chunk 6
    we finish the kepehandlerrwend finished backfileprocessing 
    
    Running inference for chunk 7
    and is he teckdatthatwewould need to do to split the cape 
    
    Running inference for chunk 8
    out from sir handler and i are on new 
    
    Running inference for chunk 9
    he is not monolithic am andthanducotingswrat 
    
    Running inference for chunk 10
    relizationutenpling paws on that until it its a product signal
    

    Then, I’ve provided similar output with the language model turned off (chunks_without_language_model.txt):

    Running inference for chunk 1
    so we're tryng again ah maybe alstart this time
    
    Running inference for chunk 2
    omiokaar forf the last quarter was to get
    
    Running inference for chunk 3
    oto car to state deloed march in strumn alha
    
    Running inference for chunk 4
    um ton product  caser egauges somd produc sidnel from that
    
    Running inference for chunk 5
    am ah to do that ou nowith
    
    Running inference for chunk 6
    we finishd the kepe handlerr wend finished backfile processinga
    
    Running inference for chunk 7
    on es eteckdat that we would need to do to split the kae ha
    
    Running inference for chunk 8
    rout frome sir hanler and ik ar on newh
    
    Running inference for chunk 9
    ch las not monoliic am andthan ducotings wrat 
    
    Running inference for chunk 10
    relization u en pling a pas on that until it its a product signal
    

    I’ve included both these files in the shared Dropbox folder link above.

    Here’s what the correct transcript should be, manually done (chunks_correct_manual_transcription.txt):

    So, we're trying again, maybe I'll start this time.
    
    So my OKR for the last quarter was to get AutoOCR to a state that we could
    launch an external alpha, and product could sort of gauge some product signal
    from that. To do that we finished the CAPE handler, we finished backfill 
    processing, we have some tech debt that we would need to do to split the CAPE 
    handler out from the search handler and make our own new handler so its not
    monolithic, and do some things around CAPE utilization. We are kind of putting
    a pause on that until we get some product signal.
    

    This shows the language model is the source of this problem; I’ve seen anecdotal reports from the official message base and blog posts that this is a wide spread problem. Perhaps when the language model hits an unknown n-gram, it ends up combining all of them together rather than retaining the space between them.

    Discussion around this bug started on the standard DeepSpeech discussion forum: https://discourse.mozilla.org/t/text-produced-has-long-strings-of-words-with-no-spaces/24089/13 https://discourse.mozilla.org/t/longer-audio-files-with-deep-speech/22784/3

    • Have I written custom code (as opposed to running examples on an unmodified clone of the repository):

    The standard client.py was slightly modified to segment the longer 15 minute audio clip into 4 second blocks.

    • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

    Mac OS X 10.12.6 (16G1036)

    • TensorFlow installed from (our builds, or upstream TensorFlow):

    Both Mozilla DeepSpeech and TensorFlow were installed into a virtualenv setup via the following requirements.txt file:

    tensorflow==1.4.0
    deepspeech==0.1.0
    numpy==1.13.3
    scipy==0.19.1
    webrtcvad==2.0.10
    
    • TensorFlow version (use command below):
    ('v1.4.0-rc1-11-g130a514', '1.4.0')
    
    • Python version:
    Python 2.7.13
    
    • Bazel version (if compiling from source):

    Did not compile from source.

    • GCC/Compiler version (if compiling from source):

    Same

    • CUDA/cuDNN version:

    Used CPU only version

    • GPU model and memory:

    Used CPU only version

    • Exact command to reproduce:

    I haven't provided my full modified client.py that segments longer audio, but to run with a language model using the standard deepspeech command against a known 4 seconds audio clip included in the Dropbox folder shared above you can run the following:

    # Set $DEEPSPEECH to where full Deep Speech checkout is; note that my own git checkout
    # for the `deepspeech` runner is at git sha fef25e9ea6b0b6d96dceb610f96a40f2757e05e4
    deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt $DEEPSPEECH/models/lm.binary $DEEPSPEECH/models/trie
    
    # Similar command to run without language model -- spaces retained for unknown words:
    deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt 
    

    This is clearly a bug and not a feature :)

    opened by BradNeuberg 54
  • Adapting engine to any Custom Language

    Adapting engine to any Custom Language

    I was wondering what kinds of modifications would be needed to use this engine for languages other than English (other than a new language model and a new words.txt file) ? In particular, I was interested if it could be used with a Cyrillic script because of this: "data in the transcripts must match the [a-z ]+ regex", and if yes how hard would it be to adapt it. I think I could circumvent this problem by creating a translator that can translate the text from a Cyrillic script to [a-z ]+ format, but would be preferable if it could use a Cyrillic script directly.

    Thanks in advance

    question 
    opened by istojan 54
  • Support for Windows

    Support for Windows

    I'm still editing the docs, preparing for CUDA, and finishing the C# examples.

    IMPORTANT NOTE: Did not try to train on Windows yet, my initial goal is to enable inference with the clients on Windows.
    Thanks to @reuben and @lissyx, they helped me a lot.

    Fixes #1123

    Epic 
    opened by carlfm01 51
  • Generate trie lm::FormatLoadException

    Generate trie lm::FormatLoadException

    I'm following this tutorial : https://discourse.mozilla.org/t/tutorial-how-i-trained-a-specific-french-model-to-control-my-robot/22830 to create a French model.

    The problem is when generating the trie file with this command :

    ./generate_trie data/cassia/alphabet.txt data/cassia/lm.binary data/cassia/vocabulary.txt data/cassia/trie

    I have this output :

    terminate called after throwing an instance of 'lm::FormatLoadException' what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException. The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers Abandon (core dumped)

    I tried several times to generate my lm.binary with kenlm (./build_binary -T -s words.arpa lm.binary) but still the same error.

    opened by yoann1995 49
Releases(v0.10.0-alpha.3)
Owner
Mozilla
This technology could fall into the right hands.
Mozilla
Python library for handling audio datasets.

AUDIOMATE Audiomate is a library for easy access to audio datasets. It provides the datastructures for accessing/loading different datasets in a gener

Matthias 121 Nov 27, 2022
LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

LibXtract LibXtract is a simple, portable, lightweight library of audio feature extraction functions. The purpose of the library is to provide a relat

Jamie Bullock 215 Nov 16, 2022
GNOME powered sound conversion

SoundConverter A simple sound converter application for the GNOME environment. It reads anything the GStreamer library can read, and writes Ogg Vorbis

Gautier Portet 188 Dec 17, 2022
Mentos Music Bot With Python

Mentos Music Bot For Any Query Join Our Support Group 👥 Special Thanks - @OfficialYukki Hey Welcome To Here 💫 💫 You Can Make Your Own Music Bot Fo

Cyber Toxic 13 Oct 21, 2022
Speech recognition module for Python, supporting several engines and APIs, online and offline.

SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/

Anthony Zhang 6.7k Jan 08, 2023
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

A Python library for audio feature extraction, classification, segmentation and applications This doc contains general info. Click here for the comple

Theodoros Giannakopoulos 5.1k Jan 02, 2023
commonfate 📦commonfate 📦 - Common Fate Model and Transform.

Common Fate Transform and Model for Python This package is a python implementation of the Common Fate Transform and Model to be used for audio source

Fabian-Robert Stöter 18 Jan 08, 2022
Praat in Python, the Pythonic way

Parselmouth - Praat in Python, the Pythonic way Parselmouth is a Python library for the Praat software. Though other attempts have been made at portin

Yannick Jadoul 786 Jan 09, 2023
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
GNU Radio – the Free and Open Software Radio Ecosystem

GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios. It can be used wit

GNU Radio 4.1k Jan 06, 2023
A voice assistant which can handle your everyday task and allows you to book items from your favourite store!

Voicely Table of Contents About The Project Built With Getting Started Prerequisites Installation Usage Roadmap Contributing License Contact Acknowled

Awantika Nigam 2 Nov 17, 2021
Reading list for research topics in sound event detection

Sound event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding sound events present at the auditory scene.

Soham 64 Jan 05, 2023
📺Headless全自动B站直播录播、切片、上传一体工具

DDRecorder Headless全自动B站直播录播、切片、上传一体工具 感谢 FortuneDayssss/BilibiliUploader 安装指南(Windows) 在Release下载zip包解压。 修改配置文件config.json 双击运行DDRecorder.exe (这将使用co

322 Dec 27, 2022
無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのコア

Hiroshiba 0 Aug 29, 2022
Gammatone-based spectrograms, using gammatone filterbanks or Fourier transform weightings.

Gammatone Filterbank Toolkit Utilities for analysing sound using perceptual models of human hearing. Jason Heeris, 2013 Summary This is a port of Malc

Jason Heeris 188 Dec 14, 2022
Improved Python UI to convert Youtube URL to .mp3 file.

YT-MP3 Improved Python UI to convert Youtube URL to .mp3 file. How to use? Just run python3 main.py Enter the URL of the video Enter the PATH of where

8 Jun 19, 2022
Accompanying code for our paper "Point Cloud Audio Processing"

Point Cloud Audio Processing Krishna Subramani1, Paris Smaragdis1 1UIUC Paper For the necessary libraries/prerequisites, please use conda/anaconda to

Krishna Subramani 17 Nov 17, 2022
Audio2midi - Automatic Audio-to-symbolic Arrangement

Automatic Audio-to-symbolic Arrangement This is the repository of the project "Audio-to-symbolic Arrangement via Cross-modal Music Representation Lear

Ziyu Wang 24 Dec 05, 2022
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
Audio fingerprinting and recognition in Python

dejavu Audio fingerprinting and recognition algorithm implemented in Python, see the explanation here: How it works Dejavu can memorize audio by liste

Will Drevo 6k Jan 06, 2023