DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Last update: Jan 03, 2023

Overview

Project DeepSpeech

DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.

Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io.

For the latest release, including pre-trained models and checkpoints, see the latest release on GitHub.

For contribution guidelines, see CONTRIBUTING.rst.

For contact and support information, see SUPPORT.rst.

Comments

Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 31 , 12, 2048]

For support and discussions, please use our Discourse forums.

If you've found a bug, or have a feature request, then please create an issue with the following information:

Have I written custom code (as opposed to running examples on an unmodified clone of the repository): no
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
TensorFlow installed from (our builds, or upstream TensorFlow): pip
TensorFlow version (use command below): 1.15
Python version: 3.5
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: 10.0
GPU model and memory: 4 gtx 1080 Ti
Exact command to reproduce:

[email protected]:~/projects/DeepSpeech$ more .compute_msprompts
#!/bin/bash

set -xe

#apt-get install -y python3-venv libopus0

#python3 -m venv /tmp/venv
#source /tmp/venv/bin/activate

#pip install -U setuptools wheel pip
#pip install .
#pip uninstall -y tensorflow
#pip install tensorflow-gpu==1.14

#mkdir -p ../keep/summaries

data="${SHARED_DIR}/data"
fis="${data}/LDC/fisher"
swb="${data}/LDC/LDC97S62/swb"
lbs="${data}/OpenSLR/LibriSpeech/librivox"
cv="${data}/mozilla/CommonVoice/en_1087h_2019-06-12/clips"
npr="${data}/NPR/WAMU/sets/v0.3"

python -u DeepSpeech.py \
  --train_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/treino_filtered_alphabet.csv \
  --dev_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/dev_filtered_alphabet.csv \
  --test_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/teste_filtered_alphabet.csv \
  --train_batch_size 12 \
  --dev_batch_size 24 \
  --test_batch_size 24 \
  --scorer ~/projects/corpora/deepspeech-pretrained-ptbr/kenlm.scorer \
  --alphabet_config_path ~/projects/corpora/deepspeech-pretrained-ptbr/alphabet.txt \
  --train_cudnn \
  --n_hidden 2048 \
  --learning_rate 0.0001 \
  --dropout_rate 0.40 \
  --epochs 150 \
  --noearly_stop \
  --audio_sample_rate 8000 \
  --save_checkpoint_dir ~/projects/corpora/deepspeech-fulltrain-ptbr  \
  --use_allow_growth \
  --log_level 0

I'm getting the following error when using my ptbr 8khz dataset to train. Have tried to downgrade and upgrade cuda, cudnn, nvidia-drivers, and ubuntu (16 and 18) and the error persists. I have tried with datasets containing two different characteristics: 6s and 15s in length. Both contain audios in 8khz.

[email protected]:~/projects/DeepSpeech$ bash .compute_msprompts
+ data=/data
+ fis=/data/LDC/fisher
+ swb=/data/LDC/LDC97S62/swb
+ lbs=/data/OpenSLR/LibriSpeech/librivox
+ cv=/data/mozilla/CommonVoice/en_1087h_2019-06-12/clips
+ npr=/data/NPR/WAMU/sets/v0.3
+ python -u DeepSpeech.py --train_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/treino_filtered_alphabet.csv --dev_files /home/andre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/dev_filtered_alphabet.csv --test_files /home/an
dre/projects/corpora/20200404084521_msprompts_90_6s/deepspeech/teste_filtered_alphabet.csv --train_batch_size 12 --dev_batch_size 24 --test_batch_size 24 --scorer /home/andre/projects/corpora/deepspeech-pretrained-ptbr/kenlm.scorer --alphabet_config_path /home/andre/pro
jects/corpora/deepspeech-pretrained-ptbr/alphabet.txt --train_cudnn --n_hidden 2048 --learning_rate 0.0001 --dropout_rate 0.40 --epochs 150 --noearly_stop --audio_sample_rate 8000 --save_checkpoint_dir /home/andre/projects/corpora/deepspeech-fulltrain-ptbr --use_allow_g
rowth --log_level 0
2020-06-18 12:30:07.508455: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-18 12:30:07.531012: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3597670000 Hz
2020-06-18 12:30:07.531588: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5178d70 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-18 12:30:07.531608: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-18 12:30:07.533960: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-18 12:30:09.563468: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5416390 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-18 12:30:09.563492: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-06-18 12:30:09.563497: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-06-18 12:30:09.563501: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-06-18 12:30:09.563505: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
2020-06-18 12:30:09.570577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:05:00.0
2020-06-18 12:30:09.571728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:06:00.0
2020-06-18 12:30:09.572862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:09:00.0
2020-06-18 12:30:09.573993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0a:00.0
2020-06-18 12:30:09.574226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-18 12:30:09.575280: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-06-18 12:30:09.576167: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-06-18 12:30:09.576401: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-06-18 12:30:09.577541: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-06-18 12:30:09.578426: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-06-18 12:30:09.581112: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-18 12:30:09.589736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
2020-06-18 12:30:09.589770: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-18 12:30:09.594742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-18 12:30:09.594757: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3
2020-06-18 12:30:09.594763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y
2020-06-18 12:30:09.594767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y
2020-06-18 12:30:09.594770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y
2020-06-18 12:30:09.594774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N
2020-06-18 12:30:09.600428: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 10478 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
2020-06-18 12:30:09.602038: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
2020-06-18 12:30:09.603572: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
2020-06-18 12:30:09.605112: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:3 with 10481 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute capability: 6.1)
swig/python detected a memory leak of type 'Alphabet *', no destructor found.
W WARNING: You specified different values for --load_checkpoint_dir and --save_checkpoint_dir, but you are running training and testing in a single invocation. The testing step will respect --load_checkpoint_dir, and thus WILL NOT TEST THE CHECKPOINT CREATED BY THE TRAI
NING STEP. Train and test in two separate invocations, specifying the correct --load_checkpoint_dir in both cases, or use the same location for loading and saving.
2020-06-18 12:30:10.102127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:05:00.0
2020-06-18 12:30:10.103272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:06:00.0
2020-06-18 12:30:10.104379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:09:00.0
2020-06-18 12:30:10.105484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0a:00.0
2020-06-18 12:30:10.105521: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-18 12:30:10.105533: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-06-18 12:30:10.105562: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-06-18 12:30:10.105574: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-06-18 12:30:10.105586: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-06-18 12:30:10.105597: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-06-18 12:30:10.105610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-18 12:30:10.114060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
W0618 12:30:10.218584 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
W0618 12:30:10.218781 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
W0618 12:30:10.218892 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0618 12:30:10.324707 139639980619584 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0618 12:30:10.326326 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0618 12:30:10.326326 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dt
ype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a f
uture version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0618 12:30:10.326584 139639980619584 deprecation.py:506] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype i
s deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py:246: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0618 12:30:10.401312 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py:246: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0618 12:30:11.297271 139639980619584 deprecation.py:323] From /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will
be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2020-06-18 12:30:11.458650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:05:00.0
2020-06-18 12:30:11.459790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:06:00.0
2020-06-18 12:30:11.460897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:09:00.0
2020-06-18 12:30:11.462003: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:0a:00.0
2020-06-18 12:30:11.462041: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-06-18 12:30:11.462071: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-06-18 12:30:11.462085: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-06-18 12:30:11.462097: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-06-18 12:30:11.462109: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-06-18 12:30:11.462121: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-06-18 12:30:11.462133: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-18 12:30:11.470539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
2020-06-18 12:30:11.470679: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-18 12:30:11.470694: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3
2020-06-18 12:30:11.470699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y
2020-06-18 12:30:11.470703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y
2020-06-18 12:30:11.470707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y
2020-06-18 12:30:11.470710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N
2020-06-18 12:30:11.476196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10478 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:05:00.0, compute ca
pability: 6.1)
2020-06-18 12:30:11.477355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10481 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:06:00.0, compute ca
pability: 6.1)
2020-06-18 12:30:11.478490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10481 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute ca
pability: 6.1)
2020-06-18 12:30:11.479608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10481 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:0a:00.0, compute ca
pability: 6.1)
D Session opened.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
2020-06-18 12:30:12.233482: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                             2020-06-18 12:30:14.672316: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
Epoch 0 |   Training | Elapsed Time: 0:00:16 | Steps: 33 | Loss: 18.239303                                                                                                                                                                                                   2
020-06-18 12:30:30.589204: E tensorflow/stream_executor/dnn.cc:588] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1778): 'cudnnRNNForwardTrainingEx( cudnn.handle(), rnn_desc.handle(), input_desc.data_handle(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.param
s_handle(), params.opaque(), output_desc.data_handle(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, workspace.opaque(), w
orkspace.size(), reserve_space.opaque(), reserve_space.size())'
2020-06-18 12:30:30.589243: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at cudnn_rnn_ops.cc:1517 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_uni
ts, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
Traceback (most recent call last):
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
         [[{{node tower_0/cudnn_lstm/CudnnRNNV3_1}}]]
  (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
         [[{{node tower_0/cudnn_lstm/CudnnRNNV3_1}}]]
         [[tower_2/CTCLoss/_147]]
1 successful operations.
2 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 968, in run_script
    absl.app.run(main)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 940, in main
    train()
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 608, in train
    train_loss, _ = run_set('train', epoch, train_init_op)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 568, in run_set
    feed_dict=feed_dict)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
         [[node tower_0/cudnn_lstm/CudnnRNNV3_1 (defined at /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 2048, 2048, 1, 63, 12, 2048]
         [[node tower_0/cudnn_lstm/CudnnRNNV3_1 (defined at /home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[tower_2/CTCLoss/_147]]
1 successful operations.
2 derived errors ignored.

Original stack trace for 'tower_0/cudnn_lstm/CudnnRNNV3_1':
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 968, in run_script
    absl.app.run(main)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 940, in main
    train()

  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 487, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 313, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 240, in calculate_mean_edit_distance_and_loss
    logits, _ = create_model(batch_x, batch_seq_len, dropout, reuse=reuse, rnn_impl=rnn_impl)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 191, in create_model
    output, output_state = rnn_impl(layer_3, seq_length, previous_state, reuse)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/deepspeech_training/train.py", line 129, in rnn_impl_cudnn_rnn
    sequence_lengths=seq_length)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/layers/base.py", line 548, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 854, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 234, in wrapper
    return converted_call(f, options, args, kwargs)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 439, in converted_call
    return _call_unconverted(f, args, kwargs, options)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/autograph/impl/api.py", line 330, in _call_unconverted
    return f(*args, **kwargs)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 440, in call
    training)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py", line 518, in _forward
    seed=self._seed)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1132, in _cudnn_rnn
    outputs, output_h, output_c, _, _ = gen_cudnn_rnn_ops.cudnn_rnnv3(**args)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_cudnn_rnn_ops.py", line 2051, in cudnn_rnnv3
    time_major=time_major, name=name)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/andre/projects/DeepSpeech/venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

upstream-issue

opened by andrenatal 155

Add support for netstandard and net core in dotnet client

Added additional targets for dotnet client to support netstandard2.0, netstandard2.1, netcoreapp3.1.

Not sure if something else required, probably need to update nuget, let me know if something missing or I need to add something more.

opened by stepkillah 113
Use this model for Urdu language

I wanted to use this model for urdu language .But I found this in FAQ '' DeepSpeech's requirements for the data is that the transcripts match the [a-z ]+ regex, and that the audio is stored WAV (PCM) files. ''

How can I design a neural network for speech transcription for languages like urdu ?
enhancement Priority: P4

opened by MalikMahnoor 79

Electron Windows build (electron-builder) is not finding the deepspeech.node binding

I'm using electron-builder to package my electron app into an installer. It's working great in Mac and Linux, but the Windows version cannot find the deepspeech native binding file.

I am not sure if this is a bug that would need to be resolved in the DeepSpeech module, or in electron-builder, or in electron itself.

I could follow up with a small test example to demonstrate the problem.

Basically, after creating the Windows exe installer (npm run dist from electron-builder), if I find the executable in my file system and run it directly from Git Bash, I can see the error messages in the console, and I receive this:

electron/js2c/asar.js:140
      if (!isAsar) return old.apply(this, arguments);
                              ^

Error: The specified module could not be found.
\\?\C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar.unpacked\node_modules\deepspeech\lib\binding\v0.7.4\win32-x64\electron-v9.0\deepspeech.node
    at process.func [as dlopen] (electron/js2c/asar.js:140:31)
    at Object.Module._extensions..node (internal/modules/cjs/loader.js:1034:18)
    at Object.func [as .node] (electron/js2c/asar.js:149:18)
    at Module.load (internal/modules/cjs/loader.js:815:32)
    at Module._load (internal/modules/cjs/loader.js:727:14)
    at Function.Module._load (electron/js2c/asar.js:769:28)
    at Module.require (internal/modules/cjs/loader.js:852:19)
    at require (internal/modules/cjs/helpers.js:74:18)
    at Object.<anonymous> (C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar\node_modules\deepspeech\index.js:18:17)
    at Module._compile (internal/modules/cjs/loader.js:967:30)

What's weird is this file actually does exist:

C:\Users\Dan\AppData\Local\Programs\mytestapp\resources\app.asar.unpacked\node_modules\deepspeech\lib\binding\v0.7.4\win32-x64\electron-v9.0\deepspeech.node

Maybe it's the junk at the start that causes a problem, I'm not sure.

\\?
\C:\

I kind of suspect electron-builder's app.asar package format is probably where the problem lies, and I may file another bug report there too and reference this one.

bug

opened by dsteinman 69

No working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION

Hello,

when I trie to use the setup.py on from version 0.7.4 it always calls this error:

No local packages or working download links found for ds_ctcdecoder==training/deepspeech_training/VERSION error: Could not find suitable distribution for Requirement.parse('ds_ctcdecoder==training/deepspeech_training/VERSION')

With the version 0.7.3 and older it finds the ds_ctcdecoder but always calls that I need numpy in version 1.16, when I install 1.16 it calls me that I need numpy 1.13.3 because of other modules and so on. That's why I think my only chance to use DeepSpeech is with the newest version.

I'm on windwos 10 with python 3.6.

Thanks in advance!
bug help wanted good first bug

opened by SirZontax 65
$Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model$
Error Non-UTF-8 code starting with '\x83' in file deepspeech on line 2 when doing inferences after training a french model
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04

TensorFlow installed from (our builds, or upstream TensorFlow): mozilla tensorflow

TensorFlow version (use command below): tensorflow-gpu 1.13

Python version: 3.6

Bazel version (if compiling from source): 0.19.2

GCC/Compiler version (if compiling from source):

CUDA/cuDNN version: 10.0

GPU model and memory: NVIDIA K80

Exact command to reproduce:

I trained a french model on a small french dataset and when I tried to do inferences using the exported model like this : python3.6 deepspeech --model ~/results/model_export/output_graph.pb --alphabet ~/Deepspeech/data/alphabet.txt --lm ~/DeepSpeech/data/lm/lm.binary --trie ~/DeepSpeech/data/lm/trie --audio test.wav -t I got this error : SyntaxError: Non-UTF-8 code starting with '\x83' in file deepspeech on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details Any suggestions to resolve this please ?
opened by testdeepv 62

The CTC decoder timesteps now corresponds to the timesteps of the most probable CTC path, instead of the earliest timesteps of all possible paths.

This follows the issue #3180 .

I suggest a new way of handling timesteps produced by the CTC decoder. There is no strange heuristic, and I think the logic is clear : when fusing two different paths leading to the same prefix, not only fuse the probabilities (the probabilities are added), but also fuse the timestep sequences (for the last letter in the sequence, choose the timestep from the most probable path).

The place where two different paths leading to the same prefix are fused are the places where log_sum_exp is called, because this function fuses the probabilities. So, timesteps would now be fused at the same places.

The other change is that each PathTrie node would now store the full sequence of timesteps. This is because one prefix can be an ancestor of another and their timesteps on a given node can differ. Having the full sequence of timesteps in each node, we have no need to duplicate a node with different timesteps, and it is much simpler like that. Moreover, it makes sense to store the full sequence of timesteps, because the combined probabilities are also stored there. The total probability is not the sum of the probability of each output token, and, in the same way, the correct sequence of timesteps is not the concatenation of the timestep of each output token.

Since I need to compare the probability of different paths (to keep the timesteps of the most probable one), it is important to compare paths of the same length (eg. paths from the beginning up to the current time). So, exactly the same way as it is done for the probabilities, I need to know the timesteps of the previous time, and store the timesteps of the current time separately.

In the end, timesteps are handled in a way very similar to the way probabilities are handled.

Results on an example

To evaluate the resulting timesteps, I first take the argmax of my logits. In my example, it gives :

tou_________________________________________________ss  les  _a__mouurreuux  de  se_p_ort__ diiivverrr_  ss'enn__ _r_é___j_uuii__rr__on_t____    aa_v_eecc_   l''aap__p_rroo___chhee_____    de_  ll'hhi__v_e_rr____  et   la   rre__p_rri_ssee  dee  la  c_ouppee ddu  mon_deee      ss__kk_i___      less    ii_mm_a__ggees_   de   _g_ll_i_ss__ssee____________________             ree__t_rrou_vveennt    uunee    __pllaa___cee____      de    _cchhooixx__  d_ans_  lles  _pp_a____ggeess        ssspp_o_r_t_ii_vees_  de  v_o_s_ _jourrnnaauxx  ttéé_l_é___v_ii___ss_é__s__      ddeeu_x_   __é___pprreeuu_vvees___________________          _auu_jjoouurrdd''hhuuii___       _o____nno___rr_o_____d__a___mm________ ___s___a_n______t__a______  _q__a___tt__e___rr_i____n__a_____     __p_r____mmie_r__  s__a___l__o___m___    _g_é____ant__  de   lla  _c_ou_ppee  ddu   m_on___deee____________________________________________________

As the logits are the only input of the decoder, I base my evaluation on them instead of comparing with the audio file directly. It is known that the CTC loss does not guarantee alignment between the audio file and the logits, so the best thing the decoder can do is to fit the logits as best as it can. This is reasonable because, in practice, the logits are aligned quite well with the audio file.

Then, for each word, I take the part of the logits corresponding to the output timesteps, take the argmax (as said above), and print the corresponding decoded text.

Finally, I assume that good timesteps should lead to a good match between the word and its corresponding text decoded from the argmax of the logits.

Before this PR, the result in my example is (text between slashes is from the logits argmax, spaces are trimmed) :

[WordScoreRange(word=tous /tou/, score=None, ranges=((0, 4),)),   
 WordScoreRange(word=les /les/, score=None, ranges=((55, 59),)),       
 WordScoreRange(word=amoureux /amoureu/, score=None, ranges=((60, 74),)),
 WordScoreRange(word=de /d/, score=None, ranges=((75, 78),)),           
 WordScoreRange(word=sport /seport/, score=None, ranges=((80, 90),)),           
 WordScoreRange(word=divers /diver/, score=None, ranges=((91, 102),)),
 WordScoreRange(word=s'en /s'en/, score=None, ranges=((103, 111),)),  
 WordScoreRange(word=réjouiront /réjuiron/, score=None, ranges=((112, 136),)),
 WordScoreRange(word=avec /avec/, score=None, ranges=((141, 153),)),  
 WordScoreRange(word=l'approche /l'approche/, score=None, ranges=((155, 179),)),
 WordScoreRange(word=de /de/, score=None, ranges=((185, 191),)),  
 WordScoreRange(word=l'hiver /l'hive/, score=None, ranges=((192, 206),)),
 WordScoreRange(word=et /e/, score=None, ranges=((208, 215),)),
 WordScoreRange(word=la /la/, score=None, ranges=((217, 221),)),
 WordScoreRange(word=reprise /reprise/, score=None, ranges=((222, 238),)),
 WordScoreRange(word=de /de/, score=None, ranges=((240, 243),)),
 WordScoreRange(word=la /la/, score=None, ranges=((244, 248),)),
 WordScoreRange(word=coupe /coupe/, score=None, ranges=((249, 257),)),
 WordScoreRange(word=du /d/, score=None, ranges=((258, 261),)),
 WordScoreRange(word=monde /monde/, score=None, ranges=((263, 270),)),
 WordScoreRange(word=de /e/, score=None, ranges=((271, 275),)),
 WordScoreRange(word=ski /ski/, score=None, ranges=((276, 286),)),
 WordScoreRange(word=les /les/, score=None, ranges=((290, 298),)),
 WordScoreRange(word=images /image/, score=None, ranges=((300, 316),)),
 WordScoreRange(word=de /d/, score=None, ranges=((318, 322),)),
 WordScoreRange(word=glisse /glisse/, score=None, ranges=((324, 341),)),
 WordScoreRange(word=retrouvent /retrouvent/, score=None, ranges=((363, 394),)),
 WordScoreRange(word=une /une/, score=None, ranges=((395, 402),)),
 WordScoreRange(word=place /place/, score=None, ranges=((404, 419),)),
 WordScoreRange(word=de /de/, score=None, ranges=((425, 432),)),
 WordScoreRange(word=choix /choix/, score=None, ranges=((433, 445),)),
 WordScoreRange(word=dans /dans/, score=None, ranges=((448, 455),)),
 WordScoreRange(word=les /les/, score=None, ranges=((457, 462),)),
 WordScoreRange(word=pages /pages/, score=None, ranges=((463, 478),)),
 WordScoreRange(word=sportives /sportives/, score=None, ranges=((479, 506),)),
 WordScoreRange(word=de /de/, score=None, ranges=((508, 511),)),
 WordScoreRange(word=vos /vos/, score=None, ranges=((512, 518),)),
 WordScoreRange(word=journaux /journaux/, score=None, ranges=((520, 532),)),
 WordScoreRange(word=télévisés /télévisé/, score=None, ranges=((533, 559),)),
 WordScoreRange(word=deux /deux/, score=None, ranges=((563, 575),)),
 WordScoreRange(word=épreuves /épreuve/, score=None, ranges=((577, 598),)),
 WordScoreRange(word=aujourd'hui /aujourd'hu/, score=None, ranges=((618, 649),)),
 WordScoreRange(word=on /o/, score=None, ranges=((654, 665),)),
 WordScoreRange(word=a /am/, score=None, ranges=((685, 699),)),
 WordScoreRange(word=santa /santa/, score=None, ranges=((703, 726),)),
 WordScoreRange(word=caterina /qaterina/, score=None, ranges=((729, 756),)),
 WordScoreRange(word=premier /prmier/, score=None, ranges=((762, 783),)),
 WordScoreRange(word=salon /salom/, score=None, ranges=((785, 803),)),
 WordScoreRange(word=géant /géant/, score=None, ranges=((805, 818),)),
 WordScoreRange(word=de /d/, score=None, ranges=((819, 823),)),
 WordScoreRange(word=la /l/, score=None, ranges=((825, 829),)),
 WordScoreRange(word=coupe /coupe/, score=None, ranges=((831, 841),)),
 WordScoreRange(word=du /d/, score=None, ranges=((842, 846),)),
 WordScoreRange(word=monde /mon/, score=None, ranges=((848, 857),))]

After this PR, the result in my example is :

[WordScoreRange(word=tous /tous/, score=None, ranges=((0, 54),)), 
 WordScoreRange(word=les /les/, score=None, ranges=((56, 59),)),        
 WordScoreRange(word=amoureux /amoureux/, score=None, ranges=((62, 75),)),
 WordScoreRange(word=de /de/, score=None, ranges=((77, 79),)),          
 WordScoreRange(word=sport /seport/, score=None, ranges=((81, 91),)),           
 WordScoreRange(word=divers /diver/, score=None, ranges=((92, 103),)),
 WordScoreRange(word=s'en /s'en/, score=None, ranges=((105, 113),)),  
 WordScoreRange(word=réjouiront /réjuiront/, score=None, ranges=((114, 140),)),
 WordScoreRange(word=avec /avec/, score=None, ranges=((145, 155),)),  
 WordScoreRange(word=l'approche /l'approche/, score=None, ranges=((158, 185),)),
 WordScoreRange(word=de /de/, score=None, ranges=((189, 192),)),  
 WordScoreRange(word=l'hiver /l'hiver/, score=None, ranges=((194, 212),)),
 WordScoreRange(word=et /et/, score=None, ranges=((214, 216),)),                                                               
 WordScoreRange(word=la /la/, score=None, ranges=((219, 221),)),
 WordScoreRange(word=reprise /reprise/, score=None, ranges=((224, 239),)),                                                                                                                         
 WordScoreRange(word=de /de/, score=None, ranges=((241, 244),)),            
 WordScoreRange(word=la /la/, score=None, ranges=((246, 248),)),              
 WordScoreRange(word=coupe /coupe/, score=None, ranges=((250, 258),)),
 WordScoreRange(word=du /du/, score=None, ranges=((259, 262),)),            
 WordScoreRange(word=monde /monde/, score=None, ranges=((264, 272),)),
 WordScoreRange(word=de //, score=None, ranges=((273, 275),)),
 WordScoreRange(word=ski /ski/, score=None, ranges=((278, 289),)),
 WordScoreRange(word=les /les/, score=None, ranges=((295, 299),)),
 WordScoreRange(word=images /images/, score=None, ranges=((303, 318),)),
 WordScoreRange(word=de /de/, score=None, ranges=((321, 323),)),
 WordScoreRange(word=glisse /glisse/, score=None, ranges=((327, 362),)),
 WordScoreRange(word=retrouvent /retrouvent/, score=None, ranges=((375, 394),)),
 WordScoreRange(word=une /une/, score=None, ranges=((398, 403),)),
 WordScoreRange(word=place /place/, score=None, ranges=((409, 424),)),
 WordScoreRange(word=de /de/, score=None, ranges=((430, 432),)),
 WordScoreRange(word=choix /choix/, score=None, ranges=((437, 448),)),
 WordScoreRange(word=dans /dans/, score=None, ranges=((450, 456),)),
 WordScoreRange(word=les /les/, score=None, ranges=((458, 462),)),
 WordScoreRange(word=pages /pages/, score=None, ranges=((465, 479),)),
 WordScoreRange(word=sportives /sportives/, score=None, ranges=((487, 507),)),
 WordScoreRange(word=de /de/, score=None, ranges=((509, 511),)),
 WordScoreRange(word=vos /vos/, score=None, ranges=((513, 519),)),
 WordScoreRange(word=journaux /journaux/, score=None, ranges=((521, 533),)),
 WordScoreRange(word=télévisés /télévisés/, score=None, ranges=((535, 562),)),
 WordScoreRange(word=deux /deux/, score=None, ranges=((568, 576),)),
 WordScoreRange(word=épreuves /épreuves/, score=None, ranges=((581, 618),)),
 WordScoreRange(word=aujourd'hui /aujourd'hui/, score=None, ranges=((629, 654),)),
 WordScoreRange(word=on /onoro/, score=None, ranges=((662, 680),)),
 WordScoreRange(word=a /am/, score=None, ranges=((685, 699),)),
 WordScoreRange(word=santa /santa/, score=None, ranges=((703, 726),)),
 WordScoreRange(word=caterina /qaterina/, score=None, ranges=((729, 761),)),
 WordScoreRange(word=premier /prmier/, score=None, ranges=((768, 783),)),
 WordScoreRange(word=salon /salom/, score=None, ranges=((785, 803),)),
 WordScoreRange(word=géant /géant/, score=None, ranges=((808, 820),)),
 WordScoreRange(word=de /de/, score=None, ranges=((822, 824),)),
 WordScoreRange(word=la /l/, score=None, ranges=((826, 829),)),
 WordScoreRange(word=coupe /coupe/, score=None, ranges=((833, 842),)),
 WordScoreRange(word=du /d/, score=None, ranges=((843, 846),)),
 WordScoreRange(word=monde /mond/, score=None, ranges=((850, 858),))]

We can see that before this PR, there are 17 words where timesteps are too early (about one letter shift, it is visible at the end but not at the begining of words because I have trimed spaces). After this PR, the fit is almost prefect. For some reason, there are still 3 remaining errors, all in the 4 last words.

opened by godefv 55

Language model incorrectly drops spaces for out-of-vocabulary words
Mozilla DeepSpeech will sometimes create long runs of text with no spaces:

omiokaarforfthelastquarterwastoget

This happens even with short audio clips (4 seconds) with a native American english speaker recorded using a high quality microphone in Mac OS X laptops. I've isolated the problem to interaction with the language model rather than the acoustic model or length of audio clips, as the problem goes away when the language model is turned off.

The problem might be related to encountering out-of-vocabulary terms.

I’ve put together test files with results that show the issue is related to the language model somehow rather than the length of the audio or the acoustic model.

I’ve provided 10 chunked WAV files at 16khz 16 bit depth, each 4 seconds long, that are a subset of a fuller 15 minute audio file (I have not provided that full 15 minute file, as a few shorter reproducible chunks are sufficient to reproduce the problem):

https://www.dropbox.com/sh/3qy65r6wo8ldtvi/AAAAVinsD_kcCi8Bs6l3zOWFa?dl=0

The audio segments deliberately include occasional out-of-vocabulary terms, mostly technical, such as “OKR”, “EdgeStore”, “CAPE”, etc.

Also in that folder are several text files that show the output with the standard language model being used, showing the garbled words together (chunks_with_language_model.txt):

Running inference for chunk 1 so were trying again a maybeialstart this time Running inference for chunk 2 omiokaarforfthelastquarterwastoget Running inference for chunk 3 to car to state deloedmarchinstrumnalha Running inference for chunk 4 a tonproductcaseregaugesomd produce sidnelfromthat Running inference for chunk 5 i am a to do that you know Running inference for chunk 6 we finish the kepehandlerrwend finished backfileprocessing Running inference for chunk 7 and is he teckdatthatwewould need to do to split the cape Running inference for chunk 8 out from sir handler and i are on new Running inference for chunk 9 he is not monolithic am andthanducotingswrat Running inference for chunk 10 relizationutenpling paws on that until it its a product signal

Then, I’ve provided similar output with the language model turned off (chunks_without_language_model.txt):

Running inference for chunk 1 so we're tryng again ah maybe alstart this time Running inference for chunk 2 omiokaar forf the last quarter was to get Running inference for chunk 3 oto car to state deloed march in strumn alha Running inference for chunk 4 um ton product caser egauges somd produc sidnel from that Running inference for chunk 5 am ah to do that ou nowith Running inference for chunk 6 we finishd the kepe handlerr wend finished backfile processinga Running inference for chunk 7 on es eteckdat that we would need to do to split the kae ha Running inference for chunk 8 rout frome sir hanler and ik ar on newh Running inference for chunk 9 ch las not monoliic am andthan ducotings wrat Running inference for chunk 10 relization u en pling a pas on that until it its a product signal

I’ve included both these files in the shared Dropbox folder link above.

Here’s what the correct transcript should be, manually done (chunks_correct_manual_transcription.txt):

So, we're trying again, maybe I'll start this time. So my OKR for the last quarter was to get AutoOCR to a state that we could launch an external alpha, and product could sort of gauge some product signal from that. To do that we finished the CAPE handler, we finished backfill processing, we have some tech debt that we would need to do to split the CAPE handler out from the search handler and make our own new handler so its not monolithic, and do some things around CAPE utilization. We are kind of putting a pause on that until we get some product signal.

This shows the language model is the source of this problem; I’ve seen anecdotal reports from the official message base and blog posts that this is a wide spread problem. Perhaps when the language model hits an unknown n-gram, it ends up combining all of them together rather than retaining the space between them.

Discussion around this bug started on the standard DeepSpeech discussion forum: https://discourse.mozilla.org/t/text-produced-has-long-strings-of-words-with-no-spaces/24089/13 https://discourse.mozilla.org/t/longer-audio-files-with-deep-speech/22784/3

Have I written custom code (as opposed to running examples on an unmodified clone of the repository):

The standard client.py was slightly modified to segment the longer 15 minute audio clip into 4 second blocks.

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

Mac OS X 10.12.6 (16G1036)

TensorFlow installed from (our builds, or upstream TensorFlow):

Both Mozilla DeepSpeech and TensorFlow were installed into a virtualenv setup via the following requirements.txt file:

tensorflow==1.4.0 deepspeech==0.1.0 numpy==1.13.3 scipy==0.19.1 webrtcvad==2.0.10

TensorFlow version (use command below):

('v1.4.0-rc1-11-g130a514', '1.4.0')

Python version:

Python 2.7.13

Bazel version (if compiling from source):

Did not compile from source.

GCC/Compiler version (if compiling from source):

Same

CUDA/cuDNN version:

Used CPU only version

GPU model and memory:

Used CPU only version

Exact command to reproduce:

I haven't provided my full modified client.py that segments longer audio, but to run with a language model using the standard deepspeech command against a known 4 seconds audio clip included in the Dropbox folder shared above you can run the following:

# Set $DEEPSPEECH to where full Deep Speech checkout is; note that my own git checkout # for the `deepspeech` runner is at git sha fef25e9ea6b0b6d96dceb610f96a40f2757e05e4 deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt $DEEPSPEECH/models/lm.binary $DEEPSPEECH/models/trie # Similar command to run without language model -- spaces retained for unknown words: deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt

This is clearly a bug and not a feature :)
opened by BradNeuberg 54
Adapting engine to any Custom Language

I was wondering what kinds of modifications would be needed to use this engine for languages other than English (other than a new language model and a new words.txt file) ? In particular, I was interested if it could be used with a Cyrillic script because of this: "data in the transcripts must match the [a-z ]+ regex", and if yes how hard would it be to adapt it. I think I could circumvent this problem by creating a translator that can translate the text from a Cyrillic script to [a-z ]+ format, but would be preferable if it could use a Cyrillic script directly.

Thanks in advance
question

opened by istojan 54
Support for Windows

I'm still editing the docs, preparing for CUDA, and finishing the C# examples.

IMPORTANT NOTE: Did not try to train on Windows yet, my initial goal is to enable inference with the clients on Windows.
Thanks to @reuben and @lissyx, they helped me a lot.

Fixes #1123
Epic

opened by carlfm01 51
Generate trie lm::FormatLoadException

I'm following this tutorial : https://discourse.mozilla.org/t/tutorial-how-i-trained-a-specific-french-model-to-control-my-robot/22830 to create a French model.

The problem is when generating the trie file with this command :

./generate_trie data/cassia/alphabet.txt data/cassia/lm.binary data/cassia/vocabulary.txt data/cassia/trie

I have this output :

terminate called after throwing an instance of 'lm::FormatLoadException' what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException. The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers Abandon (core dumped)

I tried several times to generate my lm.binary with kenlm (./build_binary -T -s words.arpa lm.binary) but still the same error.

opened by yoann1995 49

Releases(v0.10.0-alpha.3)

v0.10.0-alpha.3(Dec 19, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.41 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.35 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.75 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.arm64.cpu.linux_dbg.tar.xz(17.20 MB)
native_client.amd64.cpu.linux_dbg.tar.xz(144.88 MB)
native_client.amd64.tflite.linux_dbg.tar.xz(19.35 MB)
native_client.rpi3.cpu.linux_dbg.tar.xz(18.18 MB)
native_client.arm64.cpu.android_dbg.tar.xz(2.40 MB)
native_client.armv7.cpu.android_dbg.tar.xz(2.15 MB)
deepspeech-0.10.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.10.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.10.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.10.0a3-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.10.0a3-cp39-cp39-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.10.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(994.11 KB)
ds_ctcdecoder-0.10.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(994.10 KB)
ds_ctcdecoder-0.10.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(994.10 KB)
ds_ctcdecoder-0.10.0a3-cp38-cp38-macosx_10_10_x86_64.whl(994.35 KB)
ds_ctcdecoder-0.10.0a3-cp39-cp39-macosx_10_10_x86_64.whl(994.07 KB)
deepspeech_tflite-0.10.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.10.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.10.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.10.0a3-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.10.0a3-cp39-cp39-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.10.0a3-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.10.0a3-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.10.0a3-cp37-cp37m-manylinux1_x86_64.whl(8.79 MB)
deepspeech-0.10.0a3-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech-0.10.0a3-cp39-cp39-manylinux1_x86_64.whl(8.79 MB)
deepspeech_tflite-0.10.0a3-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.10.0a3-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.10.0a3-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.10.0a3-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.10.0a3-cp39-cp39-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.10.0a3-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.10.0a3-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.10.0a3-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.10.0a3-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.10.0a3-cp39-cp39-manylinux1_x86_64.whl(1.99 MB)
deepspeech_gpu-0.10.0a3-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.10.0a3-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.10.0a3-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.10.0a3-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.10.0a3-cp39-cp39-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.10.0a3-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.10.0a3-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.10.0a3-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.10.0a3-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.10.0a3-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech-0.10.0a3-cp39-cp39-win_amd64.whl(7.63 MB)
deepspeech_tflite-0.10.0a3-cp35-cp35m-win_amd64.whl(961.86 KB)
deepspeech_tflite-0.10.0a3-cp36-cp36m-win_amd64.whl(961.86 KB)
deepspeech_tflite-0.10.0a3-cp37-cp37m-win_amd64.whl(961.91 KB)
deepspeech_tflite-0.10.0a3-cp38-cp38-win_amd64.whl(962.27 KB)
deepspeech_tflite-0.10.0a3-cp39-cp39-win_amd64.whl(957.57 KB)
deepspeech_gpu-0.10.0a3-cp35-cp35m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.10.0a3-cp36-cp36m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.10.0a3-cp37-cp37m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.10.0a3-cp38-cp38-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.10.0a3-cp39-cp39-win_amd64.whl(21.10 MB)
ds_ctcdecoder-0.10.0a3-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.10.0a3-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.10.0a3-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.10.0a3-cp38-cp38-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.10.0a3-cp39-cp39-win_amd64.whl(2.42 MB)
deepspeech-gpu-0.10.0-alpha.3.tgz(44.10 MB)
deepspeech-0.10.0-alpha.3.tgz(34.85 MB)
deepspeech-tflite-0.10.0-alpha.3.tgz(7.19 MB)
libdeepspeech-0.10.0-alpha.3.maven.zip(6.20 MB)
DeepSpeech.0.10.0-alpha.3.nupkg(7.62 MB)
DeepSpeech-GPU.0.10.0-alpha.3.nupkg(21.08 MB)
DeepSpeech-TFLite.0.10.0-alpha.3.nupkg(939.68 KB)
deepspeech-0.10.0a3-cp37-cp37m-linux_aarch64.whl(1.36 MB)
v0.9.3(Dec 10, 2020)
General

This is the 0.9.3 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not backwards compatible with earlier versions. However, models exported for 0.7.X and 0.8.X should work with this release. This is a bugfix release and retains compatibility with the 0.9.0, 0.9.1 and 0.9.2 models. All model files included here are identical to the ones in the 0.9.0 release. As with previous releases, this release includes the source code:

v0.9.3.tar.gz

Under the MPL-2.0 license. And the acoustic models:

deepspeech-0.9.3-models.pbmm deepspeech-0.9.3-models.tflite

In addition we're releasing experimental Mandarin Chinese acoustic models trained on an internal corpus composed of 2000h of read speech:

deepspeech-0.9.3-models-zh-CN.pbmm deepspeech-0.9.3-models-zh-CN.tflite

all under the MPL-2.0 license.

The model files with the ".pbmm" extension are memory mapped and thus memory efficient and fast to load. The model files with the ".tflite" extension are converted to use TensorFlow Lite, has post-training quantization enabled, and are more suitable for resource constrained environments.

The acoustic models were trained on American English with synthetic noise augmentation and the .pbmm model achieves an 7.06% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.9.3-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

There is also a corresponding scorer for the Mandarin Chinese model:

deepspeech-0.9.3-models-zh-CN.scorer

We also include example audio files:

audio-0.9.3.tar.gz

which can be used to test the engine, and checkpoint files for both the English and Mandarin models:

deepspeech-0.9.3-checkpoint.tar.gz deepspeech-0.9.3-checkpoint-zh-CN.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Add CI testing for hot word boosting on .NET bindings (#3416)

Improve error message on generate_scorer_package tooling (#3435)

Enable support for building static iOS framework (#3436)

Change Java binding package name from org.mozilla.deepspeech to org.deepspeech (#3454)

Expose Stream type on TypeScript binding (#3456)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred as a fine tuning of the previous 0.8.2 checkpoint, with data augmentation options enabled. The following hyperparameters were used for the fine tuning. See the 0.8.2 release notes for the hyperparameters used for the base model.

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 200

augment pitch[pitch=1~0.1]

augment tempo[factor=1~0.1]

augment overlay[p=0.9,source=${noise},layers=1,snr=12~4] (where ${noise} is a dataset of Freesound.org background noise recordings)

augment overlay[p=0.1,source=${voices},layers=10~2,snr=12~4] (where ${voices} is a dataset of audiobook snippets extracted from Librivox)

augment resample[p=0.2,rate=12000~4000]

augment codec[p=0.2,bitrate=32000~16000]

augment reverb[p=0.2,decay=0.7~0.15,delay=10~8]

augment volume[p=0.2,dbfs=-10~10]

cache_for_epochs 10

The weights with the best validation loss were selected at the end of 200 epochs using --noearly_stop.

The optimal lm_alpha and lm_beta values with respect to the LibriSpeech clean dev corpus remain unchanged from the previous release:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

For the Mandarin Chinese model, the following values are recommended:

lm_alpha 0.6940122363709647

lm_beta 4.777924224113021

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7, 3.8 and 3.9) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x, 14.x and 15.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, 9.1, 9.2, 10.0, 10.1, and 11.0 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

Linux x86 64 bit with a modern CPU (at least AVX/FMA)

Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3, Pi 4

Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

iOS with Swift bindings (experimental). Tested on iPhone Xs.

TFLite Delegation API is here as a preview: do not expect released models to work out-of-the box, but feedback / PRs is welcome.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.9.3 release

Alexandre Lissy

Catalin Voss

Olaf Thiele

Reuben Morais

Source code(tar.gz)
Source code(zip)
deepspeech-0.9.3-checkpoint-zh-CN.tar.gz(623.12 MB)
deepspeech-0.9.3-checkpoint.tar.gz(616.06 MB)
deepspeech-0.9.3-models-zh-CN.pbmm(181.93 MB)
deepspeech-0.9.3-models-zh-CN.scorer(64.03 MB)
deepspeech-0.9.3-models-zh-CN.tflite(45.58 MB)
deepspeech-0.9.3-models.pbmm(180.16 MB)
deepspeech-0.9.3-models.scorer(909.19 MB)
deepspeech-0.9.3-models.tflite(45.13 MB)
audio-0.9.3.tar.gz(194.54 KB)
native_client.amd64.cpu.osx.tar.xz(8.41 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.76 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.arm64.cpu.linux_dbg.tar.xz(17.19 MB)
native_client.amd64.cpu.linux_dbg.tar.xz(144.91 MB)
native_client.amd64.tflite.linux_dbg.tar.xz(19.37 MB)
native_client.rpi3.cpu.linux_dbg.tar.xz(18.19 MB)
native_client.arm64.cpu.android_dbg.tar.xz(2.39 MB)
native_client.armv7.cpu.android_dbg.tar.xz(2.15 MB)
deepspeech-0.9.3-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.3-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.3-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.3-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.3-cp39-cp39-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.9.3-cp35-cp35m-macosx_10_10_x86_64.whl(994.24 KB)
ds_ctcdecoder-0.9.3-cp36-cp36m-macosx_10_10_x86_64.whl(994.18 KB)
ds_ctcdecoder-0.9.3-cp37-cp37m-macosx_10_10_x86_64.whl(994.21 KB)
ds_ctcdecoder-0.9.3-cp38-cp38-macosx_10_10_x86_64.whl(994.42 KB)
ds_ctcdecoder-0.9.3-cp39-cp39-macosx_10_10_x86_64.whl(994.15 KB)
deepspeech_tflite-0.9.3-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.3-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.3-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.3-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.3-cp39-cp39-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.9.3-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.3-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.3-cp37-cp37m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.3-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech-0.9.3-cp39-cp39-manylinux1_x86_64.whl(8.79 MB)
deepspeech_tflite-0.9.3-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.3-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.3-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.3-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.3-cp39-cp39-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.9.3-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.3-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.3-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.3-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.3-cp39-cp39-manylinux1_x86_64.whl(1.99 MB)
deepspeech_gpu-0.9.3-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.3-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.3-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.3-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.3-cp39-cp39-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.9.3-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.9.3-cp37-cp37m-linux_aarch64.whl(1.36 MB)
deepspeech-0.9.3-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.3-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.3-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.3-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech-0.9.3-cp39-cp39-win_amd64.whl(7.63 MB)
deepspeech_tflite-0.9.3-cp35-cp35m-win_amd64.whl(961.80 KB)
deepspeech_tflite-0.9.3-cp36-cp36m-win_amd64.whl(961.79 KB)
deepspeech_tflite-0.9.3-cp37-cp37m-win_amd64.whl(961.85 KB)
deepspeech_tflite-0.9.3-cp38-cp38-win_amd64.whl(962.21 KB)
deepspeech_tflite-0.9.3-cp39-cp39-win_amd64.whl(957.51 KB)
deepspeech_gpu-0.9.3-cp35-cp35m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.3-cp36-cp36m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.3-cp37-cp37m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.3-cp38-cp38-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.3-cp39-cp39-win_amd64.whl(21.10 MB)
ds_ctcdecoder-0.9.3-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.3-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.3-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.3-cp38-cp38-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.3-cp39-cp39-win_amd64.whl(2.42 MB)
deepspeech-gpu-0.9.3.tgz(44.10 MB)
deepspeech-0.9.3.tgz(34.85 MB)
deepspeech-tflite-0.9.3.tgz(7.19 MB)
libdeepspeech-0.9.3.maven.zip(6.20 MB)
DeepSpeech.0.9.3.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.3.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.3.nupkg(939.64 KB)
models_0.9.tar.gz(998.51 MB)
linux.amd64.convert_graphdef_memmapped_format.xz(14.10 MB)
macOS.amd64.convert_graphdef_memmapped_format.xz(17.12 MB)
ds-swig.darwin.amd64.tar.gz(1.34 MB)
ds-swig.win.amd64.tar.gz(2.10 MB)
ds-swig.linux.amd64.tar.gz(3.43 MB)
v0.9.2(Dec 3, 2020)
General

This is the 0.9.2 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X and 0.8.X should work with this release. This is a bugfix release and retains compatibility with the 0.9.0 and 0.9.1 models. All model files included here are identical to the ones in the 0.9.0 release. As with previous releases, this release includes the source code:

v0.9.2.tar.gz

Under the MPL-2.0 license. And the acoustic models:

deepspeech-0.9.2-models.pbmm deepspeech-0.9.2-models.tflite

In addition we're releasing experimental Mandarin Chinese acoustic models trained on an internal corpus composed of 2000h of read speech:

deepspeech-0.9.2-models-zh-CN.pbmm deepspeech-0.9.2-models-zh-CN.tflite

all under the MPL-2.0 license.

The model files with the ".pbmm" extension are memory mapped and thus memory efficient and fast to load. The model files with the ".tflite" extension are converted to use TensorFlow Lite, has post-training quantization enabled, and are more suitable for resource constrained environments.

The acoustic models were trained on American English with synthetic noise augmentation and the .pbmm model achieves an 7.06% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.9.2-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

There is also a corresponding scorer for the Mandarin Chinese model:

deepspeech-0.9.2-models-zh-CN.scorer

We also include example audio files:

audio-0.9.2.tar.gz

which can be used to test the engine, and checkpoint files for both the English and Mandarin models:

deepspeech-0.9.2-checkpoint.tar.gz deepspeech-0.9.2-checkpoint-zh-CN.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Add support for Python 3.9 for native client packages (#3409)

Add CI testing for hot word boosting on Java package (#3410)

Add importer for French dataset from Centre de Conférences Pierre Mendès-France (#3438)

Add support for ElectronJS v11.0 (#3441)

Correct documentation for needed versions of CUDA for training DeepSpeech (#3443)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred as a fine tuning of the previous 0.8.2 checkpoint, with data augmentation options enabled. The following hyperparameters were used for the fine tuning. See the 0.8.2 release notes for the hyperparameters used for the base model.

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 200

augment pitch[pitch=1~0.1]

augment tempo[factor=1~0.1]

augment overlay[p=0.9,source=${noise},layers=1,snr=12~4] (where ${noise} is a dataset of Freesound.org background noise recordings)

augment overlay[p=0.1,source=${voices},layers=10~2,snr=12~4] (where ${voices} is a dataset of audiobook snippets extracted from Librivox)

augment resample[p=0.2,rate=12000~4000]

augment codec[p=0.2,bitrate=32000~16000]

augment reverb[p=0.2,decay=0.7~0.15,delay=10~8]

augment volume[p=0.2,dbfs=-10~10]

cache_for_epochs 10

The weights with the best validation loss were selected at the end of 200 epochs using --noearly_stop.

The optimal lm_alpha and lm_beta values with respect to the LibriSpeech clean dev corpus remain unchanged from the previous release:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

For the Mandarin Chinese model, the following values are recommended:

lm_alpha 0.6940122363709647

lm_beta 4.777924224113021

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7, 3.8 and 3.9) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x, 14.x and 15.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, 9.1, 9.2, 10.0, 10.1, and 11.0 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

Linux x86 64 bit with a modern CPU (at least AVX/FMA)

Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3, Pi 4

Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

iOS with Swift bindings (experimental). Tested on iPhone Xs.

TFLite Delegation API is here as a preview: do not expect released models to work out-of-the box, but feedback / PRs is welcome.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.9.2 release

Alexandre Lissy

Catalin Voss

Dag7

Rahul Karmakar

Source code(tar.gz)
Source code(zip)
deepspeech-0.9.2-models-zh-CN.pbmm(181.93 MB)
deepspeech-0.9.2-models-zh-CN.scorer(64.03 MB)
deepspeech-0.9.2-models-zh-CN.tflite(45.58 MB)
deepspeech-0.9.2-models.pbmm(180.16 MB)
deepspeech-0.9.2-models.scorer(909.19 MB)
deepspeech-0.9.2-models.tflite(45.13 MB)
deepspeech-0.9.2-checkpoint-zh-CN.tar.gz(623.12 MB)
audio-0.9.2.tar.gz(194.22 KB)
deepspeech-0.9.2-checkpoint.tar.gz(616.06 MB)
native_client.amd64.cpu.osx.tar.xz(8.41 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.88 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.75 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.x86_64.tflite.ios.tar.xz(8.08 MB)
native_client.arm64.tflite.ios.tar.xz(7.06 MB)
native_client.arm64.cpu.linux_dbg.tar.xz(17.18 MB)
native_client.amd64.cpu.linux_dbg.tar.xz(144.87 MB)
native_client.amd64.tflite.linux_dbg.tar.xz(19.37 MB)
native_client.rpi3.cpu.linux_dbg.tar.xz(18.18 MB)
native_client.arm64.cpu.android_dbg.tar.xz(2.40 MB)
native_client.armv7.cpu.android_dbg.tar.xz(2.16 MB)
deepspeech-0.9.2-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.2-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.2-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.2-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.2-cp39-cp39-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.9.2-cp35-cp35m-macosx_10_10_x86_64.whl(994.14 KB)
ds_ctcdecoder-0.9.2-cp36-cp36m-macosx_10_10_x86_64.whl(994.09 KB)
ds_ctcdecoder-0.9.2-cp37-cp37m-macosx_10_10_x86_64.whl(994.10 KB)
ds_ctcdecoder-0.9.2-cp38-cp38-macosx_10_10_x86_64.whl(994.30 KB)
ds_ctcdecoder-0.9.2-cp39-cp39-macosx_10_10_x86_64.whl(994.03 KB)
deepspeech_tflite-0.9.2-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.2-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.2-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.2-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.2-cp39-cp39-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.9.2-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.2-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.2-cp37-cp37m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.2-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech-0.9.2-cp39-cp39-manylinux1_x86_64.whl(8.78 MB)
deepspeech_tflite-0.9.2-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.2-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.2-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.2-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.2-cp39-cp39-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.9.2-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.2-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.2-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.2-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.2-cp39-cp39-manylinux1_x86_64.whl(1.99 MB)
deepspeech_gpu-0.9.2-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.2-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.2-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.2-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.2-cp39-cp39-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.9.2-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.9.2-cp37-cp37m-linux_aarch64.whl(1.36 MB)
deepspeech-0.9.2-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.2-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.2-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.2-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech-0.9.2-cp39-cp39-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.2-cp35-cp35m-win_amd64.whl(961.77 KB)
deepspeech_tflite-0.9.2-cp36-cp36m-win_amd64.whl(961.77 KB)
deepspeech_tflite-0.9.2-cp37-cp37m-win_amd64.whl(961.82 KB)
deepspeech_tflite-0.9.2-cp38-cp38-win_amd64.whl(962.18 KB)
deepspeech_tflite-0.9.2-cp39-cp39-win_amd64.whl(957.48 KB)
deepspeech_gpu-0.9.2-cp35-cp35m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.2-cp36-cp36m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.2-cp37-cp37m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.2-cp38-cp38-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.2-cp39-cp39-win_amd64.whl(21.10 MB)
ds_ctcdecoder-0.9.2-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.2-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.2-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.2-cp38-cp38-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.2-cp39-cp39-win_amd64.whl(2.42 MB)
deepspeech-gpu-0.9.2.tgz(44.10 MB)
deepspeech-0.9.2.tgz(34.85 MB)
deepspeech-tflite-0.9.2.tgz(7.19 MB)
libdeepspeech-0.9.2.maven.zip(6.20 MB)
DeepSpeech.0.9.2.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.2.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.2.nupkg(939.60 KB)
v0.9.1(Nov 4, 2020)
General

This is the 0.9.1 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X and 0.8.X should work with this release. This is a bugfix release and retains compatibility with the 0.9.0 models. All model files included here are identical to the ones in the 0.9.0 release. As with previous releases, this release includes the source code:

v0.9.1.tar.gz

Under the MPL-2.0 license. And the acoustic models:

deepspeech-0.9.1-models.pbmm deepspeech-0.9.1-models.tflite

In addition we're releasing experimental Mandarin Chinese acoustic models trained on an internal corpus composed of 2000h of read speech:

deepspeech-0.9.1-models-zh-CN.pbmm deepspeech-0.9.1-models-zh-CN.tflite

all under the MPL-2.0 license.

The model files with the ".pbmm" extension are memory mapped and thus memory efficient and fast to load. The model files with the ".tflite" extension are converted to use TensorFlow Lite, has post-training quantization enabled, and are more suitable for resource constrained environments.

The acoustic models were trained on American English with synthetic noise augmentation and the .pbmm model achieves an 7.06% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.9.1-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

There is also a corresponding scorer for the Mandarin Chinese model:

deepspeech-0.9.1-models-zh-CN.scorer

We also include example audio files:

audio-0.9.1.tar.gz

which can be used to test the engine, and checkpoint files for both the English and Mandarin models:

deepspeech-0.9.1-checkpoint.tar.gz deepspeech-0.9.1-checkpoint-zh-CN.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Fixed problem with documentation build on ReadTheDocs.org (#3399)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred as a fine tuning of the previous 0.8.2 checkpoint, with data augmentation options enabled. The following hyperparameters were used for the fine tuning. See the 0.8.2 release notes for the hyperparameters used for the base model.

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 200

augment pitch[pitch=1~0.1]

augment tempo[factor=1~0.1]

augment overlay[p=0.9,source=${noise},layers=1,snr=12~4] (where ${noise} is a dataset of Freesound.org background noise recordings)

augment overlay[p=0.1,source=${voices},layers=10~2,snr=12~4] (where ${voices} is a dataset of audiobook snippets extracted from Librivox)

augment resample[p=0.2,rate=12000~4000]

augment codec[p=0.2,bitrate=32000~16000]

augment reverb[p=0.2,decay=0.7~0.15,delay=10~8]

augment volume[p=0.2,dbfs=-10~10]

cache_for_epochs 10

The weights with the best validation loss were selected at the end of 200 epochs using --noearly_stop.

The optimal lm_alpha and lm_beta values with respect to the LibriSpeech clean dev corpus remain unchanged from the previous release:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

For the Mandarin Chinese model, the following values are recommended:

lm_alpha 0.6940122363709647

lm_beta 4.777924224113021

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x, 14.x and 15.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, 9.1, 9.2, 10.0 and 10.1 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

Linux x86 64 bit with a modern CPU (at least AVX/FMA)

Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3, Pi 4

Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

iOS with Swift bindings (experimental). Tested on iPhone Xs.

TFLite Delegation API is here as a preview: do not expect released models to work out-of-the box, but feedback / PRs is welcome.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.9.1 release

Alexandre Lissy

Source code(tar.gz)
Source code(zip)
deepspeech-0.9.1-checkpoint-zh-CN.tar.gz(623.14 MB)
deepspeech-0.9.1-checkpoint.tar.gz(616.18 MB)
deepspeech-0.9.1-models-zh-CN.pbmm(181.93 MB)
deepspeech-0.9.1-models-zh-CN.scorer(64.03 MB)
deepspeech-0.9.1-models-zh-CN.tflite(45.58 MB)
deepspeech-0.9.1-models.pbmm(180.16 MB)
deepspeech-0.9.1-models.scorer(909.19 MB)
deepspeech-0.9.1-models.tflite(45.13 MB)
audio-0.9.1.tar.gz(194.54 KB)
native_client.amd64.cpu.osx.tar.xz(8.43 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.75 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.x86_64.tflite.ios.tar.xz(8.08 MB)
native_client.arm64.tflite.ios.tar.xz(7.06 MB)
native_client.arm64.cpu.linux_dbg.tar.xz(17.19 MB)
native_client.amd64.cpu.linux_dbg.tar.xz(144.88 MB)
native_client.amd64.tflite.linux_dbg.tar.xz(19.35 MB)
native_client.rpi3.cpu.linux_dbg.tar.xz(18.19 MB)
native_client.arm64.cpu.android_dbg.tar.xz(2.40 MB)
native_client.armv7.cpu.android_dbg.tar.xz(2.16 MB)
deepspeech-0.9.1-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.1-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.1-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.1-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.9.1-cp35-cp35m-macosx_10_10_x86_64.whl(994.15 KB)
ds_ctcdecoder-0.9.1-cp36-cp36m-macosx_10_10_x86_64.whl(994.12 KB)
ds_ctcdecoder-0.9.1-cp37-cp37m-macosx_10_10_x86_64.whl(994.16 KB)
ds_ctcdecoder-0.9.1-cp38-cp38-macosx_10_10_x86_64.whl(994.33 KB)
deepspeech_tflite-0.9.1-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.1-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.1-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.1-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.9.1-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.1-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.1-cp37-cp37m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.1-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech_tflite-0.9.1-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.1-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.1-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.1-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.9.1-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.1-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.1-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.1-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
deepspeech_gpu-0.9.1-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.1-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.1-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.1-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.9.1-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.9.1-cp37-cp37m-linux_aarch64.whl(1.36 MB)
deepspeech-0.9.1-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.1-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.1-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.1-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.1-cp35-cp35m-win_amd64.whl(961.77 KB)
deepspeech_tflite-0.9.1-cp36-cp36m-win_amd64.whl(961.77 KB)
deepspeech_tflite-0.9.1-cp37-cp37m-win_amd64.whl(961.82 KB)
deepspeech_tflite-0.9.1-cp38-cp38-win_amd64.whl(962.18 KB)
deepspeech_gpu-0.9.1-cp35-cp35m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.1-cp36-cp36m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.1-cp37-cp37m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.1-cp38-cp38-win_amd64.whl(21.11 MB)
ds_ctcdecoder-0.9.1-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.1-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.1-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.1-cp38-cp38-win_amd64.whl(2.31 MB)
deepspeech-gpu-0.9.1.tgz(43.99 MB)
deepspeech-0.9.1.tgz(34.69 MB)
deepspeech-tflite-0.9.1.tgz(7.07 MB)
libdeepspeech-0.9.1.maven.zip(6.20 MB)
DeepSpeech.0.9.1.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.1.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.1.nupkg(939.61 KB)
v0.9.0(Nov 2, 2020)
General

This is the 0.9.0 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X and 0.8.X should work with this release. As with previous releases, this release includes the source code:

v0.9.0.tar.gz

Under the MPL-2.0 license. And the acoustic models:

deepspeech-0.9.0-models.pbmm deepspeech-0.9.0-models.tflite

In addition we're releasing experimental Mandarin Chinese acoustic models trained on an internal corpus composed of 2000h of read speech:

deepspeech-0.9.0-models-zh-CN.pbmm deepspeech-0.9.0-models-zh-CN.tflite

all under the MPL-2.0 license.

The model files with the ".pbmm" extension are memory mapped and thus memory efficient and fast to load. The model files with the ".tflite" extension are converted to use TensorFlow Lite, has post-training quantization enabled, and are more suitable for resource constrained environments.

The acoustic models were trained on American English with synthetic noise augmentation and the .pbmm model achieves an 7.06% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.9.0-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

There is also a corresponding scorer for the Mandarin Chinese model:

deepspeech-0.9.0-models-zh-CN.scorer

We also include example audio files:

audio-0.9.0.tar.gz

which can be used to test the engine, and checkpoint files for both the English and Mandarin models:

deepspeech-0.9.0-checkpoint.tar.gz deepspeech-0.9.0-checkpoint-zh-CN.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Fixed incorrect minimum OS version in macOS binaries (#3259)

Fixed bug in metadata output for Python package client (#3264)

Added ElectronJS v9.2 support (#3266)

Improved Bytes output mode documentation

(Optional) Layer Norm support in training

Add support for boosting scores for hot words during decoding (#3297)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred as a fine tuning of the previous 0.8.2 checkpoint, with data augmentation options enabled. The following hyperparameters were used for the fine tuning. See the 0.8.2 release notes for the hyperparameters used for the base model.

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 200

augment pitch[pitch=1~0.1]

augment tempo[factor=1~0.1]

augment overlay[p=0.9,source=${noise},layers=1,snr=12~4] (where ${noise} is a dataset of Freesound.org background noise recordings)

augment overlay[p=0.1,source=${voices},layers=10~2,snr=12~4] (where ${voices} is a dataset of audiobook snippets extracted from Librivox)

augment resample[p=0.2,rate=12000~4000]

augment codec[p=0.2,bitrate=32000~16000]

augment reverb[p=0.2,decay=0.7~0.15,delay=10~8]

augment volume[p=0.2,dbfs=-10~10]

cache_for_epochs 10

The weights with the best validation loss were selected at the end of 200 epochs using --noearly_stop.

The optimal lm_alpha and lm_beta values with respect to the LibriSpeech clean dev corpus remain unchanged from the previous release:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

For the Mandarin Chinese model, the following values are recommended:

lm_alpha 0.6940122363709647

lm_beta 4.777924224113021

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x, 14.x and 15.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, 9.1 and 9.2 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

Linux x86 64 bit with a modern CPU (at least AVX/FMA)

Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3, Pi 4

Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

iOS with Swift bindings (experimental). Tested on iPhone Xs.

TFLite Delegation API is here as a preview: do not expect released models to work out-of-the box, but feedback / PRs is welcome.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.9.0 release

Alexandre Lissy

Anas Abou Allaban

Bernardo Henz

Daniel

Dewi Bryn Jones

Eren Gölge

Francis Tyers

godeffroy

Greg Cooke

imrahul3610

Jedrzej Beniamin Orbik

Josh Meyer

Kelly Davis

Liezl P

Michael Stegeman

Neil Stoker

Olaf Thiele

Ptitloup

Reuben Morais

Suriyaa Sundararuban

THCKwarter

tiagomoraismorgado

Tilman Kamp

Source code(tar.gz)
Source code(zip)
deepspeech-0.9.0-models-zh-CN.scorer(64.03 MB)
deepspeech-0.9.0-models-zh-CN.tflite(45.58 MB)
deepspeech-0.9.0-models-zh-CN.pbmm(181.93 MB)
deepspeech-0.9.0-checkpoint-zh-CN.tar.gz(623.14 MB)
deepspeech-0.9.0-checkpoint.tar.gz(616.18 MB)
deepspeech-0.9.0-models.tflite(45.13 MB)
deepspeech-0.9.0-models.pbmm(180.16 MB)
deepspeech-0.9.0-models.scorer(909.19 MB)
native_client.amd64.cpu.osx.tar.xz(8.42 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.75 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.x86_64.tflite.ios.tar.xz(8.08 MB)
native_client.arm64.tflite.ios.tar.xz(7.06 MB)
native_client.arm64.cpu.linux_dbg.tar.xz(17.19 MB)
native_client.amd64.cpu.linux_dbg.tar.xz(144.87 MB)
native_client.amd64.tflite.linux_dbg.tar.xz(19.35 MB)
native_client.rpi3.cpu.linux_dbg.tar.xz(18.18 MB)
native_client.arm64.cpu.android_dbg.tar.xz(2.40 MB)
native_client.armv7.cpu.android_dbg.tar.xz(2.16 MB)
deepspeech-0.9.0-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.9.0-cp35-cp35m-macosx_10_10_x86_64.whl(994.38 KB)
ds_ctcdecoder-0.9.0-cp36-cp36m-macosx_10_10_x86_64.whl(994.32 KB)
ds_ctcdecoder-0.9.0-cp37-cp37m-macosx_10_10_x86_64.whl(994.34 KB)
ds_ctcdecoder-0.9.0-cp38-cp38-macosx_10_10_x86_64.whl(994.53 KB)
deepspeech_tflite-0.9.0-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.9.0-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0-cp37-cp37m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech_tflite-0.9.0-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.9.0-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
deepspeech_gpu-0.9.0-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.9.0-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.9.0-cp37-cp37m-linux_aarch64.whl(1.36 MB)
deepspeech-0.9.0-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.0-cp35-cp35m-win_amd64.whl(961.77 KB)
deepspeech_tflite-0.9.0-cp36-cp36m-win_amd64.whl(961.77 KB)
deepspeech_tflite-0.9.0-cp37-cp37m-win_amd64.whl(961.82 KB)
deepspeech_tflite-0.9.0-cp38-cp38-win_amd64.whl(962.18 KB)
deepspeech_gpu-0.9.0-cp35-cp35m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0-cp36-cp36m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0-cp37-cp37m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0-cp38-cp38-win_amd64.whl(21.11 MB)
ds_ctcdecoder-0.9.0-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0-cp38-cp38-win_amd64.whl(2.31 MB)
deepspeech-gpu-0.9.0.tgz(43.99 MB)
deepspeech-0.9.0.tgz(34.69 MB)
deepspeech-tflite-0.9.0.tgz(7.07 MB)
libdeepspeech-0.9.0.maven.zip(6.20 MB)
DeepSpeech.0.9.0.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.0.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.0.nupkg(939.60 KB)
audio-0.9.0.tar.gz(194.54 KB)
v0.9.0-alpha.12(Oct 30, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.40 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.75 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.x86_64.tflite.ios.tar.xz(8.08 MB)
native_client.arm64.tflite.ios.tar.xz(7.06 MB)
native_client.arm64.cpu.linux_dbg.tar.xz(17.20 MB)
native_client.amd64.cpu.linux_dbg.tar.xz(144.87 MB)
native_client.amd64.tflite.linux_dbg.tar.xz(19.36 MB)
native_client.rpi3.cpu.linux_dbg.tar.xz(18.17 MB)
native_client.arm64.cpu.android_dbg.tar.xz(2.40 MB)
native_client.armv7.cpu.android_dbg.tar.xz(2.16 MB)
deepspeech-0.9.0a12-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a12-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a12-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a12-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.9.0a12-cp35-cp35m-macosx_10_10_x86_64.whl(994.45 KB)
ds_ctcdecoder-0.9.0a12-cp36-cp36m-macosx_10_10_x86_64.whl(994.42 KB)
ds_ctcdecoder-0.9.0a12-cp37-cp37m-macosx_10_10_x86_64.whl(994.43 KB)
ds_ctcdecoder-0.9.0a12-cp38-cp38-macosx_10_10_x86_64.whl(994.64 KB)
deepspeech_tflite-0.9.0a12-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a12-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a12-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a12-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.9.0a12-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a12-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a12-cp37-cp37m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a12-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech_tflite-0.9.0a12-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a12-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a12-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a12-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.9.0a12-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a12-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a12-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a12-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
deepspeech_gpu-0.9.0a12-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a12-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a12-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a12-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.9.0a12-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.9.0a12-cp37-cp37m-linux_aarch64.whl(1.36 MB)
deepspeech-0.9.0a12-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a12-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a12-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a12-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.0a12-cp35-cp35m-win_amd64.whl(961.82 KB)
deepspeech_tflite-0.9.0a12-cp36-cp36m-win_amd64.whl(961.81 KB)
deepspeech_tflite-0.9.0a12-cp37-cp37m-win_amd64.whl(961.86 KB)
deepspeech_tflite-0.9.0a12-cp38-cp38-win_amd64.whl(962.22 KB)
deepspeech_gpu-0.9.0a12-cp35-cp35m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a12-cp36-cp36m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a12-cp37-cp37m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a12-cp38-cp38-win_amd64.whl(21.11 MB)
ds_ctcdecoder-0.9.0a12-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a12-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a12-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a12-cp38-cp38-win_amd64.whl(2.31 MB)
deepspeech-gpu-0.9.0-alpha.12.tgz(43.99 MB)
deepspeech-0.9.0-alpha.12.tgz(34.69 MB)
deepspeech-tflite-0.9.0-alpha.12.tgz(7.07 MB)
libdeepspeech-0.9.0-alpha.12.maven.zip(6.20 MB)
DeepSpeech.0.9.0-alpha.12.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.0-alpha.12.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.0-alpha.12.nupkg(939.62 KB)
v0.9.0-alpha.11(Oct 9, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.42 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.88 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.76 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.x86_64.tflite.ios.tar.xz(8.08 MB)
native_client.arm64.tflite.ios.tar.xz(7.06 MB)
native_client.arm64.cpu.linux_dbg.tar.xz(17.20 MB)
native_client.amd64.cpu.linux_dbg.tar.xz(144.85 MB)
native_client.amd64.tflite.linux_dbg.tar.xz(19.34 MB)
native_client.rpi3.cpu.linux_dbg.tar.xz(18.18 MB)
native_client.arm64.cpu.android_dbg.tar.xz(2.40 MB)
native_client.armv7.cpu.android_dbg.tar.xz(2.16 MB)
deepspeech-0.9.0a11-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a11-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a11-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a11-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.9.0a11-cp35-cp35m-macosx_10_10_x86_64.whl(994.16 KB)
ds_ctcdecoder-0.9.0a11-cp36-cp36m-macosx_10_10_x86_64.whl(994.16 KB)
ds_ctcdecoder-0.9.0a11-cp37-cp37m-macosx_10_10_x86_64.whl(994.15 KB)
ds_ctcdecoder-0.9.0a11-cp38-cp38-macosx_10_10_x86_64.whl(994.38 KB)
deepspeech_tflite-0.9.0a11-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a11-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a11-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a11-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.9.0a11-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a11-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a11-cp37-cp37m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a11-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech_tflite-0.9.0a11-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a11-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a11-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a11-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.9.0a11-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a11-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a11-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a11-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
deepspeech_gpu-0.9.0a11-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a11-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a11-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a11-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.9.0a11-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.9.0a11-cp37-cp37m-linux_aarch64.whl(1.36 MB)
deepspeech-0.9.0a11-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a11-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a11-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a11-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.0a11-cp35-cp35m-win_amd64.whl(961.81 KB)
deepspeech_tflite-0.9.0a11-cp36-cp36m-win_amd64.whl(961.80 KB)
deepspeech_tflite-0.9.0a11-cp37-cp37m-win_amd64.whl(961.86 KB)
deepspeech_tflite-0.9.0a11-cp38-cp38-win_amd64.whl(962.22 KB)
deepspeech_gpu-0.9.0a11-cp35-cp35m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a11-cp36-cp36m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a11-cp37-cp37m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a11-cp38-cp38-win_amd64.whl(21.11 MB)
ds_ctcdecoder-0.9.0a11-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a11-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a11-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a11-cp38-cp38-win_amd64.whl(2.31 MB)
deepspeech-gpu-0.9.0-alpha.11.tgz(43.89 MB)
deepspeech-0.9.0-alpha.11.tgz(34.52 MB)
deepspeech-tflite-0.9.0-alpha.11.tgz(6.94 MB)
libdeepspeech-0.9.0-alpha.11.maven.zip(6.20 MB)
DeepSpeech.0.9.0-alpha.11.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.0-alpha.11.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.0-alpha.11.nupkg(939.62 KB)
v0.9.0-alpha.10(Sep 25, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.42 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.86 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.81 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.75 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.x86_64.tflite.ios.tar.xz(8.08 MB)
native_client.arm64.tflite.ios.tar.xz(7.06 MB)
deepspeech-0.9.0a10-cp35-cp35m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a10-cp36-cp36m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a10-cp37-cp37m-macosx_10_10_x86_64.whl(13.21 MB)
deepspeech-0.9.0a10-cp38-cp38-macosx_10_10_x86_64.whl(13.21 MB)
ds_ctcdecoder-0.9.0a10-cp35-cp35m-macosx_10_10_x86_64.whl(993.87 KB)
ds_ctcdecoder-0.9.0a10-cp36-cp36m-macosx_10_10_x86_64.whl(993.83 KB)
ds_ctcdecoder-0.9.0a10-cp37-cp37m-macosx_10_10_x86_64.whl(993.86 KB)
ds_ctcdecoder-0.9.0a10-cp38-cp38-macosx_10_10_x86_64.whl(994.05 KB)
deepspeech_tflite-0.9.0a10-cp35-cp35m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a10-cp36-cp36m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a10-cp37-cp37m-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech_tflite-0.9.0a10-cp38-cp38-macosx_10_10_x86_64.whl(2.56 MB)
deepspeech-0.9.0a10-cp35-cp35m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a10-cp36-cp36m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a10-cp37-cp37m-manylinux1_x86_64.whl(8.78 MB)
deepspeech-0.9.0a10-cp38-cp38-manylinux1_x86_64.whl(8.79 MB)
deepspeech_tflite-0.9.0a10-cp35-cp35m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a10-cp36-cp36m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a10-cp37-cp37m-manylinux1_x86_64.whl(1.68 MB)
deepspeech_tflite-0.9.0a10-cp38-cp38-manylinux1_x86_64.whl(1.68 MB)
ds_ctcdecoder-0.9.0a10-cp35-cp35m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a10-cp36-cp36m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a10-cp37-cp37m-manylinux1_x86_64.whl(2.00 MB)
ds_ctcdecoder-0.9.0a10-cp38-cp38-manylinux1_x86_64.whl(2.00 MB)
deepspeech_gpu-0.9.0a10-cp35-cp35m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a10-cp36-cp36m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a10-cp37-cp37m-manylinux1_x86_64.whl(21.24 MB)
deepspeech_gpu-0.9.0a10-cp38-cp38-manylinux1_x86_64.whl(21.24 MB)
deepspeech-0.9.0a10-cp37-cp37m-linux_armv7l.whl(1.36 MB)
deepspeech-0.9.0a10-cp37-cp37m-linux_aarch64.whl(1.36 MB)
deepspeech-0.9.0a10-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a10-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a10-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a10-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.0a10-cp35-cp35m-win_amd64.whl(961.81 KB)
deepspeech_tflite-0.9.0a10-cp36-cp36m-win_amd64.whl(961.80 KB)
deepspeech_tflite-0.9.0a10-cp37-cp37m-win_amd64.whl(961.86 KB)
deepspeech_tflite-0.9.0a10-cp38-cp38-win_amd64.whl(962.22 KB)
deepspeech_gpu-0.9.0a10-cp35-cp35m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a10-cp36-cp36m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a10-cp37-cp37m-win_amd64.whl(21.11 MB)
deepspeech_gpu-0.9.0a10-cp38-cp38-win_amd64.whl(21.11 MB)
ds_ctcdecoder-0.9.0a10-cp35-cp35m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a10-cp36-cp36m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a10-cp37-cp37m-win_amd64.whl(2.31 MB)
ds_ctcdecoder-0.9.0a10-cp38-cp38-win_amd64.whl(2.31 MB)
deepspeech-gpu-0.9.0-alpha.10.tgz(43.89 MB)
deepspeech-0.9.0-alpha.10.tgz(34.52 MB)
deepspeech-tflite-0.9.0-alpha.10.tgz(6.94 MB)
libdeepspeech-0.9.0-alpha.10.maven.zip(6.20 MB)
DeepSpeech.0.9.0-alpha.10.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.0-alpha.10.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.0-alpha.10.nupkg(939.62 KB)
v0.9.0-alpha.9(Sep 21, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.40 MB)
native_client.amd64.tflite.osx.tar.xz(2.78 MB)
native_client.amd64.cpu.linux.tar.xz(5.85 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.43 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.56 MB)
native_client.armv7.cpu.android.tar.xz(5.80 MB)
native_client.amd64.cpu.win.tar.xz(5.39 MB)
native_client.amd64.cuda.win.tar.xz(13.75 MB)
native_client.amd64.tflite.win.tar.xz(1.09 MB)
native_client.x86_64.tflite.ios.tar.xz(8.06 MB)
native_client.arm64.tflite.ios.tar.xz(7.04 MB)
deepspeech-0.9.0a9-cp35-cp35m-macosx_10_10_x86_64.whl(13.20 MB)
deepspeech-0.9.0a9-cp36-cp36m-macosx_10_10_x86_64.whl(13.20 MB)
deepspeech-0.9.0a9-cp37-cp37m-macosx_10_10_x86_64.whl(13.20 MB)
deepspeech-0.9.0a9-cp38-cp38-macosx_10_10_x86_64.whl(13.20 MB)
ds_ctcdecoder-0.9.0a9-cp35-cp35m-macosx_10_10_x86_64.whl(972.40 KB)
ds_ctcdecoder-0.9.0a9-cp36-cp36m-macosx_10_10_x86_64.whl(972.40 KB)
ds_ctcdecoder-0.9.0a9-cp37-cp37m-macosx_10_10_x86_64.whl(971.97 KB)
ds_ctcdecoder-0.9.0a9-cp38-cp38-macosx_10_10_x86_64.whl(972.59 KB)
deepspeech_tflite-0.9.0a9-cp35-cp35m-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech_tflite-0.9.0a9-cp36-cp36m-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech_tflite-0.9.0a9-cp37-cp37m-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech_tflite-0.9.0a9-cp38-cp38-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech-0.9.0a9-cp35-cp35m-manylinux1_x86_64.whl(8.77 MB)
deepspeech-0.9.0a9-cp36-cp36m-manylinux1_x86_64.whl(8.77 MB)
deepspeech-0.9.0a9-cp37-cp37m-manylinux1_x86_64.whl(8.77 MB)
deepspeech-0.9.0a9-cp38-cp38-manylinux1_x86_64.whl(8.77 MB)
deepspeech_tflite-0.9.0a9-cp35-cp35m-manylinux1_x86_64.whl(1.66 MB)
deepspeech_tflite-0.9.0a9-cp36-cp36m-manylinux1_x86_64.whl(1.66 MB)
deepspeech_tflite-0.9.0a9-cp37-cp37m-manylinux1_x86_64.whl(1.67 MB)
deepspeech_tflite-0.9.0a9-cp38-cp38-manylinux1_x86_64.whl(1.67 MB)
ds_ctcdecoder-0.9.0a9-cp35-cp35m-manylinux1_x86_64.whl(1.90 MB)
ds_ctcdecoder-0.9.0a9-cp36-cp36m-manylinux1_x86_64.whl(1.90 MB)
ds_ctcdecoder-0.9.0a9-cp37-cp37m-manylinux1_x86_64.whl(1.90 MB)
ds_ctcdecoder-0.9.0a9-cp38-cp38-manylinux1_x86_64.whl(1.89 MB)
deepspeech_gpu-0.9.0a9-cp35-cp35m-manylinux1_x86_64.whl(21.23 MB)
deepspeech_gpu-0.9.0a9-cp36-cp36m-manylinux1_x86_64.whl(21.23 MB)
deepspeech_gpu-0.9.0a9-cp37-cp37m-manylinux1_x86_64.whl(21.23 MB)
deepspeech_gpu-0.9.0a9-cp38-cp38-manylinux1_x86_64.whl(21.23 MB)
deepspeech-0.9.0a9-cp37-cp37m-linux_armv7l.whl(1.35 MB)
deepspeech-0.9.0a9-cp37-cp37m-linux_aarch64.whl(1.35 MB)
deepspeech-0.9.0a9-cp35-cp35m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a9-cp36-cp36m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a9-cp37-cp37m-win_amd64.whl(7.64 MB)
deepspeech-0.9.0a9-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.0a9-cp35-cp35m-win_amd64.whl(957.86 KB)
deepspeech_tflite-0.9.0a9-cp36-cp36m-win_amd64.whl(957.86 KB)
deepspeech_tflite-0.9.0a9-cp37-cp37m-win_amd64.whl(957.90 KB)
deepspeech_tflite-0.9.0a9-cp38-cp38-win_amd64.whl(958.21 KB)
deepspeech_gpu-0.9.0a9-cp35-cp35m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.0a9-cp36-cp36m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.0a9-cp37-cp37m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.0a9-cp38-cp38-win_amd64.whl(21.10 MB)
ds_ctcdecoder-0.9.0a9-cp35-cp35m-win_amd64.whl(2.19 MB)
ds_ctcdecoder-0.9.0a9-cp36-cp36m-win_amd64.whl(2.19 MB)
ds_ctcdecoder-0.9.0a9-cp37-cp37m-win_amd64.whl(2.19 MB)
ds_ctcdecoder-0.9.0a9-cp38-cp38-win_amd64.whl(2.19 MB)
deepspeech-gpu-0.9.0-alpha.9.tgz(43.85 MB)
deepspeech-0.9.0-alpha.9.tgz(34.43 MB)
deepspeech-tflite-0.9.0-alpha.9.tgz(6.89 MB)
libdeepspeech-0.9.0-alpha.9.maven.zip(6.19 MB)
DeepSpeech.0.9.0-alpha.9.nupkg(7.62 MB)
DeepSpeech-GPU.0.9.0-alpha.9.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.0-alpha.9.nupkg(936.15 KB)
v0.9.0-alpha.8(Sep 10, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.41 MB)
native_client.amd64.tflite.osx.tar.xz(2.77 MB)
native_client.amd64.cpu.linux.tar.xz(5.85 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(13.42 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.53 MB)
native_client.armv7.cpu.android.tar.xz(5.78 MB)
native_client.amd64.cpu.win.tar.xz(5.38 MB)
native_client.amd64.cuda.win.tar.xz(13.76 MB)
native_client.amd64.tflite.win.tar.xz(1.08 MB)
native_client.x86_64.tflite.ios.tar.xz(8.05 MB)
native_client.arm64.tflite.ios.tar.xz(7.03 MB)
deepspeech-0.9.0a8-cp35-cp35m-macosx_10_10_x86_64.whl(13.20 MB)
deepspeech-0.9.0a8-cp36-cp36m-macosx_10_10_x86_64.whl(13.20 MB)
deepspeech-0.9.0a8-cp37-cp37m-macosx_10_10_x86_64.whl(13.20 MB)
deepspeech-0.9.0a8-cp38-cp38-macosx_10_10_x86_64.whl(13.20 MB)
ds_ctcdecoder-0.9.0a8-cp35-cp35m-macosx_10_10_x86_64.whl(969.71 KB)
ds_ctcdecoder-0.9.0a8-cp36-cp36m-macosx_10_10_x86_64.whl(969.44 KB)
ds_ctcdecoder-0.9.0a8-cp37-cp37m-macosx_10_10_x86_64.whl(969.77 KB)
ds_ctcdecoder-0.9.0a8-cp38-cp38-macosx_10_10_x86_64.whl(969.96 KB)
deepspeech_tflite-0.9.0a8-cp35-cp35m-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech_tflite-0.9.0a8-cp36-cp36m-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech_tflite-0.9.0a8-cp37-cp37m-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech_tflite-0.9.0a8-cp38-cp38-macosx_10_10_x86_64.whl(2.55 MB)
deepspeech-0.9.0a8-cp35-cp35m-manylinux1_x86_64.whl(8.77 MB)
deepspeech-0.9.0a8-cp36-cp36m-manylinux1_x86_64.whl(8.77 MB)
deepspeech-0.9.0a8-cp37-cp37m-manylinux1_x86_64.whl(8.77 MB)
deepspeech-0.9.0a8-cp38-cp38-manylinux1_x86_64.whl(8.77 MB)
deepspeech_tflite-0.9.0a8-cp35-cp35m-manylinux1_x86_64.whl(1.66 MB)
deepspeech_tflite-0.9.0a8-cp36-cp36m-manylinux1_x86_64.whl(1.66 MB)
deepspeech_tflite-0.9.0a8-cp37-cp37m-manylinux1_x86_64.whl(1.66 MB)
deepspeech_tflite-0.9.0a8-cp38-cp38-manylinux1_x86_64.whl(1.66 MB)
ds_ctcdecoder-0.9.0a8-cp35-cp35m-manylinux1_x86_64.whl(1.87 MB)
ds_ctcdecoder-0.9.0a8-cp36-cp36m-manylinux1_x86_64.whl(1.87 MB)
ds_ctcdecoder-0.9.0a8-cp37-cp37m-manylinux1_x86_64.whl(1.87 MB)
ds_ctcdecoder-0.9.0a8-cp38-cp38-manylinux1_x86_64.whl(1.87 MB)
deepspeech_gpu-0.9.0a8-cp35-cp35m-manylinux1_x86_64.whl(21.23 MB)
deepspeech_gpu-0.9.0a8-cp36-cp36m-manylinux1_x86_64.whl(21.23 MB)
deepspeech_gpu-0.9.0a8-cp37-cp37m-manylinux1_x86_64.whl(21.23 MB)
deepspeech_gpu-0.9.0a8-cp38-cp38-manylinux1_x86_64.whl(21.23 MB)
deepspeech-0.9.0a8-cp37-cp37m-linux_armv7l.whl(1.35 MB)
deepspeech-0.9.0a8-cp37-cp37m-linux_aarch64.whl(1.35 MB)
deepspeech-0.9.0a8-cp35-cp35m-win_amd64.whl(7.63 MB)
deepspeech-0.9.0a8-cp36-cp36m-win_amd64.whl(7.63 MB)
deepspeech-0.9.0a8-cp37-cp37m-win_amd64.whl(7.63 MB)
deepspeech-0.9.0a8-cp38-cp38-win_amd64.whl(7.64 MB)
deepspeech_tflite-0.9.0a8-cp35-cp35m-win_amd64.whl(955.43 KB)
deepspeech_tflite-0.9.0a8-cp36-cp36m-win_amd64.whl(955.43 KB)
deepspeech_tflite-0.9.0a8-cp37-cp37m-win_amd64.whl(955.48 KB)
deepspeech_tflite-0.9.0a8-cp38-cp38-win_amd64.whl(955.78 KB)
deepspeech_gpu-0.9.0a8-cp35-cp35m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.0a8-cp36-cp36m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.0a8-cp37-cp37m-win_amd64.whl(21.10 MB)
deepspeech_gpu-0.9.0a8-cp38-cp38-win_amd64.whl(21.10 MB)
ds_ctcdecoder-0.9.0a8-cp35-cp35m-win_amd64.whl(2.17 MB)
ds_ctcdecoder-0.9.0a8-cp36-cp36m-win_amd64.whl(2.17 MB)
ds_ctcdecoder-0.9.0a8-cp37-cp37m-win_amd64.whl(2.17 MB)
ds_ctcdecoder-0.9.0a8-cp38-cp38-win_amd64.whl(2.17 MB)
deepspeech-gpu-0.9.0-alpha.8.tgz(43.64 MB)
deepspeech-0.9.0-alpha.8.tgz(34.11 MB)
deepspeech-tflite-0.9.0-alpha.8.tgz(6.64 MB)
libdeepspeech-0.9.0-alpha.8.maven.zip(6.18 MB)
DeepSpeech.0.9.0-alpha.8.nupkg(7.61 MB)
DeepSpeech-GPU.0.9.0-alpha.8.nupkg(21.08 MB)
DeepSpeech-TFLite.0.9.0-alpha.8.nupkg(933.72 KB)
v0.8.2(Aug 22, 2020)
General

This is the 0.8.2 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X should work with this release. As with previous releases, this release includes the source code:

v0.8.2.tar.gz

and the acoustic models:

deepspeech-0.8.2-models.pbmm deepspeech-0.8.2-models.tflite

all under the MPL-2.0 license.

The model with the ".pbmm" extension is memory mapped and thus memory efficient and fast to load. The model with the ".tflite" extension is converted to use TFLite, has post-training quantization enabled, and is more suitable for resource constrained environments.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.8.2-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

We also include example audio files:

audio-0.8.2.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.8.2-checkpoint.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Fixed incorrect minimum OS version in macOS binaries (#3259)

Fixed bug in metadata output for Python package client (#3264)

Added ElectronJS v9.2 support (#3266)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001

epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

Like the second, the third phase was started using the weights with the best validation loss from the previous phase. This third phase used the same hyperparameters as the second but with the following changes:

learning_rate 0.000005

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop. The model selected under this process was trained for a sum total of 732522 steps over all phases.

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5

lm_beta_max 5

n_trials 2400

test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, 9.1 and 9.2 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

Linux x86 64 bit with a modern CPU (at least AVX/FMA)

Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3, Pi 4

Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

iOS with Swift bindings (experimental). Tested on iPhone Xs.

TFLite Delegation API is here as a preview: do not expect released models to work out-of-the box, but feedback / PRs is welcome.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.8.2 release

Reuben Morais

Alexandre Lissy

Ptitloup

Source code(tar.gz)
Source code(zip)
deepspeech-0.8.2-models.pbmm(180.16 MB)
deepspeech-0.8.2-models.tflite(45.13 MB)
deepspeech-0.8.2-checkpoint.tar.gz(614.71 MB)
audio-0.8.2.tar.gz(194.54 KB)
native_client.amd64.cpu.osx.tar.xz(8.19 MB)
native_client.amd64.tflite.osx.tar.xz(2.41 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(1.26 MB)
native_client.arm64.cpu.linux.tar.xz(1.28 MB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.8.2-cp35-cp35m-macosx_10_10_x86_64.whl(12.36 MB)
deepspeech-0.8.2-cp36-cp36m-macosx_10_10_x86_64.whl(12.36 MB)
deepspeech-0.8.2-cp37-cp37m-macosx_10_10_x86_64.whl(12.36 MB)
deepspeech-0.8.2-cp38-cp38-macosx_10_10_x86_64.whl(12.36 MB)
ds_ctcdecoder-0.8.2-cp35-cp35m-macosx_10_10_x86_64.whl(965.18 KB)
ds_ctcdecoder-0.8.2-cp36-cp36m-macosx_10_10_x86_64.whl(965.16 KB)
ds_ctcdecoder-0.8.2-cp37-cp37m-macosx_10_10_x86_64.whl(965.19 KB)
ds_ctcdecoder-0.8.2-cp38-cp38-macosx_10_10_x86_64.whl(965.50 KB)
deepspeech_tflite-0.8.2-cp35-cp35m-macosx_10_10_x86_64.whl(2.04 MB)
deepspeech_tflite-0.8.2-cp36-cp36m-macosx_10_10_x86_64.whl(2.04 MB)
deepspeech_tflite-0.8.2-cp37-cp37m-macosx_10_10_x86_64.whl(2.04 MB)
deepspeech_tflite-0.8.2-cp38-cp38-macosx_10_10_x86_64.whl(2.04 MB)
deepspeech-0.8.2-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.2-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.2-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.2-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.8.2-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.2-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.2-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.2-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.2-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.2-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.2-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.2-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.8.2-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.2-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.2-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.2-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.2-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.2-cp37-cp37m-linux_aarch64.whl(1.38 MB)
deepspeech-0.8.2-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.2-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.2-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.2-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.2-cp35-cp35m-win_amd64.whl(894.43 KB)
deepspeech_tflite-0.8.2-cp36-cp36m-win_amd64.whl(894.43 KB)
deepspeech_tflite-0.8.2-cp37-cp37m-win_amd64.whl(894.47 KB)
deepspeech_tflite-0.8.2-cp38-cp38-win_amd64.whl(894.78 KB)
deepspeech_gpu-0.8.2-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.2-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.2-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.2-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.2-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.2-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.2-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.2-cp38-cp38-win_amd64.whl(2.16 MB)
deepspeech-gpu-0.8.2.tgz(38.49 MB)
deepspeech-0.8.2.tgz(30.74 MB)
deepspeech-tflite-0.8.2.tgz(5.97 MB)
libdeepspeech-0.8.2.maven.zip(5.79 MB)
DeepSpeech.0.8.2.nupkg(6.04 MB)
DeepSpeech-GPU.0.8.2.nupkg(18.22 MB)
DeepSpeech-TFLite.0.8.2.nupkg(872.73 KB)
deepspeech-0.8.2-models.scorer(909.19 MB)
v0.9.0-alpha.7(Aug 18, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.21 MB)
native_client.amd64.tflite.osx.tar.xz(2.76 MB)
native_client.amd64.cpu.linux.tar.xz(5.70 MB)
native_client.amd64.tflite.linux.tar.xz(1.87 MB)
native_client.amd64.cuda.linux.tar.xz(12.86 MB)
native_client.rpi3.cpu.linux.tar.xz(1.34 MB)
native_client.arm64.cpu.linux.tar.xz(1.27 MB)
native_client.arm64.cpu.android.tar.xz(6.53 MB)
native_client.armv7.cpu.android.tar.xz(5.78 MB)
native_client.amd64.cpu.win.tar.xz(5.28 MB)
native_client.amd64.cuda.win.tar.xz(13.26 MB)
native_client.amd64.tflite.win.tar.xz(1.08 MB)
native_client.x86_64.tflite.ios.tar.xz(8.05 MB)
native_client.arm64.tflite.ios.tar.xz(7.03 MB)
mozilla_voice_stt-0.9.0a7-cp35-cp35m-macosx_10_10_x86_64.whl(12.48 MB)
mozilla_voice_stt-0.9.0a7-cp36-cp36m-macosx_10_10_x86_64.whl(12.48 MB)
mozilla_voice_stt-0.9.0a7-cp37-cp37m-macosx_10_10_x86_64.whl(12.48 MB)
mozilla_voice_stt-0.9.0a7-cp38-cp38-macosx_10_10_x86_64.whl(12.48 MB)
mvs_ctcdecoder-0.9.0a7-cp35-cp35m-macosx_10_10_x86_64.whl(969.69 KB)
mvs_ctcdecoder-0.9.0a7-cp36-cp36m-macosx_10_10_x86_64.whl(969.68 KB)
mvs_ctcdecoder-0.9.0a7-cp37-cp37m-macosx_10_10_x86_64.whl(969.68 KB)
mvs_ctcdecoder-0.9.0a7-cp38-cp38-macosx_10_10_x86_64.whl(969.90 KB)
mozilla_voice_stt_tflite-0.9.0a7-cp35-cp35m-macosx_10_10_x86_64.whl(2.53 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp36-cp36m-macosx_10_10_x86_64.whl(2.53 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp37-cp37m-macosx_10_10_x86_64.whl(2.53 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp38-cp38-macosx_10_10_x86_64.whl(2.53 MB)
mozilla_voice_stt-0.9.0a7-cp35-cp35m-manylinux1_x86_64.whl(8.31 MB)
mozilla_voice_stt-0.9.0a7-cp36-cp36m-manylinux1_x86_64.whl(8.31 MB)
mozilla_voice_stt-0.9.0a7-cp37-cp37m-manylinux1_x86_64.whl(8.31 MB)
mozilla_voice_stt-0.9.0a7-cp38-cp38-manylinux1_x86_64.whl(8.31 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp35-cp35m-manylinux1_x86_64.whl(1.66 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp36-cp36m-manylinux1_x86_64.whl(1.66 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp37-cp37m-manylinux1_x86_64.whl(1.66 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp38-cp38-manylinux1_x86_64.whl(1.66 MB)
mvs_ctcdecoder-0.9.0a7-cp35-cp35m-manylinux1_x86_64.whl(1.87 MB)
mvs_ctcdecoder-0.9.0a7-cp36-cp36m-manylinux1_x86_64.whl(1.87 MB)
mvs_ctcdecoder-0.9.0a7-cp37-cp37m-manylinux1_x86_64.whl(1.87 MB)
mvs_ctcdecoder-0.9.0a7-cp38-cp38-manylinux1_x86_64.whl(1.87 MB)
mozilla_voice_stt_cuda-0.9.0a7-cp35-cp35m-manylinux1_x86_64.whl(19.48 MB)
mozilla_voice_stt_cuda-0.9.0a7-cp36-cp36m-manylinux1_x86_64.whl(19.48 MB)
mozilla_voice_stt_cuda-0.9.0a7-cp37-cp37m-manylinux1_x86_64.whl(19.48 MB)
mozilla_voice_stt_cuda-0.9.0a7-cp38-cp38-manylinux1_x86_64.whl(19.48 MB)
mozilla_voice_stt-0.9.0a7-cp37-cp37m-linux_armv7l.whl(1.35 MB)
mozilla_voice_stt-0.9.0a7-cp37-cp37m-linux_aarch64.whl(1.35 MB)
mozilla_voice_stt-0.9.0a7-cp35-cp35m-win_amd64.whl(7.40 MB)
mozilla_voice_stt-0.9.0a7-cp36-cp36m-win_amd64.whl(7.40 MB)
mozilla_voice_stt-0.9.0a7-cp37-cp37m-win_amd64.whl(7.40 MB)
mozilla_voice_stt-0.9.0a7-cp38-cp38-win_amd64.whl(7.40 MB)
mozilla_voice_stt_tflite-0.9.0a7-cp35-cp35m-win_amd64.whl(955.39 KB)
mozilla_voice_stt_tflite-0.9.0a7-cp36-cp36m-win_amd64.whl(955.39 KB)
mozilla_voice_stt_tflite-0.9.0a7-cp37-cp37m-win_amd64.whl(955.43 KB)
mozilla_voice_stt_tflite-0.9.0a7-cp38-cp38-win_amd64.whl(955.73 KB)
mozilla_voice_stt_cuda-0.9.0a7-cp36-cp36m-win_amd64.whl(19.64 MB)
mozilla_voice_stt_cuda-0.9.0a7-cp37-cp37m-win_amd64.whl(19.64 MB)
mozilla_voice_stt_cuda-0.9.0a7-cp38-cp38-win_amd64.whl(19.64 MB)
mvs_ctcdecoder-0.9.0a7-cp35-cp35m-win_amd64.whl(2.17 MB)
mvs_ctcdecoder-0.9.0a7-cp36-cp36m-win_amd64.whl(2.17 MB)
mvs_ctcdecoder-0.9.0a7-cp37-cp37m-win_amd64.whl(2.17 MB)
mvs_ctcdecoder-0.9.0a7-cp38-cp38-win_amd64.whl(2.17 MB)
mozilla-voice-stt-cuda-0.9.0-alpha.7.tgz(40.33 MB)
mozilla-voice-stt-0.9.0-alpha.7.tgz(32.53 MB)
mozilla-voice-stt-tflite-0.9.0-alpha.7.tgz(6.51 MB)
libmozillavoicestt-0.9.0-alpha.7.maven.zip(6.18 MB)
Mozilla-Voice-STT.0.9.0-alpha.7.nupkg(7.38 MB)
Mozilla-Voice-STT-CUDA.0.9.0-alpha.7.nupkg(19.62 MB)
Mozilla-Voice-STT-TFLite.0.9.0-alpha.7.nupkg(934.01 KB)
v0.9.0-alpha.6(Aug 12, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.12 MB)
native_client.amd64.tflite.osx.tar.xz(2.40 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(1.26 MB)
native_client.arm64.cpu.linux.tar.xz(1.28 MB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
native_client.x86_64.tflite.ios.tar.xz(6.54 MB)
native_client.arm64.tflite.ios.tar.xz(5.87 MB)
mozilla_voice_stt-0.9.0a6-cp35-cp35m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a6-cp36-cp36m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a6-cp37-cp37m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a6-cp38-cp38-macosx_10_10_x86_64.whl(12.25 MB)
mvs_ctcdecoder-0.9.0a6-cp35-cp35m-macosx_10_10_x86_64.whl(965.74 KB)
mvs_ctcdecoder-0.9.0a6-cp36-cp36m-macosx_10_10_x86_64.whl(965.75 KB)
mvs_ctcdecoder-0.9.0a6-cp37-cp37m-macosx_10_10_x86_64.whl(965.77 KB)
mvs_ctcdecoder-0.9.0a6-cp38-cp38-macosx_10_10_x86_64.whl(966.26 KB)
mozilla_voice_stt_tflite-0.9.0a6-cp35-cp35m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp36-cp36m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp37-cp37m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp38-cp38-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt-0.9.0a6-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a6-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a6-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a6-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
mvs_ctcdecoder-0.9.0a6-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
mvs_ctcdecoder-0.9.0a6-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
mvs_ctcdecoder-0.9.0a6-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
mvs_ctcdecoder-0.9.0a6-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
mozilla_voice_stt_cuda-0.9.0a6-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a6-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a6-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a6-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt-0.9.0a6-cp37-cp37m-linux_armv7l.whl(1.24 MB)
mozilla_voice_stt-0.9.0a6-cp37-cp37m-linux_aarch64.whl(1.38 MB)
mozilla_voice_stt-0.9.0a6-cp35-cp35m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a6-cp36-cp36m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a6-cp37-cp37m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a6-cp38-cp38-win_amd64.whl(6.06 MB)
mozilla_voice_stt_tflite-0.9.0a6-cp35-cp35m-win_amd64.whl(894.70 KB)
mozilla_voice_stt_tflite-0.9.0a6-cp36-cp36m-win_amd64.whl(894.70 KB)
mozilla_voice_stt_tflite-0.9.0a6-cp37-cp37m-win_amd64.whl(894.74 KB)
mozilla_voice_stt_tflite-0.9.0a6-cp38-cp38-win_amd64.whl(895.05 KB)
mozilla_voice_stt_cuda-0.9.0a6-cp35-cp35m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a6-cp36-cp36m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a6-cp37-cp37m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a6-cp38-cp38-win_amd64.whl(18.24 MB)
mvs_ctcdecoder-0.9.0a6-cp35-cp35m-win_amd64.whl(2.16 MB)
mvs_ctcdecoder-0.9.0a6-cp36-cp36m-win_amd64.whl(2.16 MB)
mvs_ctcdecoder-0.9.0a6-cp37-cp37m-win_amd64.whl(2.16 MB)
mvs_ctcdecoder-0.9.0a6-cp38-cp38-win_amd64.whl(2.16 MB)
mozilla-voice-stt-cuda-0.9.0-alpha.6.tgz(38.38 MB)
mozilla-voice-stt-0.9.0-alpha.6.tgz(30.47 MB)
mozilla-voice-stt-tflite-0.9.0-alpha.6.tgz(5.84 MB)
libmozillavoicestt-0.9.0-alpha.6.maven.zip(5.79 MB)
Mozilla-Voice-STT.0.9.0-alpha.6.nupkg(6.04 MB)
Mozilla-Voice-STT-CUDA.0.9.0-alpha.6.nupkg(18.22 MB)
Mozilla-Voice-STT-TFLite.0.9.0-alpha.6.nupkg(873.33 KB)
v0.8.1(Aug 11, 2020)
General

This is the 0.8.1 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X should work with this release. As with previous releases, this release includes the source code:

v0.8.1.tar.gz

and the acoustic models:

deepspeech-0.8.1-models.pbmm deepspeech-0.8.1-models.tflite

all under the MPL-2.0 license.

The model with the ".pbmm" extension is memory mapped and thus memory efficient and fast to load. The model with the ".tflite" extension is converted to use TFLite, has post-training quantization enabled, and is more suitable for resource constrained environments.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.8.1-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

We also include example audio files:

audio-0.8.1.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.8.1-checkpoint.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Fixed references to older models in the docs and swift code (#3216)

Fixed incorrect linkage, -shared was forced (#3207)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001

epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

Like the second, the third phase was started using the weights with the best validation loss from the previous phase. This third phase used the same hyperparameters as the second but with the following changes:

learning_rate 0.000005

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop. The model selected under this process was trained for a sum total of 732522 steps over all phases.

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5

lm_beta_max 5

n_trials 2400

test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, and 9.1 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

Linux x86 64 bit with a modern CPU (at least AVX/FMA)

Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3, Pi 4

Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

iOS with Swift bindings (experimental). Tested on iPhone Xs.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.8.1 release

Reuben Morais

Alexandre Lissy

Tilman Kamp

Daniel

Karan Sagar

Qian Xiao

Carlos Fonseca

Erik Ziegler

Karthikeyan Singaravelan

Source code(tar.gz)
Source code(zip)
deepspeech-0.8.1-checkpoint.tar.gz(614.70 MB)
audio-0.8.1.tar.gz(192.50 KB)
deepspeech-0.8.1-models.tflite(45.13 MB)
deepspeech-0.8.1-models.pbmm(180.16 MB)
deepspeech-0.8.1-models.scorer(909.19 MB)
native_client.amd64.cpu.osx.tar.xz(8.12 MB)
native_client.amd64.tflite.osx.tar.xz(2.40 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(1.26 MB)
native_client.arm64.cpu.linux.tar.xz(1.28 MB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.8.1-cp35-cp35m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.1-cp36-cp36m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.1-cp37-cp37m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.1-cp38-cp38-macosx_10_10_x86_64.whl(12.25 MB)
ds_ctcdecoder-0.8.1-cp35-cp35m-macosx_10_10_x86_64.whl(965.59 KB)
ds_ctcdecoder-0.8.1-cp36-cp36m-macosx_10_10_x86_64.whl(965.61 KB)
ds_ctcdecoder-0.8.1-cp37-cp37m-macosx_10_10_x86_64.whl(965.59 KB)
ds_ctcdecoder-0.8.1-cp38-cp38-macosx_10_10_x86_64.whl(966.11 KB)
deepspeech_tflite-0.8.1-cp35-cp35m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.1-cp36-cp36m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.1-cp37-cp37m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.1-cp38-cp38-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech-0.8.1-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.1-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.1-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.1-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.8.1-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.1-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.1-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.1-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.1-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.1-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.1-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.1-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.8.1-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.1-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.1-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.1-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.1-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.1-cp37-cp37m-linux_aarch64.whl(1.38 MB)
deepspeech-0.8.1-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.1-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.1-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.1-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.1-cp35-cp35m-win_amd64.whl(894.43 KB)
deepspeech_tflite-0.8.1-cp36-cp36m-win_amd64.whl(894.43 KB)
deepspeech_tflite-0.8.1-cp37-cp37m-win_amd64.whl(894.48 KB)
deepspeech_tflite-0.8.1-cp38-cp38-win_amd64.whl(894.78 KB)
deepspeech_gpu-0.8.1-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.1-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.1-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.1-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.1-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.1-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.1-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.1-cp38-cp38-win_amd64.whl(2.16 MB)
deepspeech-gpu-0.8.1.tgz(38.38 MB)
deepspeech-0.8.1.tgz(30.47 MB)
deepspeech-tflite-0.8.1.tgz(5.84 MB)
libdeepspeech-0.8.1.maven.zip(5.79 MB)
DeepSpeech.0.8.1.nupkg(6.04 MB)
DeepSpeech-GPU.0.8.1.nupkg(18.22 MB)
DeepSpeech-TFLite.0.8.1.nupkg(872.73 KB)
v0.9.0-alpha.5(Aug 7, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.12 MB)
native_client.amd64.tflite.osx.tar.xz(2.40 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(1.26 MB)
native_client.arm64.cpu.linux.tar.xz(1.28 MB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
native_client.x86_64.tflite.ios.tar.xz(6.54 MB)
native_client.arm64.tflite.ios.tar.xz(5.86 MB)
mozilla_voice_stt-0.9.0a5-cp35-cp35m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a5-cp36-cp36m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a5-cp37-cp37m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a5-cp38-cp38-macosx_10_10_x86_64.whl(12.25 MB)
ds_ctcdecoder-0.9.0a5-cp35-cp35m-macosx_10_10_x86_64.whl(965.60 KB)
ds_ctcdecoder-0.9.0a5-cp36-cp36m-macosx_10_10_x86_64.whl(965.62 KB)
ds_ctcdecoder-0.9.0a5-cp37-cp37m-macosx_10_10_x86_64.whl(965.64 KB)
ds_ctcdecoder-0.9.0a5-cp38-cp38-macosx_10_10_x86_64.whl(966.07 KB)
mozilla_voice_stt_tflite-0.9.0a5-cp35-cp35m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp36-cp36m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp37-cp37m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp38-cp38-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt-0.9.0a5-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a5-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a5-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a5-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.9.0a5-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a5-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a5-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a5-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
mozilla_voice_stt_cuda-0.9.0a5-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a5-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a5-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a5-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt-0.9.0a5-cp37-cp37m-linux_armv7l.whl(1.24 MB)
mozilla_voice_stt-0.9.0a5-cp37-cp37m-linux_aarch64.whl(1.38 MB)
mozilla_voice_stt-0.9.0a5-cp35-cp35m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a5-cp36-cp36m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a5-cp37-cp37m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a5-cp38-cp38-win_amd64.whl(6.06 MB)
mozilla_voice_stt_tflite-0.9.0a5-cp35-cp35m-win_amd64.whl(894.81 KB)
mozilla_voice_stt_tflite-0.9.0a5-cp36-cp36m-win_amd64.whl(894.81 KB)
mozilla_voice_stt_tflite-0.9.0a5-cp37-cp37m-win_amd64.whl(894.85 KB)
mozilla_voice_stt_tflite-0.9.0a5-cp38-cp38-win_amd64.whl(895.15 KB)
mozilla_voice_stt_cuda-0.9.0a5-cp35-cp35m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a5-cp36-cp36m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a5-cp37-cp37m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a5-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.9.0a5-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a5-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a5-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a5-cp38-cp38-win_amd64.whl(2.16 MB)
mozilla-voice-stt-cuda-0.9.0-alpha.5.tgz(38.38 MB)
mozilla-voice-stt-0.9.0-alpha.5.tgz(30.47 MB)
mozilla-voice-stt-tflite-0.9.0-alpha.5.tgz(5.84 MB)
libmozillavoicestt-0.9.0-alpha.5.maven.zip(5.79 MB)
Mozilla-Voice-STT.0.9.0-alpha.5.nupkg(6.04 MB)
Mozilla-Voice-STT-CUDA.0.9.0-alpha.5.nupkg(18.22 MB)
Mozilla-Voice-STT-TFLite.0.9.0-alpha.5.nupkg(873.43 KB)
v0.9.0-alpha.4(Aug 6, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.14 MB)
native_client.amd64.tflite.osx.tar.xz(2.40 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(1.26 MB)
native_client.arm64.cpu.linux.tar.xz(1.28 MB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
native_client.x86_64.tflite.ios.tar.xz(6.54 MB)
native_client.arm64.tflite.ios.tar.xz(5.87 MB)
mozilla_voice_stt-0.9.0a4-cp35-cp35m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a4-cp36-cp36m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a4-cp37-cp37m-macosx_10_10_x86_64.whl(12.25 MB)
mozilla_voice_stt-0.9.0a4-cp38-cp38-macosx_10_10_x86_64.whl(12.25 MB)
ds_ctcdecoder-0.9.0a4-cp35-cp35m-macosx_10_10_x86_64.whl(965.42 KB)
ds_ctcdecoder-0.9.0a4-cp36-cp36m-macosx_10_10_x86_64.whl(965.48 KB)
ds_ctcdecoder-0.9.0a4-cp37-cp37m-macosx_10_10_x86_64.whl(965.50 KB)
ds_ctcdecoder-0.9.0a4-cp38-cp38-macosx_10_10_x86_64.whl(965.72 KB)
mozilla_voice_stt_tflite-0.9.0a4-cp35-cp35m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp36-cp36m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp37-cp37m-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp38-cp38-macosx_10_10_x86_64.whl(2.02 MB)
mozilla_voice_stt-0.9.0a4-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a4-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a4-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt-0.9.0a4-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.9.0a4-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a4-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a4-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a4-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
mozilla_voice_stt_cuda-0.9.0a4-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a4-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a4-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt_cuda-0.9.0a4-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
mozilla_voice_stt-0.9.0a4-cp37-cp37m-linux_armv7l.whl(1.24 MB)
mozilla_voice_stt-0.9.0a4-cp37-cp37m-linux_aarch64.whl(1.38 MB)
mozilla_voice_stt-0.9.0a4-cp35-cp35m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a4-cp36-cp36m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a4-cp37-cp37m-win_amd64.whl(6.06 MB)
mozilla_voice_stt-0.9.0a4-cp38-cp38-win_amd64.whl(6.06 MB)
mozilla_voice_stt_tflite-0.9.0a4-cp35-cp35m-win_amd64.whl(894.81 KB)
mozilla_voice_stt_tflite-0.9.0a4-cp36-cp36m-win_amd64.whl(894.81 KB)
mozilla_voice_stt_tflite-0.9.0a4-cp37-cp37m-win_amd64.whl(894.85 KB)
mozilla_voice_stt_tflite-0.9.0a4-cp38-cp38-win_amd64.whl(895.15 KB)
mozilla_voice_stt_cuda-0.9.0a4-cp35-cp35m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a4-cp36-cp36m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a4-cp37-cp37m-win_amd64.whl(18.24 MB)
mozilla_voice_stt_cuda-0.9.0a4-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.9.0a4-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a4-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a4-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a4-cp38-cp38-win_amd64.whl(2.16 MB)
mozilla_voice_stt_cuda-0.9.0-alpha.4.tgz(38.38 MB)
mozilla_voice_stt-0.9.0-alpha.4.tgz(30.47 MB)
mozilla_voice_stt_tflite-0.9.0-alpha.4.tgz(5.84 MB)
libmozillavoicestt-0.9.0-alpha.4.maven.zip(5.79 MB)
Mozilla-Voice-STT.0.9.0-alpha.4.nupkg(6.04 MB)
Mozilla-Voice-STT-GPU.0.9.0-alpha.4.nupkg(18.22 MB)
Mozilla-Voice-STT-TFLite.0.9.0-alpha.4.nupkg(873.43 KB)
v0.8.0(Jul 30, 2020)
General

This is the 0.8.0 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not completely backwards compatible with earlier versions. However, models exported for 0.7.X should work with this release. As with previous releases, this release includes the source code:

v0.8.0.tar.gz

and the acoustic models:

deepspeech-0.8.0-models.pbmm deepspeech-0.8.0-models.tflite

all under the MPL-2.0 license.

The model with the ".pbmm" extension is memory mapped and thus memory efficient and fast to load. The model with the ".tflite" extension is converted to use TFLite, has post-training quantization enabled, and is more suitable for resource constrained environments.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

Note that the model currently performs best in low-noise environments with clear recordings and has a bias towards US male accents. This does not mean the model cannot be used outside of these conditions, but that accuracy may be lower. Some users may need to train the model further to meet their intended use-case.

In addition we release the scorer:

deepspeech-0.8.0-models.scorer

which takes the place of the language model and trie in older releases and which is also under the MPL-2.0 license.

We also include example audio files:

audio-0.8.0.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.8.0-checkpoint.tar.gz

which are under the MPL-2.0 license and can be used as the basis for further fine-tuning.

Notable changes from the previous release

Removed scorer file from Git LFS (#3192)

Added iOS microphone streaming (#3191)

Added ability to reverse data set order to quickly probe OOM conditions (#3177)

Build and publish iOS framework in GitHub release files (#3173)

Added iOS support (#3150)

Add csv output to SDB building (#3147)

Add augmentation support to SDB building (#3145)

Fixed some style inconsistencies in Java bindings (#3135)

Added methods to check for label presence in the Alphabet (#3131)

Fixed some regressions from Alphabet refactoring (#3125)

Re-wrote generate_package.py in C++ to avoid training dependencies (#3113)

Added building of kenlm in training container image (#3108)

Added TensorFlow as a submodule (#3107)

Use TensorFlow r2.2 and build TFLite with Ruy (enables threaded computations on TFLite models) (#2952)

Enable TFLite delegate support (#3100)

Add UWP Nuget packing support (#3100)

Added warp augmentation (#3091)

Fix of overlay augmentation hang after first epoch (#3090)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to some previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001

epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

Like the second, the third phase was started using the weights with the best validation loss from the previous phase. This third phase used the same hyperparameters as the second but with the following changes:

learning_rate 0.000005

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop. The model selected under this process was trained for a sum total of 732522 steps over all phases.

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5

lm_beta_max 5

n_trials 2400

test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0, 9.0, and 9.1 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14, and 10.15

Linux x86 64 bit with a modern CPU (at least AVX/FMA)

Linux x86 64 bit with a modern CPU (at least AVX/FMA) + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3, Pi 4

Linux/ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android (7.0-11.0) bindings (+ demo app). Tested on Google Pixel 2 ; Sony Xperia Z Premium ; Nokia 1.3, TF Lite model only.

iOS with Swift bindings (experimental). Tested on iPhone Xs.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.8.0 release

Reuben Morais

Alexandre Lissy

Tilman Kamp

Daniel

Karan Sagar

Qian Xiao

Carlos Fonseca

Erik Ziegler

Karthikeyan Singaravelan

Source code(tar.gz)
Source code(zip)
audio-0.8.0.tar.gz(193.95 KB)
deepspeech-0.8.0-models.scorer(909.19 MB)
deepspeech-0.8.0-checkpoint.tar.gz(614.71 MB)
deepspeech-0.8.0-models.pbmm(180.16 MB)
deepspeech-0.8.0-models.tflite(45.13 MB)
native_client.amd64.cpu.osx.tar.xz(8.13 MB)
native_client.amd64.tflite.osx.tar.xz(2.40 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(940.10 KB)
native_client.arm64.cpu.linux.tar.xz(941.79 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.8.0-cp35-cp35m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.0-cp36-cp36m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.0-cp37-cp37m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.0-cp38-cp38-macosx_10_10_x86_64.whl(12.25 MB)
ds_ctcdecoder-0.8.0-cp35-cp35m-macosx_10_10_x86_64.whl(965.73 KB)
ds_ctcdecoder-0.8.0-cp36-cp36m-macosx_10_10_x86_64.whl(965.71 KB)
ds_ctcdecoder-0.8.0-cp37-cp37m-macosx_10_10_x86_64.whl(965.75 KB)
ds_ctcdecoder-0.8.0-cp38-cp38-macosx_10_10_x86_64.whl(966.19 KB)
deepspeech_tflite-0.8.0-cp35-cp35m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.0-cp36-cp36m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.0-cp37-cp37m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.0-cp38-cp38-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech-0.8.0-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.8.0-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.0-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.8.0-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.0-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.0-cp37-cp37m-linux_aarch64.whl(1.38 MB)
deepspeech-0.8.0-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.0-cp35-cp35m-win_amd64.whl(894.43 KB)
deepspeech_tflite-0.8.0-cp36-cp36m-win_amd64.whl(894.43 KB)
deepspeech_tflite-0.8.0-cp37-cp37m-win_amd64.whl(894.47 KB)
deepspeech_tflite-0.8.0-cp38-cp38-win_amd64.whl(894.78 KB)
deepspeech_gpu-0.8.0-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.0-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0-cp38-cp38-win_amd64.whl(2.16 MB)
deepspeech-gpu-0.8.0.tgz(38.38 MB)
deepspeech-0.8.0.tgz(30.47 MB)
deepspeech-tflite-0.8.0.tgz(5.84 MB)
libdeepspeech-0.8.0.maven.zip(5.79 MB)
DeepSpeech.0.8.0.nupkg(6.04 MB)
DeepSpeech-GPU.0.8.0.nupkg(18.22 MB)
DeepSpeech-TFLite.0.8.0.nupkg(872.73 KB)
native_client.x86_64.tflite.ios.tar.xz(6.54 MB)
deepspeech_ios.framework.x86_64.tar.xz(34.00 KB)
deepspeech_ios.framework.arm64.tar.xz(33.84 KB)
native_client.arm64.tflite.ios.tar.xz(5.87 MB)
v0.8.0-alpha.8(Jul 27, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.13 MB)
native_client.amd64.tflite.osx.tar.xz(2.40 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(940.14 KB)
native_client.arm64.cpu.linux.tar.xz(941.62 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.8.0a8-cp35-cp35m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.0a8-cp36-cp36m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.0a8-cp37-cp37m-macosx_10_10_x86_64.whl(12.25 MB)
deepspeech-0.8.0a8-cp38-cp38-macosx_10_10_x86_64.whl(12.25 MB)
ds_ctcdecoder-0.8.0a8-cp35-cp35m-macosx_10_10_x86_64.whl(966.77 KB)
ds_ctcdecoder-0.8.0a8-cp36-cp36m-macosx_10_10_x86_64.whl(966.70 KB)
ds_ctcdecoder-0.8.0a8-cp37-cp37m-macosx_10_10_x86_64.whl(966.72 KB)
ds_ctcdecoder-0.8.0a8-cp38-cp38-macosx_10_10_x86_64.whl(967.21 KB)
deepspeech_tflite-0.8.0a8-cp35-cp35m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.0a8-cp36-cp36m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.0a8-cp37-cp37m-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech_tflite-0.8.0a8-cp38-cp38-macosx_10_10_x86_64.whl(2.02 MB)
deepspeech-0.8.0a8-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a8-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a8-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a8-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.8.0a8-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a8-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a8-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a8-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.0a8-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a8-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a8-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a8-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.8.0a8-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a8-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a8-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a8-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.0a8-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.0a8-cp37-cp37m-linux_aarch64.whl(1.38 MB)
deepspeech-0.8.0a8-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a8-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a8-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a8-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.0a8-cp35-cp35m-win_amd64.whl(894.51 KB)
deepspeech_tflite-0.8.0a8-cp36-cp36m-win_amd64.whl(894.51 KB)
deepspeech_tflite-0.8.0a8-cp37-cp37m-win_amd64.whl(894.56 KB)
deepspeech_tflite-0.8.0a8-cp38-cp38-win_amd64.whl(894.86 KB)
deepspeech_gpu-0.8.0a8-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a8-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a8-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a8-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.0a8-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0a8-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0a8-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0a8-cp38-cp38-win_amd64.whl(2.16 MB)
deepspeech-gpu-0.8.0-alpha.8.tgz(38.38 MB)
deepspeech-0.8.0-alpha.8.tgz(30.47 MB)
deepspeech-tflite-0.8.0-alpha.8.tgz(5.84 MB)
libdeepspeech-0.8.0-alpha.8.maven.zip(5.79 MB)
DeepSpeech.0.8.0-alpha.8.nupkg(6.04 MB)
DeepSpeech-GPU.0.8.0-alpha.8.nupkg(18.22 MB)
DeepSpeech-TFLite.0.8.0-alpha.8.nupkg(872.79 KB)
kenlm.scorer(909.19 MB)
v0.9.0-alpha.3(Jul 15, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.22 MB)
native_client.amd64.tflite.osx.tar.xz(2.43 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.81 MB)
native_client.rpi3.cpu.linux.tar.xz(940.28 KB)
native_client.arm64.cpu.linux.tar.xz(941.91 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.52 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.9.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a3-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.9.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(992.02 KB)
ds_ctcdecoder-0.9.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(991.98 KB)
ds_ctcdecoder-0.9.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(992.02 KB)
ds_ctcdecoder-0.9.0a3-cp38-cp38-macosx_10_10_x86_64.whl(992.48 KB)
deepspeech_tflite-0.9.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a3-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.9.0a3-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a3-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a3-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a3-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.9.0a3-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a3-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a3-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a3-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.9.0a3-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a3-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a3-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a3-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.9.0a3-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a3-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a3-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a3-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.9.0a3-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.9.0a3-cp37-cp37m-linux_aarch64.whl(1.38 MB)
deepspeech-0.9.0a3-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a3-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a3-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a3-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.9.0a3-cp35-cp35m-win_amd64.whl(894.49 KB)
deepspeech_tflite-0.9.0a3-cp36-cp36m-win_amd64.whl(894.49 KB)
deepspeech_tflite-0.9.0a3-cp37-cp37m-win_amd64.whl(894.54 KB)
deepspeech_tflite-0.9.0a3-cp38-cp38-win_amd64.whl(894.84 KB)
deepspeech_gpu-0.9.0a3-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a3-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a3-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a3-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.9.0a3-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a3-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a3-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a3-cp38-cp38-win_amd64.whl(2.16 MB)
deepspeech-gpu-0.9.0-alpha.3.tgz(38.38 MB)
deepspeech-0.9.0-alpha.3.tgz(30.55 MB)
deepspeech-tflite-0.9.0-alpha.3.tgz(5.91 MB)
libdeepspeech-0.9.0-alpha.3.maven.zip(5.79 MB)
DeepSpeech.0.9.0-alpha.3.nupkg(6.04 MB)
DeepSpeech-GPU.0.9.0-alpha.3.nupkg(18.21 MB)
DeepSpeech-TFLite.0.9.0-alpha.3.nupkg(868.56 KB)
kenlm.scorer(909.19 MB)
v0.8.0-alpha.7(Jul 15, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.22 MB)
native_client.amd64.tflite.osx.tar.xz(2.43 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(940.06 KB)
native_client.arm64.cpu.linux.tar.xz(941.78 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.52 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.8.0a7-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a7-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a7-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a7-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.8.0a7-cp35-cp35m-macosx_10_10_x86_64.whl(992.35 KB)
ds_ctcdecoder-0.8.0a7-cp36-cp36m-macosx_10_10_x86_64.whl(992.26 KB)
ds_ctcdecoder-0.8.0a7-cp37-cp37m-macosx_10_10_x86_64.whl(992.33 KB)
ds_ctcdecoder-0.8.0a7-cp38-cp38-macosx_10_10_x86_64.whl(992.71 KB)
deepspeech_tflite-0.8.0a7-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a7-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a7-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a7-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.8.0a7-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a7-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a7-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a7-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.8.0a7-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a7-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a7-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a7-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.0a7-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a7-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a7-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a7-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.8.0a7-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a7-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a7-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a7-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.0a7-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.0a7-cp37-cp37m-linux_aarch64.whl(1.38 MB)
deepspeech-0.8.0a7-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a7-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a7-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a7-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.0a7-cp35-cp35m-win_amd64.whl(894.49 KB)
deepspeech_tflite-0.8.0a7-cp36-cp36m-win_amd64.whl(894.49 KB)
deepspeech_tflite-0.8.0a7-cp37-cp37m-win_amd64.whl(894.54 KB)
deepspeech_tflite-0.8.0a7-cp38-cp38-win_amd64.whl(894.84 KB)
deepspeech_gpu-0.8.0a7-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a7-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a7-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a7-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.0a7-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0a7-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0a7-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.8.0a7-cp38-cp38-win_amd64.whl(2.16 MB)
deepspeech-gpu-0.8.0-alpha.7.tgz(38.38 MB)
deepspeech-0.8.0-alpha.7.tgz(30.55 MB)
deepspeech-tflite-0.8.0-alpha.7.tgz(5.91 MB)
libdeepspeech-0.8.0-alpha.7.maven.zip(5.79 MB)
DeepSpeech.0.8.0-alpha.7.nupkg(6.04 MB)
DeepSpeech-GPU.0.8.0-alpha.7.nupkg(18.21 MB)
DeepSpeech-TFLite.0.8.0-alpha.7.nupkg(868.56 KB)
kenlm.scorer(909.19 MB)
v0.9.0-alpha.2(Jul 7, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.22 MB)
native_client.amd64.tflite.osx.tar.xz(2.43 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(940.13 KB)
native_client.arm64.cpu.linux.tar.xz(941.69 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.53 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.9.0a2-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a2-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a2-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a2-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.9.0a2-cp35-cp35m-macosx_10_10_x86_64.whl(992.27 KB)
ds_ctcdecoder-0.9.0a2-cp36-cp36m-macosx_10_10_x86_64.whl(992.17 KB)
ds_ctcdecoder-0.9.0a2-cp37-cp37m-macosx_10_10_x86_64.whl(992.44 KB)
ds_ctcdecoder-0.9.0a2-cp38-cp38-macosx_10_10_x86_64.whl(992.89 KB)
deepspeech_tflite-0.9.0a2-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a2-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a2-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a2-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.9.0a2-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a2-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a2-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a2-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.9.0a2-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a2-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a2-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a2-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.9.0a2-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a2-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a2-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a2-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.9.0a2-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a2-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a2-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a2-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.9.0a2-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.9.0a2-cp37-cp37m-linux_aarch64.whl(1.38 MB)
deepspeech-0.9.0a2-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a2-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a2-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a2-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.9.0a2-cp35-cp35m-win_amd64.whl(892.50 KB)
deepspeech_tflite-0.9.0a2-cp36-cp36m-win_amd64.whl(892.49 KB)
deepspeech_tflite-0.9.0a2-cp37-cp37m-win_amd64.whl(892.56 KB)
deepspeech_tflite-0.9.0a2-cp38-cp38-win_amd64.whl(892.87 KB)
deepspeech_gpu-0.9.0a2-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a2-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a2-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a2-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.9.0a2-cp35-cp35m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a2-cp36-cp36m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a2-cp37-cp37m-win_amd64.whl(2.16 MB)
ds_ctcdecoder-0.9.0a2-cp38-cp38-win_amd64.whl(2.16 MB)
deepspeech-gpu-0.9.0-alpha.2.tgz(38.28 MB)
deepspeech-0.9.0-alpha.2.tgz(30.39 MB)
deepspeech-tflite-0.9.0-alpha.2.tgz(5.79 MB)
libdeepspeech-0.9.0-alpha.2.maven.zip(5.79 MB)
DeepSpeech.0.9.0-alpha.2.nupkg(6.03 MB)
DeepSpeech-GPU.0.9.0-alpha.2.nupkg(18.21 MB)
DeepSpeech-TFLite.0.9.0-alpha.2.nupkg(866.59 KB)
kenlm.scorer(909.19 MB)
v0.9.0-alpha.1(Jul 6, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.21 MB)
native_client.amd64.tflite.osx.tar.xz(2.43 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.82 MB)
native_client.rpi3.cpu.linux.tar.xz(939.62 KB)
native_client.arm64.cpu.linux.tar.xz(941.53 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.52 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.9.0a1-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a1-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a1-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a1-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.9.0a1-cp35-cp35m-macosx_10_10_x86_64.whl(990.93 KB)
ds_ctcdecoder-0.9.0a1-cp36-cp36m-macosx_10_10_x86_64.whl(990.88 KB)
ds_ctcdecoder-0.9.0a1-cp37-cp37m-macosx_10_10_x86_64.whl(990.95 KB)
ds_ctcdecoder-0.9.0a1-cp38-cp38-macosx_10_10_x86_64.whl(991.20 KB)
deepspeech_tflite-0.9.0a1-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a1-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a1-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a1-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.9.0a1-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a1-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a1-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.9.0a1-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.9.0a1-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a1-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a1-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a1-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.9.0a1-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a1-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a1-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.9.0a1-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.9.0a1-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a1-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a1-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a1-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.9.0a1-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.9.0a1-cp37-cp37m-linux_aarch64.whl(1.37 MB)
deepspeech-0.9.0a1-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a1-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a1-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a1-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.9.0a1-cp35-cp35m-win_amd64.whl(892.14 KB)
deepspeech_tflite-0.9.0a1-cp36-cp36m-win_amd64.whl(892.15 KB)
deepspeech_tflite-0.9.0a1-cp37-cp37m-win_amd64.whl(892.21 KB)
deepspeech_tflite-0.9.0a1-cp38-cp38-win_amd64.whl(892.52 KB)
deepspeech_gpu-0.9.0a1-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a1-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a1-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a1-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.9.0a1-cp35-cp35m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.9.0a1-cp36-cp36m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.9.0a1-cp37-cp37m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.9.0a1-cp38-cp38-win_amd64.whl(2.15 MB)
deepspeech-gpu-0.9.0-alpha.1.tgz(38.28 MB)
deepspeech-0.9.0-alpha.1.tgz(30.38 MB)
deepspeech-tflite-0.9.0-alpha.1.tgz(5.79 MB)
libdeepspeech-0.9.0-alpha.1.maven.zip(5.79 MB)
DeepSpeech.0.9.0-alpha.1.nupkg(6.03 MB)
DeepSpeech-GPU.0.9.0-alpha.1.nupkg(18.21 MB)
DeepSpeech-TFLite.0.9.0-alpha.1.nupkg(866.24 KB)
kenlm.scorer(909.19 MB)
v0.8.0-alpha.6(Jul 4, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.22 MB)
native_client.amd64.tflite.osx.tar.xz(2.43 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.81 MB)
native_client.rpi3.cpu.linux.tar.xz(939.96 KB)
native_client.arm64.cpu.linux.tar.xz(941.08 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.52 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.8.0a6-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a6-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a6-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a6-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.8.0a6-cp35-cp35m-macosx_10_10_x86_64.whl(990.97 KB)
ds_ctcdecoder-0.8.0a6-cp36-cp36m-macosx_10_10_x86_64.whl(990.91 KB)
ds_ctcdecoder-0.8.0a6-cp37-cp37m-macosx_10_10_x86_64.whl(990.98 KB)
ds_ctcdecoder-0.8.0a6-cp38-cp38-macosx_10_10_x86_64.whl(991.28 KB)
deepspeech_tflite-0.8.0a6-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a6-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a6-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a6-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.8.0a6-cp35-cp35m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a6-cp36-cp36m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a6-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a6-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.8.0a6-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a6-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a6-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a6-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.0a6-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a6-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a6-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a6-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.8.0a6-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a6-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a6-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a6-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.0a6-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.0a6-cp37-cp37m-linux_aarch64.whl(1.37 MB)
deepspeech-0.8.0a6-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a6-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a6-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a6-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.0a6-cp35-cp35m-win_amd64.whl(892.15 KB)
deepspeech_tflite-0.8.0a6-cp36-cp36m-win_amd64.whl(892.15 KB)
deepspeech_tflite-0.8.0a6-cp37-cp37m-win_amd64.whl(892.22 KB)
deepspeech_tflite-0.8.0a6-cp38-cp38-win_amd64.whl(892.53 KB)
deepspeech_gpu-0.8.0a6-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a6-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a6-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a6-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.0a6-cp35-cp35m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.8.0a6-cp36-cp36m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.8.0a6-cp37-cp37m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.8.0a6-cp38-cp38-win_amd64.whl(2.15 MB)
deepspeech-gpu-0.8.0-alpha.6.tgz(38.28 MB)
deepspeech-0.8.0-alpha.6.tgz(30.38 MB)
deepspeech-tflite-0.8.0-alpha.6.tgz(5.79 MB)
libdeepspeech-0.8.0-alpha.6.maven.zip(5.79 MB)
DeepSpeech.0.8.0-alpha.6.nupkg(6.03 MB)
DeepSpeech-GPU.0.8.0-alpha.6.nupkg(18.21 MB)
DeepSpeech-TFLite.0.8.0-alpha.6.nupkg(866.25 KB)
kenlm.scorer(909.19 MB)
v0.8.0-alpha.5(Jul 3, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.21 MB)
native_client.amd64.tflite.osx.tar.xz(2.43 MB)
native_client.amd64.cpu.linux.tar.xz(5.53 MB)
native_client.amd64.tflite.linux.tar.xz(1.80 MB)
native_client.amd64.cuda.linux.tar.xz(10.81 MB)
native_client.rpi3.cpu.linux.tar.xz(939.07 KB)
native_client.arm64.cpu.linux.tar.xz(940.43 KB)
native_client.arm64.cpu.android.tar.xz(6.44 MB)
native_client.armv7.cpu.android.tar.xz(5.68 MB)
native_client.amd64.cpu.win.tar.xz(4.41 MB)
native_client.amd64.cuda.win.tar.xz(10.52 MB)
native_client.amd64.tflite.win.tar.xz(1.04 MB)
deepspeech-0.8.0a5-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a5-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a5-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a5-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.8.0a5-cp35-cp35m-macosx_10_10_x86_64.whl(990.26 KB)
ds_ctcdecoder-0.8.0a5-cp36-cp36m-macosx_10_10_x86_64.whl(990.18 KB)
ds_ctcdecoder-0.8.0a5-cp37-cp37m-macosx_10_10_x86_64.whl(990.20 KB)
ds_ctcdecoder-0.8.0a5-cp38-cp38-macosx_10_10_x86_64.whl(990.67 KB)
deepspeech_tflite-0.8.0a5-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a5-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a5-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a5-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.8.0a5-cp35-cp35m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.8.0a5-cp36-cp36m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.8.0a5-cp37-cp37m-manylinux1_x86_64.whl(7.90 MB)
deepspeech-0.8.0a5-cp38-cp38-manylinux1_x86_64.whl(7.90 MB)
deepspeech_tflite-0.8.0a5-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a5-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a5-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a5-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.0a5-cp35-cp35m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a5-cp36-cp36m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a5-cp37-cp37m-manylinux1_x86_64.whl(1.86 MB)
ds_ctcdecoder-0.8.0a5-cp38-cp38-manylinux1_x86_64.whl(1.86 MB)
deepspeech_gpu-0.8.0a5-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a5-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a5-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a5-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.0a5-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.0a5-cp37-cp37m-linux_aarch64.whl(1.37 MB)
deepspeech-0.8.0a5-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a5-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a5-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a5-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.0a5-cp35-cp35m-win_amd64.whl(891.11 KB)
deepspeech_tflite-0.8.0a5-cp36-cp36m-win_amd64.whl(891.11 KB)
deepspeech_tflite-0.8.0a5-cp37-cp37m-win_amd64.whl(891.18 KB)
deepspeech_tflite-0.8.0a5-cp38-cp38-win_amd64.whl(891.49 KB)
deepspeech_gpu-0.8.0a5-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a5-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a5-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a5-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.0a5-cp35-cp35m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.8.0a5-cp36-cp36m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.8.0a5-cp37-cp37m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.8.0a5-cp38-cp38-win_amd64.whl(2.15 MB)
deepspeech-gpu-0.8.0-alpha.5.tgz(38.28 MB)
deepspeech-0.8.0-alpha.5.tgz(30.38 MB)
deepspeech-tflite-0.8.0-alpha.5.tgz(5.78 MB)
libdeepspeech-0.8.0-alpha.5.maven.zip(5.78 MB)
DeepSpeech.0.8.0-alpha.5.nupkg(6.03 MB)
DeepSpeech-GPU.0.8.0-alpha.5.nupkg(18.21 MB)
DeepSpeech-TFLite.0.8.0-alpha.5.nupkg(865.21 KB)
kenlm.scorer(909.19 MB)
v0.9.0-alpha.0(Jun 24, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.01 MB)
native_client.amd64.tflite.osx.tar.xz(2.21 MB)
native_client.amd64.cpu.linux.tar.xz(5.13 MB)
native_client.amd64.tflite.linux.tar.xz(1.39 MB)
native_client.amd64.cuda.linux.tar.xz(10.38 MB)
native_client.rpi3.cpu.linux.tar.xz(906.26 KB)
native_client.arm64.cpu.linux.tar.xz(910.38 KB)
native_client.arm64.cpu.android.tar.xz(958.01 KB)
native_client.armv7.cpu.android.tar.xz(885.67 KB)
native_client.amd64.cpu.win.tar.xz(4.14 MB)
native_client.amd64.cuda.win.tar.xz(10.25 MB)
native_client.amd64.tflite.win.tar.xz(784.19 KB)
deepspeech-0.9.0a0-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a0-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a0-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.9.0a0-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.9.0a0-cp35-cp35m-macosx_10_10_x86_64.whl(988.04 KB)
ds_ctcdecoder-0.9.0a0-cp36-cp36m-macosx_10_10_x86_64.whl(988.03 KB)
ds_ctcdecoder-0.9.0a0-cp37-cp37m-macosx_10_10_x86_64.whl(988.05 KB)
ds_ctcdecoder-0.9.0a0-cp38-cp38-macosx_10_10_x86_64.whl(988.16 KB)
deepspeech_tflite-0.9.0a0-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a0-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a0-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.9.0a0-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.9.0a0-cp35-cp35m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.9.0a0-cp36-cp36m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.9.0a0-cp37-cp37m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.9.0a0-cp38-cp38-manylinux1_x86_64.whl(7.89 MB)
deepspeech_tflite-0.9.0a0-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a0-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a0-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.9.0a0-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.9.0a0-cp35-cp35m-manylinux1_x86_64.whl(1.84 MB)
ds_ctcdecoder-0.9.0a0-cp36-cp36m-manylinux1_x86_64.whl(1.84 MB)
ds_ctcdecoder-0.9.0a0-cp37-cp37m-manylinux1_x86_64.whl(1.84 MB)
ds_ctcdecoder-0.9.0a0-cp38-cp38-manylinux1_x86_64.whl(1.84 MB)
deepspeech_gpu-0.9.0a0-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a0-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a0-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.9.0a0-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.9.0a0-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.9.0a0-cp37-cp37m-linux_aarch64.whl(1.37 MB)
deepspeech-0.9.0a0-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a0-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a0-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.9.0a0-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.9.0a0-cp35-cp35m-win_amd64.whl(889.44 KB)
deepspeech_tflite-0.9.0a0-cp36-cp36m-win_amd64.whl(889.43 KB)
deepspeech_tflite-0.9.0a0-cp37-cp37m-win_amd64.whl(889.51 KB)
deepspeech_tflite-0.9.0a0-cp38-cp38-win_amd64.whl(889.81 KB)
deepspeech_gpu-0.9.0a0-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a0-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a0-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.9.0a0-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.9.0a0-cp35-cp35m-win_amd64.whl(2.14 MB)
ds_ctcdecoder-0.9.0a0-cp36-cp36m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.9.0a0-cp37-cp37m-win_amd64.whl(2.14 MB)
ds_ctcdecoder-0.9.0a0-cp38-cp38-win_amd64.whl(2.14 MB)
deepspeech-gpu-0.9.0-alpha.0.tgz(38.28 MB)
deepspeech-0.9.0-alpha.0.tgz(30.37 MB)
deepspeech-tflite-0.9.0-alpha.0.tgz(5.77 MB)
libdeepspeech-0.9.0-alpha.0.maven.zip(3.83 MB)
DeepSpeech.0.9.0-alpha.0.nupkg(6.03 MB)
DeepSpeech-GPU.0.9.0-alpha.0.nupkg(18.21 MB)
DeepSpeech-TFLite.0.9.0-alpha.0.nupkg(863.54 KB)
kenlm.scorer(909.19 MB)
v0.8.0-alpha.4(Jun 23, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(8.00 MB)
native_client.amd64.tflite.osx.tar.xz(2.21 MB)
native_client.amd64.cpu.linux.tar.xz(5.13 MB)
native_client.amd64.tflite.linux.tar.xz(1.39 MB)
native_client.amd64.cuda.linux.tar.xz(10.39 MB)
native_client.rpi3.cpu.linux.tar.xz(906.05 KB)
native_client.arm64.cpu.linux.tar.xz(910.49 KB)
native_client.arm64.cpu.android.tar.xz(957.68 KB)
native_client.armv7.cpu.android.tar.xz(885.83 KB)
native_client.amd64.cpu.win.tar.xz(4.14 MB)
native_client.amd64.cuda.win.tar.xz(10.25 MB)
native_client.amd64.tflite.win.tar.xz(784.39 KB)
deepspeech-0.8.0a4-cp35-cp35m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a4-cp36-cp36m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a4-cp37-cp37m-macosx_10_10_x86_64.whl(12.32 MB)
deepspeech-0.8.0a4-cp38-cp38-macosx_10_10_x86_64.whl(12.32 MB)
ds_ctcdecoder-0.8.0a4-cp35-cp35m-macosx_10_10_x86_64.whl(988.14 KB)
ds_ctcdecoder-0.8.0a4-cp36-cp36m-macosx_10_10_x86_64.whl(988.11 KB)
ds_ctcdecoder-0.8.0a4-cp37-cp37m-macosx_10_10_x86_64.whl(988.15 KB)
ds_ctcdecoder-0.8.0a4-cp38-cp38-macosx_10_10_x86_64.whl(988.25 KB)
deepspeech_tflite-0.8.0a4-cp35-cp35m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a4-cp36-cp36m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a4-cp37-cp37m-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech_tflite-0.8.0a4-cp38-cp38-macosx_10_10_x86_64.whl(2.09 MB)
deepspeech-0.8.0a4-cp35-cp35m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.8.0a4-cp36-cp36m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.8.0a4-cp37-cp37m-manylinux1_x86_64.whl(7.89 MB)
deepspeech-0.8.0a4-cp38-cp38-manylinux1_x86_64.whl(7.89 MB)
deepspeech_tflite-0.8.0a4-cp35-cp35m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a4-cp36-cp36m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a4-cp37-cp37m-manylinux1_x86_64.whl(1.56 MB)
deepspeech_tflite-0.8.0a4-cp38-cp38-manylinux1_x86_64.whl(1.56 MB)
ds_ctcdecoder-0.8.0a4-cp35-cp35m-manylinux1_x86_64.whl(1.84 MB)
ds_ctcdecoder-0.8.0a4-cp36-cp36m-manylinux1_x86_64.whl(1.84 MB)
ds_ctcdecoder-0.8.0a4-cp37-cp37m-manylinux1_x86_64.whl(1.84 MB)
ds_ctcdecoder-0.8.0a4-cp38-cp38-manylinux1_x86_64.whl(1.84 MB)
deepspeech_gpu-0.8.0a4-cp35-cp35m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a4-cp36-cp36m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a4-cp37-cp37m-manylinux1_x86_64.whl(18.96 MB)
deepspeech_gpu-0.8.0a4-cp38-cp38-manylinux1_x86_64.whl(18.96 MB)
deepspeech-0.8.0a4-cp37-cp37m-linux_armv7l.whl(1.24 MB)
deepspeech-0.8.0a4-cp37-cp37m-linux_aarch64.whl(1.37 MB)
deepspeech-0.8.0a4-cp35-cp35m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a4-cp36-cp36m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a4-cp37-cp37m-win_amd64.whl(6.06 MB)
deepspeech-0.8.0a4-cp38-cp38-win_amd64.whl(6.06 MB)
deepspeech_tflite-0.8.0a4-cp35-cp35m-win_amd64.whl(889.44 KB)
deepspeech_tflite-0.8.0a4-cp36-cp36m-win_amd64.whl(889.43 KB)
deepspeech_tflite-0.8.0a4-cp37-cp37m-win_amd64.whl(889.50 KB)
deepspeech_tflite-0.8.0a4-cp38-cp38-win_amd64.whl(889.81 KB)
deepspeech_gpu-0.8.0a4-cp35-cp35m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a4-cp36-cp36m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a4-cp37-cp37m-win_amd64.whl(18.24 MB)
deepspeech_gpu-0.8.0a4-cp38-cp38-win_amd64.whl(18.24 MB)
ds_ctcdecoder-0.8.0a4-cp35-cp35m-win_amd64.whl(2.14 MB)
ds_ctcdecoder-0.8.0a4-cp36-cp36m-win_amd64.whl(2.15 MB)
ds_ctcdecoder-0.8.0a4-cp37-cp37m-win_amd64.whl(2.14 MB)
ds_ctcdecoder-0.8.0a4-cp38-cp38-win_amd64.whl(2.14 MB)
deepspeech-gpu-0.8.0-alpha.4.tgz(38.28 MB)
deepspeech-0.8.0-alpha.4.tgz(30.37 MB)
deepspeech-tflite-0.8.0-alpha.4.tgz(5.77 MB)
libdeepspeech-0.8.0-alpha.4.maven.zip(3.83 MB)
DeepSpeech.0.8.0-alpha.4.nupkg(6.03 MB)
DeepSpeech-GPU.0.8.0-alpha.4.nupkg(18.21 MB)
DeepSpeech-TFLite.0.8.0-alpha.4.nupkg(863.53 KB)
kenlm.scorer(909.19 MB)
v0.7.4(Jun 18, 2020)
General

This is the 0.7.4 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not backwards compatible with version 0.6.1 or earlier versions. This is a bugfix release and retains compatibility with the 0.7.0 models. All model files included here are identical to the ones in the 0.7.0 release. As with previous releases, this release includes the source code:

v0.7.4.tar.gz

and the acoustic models:

deepspeech-0.7.4-models.pbmm deepspeech-0.7.4-models.tflite.

The model with the ".pbmm" extension is memory mapped and thus memory efficient and fast to load. The model with the ".tflite" extension is converted to use TFLite, has post-training quantization enabled, and is more suitable for resource constrained environments.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

In addition we release the scorer:

deepspeech-0.7.4-models.scorer

which takes the place of the language model and trie in older releases.

We also include example audio files:

audio-0.7.4.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.7.4-checkpoint.tar.gz

which can be used as the basis for further fine-tuning.

Notable changes from the previous release

Fix csv.DictWriter configuration on Windows in some importers (#3045)

Reduce number of users of VERSION and GRAPH_VERSION symlinks to fix issues on Windows (#3043)

Fix bug in ds_ctcdecoder SWIG definition which was causing wrapper objects to be leaked (#3049)

Add support for read-only validation metrics (not affecting best validation checkpoint logic) (#3051)

Fix some importers to report total imported audio duration alongside total input audio duration (#3054)

Separate Dockerfile into one for training and one for building native client related tools (#3060)

Add list of supported platforms to ReadTheDocs (#3065)

Added third-party bindings for the Nim language (#3076)

Avoid reinstalling TensorFlow package from PyPI when using Docker bases that already come with it (#3072)

Refactor artifact caching mechanism in CI (#3069)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001

epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

Like the second, the third phase was started using the weights with the best validation loss from the previous phase. This third phase used the same hyperparameters as the second but with the following changes:

learning_rate 0.000005

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop. The model selected under this process was trained for a sum total of 732522 steps over all phases.

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5

lm_beta_max 5

n_trials 2400

test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0 and 9.0 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (Needs at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14 and 10.15

Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)

Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3 + Raspberry Pi 4

ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android bindings / demo app. Early preview, tested only on Pixel 2 device, TF Lite model only.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.7.4 release

Alexandre Lissy

Anubhav

Daniel

Kelly Davis

Marek Grzegorek

Adarsh Shetty

Reuben Morais

RickyChan

Tilman Kamp

Source code(tar.gz)
Source code(zip)
deepspeech-0.7.4-models.pb(180.16 MB)
deepspeech-0.7.4-models.pbmm(180.16 MB)
deepspeech-0.7.4-models.tflite(45.13 MB)
deepspeech-0.7.4-models.scorer(909.19 MB)
deepspeech-0.7.4-checkpoint.tar.gz(614.71 MB)
audio-0.7.4.tar.gz(193.95 KB)
native_client.amd64.cpu.osx.tar.xz(9.37 MB)
native_client.amd64.tflite.osx.tar.xz(2.06 MB)
native_client.amd64.cpu.linux.tar.xz(6.19 MB)
native_client.amd64.tflite.linux.tar.xz(1.06 MB)
native_client.amd64.cuda.linux.tar.xz(10.96 MB)
native_client.rpi3.cpu.linux.tar.xz(976.13 KB)
native_client.arm64.cpu.linux.tar.xz(1.01 MB)
native_client.arm64.cpu.android.tar.xz(933.05 KB)
native_client.armv7.cpu.android.tar.xz(942.10 KB)
native_client.amd64.cpu.win.tar.xz(4.29 MB)
native_client.amd64.cuda.win.tar.xz(10.18 MB)
native_client.amd64.tflite.win.tar.xz(830.12 KB)
deepspeech-0.7.4-cp35-cp35m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.7.4-cp36-cp36m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.7.4-cp37-cp37m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.7.4-cp38-cp38-macosx_10_10_x86_64.whl(14.11 MB)
ds_ctcdecoder-0.7.4-cp35-cp35m-macosx_10_10_x86_64.whl(988.18 KB)
ds_ctcdecoder-0.7.4-cp36-cp36m-macosx_10_10_x86_64.whl(988.19 KB)
ds_ctcdecoder-0.7.4-cp37-cp37m-macosx_10_10_x86_64.whl(988.18 KB)
ds_ctcdecoder-0.7.4-cp38-cp38-macosx_10_10_x86_64.whl(988.35 KB)
deepspeech_tflite-0.7.4-cp35-cp35m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.7.4-cp36-cp36m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.7.4-cp37-cp37m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.7.4-cp38-cp38-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech-0.7.4-cp35-cp35m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.7.4-cp36-cp36m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.7.4-cp37-cp37m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.7.4-cp38-cp38-manylinux1_x86_64.whl(9.26 MB)
deepspeech_tflite-0.7.4-cp35-cp35m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.7.4-cp36-cp36m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.7.4-cp37-cp37m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.7.4-cp38-cp38-manylinux1_x86_64.whl(1.11 MB)
ds_ctcdecoder-0.7.4-cp35-cp35m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.7.4-cp36-cp36m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.7.4-cp37-cp37m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.7.4-cp38-cp38-manylinux1_x86_64.whl(1.77 MB)
deepspeech_gpu-0.7.4-cp35-cp35m-manylinux1_x86_64.whl(18.33 MB)
deepspeech_gpu-0.7.4-cp36-cp36m-manylinux1_x86_64.whl(18.34 MB)
deepspeech_gpu-0.7.4-cp37-cp37m-manylinux1_x86_64.whl(18.34 MB)
deepspeech_gpu-0.7.4-cp38-cp38-manylinux1_x86_64.whl(18.34 MB)
deepspeech-0.7.4-cp37-cp37m-linux_armv7l.whl(1.33 MB)
deepspeech-0.7.4-cp37-cp37m-linux_aarch64.whl(1.54 MB)
deepspeech-0.7.4-cp35-cp35m-win_amd64.whl(6.60 MB)
deepspeech-0.7.4-cp36-cp36m-win_amd64.whl(6.60 MB)
deepspeech-0.7.4-cp37-cp37m-win_amd64.whl(6.60 MB)
deepspeech-0.7.4-cp38-cp38-win_amd64.whl(6.60 MB)
deepspeech_tflite-0.7.4-cp35-cp35m-win_amd64.whl(964.25 KB)
deepspeech_tflite-0.7.4-cp36-cp36m-win_amd64.whl(964.24 KB)
deepspeech_tflite-0.7.4-cp37-cp37m-win_amd64.whl(964.28 KB)
deepspeech_tflite-0.7.4-cp38-cp38-win_amd64.whl(964.63 KB)
deepspeech_gpu-0.7.4-cp35-cp35m-win_amd64.whl(17.30 MB)
deepspeech_gpu-0.7.4-cp36-cp36m-win_amd64.whl(17.30 MB)
deepspeech_gpu-0.7.4-cp37-cp37m-win_amd64.whl(17.30 MB)
deepspeech_gpu-0.7.4-cp38-cp38-win_amd64.whl(17.30 MB)
ds_ctcdecoder-0.7.4-cp35-cp35m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.7.4-cp36-cp36m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.7.4-cp37-cp37m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.7.4-cp38-cp38-win_amd64.whl(2.49 MB)
deepspeech-gpu-0.7.4.tgz(37.28 MB)
deepspeech-0.7.4.tgz(34.88 MB)
deepspeech-tflite-0.7.4.tgz(5.74 MB)
libdeepspeech-0.7.4.maven.zip(3.82 MB)
DeepSpeech.0.7.4.nupkg(6.58 MB)
DeepSpeech-GPU.0.7.4.nupkg(17.28 MB)
DeepSpeech-TFLite.0.7.4.nupkg(946.28 KB)
kenlm.scorer(909.19 MB)
v0.8.0-alpha.3(Jun 9, 2020)

Source code(tar.gz)
Source code(zip)
native_client.amd64.cpu.osx.tar.xz(9.37 MB)
native_client.amd64.tflite.osx.tar.xz(2.06 MB)
native_client.amd64.cpu.linux.tar.xz(6.19 MB)
native_client.amd64.tflite.linux.tar.xz(1.06 MB)
native_client.amd64.cuda.linux.tar.xz(10.95 MB)
native_client.rpi3.cpu.linux.tar.xz(976.25 KB)
native_client.arm64.cpu.linux.tar.xz(1.01 MB)
native_client.arm64.cpu.android.tar.xz(932.33 KB)
native_client.armv7.cpu.android.tar.xz(941.83 KB)
native_client.amd64.cpu.win.tar.xz(4.29 MB)
native_client.amd64.cuda.win.tar.xz(10.18 MB)
native_client.amd64.tflite.win.tar.xz(830.04 KB)
deepspeech-0.8.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.8.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.8.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.8.0a3-cp38-cp38-macosx_10_10_x86_64.whl(14.11 MB)
ds_ctcdecoder-0.8.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(987.90 KB)
ds_ctcdecoder-0.8.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(987.84 KB)
ds_ctcdecoder-0.8.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(987.90 KB)
ds_ctcdecoder-0.8.0a3-cp38-cp38-macosx_10_10_x86_64.whl(987.97 KB)
deepspeech_tflite-0.8.0a3-cp35-cp35m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.8.0a3-cp36-cp36m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.8.0a3-cp37-cp37m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.8.0a3-cp38-cp38-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech-0.8.0a3-cp35-cp35m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.8.0a3-cp36-cp36m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.8.0a3-cp37-cp37m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.8.0a3-cp38-cp38-manylinux1_x86_64.whl(9.26 MB)
deepspeech_tflite-0.8.0a3-cp35-cp35m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.8.0a3-cp36-cp36m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.8.0a3-cp37-cp37m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.8.0a3-cp38-cp38-manylinux1_x86_64.whl(1.11 MB)
ds_ctcdecoder-0.8.0a3-cp35-cp35m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.8.0a3-cp36-cp36m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.8.0a3-cp37-cp37m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.8.0a3-cp38-cp38-manylinux1_x86_64.whl(1.77 MB)
deepspeech_gpu-0.8.0a3-cp35-cp35m-manylinux1_x86_64.whl(18.33 MB)
deepspeech_gpu-0.8.0a3-cp36-cp36m-manylinux1_x86_64.whl(18.34 MB)
deepspeech_gpu-0.8.0a3-cp37-cp37m-manylinux1_x86_64.whl(18.34 MB)
deepspeech_gpu-0.8.0a3-cp38-cp38-manylinux1_x86_64.whl(18.34 MB)
deepspeech-0.8.0a3-cp37-cp37m-linux_armv7l.whl(1.33 MB)
deepspeech-0.8.0a3-cp37-cp37m-linux_aarch64.whl(1.54 MB)
deepspeech-0.8.0a3-cp35-cp35m-win_amd64.whl(6.60 MB)
deepspeech-0.8.0a3-cp36-cp36m-win_amd64.whl(6.60 MB)
deepspeech-0.8.0a3-cp37-cp37m-win_amd64.whl(6.60 MB)
deepspeech-0.8.0a3-cp38-cp38-win_amd64.whl(6.60 MB)
deepspeech_tflite-0.8.0a3-cp35-cp35m-win_amd64.whl(964.50 KB)
deepspeech_tflite-0.8.0a3-cp36-cp36m-win_amd64.whl(964.49 KB)
deepspeech_tflite-0.8.0a3-cp37-cp37m-win_amd64.whl(964.53 KB)
deepspeech_tflite-0.8.0a3-cp38-cp38-win_amd64.whl(964.89 KB)
deepspeech_gpu-0.8.0a3-cp35-cp35m-win_amd64.whl(17.31 MB)
deepspeech_gpu-0.8.0a3-cp36-cp36m-win_amd64.whl(17.31 MB)
deepspeech_gpu-0.8.0a3-cp37-cp37m-win_amd64.whl(17.31 MB)
deepspeech_gpu-0.8.0a3-cp38-cp38-win_amd64.whl(17.31 MB)
ds_ctcdecoder-0.8.0a3-cp35-cp35m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.8.0a3-cp36-cp36m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.8.0a3-cp37-cp37m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.8.0a3-cp38-cp38-win_amd64.whl(2.49 MB)
deepspeech-gpu-0.8.0-alpha.3.tgz(37.28 MB)
deepspeech-0.8.0-alpha.3.tgz(34.88 MB)
deepspeech-tflite-0.8.0-alpha.3.tgz(5.74 MB)
libdeepspeech-0.8.0-alpha.3.maven.zip(3.82 MB)
DeepSpeech.0.8.0-alpha.3.nupkg(6.58 MB)
DeepSpeech-GPU.0.8.0-alpha.3.nupkg(17.29 MB)
DeepSpeech-TFLite.0.8.0-alpha.3.nupkg(946.51 KB)
kenlm.scorer(909.19 MB)
v0.7.3(Jun 4, 2020)
General

This is the 0.7.3 release of Deep Speech, an open speech-to-text engine. In accord with semantic versioning, this version is not backwards compatible with version 0.6.1 or earlier versions. This is a bugfix release and retains compatibility with the 0.7.0 models. All model files included here are identical to the ones in the 0.7.0 release. As with previous releases, this release includes the source code:

v0.7.3.tar.gz

and the acoustic models:

deepspeech-0.7.3-models.pbmm deepspeech-0.7.3-models.tflite.

The model with the ".pbmm" extension is memory mapped and thus memory efficient and fast to load. The model with the ".tflite" extension is converted to use TFLite, has post-training quantization enabled, and is more suitable for resource constrained environments.

The acoustic models were trained on American English and the pbmm model achieves an 5.97% word error rate on the LibriSpeech clean test corpus.

In addition we release the scorer:

deepspeech-0.7.3-models.scorer

which takes the place of the language model and trie in older releases.

We also include example audio files:

audio-0.7.3.tar.gz

which can be used to test the engine, and checkpoint files:

deepspeech-0.7.3-checkpoint.tar.gz

which can be used as the basis for further fine-tuning.

Notable changes from the previous release

Bug fix - test_csvs argument was ignored (#2994)

Convert path to str to fix Python 3.5 compat (#3025)

Added support for NodeJS v14 and ElectronJS v9.0 (#3027)

Improve error handling around Scorer (#2998)

Windows support in setup.py decoder wheel installation (#3001)

Fix JS IntermediateDecodeWithMetadata binding (#3011)

Switch index.js to TypeScript (#3012)

Return raw scores in confidence value (#3021)

Training Regimen + Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning. Thus, we document them here along with the training regimen, hardware used (a server with 8 Quadro RTX 6000 GPUs each with 24GB of VRAM), and our use of cuDNN RNN.

In contrast to previous releases, training for this release occurred in several phases each phase with a lower learning rate than the phase before it.

The initial phase used the hyperparameters:

train_files Fisher, LibriSpeech, Switchboard, Common Voice English, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

dev_files LibriSpeech clean dev corpus.

test_files LibriSpeech clean test corpus

train_batch_size 128

dev_batch_size 128

test_batch_size 128

n_hidden 2048

learning_rate 0.0001

dropout_rate 0.40

epochs 125

The weights with the best validation loss were selected at the end of 125 epochs using --noearly_stop.

The second phase was started using the weights with the best validation loss from the previous phase. This second phase used the same hyperparameters as the first but with the following changes:

learning_rate 0.00001

epochs 100

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop.

Like the second, the third phase was started using the weights with the best validation loss from the previous phase. This third phase used the same hyperparameters as the second but with the following changes:

learning_rate 0.000005

The weights with the best validation loss were selected at the end of 100 epochs using --noearly_stop. The model selected under this process was trained for a sum total of 732522 steps over all phases.

Subsequent to this the lm_optimizer.py was used with the following parameters:

lm_alpha_max 5

lm_beta_max 5

n_trials 2400

test_files LibriSpeech clean dev corpus.

to determine the optimal lm_alpha and lm_beta with respect to the LibriSpeech clean dev corpus. This resulted in:

lm_alpha 0.931289039105002

lm_beta 1.1834137581510284

Bindings

This release also includes a Python based command line tool deepspeech, installed through

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux, macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

Also, it exposes bindings for the following languages

Python (Versions 3.5, 3.6, 3.7 and 3.8) installed via

pip install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

pip install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

pip install deepspeech-tflite

NodeJS (Versions 10.x, 11.x, 12.x, 13.x and 14.x) installed via

npm install deepspeech

Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU's are supported.) This is done by instead installing the GPU specific package:

npm install deepspeech-gpu

On Linux (AMD64), macOS and Windows, the DeepSpeech package does not use TFLite by default. A TFLite version of the package on those platforms is available as:

npm install deepspeech-tflite

ElectronJS versions 5.0, 6.0, 6.1, 7.0, 7.1, 8.0 and 9.0 are also supported

C which requires the appropriate shared objects are installed from native_client.tar.xz (See the section in the main README which describes native_client.tar.xz installation.)

.NET which is installed by following the instructions on the NuGet package page.

In addition there are third party bindings that are supported by external developers, for example

Rust which is installed by following the instructions on the external Rust repo.

Go which is installed by following the instructions on the external Go repo.

V which is installed by following the instructions on the external Vlang repo.

Supported Platforms

Windows 8.1, 10, and Server 2012 R2 64-bits (Needs at least AVX support, requires Redistribuable Visual C++ 2015 Update 3 (64-bits) for runtime).

OS X 10.10, 10.11, 10.12, 10.13, 10.14 and 10.15

Linux x86 64 bit with a modern CPU (Needs at least AVX/FMA)

Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3.0, see NVIDIA docs)

Raspbian Buster on Raspberry Pi 3 + Raspberry Pi 4

ARM64 built against Debian/ARMbian Buster and tested on LePotato boards

Java Android bindings / demo app. Early preview, tested only on Pixel 2 device, TF Lite model only.

Documentation

Documentation is available on deepspeech.readthedocs.io.

Contact/Getting Help

FAQ - We have a list of common questions, and their answers, in our FAQ. When just getting started, it's best to first check the FAQ to see if your question is addressed.

Discourse Forums - If your question is not addressed in the FAQ, the Discourse Forums is the next place to look. They contain conversations on General Topics, Using Deep Speech, Alternative Platforms, and Deep Speech Development.

Matrix - If your question is not addressed by either the FAQ or Discourse Forums, you can contact us on the #machinelearning:mozilla.org channel on Mozilla Matrix; people there can try to answer/help

Issues - Finally, if all else fails, you can open an issue in our repo if there is a bug with the current code base.

Contributors to 0.7.3 release

Alexandre Lissy

Greg Richardson

Jędrzej Beniamin Orbik

Kelly Davis

Reuben Morais

Shubham Kumar

Tilman Kamp

Source code(tar.gz)
Source code(zip)
deepspeech-0.7.3-checkpoint.tar.gz(614.51 MB)
audio-0.7.3.tar.gz(193.76 KB)
deepspeech-0.7.3-models.tflite(45.13 MB)
deepspeech-0.7.3-models.scorer(909.19 MB)
deepspeech-0.7.3-models.pbmm(180.16 MB)
native_client.amd64.cpu.osx.tar.xz(9.37 MB)
native_client.amd64.cpu.linux.tar.xz(6.19 MB)
native_client.amd64.cuda.linux.tar.xz(10.95 MB)
native_client.rpi3.cpu.linux.tar.xz(976.22 KB)
native_client.arm64.cpu.linux.tar.xz(1.01 MB)
native_client.arm64.cpu.android.tar.xz(933.20 KB)
native_client.armv7.cpu.android.tar.xz(941.96 KB)
native_client.amd64.cpu.win.tar.xz(4.28 MB)
native_client.amd64.cuda.win.tar.xz(10.17 MB)
native_client.amd64.tflite.win.tar.xz(829.80 KB)
deepspeech-0.7.3-cp35-cp35m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.7.3-cp36-cp36m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.7.3-cp37-cp37m-macosx_10_10_x86_64.whl(14.11 MB)
deepspeech-0.7.3-cp38-cp38-macosx_10_10_x86_64.whl(14.11 MB)
ds_ctcdecoder-0.7.3-cp35-cp35m-macosx_10_10_x86_64.whl(986.73 KB)
ds_ctcdecoder-0.7.3-cp36-cp36m-macosx_10_10_x86_64.whl(986.68 KB)
ds_ctcdecoder-0.7.3-cp37-cp37m-macosx_10_10_x86_64.whl(986.73 KB)
ds_ctcdecoder-0.7.3-cp38-cp38-macosx_10_10_x86_64.whl(986.77 KB)
deepspeech_tflite-0.7.3-cp35-cp35m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.7.3-cp36-cp36m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.7.3-cp37-cp37m-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech_tflite-0.7.3-cp38-cp38-macosx_10_10_x86_64.whl(1.87 MB)
deepspeech-0.7.3-cp35-cp35m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.7.3-cp36-cp36m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.7.3-cp37-cp37m-manylinux1_x86_64.whl(9.26 MB)
deepspeech-0.7.3-cp38-cp38-manylinux1_x86_64.whl(9.26 MB)
deepspeech_tflite-0.7.3-cp35-cp35m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.7.3-cp36-cp36m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.7.3-cp37-cp37m-manylinux1_x86_64.whl(1.11 MB)
deepspeech_tflite-0.7.3-cp38-cp38-manylinux1_x86_64.whl(1.11 MB)
ds_ctcdecoder-0.7.3-cp35-cp35m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.7.3-cp36-cp36m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.7.3-cp37-cp37m-manylinux1_x86_64.whl(1.76 MB)
ds_ctcdecoder-0.7.3-cp38-cp38-manylinux1_x86_64.whl(1.77 MB)
deepspeech_gpu-0.7.3-cp35-cp35m-manylinux1_x86_64.whl(18.33 MB)
deepspeech_gpu-0.7.3-cp36-cp36m-manylinux1_x86_64.whl(18.34 MB)
deepspeech_gpu-0.7.3-cp37-cp37m-manylinux1_x86_64.whl(18.34 MB)
deepspeech_gpu-0.7.3-cp38-cp38-manylinux1_x86_64.whl(18.34 MB)
deepspeech-0.7.3-cp37-cp37m-linux_armv7l.whl(1.33 MB)
deepspeech-0.7.3-cp37-cp37m-linux_aarch64.whl(1.54 MB)
deepspeech-0.7.3-cp35-cp35m-win_amd64.whl(6.60 MB)
deepspeech-0.7.3-cp36-cp36m-win_amd64.whl(6.60 MB)
deepspeech-0.7.3-cp37-cp37m-win_amd64.whl(6.60 MB)
deepspeech-0.7.3-cp38-cp38-win_amd64.whl(6.60 MB)
deepspeech_tflite-0.7.3-cp35-cp35m-win_amd64.whl(964.12 KB)
deepspeech_tflite-0.7.3-cp36-cp36m-win_amd64.whl(964.11 KB)
deepspeech_tflite-0.7.3-cp37-cp37m-win_amd64.whl(964.15 KB)
deepspeech_tflite-0.7.3-cp38-cp38-win_amd64.whl(964.50 KB)
deepspeech_gpu-0.7.3-cp35-cp35m-win_amd64.whl(17.30 MB)
deepspeech_gpu-0.7.3-cp36-cp36m-win_amd64.whl(17.30 MB)
deepspeech_gpu-0.7.3-cp37-cp37m-win_amd64.whl(17.30 MB)
deepspeech_gpu-0.7.3-cp38-cp38-win_amd64.whl(17.30 MB)
ds_ctcdecoder-0.7.3-cp35-cp35m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.7.3-cp36-cp36m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.7.3-cp37-cp37m-win_amd64.whl(2.49 MB)
ds_ctcdecoder-0.7.3-cp38-cp38-win_amd64.whl(2.49 MB)
deepspeech-gpu-0.7.3.tgz(37.28 MB)
deepspeech-0.7.3.tgz(34.88 MB)
deepspeech-tflite-0.7.3.tgz(5.74 MB)
libdeepspeech-0.7.3.maven.zip(3.82 MB)
DeepSpeech.0.7.3.nupkg(6.58 MB)
DeepSpeech-GPU.0.7.3.nupkg(17.28 MB)
DeepSpeech-TFLite.0.7.3.nupkg(946.15 KB)
native_client.amd64.tflite.linux.tar.xz(1.06 MB)
native_client.amd64.tflite.osx.tar.xz(2.06 MB)
kenlm.scorer(909.19 MB)
v0.7.2(Jul 28, 2020)

Source code(tar.gz)
Source code(zip)
kenlm.scorer(909.19 MB)

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Related tags

Overview

Project DeepSpeech

Comments

Results on an example

Releases(v0.10.0-alpha.3)

v0.10.0-alpha.3(Dec 19, 2020)

v0.9.3(Dec 10, 2020)

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

Contributors to 0.9.3 release

v0.9.2(Dec 3, 2020)

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

Contributors to 0.9.2 release

v0.9.1(Nov 4, 2020)

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

Contributors to 0.9.1 release

v0.9.0(Nov 2, 2020)

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

Contributors to 0.9.0 release

v0.9.0-alpha.12(Oct 30, 2020)

v0.9.0-alpha.11(Oct 9, 2020)

v0.9.0-alpha.10(Sep 25, 2020)

v0.9.0-alpha.9(Sep 21, 2020)

v0.9.0-alpha.8(Sep 10, 2020)

v0.8.2(Aug 22, 2020)

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

Contributors to 0.8.2 release

v0.9.0-alpha.7(Aug 18, 2020)

v0.9.0-alpha.6(Aug 12, 2020)

v0.8.1(Aug 11, 2020)

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

Contributors to 0.8.1 release

v0.9.0-alpha.5(Aug 7, 2020)

v0.9.0-alpha.4(Aug 6, 2020)

v0.8.0(Jul 30, 2020)

General

Notable changes from the previous release

Training Regimen + Hyperparameters for fine-tuning

Bindings

Supported Platforms

Documentation

Contact/Getting Help

Contributors to 0.8.0 release

Pythonic bindings for FFmpeg's libraries.