kapre: Keras Audio Preprocessors

Last update: Dec 29, 2022

Overview

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

You can optimize DSP parameters
Your model deployment becomes much simpler and consistent.
Your code and model has less dependencies

vs. Your own implementation

Quick and easy!
Consistent with 1D/2D tensorflow batch shapes
Data format agnostic (channels_first and channels_last)
Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
Kapre layers have some extended APIs from the default tf.signals implementation such as..
- A perfectly invertible STFT and InverseSTFT pair
- Mel-spectrogram with more options
Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
The data loader simply loads audio signals and feed them into the model
In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

See the Jupyter notebook at the example folder

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}

Comments

Migrated functions to tf.keras

This PR addresses #52 by removing the dependency on keras and switching to tensorflow.keras

Proposed version is 0.1.6 because of pull request #56

In particular, #56 keeps the dependency on keras with from keras.utils.conv_utils import conv_output_length

opened by douglas125 27
Spectrogram?

I have an older version of Kapre that has time_frequency.Spectrogram, which is trainable.

However, the new version of Kapre doesn't have Spectrogram anymore. Why?

opened by turian 8
Melspectrogram cant be set 'trainable_fb=False'

Melspectrogram cant be set 'trainable_fb=False',after I set trainable_fb=False,trainable_kernel=False,but seems like it doesnot work.It is still trainable.

opened by zhh6 8
htk=true for mel frequencies

We noticed the current implenetation of the mel_frequencies function (based on Librosa) doesn't include the htk=True option, which is handy when training CNNs because then the frequency scale is fully logarithmic which, in principle, makes more sense for frequency invariant convolutional filters.

What was the motivation for removing this? Any chance it can be added?

opened by justinsalamon 8
Added parallel STFT implementation

Hi!

As I comented in #98, I added a parallel STFT implementation based on the map_fn function following the indications of @zaccharieramzi here.

I've added a use_parallel_stft param (disabled by default) that allows to use this function. I've put this param in as many functions as I can. I also added test cases for every function I can, including an specific test that checks that the result of the tf.signal.stft is equals to the result of the parallel_stft function.

Hope this could serve us well meanwhile tensorflow resolves its issues with the fft implementation.

opened by JPery 7
Amplitude-to-decibel conversion produces different results on different batches

Related to #16, I found another issue that contributes to different prediction results depending on batch size (and the batches themselves). In particular, it occurs when using converting spectrograms to decibels.

https://github.com/keunwoochoi/kapre/blob/master/kapre/backend_keras.py#L17

The maximum is taken over the entire tensor, instead of per example in the batch. This results in different normalization when the examples in a batch are different.
bug

opened by auroracramer 7
Inverse Spectrogram and Mel-Spectrogram Layer?

Namaste!

kapre has become an integral part of all my audio Deep Learning experiments. Powerful! Thanks for providing such a great software!

I was thinking... I guess it would make sense to have layers for inverse spectrogram and inverse mel-spectrogram. Thinking about Autoencoders, this would be even more powerful. I know that reconstructing samples from spectrograms is not the best, but it is possible to a certain degree.

What do you think about that feature request?

Best, Tristan

opened by AI-Guru 7
Hey! The input is too short!

Hi,

I'm encountering an assertion problem when calling your code with a Tensorflow backend.

input_shape = (44100,1)

Could this be a be a problem with "channels_first" / "channels_last"?

Best, Alex

opened by slychief 6
Pip?

It seems you were on pip, but are no longer. Is there anything I could do to help get kapre back on there? We want to use this library in a commercial application, and for our process pip packages are easier to support than a git repository.

opened by ff-rfeather 6

trainable_stft error

Following your example but missing layer definition trainable_stft or something, can you provide example with error resolution?

`# 6 channels (!), maybe 1-sec audio signal
input_shape = (6, 44100) 
sr = 44100
model = Sequential()
model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=src_shape,
                         border_mode='same', sr=sr, n_mels=128,
                         fmin=0.0, fmax=sr/2, power=1.0,
                         return_decibel=False, trainable_fb=False,
                         trainable_kernel=False
                         name='trainable_stft'))`

  File "<ipython-input-24-cea5588ddf1e>", line 13
    name='trainable_stft'))
       ^
SyntaxError: invalid syntax

opened by sildeag 6

`STFT` layer output shape deviates from `STFTTflite` layer in batch dimension
Use Case

I want to convert a STFT layer in my model to a STFTTflite to deploy it to my mobile device. In the documentation I found that another dimension is added to account for complex numbers. But I also encountered a behaviour that is not documented.

Expected Behaviour

input_shape = (2048, 1) # mono signal model = keras.models.Sequential() # TFLite incompatible model model.add(kapre.STFT(n_fft=1024, hop_length=512, input_shape=input_shape)) tflite_model = keras.models.Sequential() # TFLite compatible model tflite_model.add(kapre.STFTTflite(n_fft=1024, hop_length=512, input_shape=input_shape))

model has the output shape (None, 3, 513, 1). Therefore, tflite_model should have the output shape (None, 3, 513, 1, 2).

Observed Behaviour

The output shape of tflite_model is (1, 3, 513, 1, 2) instead of (None, 3, 513, 1, 2).

Problem Solution

If this behaviour is unwanted:

Change the model output format so that the batch dimension is correctly shaped.

Otherwise:

Explain in the documentation why the batch dimension is shaped to 1.

Explain in the documentation how to include this layer into models which expect the batch dimension to be shaped None.
opened by PhilippMatthes 5

Problem incorporating SpecAugument in the training process

Hi,

I'm trying to add a SpecAug layer in the training process of a CNN using the code below:


CLIP_DURATION = 5 
SAMPLING_RATE = 41000
NUM_CHANNELS = 1

INPUT_SHAPE = ((CLIP_DURATION * SAMPLING_RATE), NUM_CHANNELS)

melgram = get_melspectrogram_layer(input_shape = INPUT_SHAPE,
                          n_fft = 2048,
                          hop_length = 512,
                          return_decibel=True,
                          n_mels = 40,
                          mel_f_min = 500,
                          mel_f_max = 15000,
                          input_data_format='channels_last',
                          output_data_format='channels_last')

spec_augment = SpecAugment(freq_mask_param=5,
                          time_mask_param=10,
                          n_freq_masks=2,
                          n_time_masks=3,
                          mask_value=-100)   

model = Sequential()
model.add(melgram)
model.add(spec_augment)

The CNN summary looks like this:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 melspectrogram (Sequential)  (None, 397, 40, 1)       0         
                                                                 
 spec_augment_1 (SpecAugment  (None, 397, 40, 1)       0         
 )                                                               
                                                                 
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________

Compiling and fitting the model

model.compile(loss = 'sparse_categorical_crossentropy', optimizer='adam', metrics = 'accuracy')

early_stop = EarlyStopping(monitor='loss', patience=5)

reduce_LR = ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=4)

checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')

model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])

Then I get the following error:

Epoch 1/50
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-35-e58a056ab523>](https://localhost:8080/#) in <module>
      7 checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
      8 
----> 9 model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])

6 frames
[/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py](https://localhost:8080/#) in tf___apply_masks_to_axis(self, x, axis, mask_param, n_masks)
     78                 try:
     79                     do_return = True
---> 80                     retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
     81                 except:
     82                     do_return = False

TypeError: in user code:

    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
        y_pred = self(x, training=True)
    File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_filepzvfxhgz.py", line 63, in tf__call
        ag__.if_stmt((ag__.ld(training) in (None, False)), if_body_2, else_body_2, get_state_2, set_state_2, ('do_return', 'retval_'), 2)
    File "/tmp/__autograph_generated_filepzvfxhgz.py", line 58, in else_body_2
        retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (), dict(elems=ag__.ld(x), fn=ag__.ld(self)._apply_spec_augment, dtype=ag__.ld(tf).float32, fn_output_signature=ag__.ld(tf).float32), fscope)
    File "/tmp/__autograph_generated_filef27o6c1f.py", line 44, in tf___apply_spec_augment
        ag__.if_stmt((ag__.ld(self).n_time_masks >= 1), if_body_1, else_body_1, get_state_1, set_state_1, ('x',), 1)
    File "/tmp/__autograph_generated_filef27o6c1f.py", line 39, in if_body_1
        x = ag__.converted_call(ag__.ld(self)._apply_masks_to_axis, (ag__.ld(x),), dict(axis=ag__.ld(time_axis), mask_param=ag__.ld(self).time_mask_param, n_masks=ag__.ld(self).n_time_masks), fscope)
    File "/tmp/__autograph_generated_file3vip8w4x.py", line 80, in tf___apply_masks_to_axis
        retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)

    TypeError: Exception encountered when calling layer "spec_augment_1" (type SpecAugment).
    
    in user code:
    
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 299, in call  *
            elems=x, fn=self._apply_spec_augment, dtype=tf.float32, fn_output_signature=tf.float32
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 273, in _apply_spec_augment  *
            x = self._apply_masks_to_axis(
        File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 254, in _apply_masks_to_axis  *
            return tf.where(mask, self.mask_value, x)
    
        TypeError: Input 'e' of 'SelectV2' Op has type float32 that does not match type int32 of argument 't'.
    
    
    Call arguments received by layer "spec_augment_1" (type SpecAugment):
      • x=tf.Tensor(shape=(None, 397, 40, 1), dtype=float32)
      • training=True
      • kwargs=<class 'inspect._empty'>

The shape of X_train is

(2182, 205000, 1)

I'm using Tensorflow 2.9.2, and Python 3.7.15

When I remove the SpecAug layer everything runs fine. I've tested using only the melspec + a mobile net at the end and it runs smooth. The problem is apparently related to SpecAug layer.

Do you have any idea what could be going wrong here? I appreciate any guidance related to the problem. Best regards.

opened by nnbuainain 2

Full-integer quantization and kapre layers
I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.

The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).

I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?

I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).

I am using TensorFlow 2.7.0 and kapre 0.3.7.

Here is my code for testing the tflite-model:

preds = [] # Test and evaluate the TFLite-converted model on unseen test data for i, sample in enumerate(X_test_full_scaled): X = sample if input_details['dtype'] == np.int8: input_scale, input_zero_point = input_details["quantization"] X = sample / input_scale + input_zero_point X = X.reshape((1, 8000, 1)).astype(input_details["dtype"]) interpreter.set_tensor(input_index, X) interpreter.invoke() pred = interpreter.get_tensor(output_index) output_scale, output_zero_point = output_details['quantization'] if output_details['dtype'] == np.int8: pred = pred.astype(np.float32) pred = (pred - output_zero_point) * output_scale pred = np.argmax(pred, axis=1)[0] preds.append(pred) preds = np.array(preds)
opened by eppane 3
Calling Magnitude() and Phase() simultaneously

Hi,

I am looking to call Magnitude() and Phase() simultaneously for the same STFT input and concatenate the magnitude and phase before feeding into the convolution layers in my CNN sequential Keras model.

Is this possible?

Best,

Yang

opened by HsuanYang-Wang 1
about kapre.utiils

Hi, when i used "from kapre.utils import Normalization2D", I met this error which said No module named 'kapre.utils'. I see your package, and found that there is surely no utils.py. I am wondering how to slove it.

Best wishes, Daisy

opened by YiningWang2 1
Function missing in updated version

I noticed there is a functon "kapre.utils.Normalization2D" in the old version, while I cannot find it in the updated version. Why? Is there have any alternative functions?

opened by v3551G 1
trainable DSP parameters

hello contributers and community.

I love your repo! It's eases so much for me! Although having the precomputation in the model is already great I'd like to know how you can optimize DSP parameters. It looks like that this is a feature from old versions (e.g. 0.2) and by default I dont see any trainable params in this layer.

Could you please state if this is still available and how to use it?

happy hacking Paul

opened by bytosaur 2

Releases(Kapre-0.3.7)

Kapre-0.3.7(Jan 21, 2022)
Add SpecAugment layer

Source code(tar.gz)
Source code(zip)
Kapre-0.3.6(Nov 14, 2021)
bugfix (tflite)

Source code(tar.gz)
Source code(zip)
Kapre-0.3.5(Mar 18, 2021)
Add tflite-compatible stft layer

Source code(tar.gz)
Source code(zip)
Kapre-0.3.4(Sep 29, 2020)

Bugfix for get_window_fn()
Source code(tar.gz)
Source code(zip)
0.3.3(Sep 15, 2020)
kapre.augmentation is added

kapre.time_frequency.ConcatenateFrequencyMap is added

kapre.composed.get_frequency_aware_conv2d is added

In STFT and InverseSTFT, keyword arg window_fn is renamed to window_name and it expects string value, not function.

With this update, models with Kapre layers can be loaded with h5 file format.

kapre.backend.get_window_fn is added

Source code(tar.gz)
Source code(zip)

0.3.2(Aug 30, 2020)

- `kapre.signal.Frame` and `kapre.signal.Energy` are added
- `kapre.signal.LogmelToMFCC` is added
- `kapre.signal.MuLawEncoder` and `kapre.signal.MuLawDecoder` are added
- `kapre.composed.get_stft_magnitude_layer()` is added 
- doc is hosted at https://kapre.readthedocs.io/

Source code(tar.gz)
Source code(zip)

0.3.1(Aug 21, 2020)
InverseSTFT and etc.

Source code(tar.gz)
Source code(zip)
0.3.0(Aug 16, 2020)

Breaking and simplifying changes with Tensorflow 2.0 and more tests. Some features are removed. New layer - STFT(). New approach for more complicated representations - see kapre.composed.
Source code(tar.gz)
Source code(zip)
v0.1.8(May 18, 2020)

Added Delta layer
Source code(tar.gz)
Source code(zip)
kapre-master.zip(3.16 MB)
v0.1.7(Feb 20, 2020)

Source code(tar.gz)
Source code(zip)

Owner

Keunwoo Choi

MIR, machine learning, music recommendation.

GitHub Repository

SiamMOT is a region-based Siamese Multi-Object Tracking network that detects and associates object instances simultaneously.

SiamMOT: Siamese Multi-Object Tracking

432 Dec 17, 2022

[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

Focal Frequency Loss - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Focal Fre

460 Jan 04, 2023

Deep Learning segmentation suite designed for 2D microscopy image segmentation

Deep Learning segmentation suite dessigned for 2D microscopy image segmentation This repository provides researchers with a code to try different enco

7 Nov 03, 2022

A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction.

Graph2SMILES A graph-to-sequence model for one-step retrosynthesis and reaction outcome prediction. 1. Environmental setup System requirements Ubuntu:

29 Nov 18, 2022

(to be released) [NeurIPS'21] Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs

Higher-Order Transformers Kim J, Oh S, Hong S, Transformers Generalize DeepSets and Can be Extended to Graphs and Hypergraphs, NeurIPS 2021. [arxiv] W

44 Dec 28, 2022

Visual Tracking by TridenAlign and Context Embedding

Visual Tracking by TridentAlign and Context Embedding (TACT) Test code for "Visual Tracking by TridentAlign and Context Embedding" Janghoon Choi, Juns

32 Aug 25, 2021

RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

90 Dec 08, 2022

Indices Matter: Learning to Index for Deep Image Matting

IndexNet Matting This repository includes the official implementation of IndexNet Matting for deep image matting, presented in our paper: Indices Matt

357 Nov 26, 2022

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data

SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data Au

14 Nov 28, 2022

PyTorch implementation for paper Neural Marching Cubes.

NMC PyTorch implementation for paper Neural Marching Cubes, Zhiqin Chen, Hao Zhang. Paper | Supplementary Material (to be updated) Citation If you fin

109 Dec 27, 2022

Deep learning for Engineers - Physics Informed Deep Learning

SciANN: Neural Networks for Scientific Computations SciANN is a Keras wrapper for scientific computations and physics-informed deep learning. New to S

195 Jan 03, 2023

CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification (ICCV2021)

CM-NAS Official Pytorch code of paper CM-NAS: Cross-Modality Neural Architecture Search for Visible-Infrared Person Re-Identification in ICCV2021. Vis

40 Nov 25, 2022

Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021

Image Translation with ASAPNets Spatially-Adaptive Pixelwise Networks for Fast Image Translation, CVPR 2021 Webpage | Paper | Video Installation insta

100 Dec 28, 2022

Run containerized, rootless applications with podman

Why? restrict scope of file system access run any application without root privileges creates usable "Desktop applications" to integrate into your nor

119 Dec 27, 2022

Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

Lens by Credo AI - Responsible AI Assessment Framework Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data a

27 Dec 14, 2022

kapre: Keras Audio Preprocessors

Related tags

Overview

Kapre

Why Kapre?

vs. Pre-computation

vs. Your own implementation

Workflow with Kapre

Installation

API Documentation

One-shot example

Citation

Comments

Use Case

Expected Behaviour

Observed Behaviour

Problem Solution

Releases(Kapre-0.3.7)

Kapre-0.3.7(Jan 21, 2022)

Kapre-0.3.6(Nov 14, 2021)

Kapre-0.3.5(Mar 18, 2021)

Kapre-0.3.4(Sep 29, 2020)

0.3.3(Sep 15, 2020)

0.3.2(Aug 30, 2020)

0.3.1(Aug 21, 2020)

0.3.0(Aug 16, 2020)

v0.1.8(May 18, 2020)

v0.1.7(Feb 20, 2020)