kapre: Keras Audio Preprocessors

Overview

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
    • A perfectly invertible STFT and InverseSTFT pair
    • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}
Comments
  • Migrated functions to tf.keras

    Migrated functions to tf.keras

    This PR addresses #52 by removing the dependency on keras and switching to tensorflow.keras

    Proposed version is 0.1.6 because of pull request #56

    In particular, #56 keeps the dependency on keras with from keras.utils.conv_utils import conv_output_length

    opened by douglas125 27
  • Spectrogram?

    Spectrogram?

    I have an older version of Kapre that has time_frequency.Spectrogram, which is trainable.

    However, the new version of Kapre doesn't have Spectrogram anymore. Why?

    opened by turian 8
  • Melspectrogram cant be set 'trainable_fb=False'

    Melspectrogram cant be set 'trainable_fb=False'

    Melspectrogram cant be set 'trainable_fb=False',after I set trainable_fb=False,trainable_kernel=False,but seems like it doesnot work.It is still trainable.

    opened by zhh6 8
  • htk=true for mel frequencies

    htk=true for mel frequencies

    We noticed the current implenetation of the mel_frequencies function (based on Librosa) doesn't include the htk=True option, which is handy when training CNNs because then the frequency scale is fully logarithmic which, in principle, makes more sense for frequency invariant convolutional filters.

    What was the motivation for removing this? Any chance it can be added?

    opened by justinsalamon 8
  • Added parallel STFT implementation

    Added parallel STFT implementation

    Hi!

    As I comented in #98, I added a parallel STFT implementation based on the map_fn function following the indications of @zaccharieramzi here.

    I've added a use_parallel_stft param (disabled by default) that allows to use this function. I've put this param in as many functions as I can. I also added test cases for every function I can, including an specific test that checks that the result of the tf.signal.stft is equals to the result of the parallel_stft function.

    Hope this could serve us well meanwhile tensorflow resolves its issues with the fft implementation.

    opened by JPery 7
  • Amplitude-to-decibel conversion produces different results on different batches

    Amplitude-to-decibel conversion produces different results on different batches

    Related to #16, I found another issue that contributes to different prediction results depending on batch size (and the batches themselves). In particular, it occurs when using converting spectrograms to decibels.

    https://github.com/keunwoochoi/kapre/blob/master/kapre/backend_keras.py#L17

    The maximum is taken over the entire tensor, instead of per example in the batch. This results in different normalization when the examples in a batch are different.

    bug 
    opened by auroracramer 7
  • Inverse Spectrogram and Mel-Spectrogram Layer?

    Inverse Spectrogram and Mel-Spectrogram Layer?

    Namaste!

    kapre has become an integral part of all my audio Deep Learning experiments. Powerful! Thanks for providing such a great software!

    I was thinking... I guess it would make sense to have layers for inverse spectrogram and inverse mel-spectrogram. Thinking about Autoencoders, this would be even more powerful. I know that reconstructing samples from spectrograms is not the best, but it is possible to a certain degree.

    What do you think about that feature request?

    Best, Tristan

    opened by AI-Guru 7
  • Hey! The input is too short!

    Hey! The input is too short!

    Hi,

    I'm encountering an assertion problem when calling your code with a Tensorflow backend.

    input_shape = (44100,1)

    Could this be a be a problem with "channels_first" / "channels_last"?

    Best, Alex

    opened by slychief 6
  • Pip?

    Pip?

    It seems you were on pip, but are no longer. Is there anything I could do to help get kapre back on there? We want to use this library in a commercial application, and for our process pip packages are easier to support than a git repository.

    opened by ff-rfeather 6
  • trainable_stft error

    trainable_stft error

    Following your example but missing layer definition trainable_stft or something, can you provide example with error resolution?

    `# 6 channels (!), maybe 1-sec audio signal
    input_shape = (6, 44100) 
    sr = 44100
    model = Sequential()
    model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=src_shape,
                             border_mode='same', sr=sr, n_mels=128,
                             fmin=0.0, fmax=sr/2, power=1.0,
                             return_decibel=False, trainable_fb=False,
                             trainable_kernel=False
                             name='trainable_stft'))`
    
      File "<ipython-input-24-cea5588ddf1e>", line 13
        name='trainable_stft'))
           ^
    SyntaxError: invalid syntax
    
    opened by sildeag 6
  • `STFT` layer output shape deviates from `STFTTflite` layer in batch dimension

    `STFT` layer output shape deviates from `STFTTflite` layer in batch dimension

    Use Case

    I want to convert a STFT layer in my model to a STFTTflite to deploy it to my mobile device. In the documentation I found that another dimension is added to account for complex numbers. But I also encountered a behaviour that is not documented.

    Expected Behaviour

    input_shape = (2048, 1)  # mono signal
    
    model = keras.models.Sequential()  # TFLite incompatible model
    model.add(kapre.STFT(n_fft=1024, hop_length=512, input_shape=input_shape))
    
    tflite_model = keras.models.Sequential()  # TFLite compatible model
    tflite_model.add(kapre.STFTTflite(n_fft=1024, hop_length=512, input_shape=input_shape))
    

    model has the output shape (None, 3, 513, 1). Therefore, tflite_model should have the output shape (None, 3, 513, 1, 2).

    Observed Behaviour

    The output shape of tflite_model is (1, 3, 513, 1, 2) instead of (None, 3, 513, 1, 2).

    Problem Solution

    • If this behaviour is unwanted:
      • Change the model output format so that the batch dimension is correctly shaped.
    • Otherwise:
      • Explain in the documentation why the batch dimension is shaped to 1.
      • Explain in the documentation how to include this layer into models which expect the batch dimension to be shaped None.
    opened by PhilippMatthes 5
  • Problem incorporating SpecAugument in the training process

    Problem incorporating SpecAugument in the training process

    Hi,

    I'm trying to add a SpecAug layer in the training process of a CNN using the code below:

    
    CLIP_DURATION = 5 
    SAMPLING_RATE = 41000
    NUM_CHANNELS = 1
    
    INPUT_SHAPE = ((CLIP_DURATION * SAMPLING_RATE), NUM_CHANNELS)
    
    melgram = get_melspectrogram_layer(input_shape = INPUT_SHAPE,
                              n_fft = 2048,
                              hop_length = 512,
                              return_decibel=True,
                              n_mels = 40,
                              mel_f_min = 500,
                              mel_f_max = 15000,
                              input_data_format='channels_last',
                              output_data_format='channels_last')
    
    spec_augment = SpecAugment(freq_mask_param=5,
                              time_mask_param=10,
                              n_freq_masks=2,
                              n_time_masks=3,
                              mask_value=-100)   
    
    model = Sequential()
    model.add(melgram)
    model.add(spec_augment)
    

    The CNN summary looks like this:

    Model: "sequential_2"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     melspectrogram (Sequential)  (None, 397, 40, 1)       0         
                                                                     
     spec_augment_1 (SpecAugment  (None, 397, 40, 1)       0         
     )                                                               
                                                                     
    =================================================================
    Total params: 0
    Trainable params: 0
    Non-trainable params: 0
    _________________________________________________________________
    

    Compiling and fitting the model

    model.compile(loss = 'sparse_categorical_crossentropy', optimizer='adam', metrics = 'accuracy')
    
    early_stop = EarlyStopping(monitor='loss', patience=5)
    
    reduce_LR = ReduceLROnPlateau(monitor="val_loss",factor=0.1,patience=4)
    
    checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
    
    model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])
    

    Then I get the following error:

    Epoch 1/50
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    [<ipython-input-35-e58a056ab523>](https://localhost:8080/#) in <module>
          7 checkpointer = ModelCheckpoint(filepath = 'saved_models/bird_song_classification.hdf5')
          8 
    ----> 9 model.fit(X_train, y_train, validation_data = (X_val, y_val), epochs = 50, batch_size = 32, callbacks = [early_stop, checkpointer, reduce_LR])
    
    6 frames
    [/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py](https://localhost:8080/#) in tf___apply_masks_to_axis(self, x, axis, mask_param, n_masks)
         78                 try:
         79                     do_return = True
    ---> 80                     retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
         81                 except:
         82                     do_return = False
    
    TypeError: in user code:
    
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1051, in train_function  *
            return step_function(self, iterator)
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1040, in step_function  **
            outputs = model.distribute_strategy.run(run_step, args=(data,))
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 1030, in run_step  **
            outputs = model.train_step(data)
        File "/usr/local/lib/python3.7/dist-packages/keras/engine/training.py", line 889, in train_step
            y_pred = self(x, training=True)
        File "/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py", line 67, in error_handler
            raise e.with_traceback(filtered_tb) from None
        File "/tmp/__autograph_generated_filepzvfxhgz.py", line 63, in tf__call
            ag__.if_stmt((ag__.ld(training) in (None, False)), if_body_2, else_body_2, get_state_2, set_state_2, ('do_return', 'retval_'), 2)
        File "/tmp/__autograph_generated_filepzvfxhgz.py", line 58, in else_body_2
            retval_ = ag__.converted_call(ag__.ld(tf).map_fn, (), dict(elems=ag__.ld(x), fn=ag__.ld(self)._apply_spec_augment, dtype=ag__.ld(tf).float32, fn_output_signature=ag__.ld(tf).float32), fscope)
        File "/tmp/__autograph_generated_filef27o6c1f.py", line 44, in tf___apply_spec_augment
            ag__.if_stmt((ag__.ld(self).n_time_masks >= 1), if_body_1, else_body_1, get_state_1, set_state_1, ('x',), 1)
        File "/tmp/__autograph_generated_filef27o6c1f.py", line 39, in if_body_1
            x = ag__.converted_call(ag__.ld(self)._apply_masks_to_axis, (ag__.ld(x),), dict(axis=ag__.ld(time_axis), mask_param=ag__.ld(self).time_mask_param, n_masks=ag__.ld(self).n_time_masks), fscope)
        File "/tmp/__autograph_generated_file3vip8w4x.py", line 80, in tf___apply_masks_to_axis
            retval_ = ag__.converted_call(ag__.ld(tf).where, (ag__.ld(mask), ag__.ld(self).mask_value, ag__.ld(x)), None, fscope)
    
        TypeError: Exception encountered when calling layer "spec_augment_1" (type SpecAugment).
        
        in user code:
        
            File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 299, in call  *
                elems=x, fn=self._apply_spec_augment, dtype=tf.float32, fn_output_signature=tf.float32
            File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 273, in _apply_spec_augment  *
                x = self._apply_masks_to_axis(
            File "/usr/local/lib/python3.7/dist-packages/kapre/augmentation.py", line 254, in _apply_masks_to_axis  *
                return tf.where(mask, self.mask_value, x)
        
            TypeError: Input 'e' of 'SelectV2' Op has type float32 that does not match type int32 of argument 't'.
        
        
        Call arguments received by layer "spec_augment_1" (type SpecAugment):
          • x=tf.Tensor(shape=(None, 397, 40, 1), dtype=float32)
          • training=True
          • kwargs=<class 'inspect._empty'>
    

    The shape of X_train is

    (2182, 205000, 1)
    

    I'm using Tensorflow 2.9.2, and Python 3.7.15

    When I remove the SpecAug layer everything runs fine. I've tested using only the melspec + a mobile net at the end and it runs smooth. The problem is apparently related to SpecAug layer.

    Do you have any idea what could be going wrong here? I appreciate any guidance related to the problem. Best regards.

    opened by nnbuainain 2
  • Full-integer quantization and kapre layers

    Full-integer quantization and kapre layers

    I am training a model which includes the mel-spectrogram block from get_melspectrogram_layer() right after the input layer. Training goes well, and I am able to change the specific mel-spec-layers to their TFLite-counterparts (STFTTflite, MagnitudeTflite) afterwards. I have checked also that the model performs as well as before.

    The model also perfoms as expected when converting the model to .tflite using dynamic range quantization. However, when using full-integer quantization, the model loses its accuracy (see (https://www.tensorflow.org/lite/performance/post_training_quantization#integer_only).

    I suppose the mel-spec starts to significantly differ as in full-integer quantization, the input values are projected to new range (int8). Is there any way to make it work with full-integer quantization?

    I guess I need to separate the mel-spec-layer from the model as a preprocessing step in order to succeed with full-integer quantization, i.e., apply the input quantization to the output values of mel-spec layer. But then I would have to deploy two models to the edge device, where the input goes first to the mel-spec-block and then to the rest of the model (?).

    I am using TensorFlow 2.7.0 and kapre 0.3.7.

    Here is my code for testing the tflite-model:

    preds = []
    # Test and evaluate the TFLite-converted model on unseen test data
    for i, sample in enumerate(X_test_full_scaled):
        X = sample
        
        if input_details['dtype'] == np.int8:
            input_scale, input_zero_point = input_details["quantization"]
            X = sample / input_scale + input_zero_point
        
        X = X.reshape((1, 8000, 1)).astype(input_details["dtype"])
        
        interpreter.set_tensor(input_index, X)
        interpreter.invoke()
        pred = interpreter.get_tensor(output_index)
        
        output_scale, output_zero_point = output_details['quantization']
        if output_details['dtype'] == np.int8:
            pred = pred.astype(np.float32)
            pred = (pred - output_zero_point) * output_scale
        
        pred = np.argmax(pred, axis=1)[0]
        preds.append(pred)
    
    preds = np.array(preds)
    
    opened by eppane 3
  • Calling Magnitude() and Phase() simultaneously

    Calling Magnitude() and Phase() simultaneously

    Hi,

    I am looking to call Magnitude() and Phase() simultaneously for the same STFT input and concatenate the magnitude and phase before feeding into the convolution layers in my CNN sequential Keras model.

    Is this possible?

    Best,

    Yang

    opened by HsuanYang-Wang 1
  • about kapre.utiils

    about kapre.utiils

    Hi, when i used "from kapre.utils import Normalization2D", I met this error which said No module named 'kapre.utils'. I see your package, and found that there is surely no utils.py. I am wondering how to slove it.

    Best wishes, Daisy

    opened by YiningWang2 1
  • Function missing in updated version

    Function missing in updated version

    I noticed there is a functon "kapre.utils.Normalization2D" in the old version, while I cannot find it in the updated version. Why? Is there have any alternative functions?

    opened by v3551G 1
  • trainable DSP parameters

    trainable DSP parameters

    hello contributers and community.

    I love your repo! It's eases so much for me! Although having the precomputation in the model is already great I'd like to know how you can optimize DSP parameters. It looks like that this is a feature from old versions (e.g. 0.2) and by default I dont see any trainable params in this layer.

    Could you please state if this is still available and how to use it?

    happy hacking Paul

    opened by bytosaur 2
Releases(Kapre-0.3.7)
Owner
Keunwoo Choi
MIR, machine learning, music recommendation.
Keunwoo Choi
FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks

FinGAT: A Financial Graph Attention Networkto Recommend Top-K Profitable Stocks This is our implementation for the paper: FinGAT: A Financial Graph At

Yu-Che Tsai 64 Dec 13, 2022
Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness This repository contains the code used for the exper

H.R. Oosterhuis 28 Nov 29, 2022
On-device wake word detection powered by deep learning.

Porcupine Made in Vancouver, Canada by Picovoice Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening

Picovoice 2.8k Dec 29, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 06, 2022
PyTorch implementations of deep reinforcement learning algorithms and environments

Deep Reinforcement Learning Algorithms with PyTorch This repository contains PyTorch implementations of deep reinforcement learning algorithms and env

Petros Christodoulou 4.7k Jan 04, 2023
SelfAugment extends MoCo to include automatic unsupervised augmentation selection.

SelfAugment extends MoCo to include automatic unsupervised augmentation selection. In addition, we've included the ability to pretrain on several new datasets and included a wandb integration.

Colorado Reed 24 Oct 26, 2022
Log4j JNDI inj. vuln scanner

Log-4-JAM - Log 4 Just Another Mess Log4j JNDI inj. vuln scanner Requirements pip3 install requests_toolbelt Usage # make sure target list has http/ht

Ashish Kunwar 66 Nov 09, 2022
Direct design of biquad filter cascades with deep learning by sampling random polynomials.

IIRNet Direct design of biquad filter cascades with deep learning by sampling random polynomials. Usage git clone https://github.com/csteinmetz1/IIRNe

Christian J. Steinmetz 55 Nov 02, 2022
Cross-Document Coreference Resolution

Cross-Document Coreference Resolution This repository contains code and models for end-to-end cross-document coreference resolution, as decribed in ou

Arie Cattan 29 Nov 28, 2022
Intrusion Detection System using ensemble learning (machine learning)

IDS-ML implementation of an intrusion detection system using ensemble machine learning methods Data set This project is carried out using the UNSW-15

4 Nov 25, 2022
PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

PerfFuzz Performance problems in software can arise unexpectedly when programs are provided with inputs that exhibit pathological behavior. But how ca

Caroline Lemieux 125 Nov 18, 2022
Image Segmentation Evaluation

Image Segmentation Evaluation Martin Keršner, [email protected] Evaluation

Martin Kersner 273 Oct 28, 2022
Contrastive Multi-View Representation Learning on Graphs

Contrastive Multi-View Representation Learning on Graphs This work introduces a self-supervised approach based on contrastive multi-view learning to l

Kaveh 208 Dec 23, 2022
Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

The Second Situated Interactive MultiModal Conversations (SIMMC 2.0) Challenge 2021 Welcome to the Second Situated Interactive Multimodal Conversation

Facebook Research 81 Nov 22, 2022
Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark (ICCV 2021)

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark (ICCV 2021) Kun Wang, Zhenyu Zhang, Zhiqiang Yan, X

kunwang 66 Nov 24, 2022
Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral] Learning to Disambiguate Strongly In

Zicong Fan 40 Dec 22, 2022
Official repository for "On Generating Transferable Targeted Perturbations" (ICCV 2021)

On Generating Transferable Targeted Perturbations (ICCV'21) Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli Paper:

Muzammal Naseer 46 Nov 17, 2022
deep_image_prior_extension

Code for "Is Deep Image Prior in Need of a Good Education?" Project page: https://jleuschn.github.io/docs.educated_deep_image_prior/. Supplementary Ma

riccardo barbano 7 Jan 09, 2022
PyTorch implementation of ECCV 2020 paper "Foley Music: Learning to Generate Music from Videos "

Foley Music: Learning to Generate Music from Videos This repo holds the code for the framework presented on ECCV 2020. Foley Music: Learning to Genera

Chuang Gan 30 Nov 03, 2022
Segmentation models with pretrained backbones. Keras and TensorFlow Keras.

Python library with Neural Networks for Image Segmentation based on Keras and TensorFlow. The main features of this library are: High level API (just

Pavel Yakubovskiy 4.2k Jan 09, 2023