OpenL3: Open-source deep audio and image embeddings

Overview

OpenL3

OpenL3 is an open-source Python library for computing deep audio and image embeddings.

PyPI MIT license Build Status Coverage Status Documentation Status Downloads

Please refer to the documentation for detailed instructions and examples.

UPDATE: Openl3 now has Tensorflow 2 support!

The audio and image embedding models provided here are published as part of [1], and are based on the Look, Listen and Learn approach [2]. For details about the embedding models and how they were trained, please see:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

Installing OpenL3

Dependencies

libsndfile

OpenL3 depends on the pysoundfile module to load audio files, which depends on the non-Python library libsndfile. On Windows and macOS, these will be installed via pip and you can therefore skip this step. However, on Linux this must be installed manually via your platform's package manager. For Debian-based distributions (such as Ubuntu), this can be done by simply running

apt-get install libsndfile1

Alternatively, if you are using conda, you can install libsndfile simply by running

conda install -c conda-forge libsndfile

For more detailed information, please consult the pysoundfile installation documentation.

Tensorflow

Starting with openl3>=0.4.0, Openl3 has been upgraded to use Tensorflow 2. Because Tensorflow 2 and higher now includes GPU support, tensorflow>=2.0.0 is included as a dependency and no longer needs to be installed separately.

If you are interested in using Tensorflow 1.x, please install using pip install 'openl3<=0.3.1'.

Tensorflow 1x & OpenL3 <= v0.3.1

Because Tensorflow 1.x comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

On most platforms, either of the following commands should properly install Tensorflow:

pip install "tensorflow<1.14" # CPU-only version
pip install "tensorflow-gpu<1.14" # GPU version

For more detailed information, please consult the Tensorflow installation documentation.

Installing OpenL3

The simplest way to install OpenL3 is by using pip, which will also install the additional required dependencies if needed. To install OpenL3 using pip, simply run

pip install openl3

To install the latest version of OpenL3 from source:

  1. Clone or pull the latest version, only retrieving the main branch to avoid downloading the branch where we store the model weight files (these will be properly downloaded during installation).

     git clone [email protected]:marl/openl3.git --branch main --single-branch
    
  2. Install using pip to handle python dependencies. The installation also downloads model files, which requires a stable network connection.

     cd openl3
     pip install -e .
    

Using OpenL3

To help you get started with OpenL3 please see the tutorial.

Acknowledging OpenL3

Please cite the following papers when using OpenL3 in your work:

[1] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

[2] Look, Listen and Learn
Relja Arandjelović and Andrew Zisserman
IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017.

Model Weights License

The model weights are made available under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Comments
  • Implement image embedding API

    Implement image embedding API

    Add the image embedding API to the library. This should be fairly similar to the existing audio API. I'll add a candidate interface once I've given it more thought.

    enhancement 
    opened by auroracramer 26
  • Refactor code and models to support TF 2.x and tf.keras

    Refactor code and models to support TF 2.x and tf.keras

    At some point in the somewhat near future, we should establish support for TF 2.x and tf.keras. The main reasons for this are:

    • To remain compatible with new releases of TF and Keras (the official version of which is now tf.keras) and make use of bug fixes, etc. some regression issues. As we have found (#42, #43), installing with newer versions of either package break installation and usage.
    • To address multiple vulnerabilities contained in tensorflow < 1.15.2.
    • To simplify the installation process; since TF 2.x now includes support for both CPU and GPU, we can now directly include tensorflow in the project dependencies, (as brought up in #39).

    A priori, it seems like the main things to do are:

    • Updating the dependencies in setup.py to include tensorflow
    • Modifying the model definitions to be tf.keras compatible
    • Porting the model files to a format that can be loaded by tf.keras with TF 2.x

    The main concern that comes to mind is the regression tests. We have already seen that tensorflow > 1.13 causes regression tests to fail. I imagine that this will only worsen as we introduce not only a new major release to TF, but also a divergence in Keras with tf.keras. @justinsalamon, what are your thoughts?

    opened by auroracramer 16
  • Add batch processing mode

    Add batch processing mode

    Something else to consider is a batch processing mode. i.e. making more efficient use of the GPU by predicting multiple files at once.

    Probably the least messy option would be to separate some of the interior code of get_audio_embedding for the case of audio into their own functions and make a get_audio_embedding_batch function that calls most of the same functions. We would also have a process_audio_file_batch function.

    I thought about changing get_audio_embedding so that it can either take in a single audio array, or a list of audio arrays (and probably a list of corresponding sample rates). While this might consolidate multiple usecases into one function, it'd probably get pretty messy so it's probably best we don't do this.

    Regarding the visual frame embedding extraction, we could ask the same question, though there might be more nuance depending on if we allow for individual images to be processed or not (I think we should). In the case of videos though, multiple frames are already being provided at once. So it raises a question (to me at least) whether we allow for get_vframe_embedding (as I'm currently calling it) should support both a single frame as well as multiple. This also raises the question of whether we allow for frames of multiple sizes or not.

    Thoughts?

    opened by auroracramer 10
  • tensorflow 2.1 doesn't require separate pip installes for gpu and cpu

    tensorflow 2.1 doesn't require separate pip installes for gpu and cpu

    Thanks for this great package! We love to use it!

    You state

    Because Tensorflow comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

    This is not the case anymore in 2.1 so you could (if 2.1 is supported) make tensorflow part of the standard requirements.

    opened by faroit 8
  • skimage submodules not imported correctly, regression tests fail

    skimage submodules not imported correctly, regression tests fail

    skimage uses lazy imports, so we need to import each submodule explictly (e.g. import skimage.transform; skimage.transform.rescale(X, s) instead of import skimage; skimage.transform.rescale(X, s)).

    opened by auroracramer 7
  • Output file format and naming convention

    Output file format and naming convention

    I have some questions about how to deal with embedding outputs:

    • Should we include the timestamps? If so do we save it in the same file?
    • What format should we use?
      • h5: Nice compression options, but since these typically shouldn't be large, it might be more annoying to deal with than other options
      • npy/npz: Standard approach, can easily load numpy arrays directly
      • JAMS: Using JAMs would help expand its use and would have a natural way to associate the timestamps with each embedding, but storing all of the values as text might be cumbersome and make the files big, especially if they are long
    • Should we use the embedding type to name the embedding? e.g. example_audio_openl3_6144emb_linear_music.<ext> Or should we just keep it simple?
      • It might be good if the user is comparing different embeddings, but it might be cumbersome if people just want to use a single type of embeddings. Of course we could add an option for this, but adding another option for something like this might be excessive.
    opened by auroracramer 6
  • Fix API documentation and build

    Fix API documentation and build

    Fix image embedding size in load_image_embedding_model() docstring, mock missing tensorflow.keras modules in doc/conf.py to fix API documentation build, and remove pin on sphinx version. Addresses #60 and #71.

    opened by auroracramer 4
  • Openl3 0.4.0 - Support for Tensorflow 2

    Openl3 0.4.0 - Support for Tensorflow 2

    Figured I would push this out while I'm waiting for something else to build.

    Related PR containing updated models: https://github.com/marl/openl3/pull/61

    Setup Changes

    • Openl3 now requires tensorflow>=2.0.0 and installs it by default (there is no longer a separate GPU package)
    • Now requires kapre>=0.3.5 - TODO: make sure we have the exact minimum kapre version - I remember checking git blame, but haven't tested anything
    • keras as a standalone package was removed from dependencies (we're using tf.keras)
    • travis.yml: removed python 2.7 & 3.5 and added 3.7 & 3.8 since tensorflow only supports 3.6-3.8
      • needed to install Cython first for python 3.8 in order to install skimage (RuntimeError: Cython >= 0.23.4 is required to build scikit-image from git checkout)

    Doc Changes

    • Changed tensorflow dependency message to reflect updates
    • Added "Choosing an Audio Frontend (CPU / GPU)" section to tutorial.rst

    Code Changes

    • core.py
      • added params: get_audio_embedding(frontend='auto'), process_audio_file(frontend='auto'), process_video_file(audio_frontend='auto')
      • Added function preprocess_audio(y, sr, input_repr=None) that encapsulates the librosa frontend (as well as preprocessing for the kapre frontend)
        • for librosa, you pass the input_repr and for kapre inputs, you leave input_repr=None
    • cli.py - added cli flag (--audio-frontend)
    • models.py
      • added param load_audio_embedding_model(frontend='kapre')
      • using new kapre composite layer helpers get_stft_magnitude_layer
      • disabled latest mag2db code and patched in the legacy version (kapre_v0_1_4_magnitude_to_decibel)
      • kapre is now technically an optional dependency (will only try to import if we try to load a model with kapre frontend)
        • we still install it with setup.py, but if someone wanted to, they could install everything manually without kapre and openl3 should still work for the librosa frontend

    Test Changes

    • we now have separate regression data for kapre/librosa
    • added tests for frontend model following the existing model tests
    • converted some tests to use pytest.mark.parameterize to avoid doubling the length of the tests for testing frontends

    Dev Util Changes

    • added tests/generate_regression.py which generates new regression data
    • added tests/package_weights.py which takes the weights files in the openl3 package folder and gzips them for git push
    • added tests/migration/remove_layers.py which lets us strip out the spectrogram (or any other) layers
    • tests/migration/ has a few other analysis things/notebooks that were used early on in the frontend testing

    Before merging:

    • double check dependency versions
    • are the pinned versions still valid? might need some help with this one
    • Change models download url in setup.py to main repo (currently it's pointing at my fork so I could test with travis)
    • should we integrate changes from https://github.com/marl/openl3/pull/55?
    • should we run the classifier comparison one more time right before merging as a safety check? idk
    opened by beasteers 4
  • Add batch processing functionality

    Add batch processing functionality

    Adds batch processing functionality to all embedding computation functions and file processing functions, allowing for one or more inputs to be processed. When possible, multiple inputs are put in the same input batch to the network for inference.

    opened by auroracramer 4
  • Add image embedding API

    Add image embedding API

    Adds image embedding API, including functions for processing both images and videos in addition to audio files. Additionally changes the CLI to account for different modalities of inputs (i.e. audio, image, or video).

    opened by auroracramer 4
  • API reference in documentation missing

    API reference in documentation missing

    When going to https://openl3.readthedocs.io/en/latest/api.html I only see the headers

    Core functionality
    Models functionality
    

    with nothing under each header. Expected would be a list of classes and functions and the associated documentation. At least those APIs that are mentioned in the tutorial.

    opened by jonnor 4
  • Clarification on input representation

    Clarification on input representation

    I was just reading through the source code on openl3 > core.py and noticed something in functions (1. _librosa_linear_frontend and 2. _librosa_mel_frontend). It seems librosa.power_to_db() is being used on a magnitude, not power spectrum? Instead should it be using librosa.amplitude_to_db()?

    opened by alisonbma 0
  • Example of fine-tuning the audio sub-network.

    Example of fine-tuning the audio sub-network.

    I want to perform the fine-tuning of the audio subnetwork to fit my audio classification problem. To this aim, I plan to use the _construct_linear_audio_network, _construct_mel128_audio_network, and _construct_mel256_audio_network functions to load the pre-trained Keras model and then append one or more fully-connected layers to perform the classification.

    However, I don't understand the Input shape of such models. According to the models.py, the input shape is input_shape = (1, asr * audio_window_dur), where asr= 48000 and audio_window_dur=1; what's asr and why it has that value? Can you please provide an example of using the Keras model from the .wav file?

    I really appreciate any help you can provide.

    opened by mattiacampana 0
  • Extract activation from lower audio layers

    Extract activation from lower audio layers

    Hi, I was wondering how I can extract activations from the lower audio layers. I guess "embeddings" are the same as "MaxPool_3"? and if that's correct, then "MaxPool", "MaxPool_1", and "MaxPool_2" corresponds to the first, second, and third max-pooling layers in the Audio ConvNet as explained in Arandjelovic and Zisserman 2018 (https://arxiv.org/abs/1712.06651)?

    opened by seunggookim 1
  • m1 macos installation problem

    m1 macos installation problem

    Hi, I am using an m1 macbook, and when I try to install openl3, I encounter the problem when the script tries to install h5py, but I have installed h5py in my virtual environment. The problem is as below: building 'h5py.defs' extension creating build/temp.macosx-11.0-arm64-3.8 creating build/temp.macosx-11.0-arm64-3.8/private creating build/temp.macosx-11.0-arm64-3.8/private/var creating build/temp.macosx-11.0-arm64-3.8/private/var/folders creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308 creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include -arch arm64 -fPIC -O2 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include -arch arm64 -DH5_USE_16_API -I./h5py -I/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/lzf -I/opt/local/include -I/usr/local/include -I/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include -I/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/python3.8 -c /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c -o build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.o In file included from /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:731: In file included from ./h5py/api_compat.h:26: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969: /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
    ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16556:56: error: too few arguments to function call, expected 3, have 2 __pyx_t_1 = H5Oget_info(__pyx_v_loc_id, __pyx_v_oinfo); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1509, __pyx_L1_error) ~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:497:15: note: 'H5Oget_info3' declared here H5_DLL herr_t H5Oget_info3(hid_t loc_id, H5O_info2_t *oinfo, unsigned fields); ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16671:95: error: too few arguments to function call, expected 5, have 4 __pyx_t_1 = H5Oget_info_by_name(__pyx_v_loc_id, __pyx_v_name, __pyx_v_oinfo, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1520, __pyx_L1_error) ~~~~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:557:15: note: 'H5Oget_info_by_name3' declared here H5_DLL herr_t H5Oget_info_by_name3(hid_t loc_id, const char *name, H5O_info2_t *oinfo, unsigned fields, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16786:144: error: too few arguments to function call, expected 8, have 7 __pyx_t_1 = H5Oget_info_by_idx(__pyx_v_loc_id, __pyx_v_group_name, __pyx_v_idx_type, __pyx_v_order, __pyx_v_n, __pyx_v_oinfo, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1531, __pyx_L1_error) ~~~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:631:15: note: 'H5Oget_info_by_idx3' declared here H5_DLL herr_t H5Oget_info_by_idx3(hid_t loc_id, const char *group_name, H5_index_t idx_type, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:17821:100: error: too few arguments to function call, expected 6, have 5 __pyx_t_1 = H5Ovisit(__pyx_v_obj_id, __pyx_v_idx_type, __pyx_v_order, __pyx_v_op, __pyx_v_op_data); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1630, __pyx_L1_error) ~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:1326:15: note: 'H5Ovisit3' declared here H5_DLL herr_t H5Ovisit3(hid_t obj_id, H5_index_t idx_type, H5_iter_order_t order, H5O_iterate2_t op, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:17936:143: error: too few arguments to function call, expected 8, have 7 __pyx_t_1 = H5Ovisit_by_name(__pyx_v_loc_id, __pyx_v_obj_name, __pyx_v_idx_type, __pyx_v_order, __pyx_v_op, __pyx_v_op_data, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1641, __pyx_L1_error) ~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:1492:15: note: 'H5Ovisit_by_name3' declared here H5_DLL herr_t H5Ovisit_by_name3(hid_t loc_id, const char *obj_name, H5_index_t idx_type, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:21846:13: warning: assigning to 'void *' from 'const void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] __pyx_t_1 = H5Pget_driver_info(__pyx_v_plist_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2016, __pyx_L1_error) ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:34606:68: error: too few arguments to function call, expected 4, have 3 __pyx_t_1 = H5Sencode(__pyx_v_obj_id, __pyx_v_buf, __pyx_v_nalloc); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 3303, __pyx_L1_error) ~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Spublic.h:373:15: note: 'H5Sencode2' declared here H5_DLL herr_t H5Sencode2(hid_t obj_id, void *buf, size_t *nalloc, hid_t fapl); ^ 2 warnings and 6 errors generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. WARNING: No metadata found in /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages Rolling back uninstall of h5py Moving to /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/h5py-3.6.0.dist-info/ from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/~5py-3.6.0.dist-info Moving to /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/h5py/ from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/~5py error: legacy-install-failure

    × Encountered error while trying to install package. ╰─> h5py

    note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

    My python version: 3.8.13

    By the way, I have tried to build from source, but this problem of h5py still exists

    opened by yy945635407 3
Releases(v0.4.1)
  • v0.4.1(Aug 6, 2021)

    Release version 0.4.1 of OpenL3.

    • Add librosa as an explicit dependency
    • Remove upper limit pinning for scikit-image dependency
    • Fix version number typo in README
    • Update TensorFlow information in README
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Aug 6, 2021)

    Release version 0.4.0 of OpenL3.

    • Upgraded to tensorflow>=2.0.0. Tensorflow is now included as a dependency because of dual CPU-GPU support.
    • Upgraded to kapre>=0.3.5. Reverted magnitude scaling method to match kapre<=0.1.4 as that's what the model was trained on.
    • Removed Python 2/3.5 support as they are not supported by Tensorflow 2 (and added 3.7 & 3.8)
    • Add librosa frontend, and allow frontend to be configurable between kapre and librosa
      • Added frontend='kapre' parameter to get_audio_embedding, process_audio_file, and load_audio_embedding_model
      • Added audio_frontend='kapre' parameter to process_video_file and the CLI
      • Added frontend='librosa' flag to load_audio_embedding_model for use with a librosa or other external frontend
      • Added a openl3.preprocess_audio function that computes the input features needed for each frontend
    • Model .h5 no longer have Kapre layers in them and are all importable from tf.keras
    • Made skimage and moviepy.video.io.VideoFileClip import VideoFileClip use lazy imports
    • Added new regression data for both Kapre 0.3.5 and Librosa
    • Parameterized some of the tests to reduce duplication
    • Added developer helpers for regression data, weight packaging, and .h5 file manipulation
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0rc2(May 30, 2021)

  • v0.4.0rc1(May 30, 2021)

  • v0.4.0rc0(May 30, 2021)

  • v0.3.1(Feb 28, 2020)

    Release version 0.3.0 of OpenL3.

    • Require keras>=2.0.9,<2.3.0 in dependencies to avoid force installation of TF 2.x during pip installation.
    • Update README and installation docs to explicitly state that we do not yet support TF 2.x and to offer a working dependency combination.
    • Require kapre==0.1.4 in dependencies to avoid installing tensorflow>=1.14 which break regression tests.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1rc0(Feb 28, 2020)

    Release candidate 0 of version 0.3.1.

    • Require keras>=2.0.9,<2.3.0 in dependencies to avoid force installation of TF 2.x during pip installation.
    • Update README and installation docs to explicitly state that we do not yet support TF 2.x and to offer a working dependency combination.
    • Require kapre==0.1.4 in dependencies to avoid installing tensorflow>=1.14 which break regression tests.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Jan 23, 2020)

    Release version 0.3.0 of OpenL3.

    • Rename audio related embedding functions to indicate that they are specific to audio.
    • Add image embedding functionality to API and CLI.
    • Add video processing functionality to API and CLI.
    • Add batch processing functionality to API and CLI to more efficiently process multiple inputs.
    • Update documentation with new functionality.
    • Address build issues with updated dependencies.
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0rc0(Jan 23, 2020)

    Release candidate 0 of version 0.3.0.

    • Rename audio related embedding functions to indicate that they are specific to audio.
    • Add image embedding functionality to API and CLI.
    • Add video processing functionality to API and CLI.
    • Add batch processing functionality to API and CLI to more efficiently process multiple inputs.
    • Update documentation with new functionality.
    • Address build issues with updated dependencies.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Apr 18, 2019)

    Release version 0.2.0 of OpenL3.

    • Update embedding models with ones that have been trained with the kapre bug fixed.
    • Allow loaded models to be passed in and used in process_file and get_embedding.
    • Rename get_embedding_model to load_embedding_model.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0rc0(Apr 13, 2019)

    Release candidate 0 of version 0.2.0

    • Update embedding models with ones that have been trained with the kapre bug fixed.
    • Allow loaded models to be passed in and used in process_file and get_embedding.
    • Rename get_embedding_model to load_embedding_model.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Mar 7, 2019)

    Release of v0.1.1 of OpenL3.

    Update kapre to fix issue with dynamic range normalization for decibel computation when computing spectrograms.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1rc1(Mar 6, 2019)

  • v0.1.1rc0(Feb 21, 2019)

    Release candidate 0 of version 0.1.1

    Update kapre to fix issue with dynamic range normalization for decibel computation when computing spectrograms.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Nov 22, 2018)

  • v0.1.0rc6(Nov 20, 2018)

  • v0.1.0rc5(Nov 20, 2018)

  • v0.1.0rc4(Nov 20, 2018)

    Release candidate 4 of version 0.1.0

    This release also updates the PyPI keywords, and moves the model files directly into the module directory (instead of creating a subdirectory) to make the pip installation process easier when installing with PyPI.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.0rc3(Nov 20, 2018)

  • v0.1.0rc2(Nov 20, 2018)

  • v0.1.0rc1(Nov 20, 2018)

  • v0.1.0rc0(Nov 20, 2018)

Owner
Music and Audio Research Laboratory - NYU
Music and Audio Research Laboratory - NYU
Simultaneous NMT/MMT framework in PyTorch

This repository includes the codes, the experiment configurations and the scripts to prepare/download data for the Simultaneous Machine Translation wi

<a href=[email protected]"> 37 Sep 29, 2022
Code of the paper "Multi-Task Meta-Learning Modification with Stochastic Approximation".

Multi-Task Meta-Learning Modification with Stochastic Approximation This repository contains the code for the paper "Multi-Task Meta-Learning Modifica

Andrew 3 Jan 05, 2022
PyTorch Implementation of AnimeGANv2

PyTorch implementation of AnimeGANv2

4k Jan 07, 2023
Official PyTorch Implementation for InfoSwap: Information Bottleneck Disentanglement for Identity Swapping

InfoSwap: Information Bottleneck Disentanglement for Identity Swapping Code usage Please check out the user manual page. Paper Gege Gao, Huaibo Huang,

Grace Hešeri 56 Dec 20, 2022
Ensembling Off-the-shelf Models for GAN Training

Data-Efficient GANs with DiffAugment project | paper | datasets | video | slides Generated using only 100 images of Obama, grumpy cats, pandas, the Br

MIT HAN Lab 1.2k Dec 26, 2022
My solution for the 7th place / 245 in the Umoja Hack 2022 challenge

Umoja Hack 2022 : Insurance Claim Challenge My solution for the 7th place / 245 in the Umoja Hack 2022 challenge Umoja Hack Africa is a yearly hackath

Souames Annis 17 Jun 03, 2022
This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

Spatiotemporal Machine Learning 45 Jul 22, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Dec 30, 2022
🎁 3,000,000+ Unsplash images made available for research and machine learning

The Unsplash Dataset The Unsplash Dataset is made up of over 250,000+ contributing global photographers and data sourced from hundreds of millions of

Unsplash 2k Jan 03, 2023
This is the official implementation of "One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval".

CORA This is the official implementation of the following paper: Akari Asai, Xinyan Yu, Jungo Kasai and Hannaneh Hajishirzi. One Question Answering Mo

Akari Asai 59 Dec 28, 2022
A particular navigation route using satellite feed and can help in toll operations & traffic managemen

How about adding some info that can quanitfy the stress on a particular navigation route using satellite feed and can help in toll operations & traffic management The current analysis is on the satel

Ashish Pandey 1 Feb 14, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 03, 2023
A collection of resources on GAN Inversion.

This repo is a collection of resources on GAN inversion, as a supplement for our survey

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Simultaneous Demand Prediction and Planning

Simultaneous Demand Prediction and Planning Dependencies Python packages: Pytorch, scikit-learn, Pandas, Numpy, PyYAML Data POI: data/poi Road network

Yizong Wang 1 Sep 01, 2022
StorSeismic: An approach to pre-train a neural network to store seismic data features

StorSeismic: An approach to pre-train a neural network to store seismic data features This repository contains codes and resources to reproduce experi

Seismic Wave Analysis Group 11 Dec 05, 2022
Transformer based SAR image despeckling

Transformer based SAR image despeckling Using the code: The code is stable while using Python 3.6.13, CUDA =10.1 Clone this repository: git clone htt

27 Nov 13, 2022
Python PID Tuner - Makes a model of the System from a Process Reaction Curve and calculates PID Gains

PythonPID_Tuner_SOPDT Step 1: Takes a Process Reaction Curve in csv format - assumes data at 100ms interval (column names CV and PV) Step 2: Makes a r

1 Jan 18, 2022
Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection Acknowledgement We implement our model, BtcDet, based on [OpenPcdet 0.3.0]. Insta

Qiangeng Xu 163 Dec 19, 2022
A state-of-the-art semi-supervised method for image recognition

Mean teachers are better role models Paper ---- NIPS 2017 poster ---- NIPS 2017 spotlight slides ---- Blog post By Antti Tarvainen, Harri Valpola (The

Curious AI 1.4k Jan 06, 2023