OpenL3: Open-source deep audio and image embeddings

Last update: Jan 02, 2023

Overview

OpenL3

OpenL3 is an open-source Python library for computing deep audio and image embeddings.

Please refer to the documentation for detailed instructions and examples.

UPDATE: Openl3 now has Tensorflow 2 support!

The audio and image embedding models provided here are published as part of [1], and are based on the Look, Listen and Learn approach [2]. For details about the embedding models and how they were trained, please see:

Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

Installing OpenL3

Dependencies

libsndfile

OpenL3 depends on the pysoundfile module to load audio files, which depends on the non-Python library libsndfile. On Windows and macOS, these will be installed via pip and you can therefore skip this step. However, on Linux this must be installed manually via your platform's package manager. For Debian-based distributions (such as Ubuntu), this can be done by simply running

apt-get install libsndfile1

Alternatively, if you are using conda, you can install libsndfile simply by running

conda install -c conda-forge libsndfile

For more detailed information, please consult the pysoundfile installation documentation.

Tensorflow

Starting with openl3>=0.4.0, Openl3 has been upgraded to use Tensorflow 2. Because Tensorflow 2 and higher now includes GPU support, tensorflow>=2.0.0 is included as a dependency and no longer needs to be installed separately.

If you are interested in using Tensorflow 1.x, please install using pip install 'openl3<=0.3.1'.

Tensorflow 1x & OpenL3 <= v0.3.1

Because Tensorflow 1.x comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

On most platforms, either of the following commands should properly install Tensorflow:

pip install "tensorflow<1.14" # CPU-only version
pip install "tensorflow-gpu<1.14" # GPU version

For more detailed information, please consult the Tensorflow installation documentation.

Installing OpenL3

The simplest way to install OpenL3 is by using pip, which will also install the additional required dependencies if needed. To install OpenL3 using pip, simply run

pip install openl3

To install the latest version of OpenL3 from source:

Clone or pull the latest version, only retrieving the main branch to avoid downloading the branch where we store the model weight files (these will be properly downloaded during installation).
```
 git clone [email protected]:marl/openl3.git --branch main --single-branch
```
Install using pip to handle python dependencies. The installation also downloads model files, which requires a stable network connection.
```
 cd openl3
 pip install -e .
```

Using OpenL3

To help you get started with OpenL3 please see the tutorial.

Acknowledging OpenL3

Please cite the following papers when using OpenL3 in your work:

[1] Look, Listen and Learn More: Design Choices for Deep Audio Embeddings
Jason Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello.
IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856, Brighton, UK, May 2019.

[2] Look, Listen and Learn
Relja Arandjelović and Andrew Zisserman
IEEE International Conference on Computer Vision (ICCV), Venice, Italy, Oct. 2017.

Model Weights License

The model weights are made available under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Comments

Implement image embedding API

Add the image embedding API to the library. This should be fairly similar to the existing audio API. I'll add a candidate interface once I've given it more thought.
enhancement

opened by auroracramer 26
Refactor code and models to support TF 2.x and tf.keras
At some point in the somewhat near future, we should establish support for TF 2.x and tf.keras. The main reasons for this are:

To remain compatible with new releases of TF and Keras (the official version of which is now tf.keras) and make use of bug fixes, etc. some regression issues. As we have found (#42, #43), installing with newer versions of either package break installation and usage.

To address multiple vulnerabilities contained in tensorflow < 1.15.2.

To simplify the installation process; since TF 2.x now includes support for both CPU and GPU, we can now directly include tensorflow in the project dependencies, (as brought up in #39).

A priori, it seems like the main things to do are:

Updating the dependencies in setup.py to include tensorflow

Modifying the model definitions to be tf.keras compatible

Porting the model files to a format that can be loaded by tf.keras with TF 2.x

The main concern that comes to mind is the regression tests. We have already seen that tensorflow > 1.13 causes regression tests to fail. I imagine that this will only worsen as we introduce not only a new major release to TF, but also a divergence in Keras with tf.keras. @justinsalamon, what are your thoughts?
opened by auroracramer 16
Add batch processing mode

Something else to consider is a batch processing mode. i.e. making more efficient use of the GPU by predicting multiple files at once.

Probably the least messy option would be to separate some of the interior code of get_audio_embedding for the case of audio into their own functions and make a get_audio_embedding_batch function that calls most of the same functions. We would also have a process_audio_file_batch function.

I thought about changing get_audio_embedding so that it can either take in a single audio array, or a list of audio arrays (and probably a list of corresponding sample rates). While this might consolidate multiple usecases into one function, it'd probably get pretty messy so it's probably best we don't do this.

Regarding the visual frame embedding extraction, we could ask the same question, though there might be more nuance depending on if we allow for individual images to be processed or not (I think we should). In the case of videos though, multiple frames are already being provided at once. So it raises a question (to me at least) whether we allow for get_vframe_embedding (as I'm currently calling it) should support both a single frame as well as multiple. This also raises the question of whether we allow for frames of multiple sizes or not.

Thoughts?

opened by auroracramer 10
tensorflow 2.1 doesn't require separate pip installes for gpu and cpu

Thanks for this great package! We love to use it!

You state

Because Tensorflow comes in CPU-only and GPU variants, we leave it up to the user to install the version that best fits their usecase.

This is not the case anymore in 2.1 so you could (if 2.1 is supported) make tensorflow part of the standard requirements.

opened by faroit 8
skimage submodules not imported correctly, regression tests fail

skimage uses lazy imports, so we need to import each submodule explictly (e.g. import skimage.transform; skimage.transform.rescale(X, s) instead of import skimage; skimage.transform.rescale(X, s)).

opened by auroracramer 7
Output file format and naming convention
I have some questions about how to deal with embedding outputs:

Should we include the timestamps? If so do we save it in the same file?

What format should we use?

h5: Nice compression options, but since these typically shouldn't be large, it might be more annoying to deal with than other options

npy/npz: Standard approach, can easily load numpy arrays directly

JAMS: Using JAMs would help expand its use and would have a natural way to associate the timestamps with each embedding, but storing all of the values as text might be cumbersome and make the files big, especially if they are long

Should we use the embedding type to name the embedding? e.g. example_audio_openl3_6144emb_linear_music.<ext> Or should we just keep it simple?

It might be good if the user is comparing different embeddings, but it might be cumbersome if people just want to use a single type of embeddings. Of course we could add an option for this, but adding another option for something like this might be excessive.
opened by auroracramer 6
Fix API documentation and build

Fix image embedding size in load_image_embedding_model() docstring, mock missing tensorflow.keras modules in doc/conf.py to fix API documentation build, and remove pin on sphinx version. Addresses #60 and #71.

opened by auroracramer 4
Openl3 0.4.0 - Support for Tensorflow 2
Figured I would push this out while I'm waiting for something else to build.

Related PR containing updated models: https://github.com/marl/openl3/pull/61

Setup Changes

Openl3 now requires tensorflow>=2.0.0 and installs it by default (there is no longer a separate GPU package)

Now requires kapre>=0.3.5 - TODO: make sure we have the exact minimum kapre version - I remember checking git blame, but haven't tested anything

keras as a standalone package was removed from dependencies (we're using tf.keras)

travis.yml: removed python 2.7 & 3.5 and added 3.7 & 3.8 since tensorflow only supports 3.6-3.8

needed to install Cython first for python 3.8 in order to install skimage (RuntimeError: Cython >= 0.23.4 is required to build scikit-image from git checkout)

Doc Changes

Changed tensorflow dependency message to reflect updates

Added "Choosing an Audio Frontend (CPU / GPU)" section to tutorial.rst

Code Changes

core.py

added params: get_audio_embedding(frontend='auto'), process_audio_file(frontend='auto'), process_video_file(audio_frontend='auto')

Added function preprocess_audio(y, sr, input_repr=None) that encapsulates the librosa frontend (as well as preprocessing for the kapre frontend)

for librosa, you pass the input_repr and for kapre inputs, you leave input_repr=None

cli.py - added cli flag (--audio-frontend)

models.py

added param load_audio_embedding_model(frontend='kapre')

using new kapre composite layer helpers get_stft_magnitude_layer

disabled latest mag2db code and patched in the legacy version (kapre_v0_1_4_magnitude_to_decibel)

kapre is now technically an optional dependency (will only try to import if we try to load a model with kapre frontend)

we still install it with setup.py, but if someone wanted to, they could install everything manually without kapre and openl3 should still work for the librosa frontend

Test Changes

we now have separate regression data for kapre/librosa

added tests for frontend model following the existing model tests

converted some tests to use pytest.mark.parameterize to avoid doubling the length of the tests for testing frontends

Dev Util Changes

added tests/generate_regression.py which generates new regression data

added tests/package_weights.py which takes the weights files in the openl3 package folder and gzips them for git push

added tests/migration/remove_layers.py which lets us strip out the spectrogram (or any other) layers

tests/migration/ has a few other analysis things/notebooks that were used early on in the frontend testing

Before merging:

double check dependency versions

are the pinned versions still valid? might need some help with this one

Change models download url in setup.py to main repo (currently it's pointing at my fork so I could test with travis)

should we integrate changes from https://github.com/marl/openl3/pull/55?

should we run the classifier comparison one more time right before merging as a safety check? idk
opened by beasteers 4
Add batch processing functionality

Adds batch processing functionality to all embedding computation functions and file processing functions, allowing for one or more inputs to be processed. When possible, multiple inputs are put in the same input batch to the network for inference.

opened by auroracramer 4
Add image embedding API

Adds image embedding API, including functions for processing both images and videos in addition to audio files. Additionally changes the CLI to account for different modalities of inputs (i.e. audio, image, or video).

opened by auroracramer 4
API reference in documentation missing
When going to https://openl3.readthedocs.io/en/latest/api.html I only see the headers

Core functionality Models functionality

with nothing under each header. Expected would be a list of classes and functions and the associated documentation. At least those APIs that are mentioned in the tutorial.
opened by jonnor 4
Clarification on input representation

I was just reading through the source code on openl3 > core.py and noticed something in functions (1. _librosa_linear_frontend and 2. _librosa_mel_frontend). It seems librosa.power_to_db() is being used on a magnitude, not power spectrum? Instead should it be using librosa.amplitude_to_db()?

opened by alisonbma 0
Example of fine-tuning the audio sub-network.

I want to perform the fine-tuning of the audio subnetwork to fit my audio classification problem. To this aim, I plan to use the _construct_linear_audio_network, _construct_mel128_audio_network, and _construct_mel256_audio_network functions to load the pre-trained Keras model and then append one or more fully-connected layers to perform the classification.

However, I don't understand the Input shape of such models. According to the models.py, the input shape is input_shape = (1, asr * audio_window_dur), where asr= 48000 and audio_window_dur=1; what's asr and why it has that value? Can you please provide an example of using the Keras model from the .wav file?

I really appreciate any help you can provide.

opened by mattiacampana 0
Extract activation from lower audio layers

Hi, I was wondering how I can extract activations from the lower audio layers. I guess "embeddings" are the same as "MaxPool_3"? and if that's correct, then "MaxPool", "MaxPool_1", and "MaxPool_2" corresponds to the first, second, and third max-pooling layers in the Audio ConvNet as explained in Arandjelovic and Zisserman 2018 (https://arxiv.org/abs/1712.06651)?

opened by seunggookim 1
m1 macos installation problem

Hi, I am using an m1 macbook, and when I try to install openl3, I encounter the problem when the script tries to install h5py, but I have installed h5py in my virtual environment. The problem is as below: building 'h5py.defs' extension creating build/temp.macosx-11.0-arm64-3.8 creating build/temp.macosx-11.0-arm64-3.8/private creating build/temp.macosx-11.0-arm64-3.8/private/var creating build/temp.macosx-11.0-arm64-3.8/private/var/folders creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308 creating build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include -arch arm64 -fPIC -O2 -isystem /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include -arch arm64 -DH5_USE_16_API -I./h5py -I/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/lzf -I/opt/local/include -I/usr/local/include -I/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include -I/opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/python3.8 -c /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c -o build/temp.macosx-11.0-arm64-3.8/private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.o In file included from /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:731: In file included from ./h5py/api_compat.h:26: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969: /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16556:56: error: too few arguments to function call, expected 3, have 2 __pyx_t_1 = H5Oget_info(__pyx_v_loc_id, __pyx_v_oinfo); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1509, __pyx_L1_error) ~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:497:15: note: 'H5Oget_info3' declared here H5_DLL herr_t H5Oget_info3(hid_t loc_id, H5O_info2_t *oinfo, unsigned fields); ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16671:95: error: too few arguments to function call, expected 5, have 4 __pyx_t_1 = H5Oget_info_by_name(__pyx_v_loc_id, __pyx_v_name, __pyx_v_oinfo, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1520, __pyx_L1_error) ~~~~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:557:15: note: 'H5Oget_info_by_name3' declared here H5_DLL herr_t H5Oget_info_by_name3(hid_t loc_id, const char *name, H5O_info2_t *oinfo, unsigned fields, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:16786:144: error: too few arguments to function call, expected 8, have 7 __pyx_t_1 = H5Oget_info_by_idx(__pyx_v_loc_id, __pyx_v_group_name, __pyx_v_idx_type, __pyx_v_order, __pyx_v_n, __pyx_v_oinfo, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1531, __pyx_L1_error) ~~~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:631:15: note: 'H5Oget_info_by_idx3' declared here H5_DLL herr_t H5Oget_info_by_idx3(hid_t loc_id, const char *group_name, H5_index_t idx_type, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:17821:100: error: too few arguments to function call, expected 6, have 5 __pyx_t_1 = H5Ovisit(__pyx_v_obj_id, __pyx_v_idx_type, __pyx_v_order, __pyx_v_op, __pyx_v_op_data); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1630, __pyx_L1_error) ~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:1326:15: note: 'H5Ovisit3' declared here H5_DLL herr_t H5Ovisit3(hid_t obj_id, H5_index_t idx_type, H5_iter_order_t order, H5O_iterate2_t op, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:17936:143: error: too few arguments to function call, expected 8, have 7 __pyx_t_1 = H5Ovisit_by_name(__pyx_v_loc_id, __pyx_v_obj_name, __pyx_v_idx_type, __pyx_v_order, __pyx_v_op, __pyx_v_op_data, __pyx_v_lapl_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1641, __pyx_L1_error) ~~~~~~~~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Opublic.h:1492:15: note: 'H5Ovisit_by_name3' declared here H5_DLL herr_t H5Ovisit_by_name3(hid_t loc_id, const char *obj_name, H5_index_t idx_type, ^ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:21846:13: warning: assigning to 'void *' from 'const void *' discards qualifiers [-Wincompatible-pointer-types-discards-qualifiers] __pyx_t_1 = H5Pget_driver_info(__pyx_v_plist_id); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 2016, __pyx_L1_error) ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /private/var/folders/rz/zx876pv95_39jqfx9hwszfq00000gn/T/pip-install-uj6hq2xz/h5py_235d3597fd094986b308a5244243c308/h5py/defs.c:34606:68: error: too few arguments to function call, expected 4, have 3 __pyx_t_1 = H5Sencode(__pyx_v_obj_id, __pyx_v_buf, __pyx_v_nalloc); if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 3303, __pyx_L1_error) ~~~~~~~~~ ^ /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/include/H5Spublic.h:373:15: note: 'H5Sencode2' declared here H5_DLL herr_t H5Sencode2(hid_t obj_id, void *buf, size_t *nalloc, hid_t fapl); ^ 2 warnings and 6 errors generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. WARNING: No metadata found in /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages Rolling back uninstall of h5py Moving to /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/h5py-3.6.0.dist-info/ from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/~5py-3.6.0.dist-info Moving to /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/h5py/ from /opt/homebrew/Caskroom/miniforge/base/envs/pytorch_env/lib/python3.8/site-packages/~5py error: legacy-install-failure

× Encountered error while trying to install package. ╰─> h5py

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

My python version: 3.8.13

By the way, I have tried to build from source, but this problem of h5py still exists

opened by yy945635407 3

Releases(v0.4.1)

v0.4.1(Aug 6, 2021)
Release version 0.4.1 of OpenL3.

Add librosa as an explicit dependency

Remove upper limit pinning for scikit-image dependency

Fix version number typo in README

Update TensorFlow information in README

Source code(tar.gz)
Source code(zip)
v0.4.0(Aug 6, 2021)
Release version 0.4.0 of OpenL3.

Upgraded to tensorflow>=2.0.0. Tensorflow is now included as a dependency because of dual CPU-GPU support.

Upgraded to kapre>=0.3.5. Reverted magnitude scaling method to match kapre<=0.1.4 as that's what the model was trained on.

Removed Python 2/3.5 support as they are not supported by Tensorflow 2 (and added 3.7 & 3.8)

Add librosa frontend, and allow frontend to be configurable between kapre and librosa

Added frontend='kapre' parameter to get_audio_embedding, process_audio_file, and load_audio_embedding_model

Added audio_frontend='kapre' parameter to process_video_file and the CLI

Added frontend='librosa' flag to load_audio_embedding_model for use with a librosa or other external frontend

Added a openl3.preprocess_audio function that computes the input features needed for each frontend

Model .h5 no longer have Kapre layers in them and are all importable from tf.keras

Made skimage and moviepy.video.io.VideoFileClip import VideoFileClip use lazy imports

Added new regression data for both Kapre 0.3.5 and Librosa

Parameterized some of the tests to reduce duplication

Added developer helpers for regression data, weight packaging, and .h5 file manipulation

Source code(tar.gz)
Source code(zip)
v0.4.0rc2(May 30, 2021)

v0.4.0rc2
Source code(tar.gz)
Source code(zip)
v0.4.0rc1(May 30, 2021)

RC1, fix soundfile install
Source code(tar.gz)
Source code(zip)
v0.4.0rc0(May 30, 2021)

RC0 for v0.4.0 which adds TF2 support and adds a librosa audio front end as a CPU alternative to the kapre front-end
Source code(tar.gz)
Source code(zip)
v0.3.1(Feb 28, 2020)
Release version 0.3.0 of OpenL3.

Require keras>=2.0.9,<2.3.0 in dependencies to avoid force installation of TF 2.x during pip installation.

Update README and installation docs to explicitly state that we do not yet support TF 2.x and to offer a working dependency combination.

Require kapre==0.1.4 in dependencies to avoid installing tensorflow>=1.14 which break regression tests.

Source code(tar.gz)
Source code(zip)
v0.3.1rc0(Feb 28, 2020)
Release candidate 0 of version 0.3.1.

Require keras>=2.0.9,<2.3.0 in dependencies to avoid force installation of TF 2.x during pip installation.

Update README and installation docs to explicitly state that we do not yet support TF 2.x and to offer a working dependency combination.

Require kapre==0.1.4 in dependencies to avoid installing tensorflow>=1.14 which break regression tests.

Source code(tar.gz)
Source code(zip)
v0.3.0(Jan 23, 2020)
Release version 0.3.0 of OpenL3.

Rename audio related embedding functions to indicate that they are specific to audio.

Add image embedding functionality to API and CLI.

Add video processing functionality to API and CLI.

Add batch processing functionality to API and CLI to more efficiently process multiple inputs.

Update documentation with new functionality.

Address build issues with updated dependencies.

Source code(tar.gz)
Source code(zip)
v0.3.0rc0(Jan 23, 2020)
Release candidate 0 of version 0.3.0.

Rename audio related embedding functions to indicate that they are specific to audio.

Add image embedding functionality to API and CLI.

Add video processing functionality to API and CLI.

Add batch processing functionality to API and CLI to more efficiently process multiple inputs.

Update documentation with new functionality.

Address build issues with updated dependencies.

Source code(tar.gz)
Source code(zip)
v0.2.0(Apr 18, 2019)
Release version 0.2.0 of OpenL3.

Update embedding models with ones that have been trained with the kapre bug fixed.

Allow loaded models to be passed in and used in process_file and get_embedding.

Rename get_embedding_model to load_embedding_model.

Source code(tar.gz)
Source code(zip)
v0.2.0rc0(Apr 13, 2019)
Release candidate 0 of version 0.2.0

Update embedding models with ones that have been trained with the kapre bug fixed.

Allow loaded models to be passed in and used in process_file and get_embedding.

Rename get_embedding_model to load_embedding_model.

Source code(tar.gz)
Source code(zip)
v0.1.1(Mar 7, 2019)

Release of v0.1.1 of OpenL3.

Update kapre to fix issue with dynamic range normalization for decibel computation when computing spectrograms.
Source code(tar.gz)
Source code(zip)
v0.1.1rc1(Mar 6, 2019)

Release candidate 1 of version 0.1.1
Source code(tar.gz)
Source code(zip)
v0.1.1rc0(Feb 21, 2019)

Release candidate 0 of version 0.1.1

Update kapre to fix issue with dynamic range normalization for decibel computation when computing spectrograms.
Source code(tar.gz)
Source code(zip)
v0.1.0(Nov 22, 2018)

First release of OpenL3.

Implements audio embedding models, basic extraction and file processing API, and CLI.
Source code(tar.gz)
Source code(zip)
v0.1.0rc6(Nov 20, 2018)

Release candidate 6 of version 0.1.0

Update docs with references and make spelling of "OpenL3" consistent.
Source code(tar.gz)
Source code(zip)
v0.1.0rc5(Nov 20, 2018)

Release candidate 5 of version 0.1.0
Source code(tar.gz)
Source code(zip)
v0.1.0rc4(Nov 20, 2018)

Release candidate 4 of version 0.1.0

This release also updates the PyPI keywords, and moves the model files directly into the module directory (instead of creating a subdirectory) to make the pip installation process easier when installing with PyPI.
Source code(tar.gz)
Source code(zip)
v0.1.0rc3(Nov 20, 2018)

Release candidate 3 of version 0.1.0
Source code(tar.gz)
Source code(zip)
v0.1.0rc2(Nov 20, 2018)

Release candidate 2 of version 0.1.0
Source code(tar.gz)
Source code(zip)
v0.1.0rc1(Nov 20, 2018)

Release candidate 1 of version 0.1.0
Source code(tar.gz)
Source code(zip)
v0.1.0rc0(Nov 20, 2018)

Release candidate 0 of version 0.1.0
Source code(tar.gz)
Source code(zip)

Owner

Music and Audio Research Laboratory - NYU

GitHub Repository

This is a library for training and applying sparse fine-tunings with torch and transformers.

This is a library for training and applying sparse fine-tunings with torch and transformers. Please refer to our paper Composable Sparse Fine-Tuning f

37 Dec 30, 2022

Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

D-HAN The source code of D-HAN This is the source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network. However, only the co

30 Sep 22, 2022

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement In this project, we proposed a Domain Disentanglement Faster-RCNN (DDF)

19 Nov 24, 2022

Relative Uncertainty Learning for Facial Expression Recognition

Relative Uncertainty Learning for Facial Expression Recognition The official implementation of the following paper at NeurIPS2021: Title: Relative Unc

35 Dec 28, 2022

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

19 Oct 14, 2022

CL-Gym: Full-Featured PyTorch Library for Continual Learning

CL-Gym: Full-Featured PyTorch Library for Continual Learning CL-Gym is a small yet very flexible library for continual learning research and developme

36 Dec 25, 2022

Vision-Language Pre-training for Image Captioning and Question Answering

VLP This repo hosts the source code for our AAAI2020 work Vision-Language Pre-training (VLP). We have released the pre-trained model on Conceptual Cap

373 Jan 03, 2023

A library for building and serving multi-node distributed faiss indices.

About Distributed faiss index service. A lightweight library that lets you work with FAISS indexes which don't fit into a single server memory. It fol

170 Dec 30, 2022

Differentiable Optimizers with Perturbations in Pytorch

Differentiable Optimizers with Perturbations in PyTorch This contains a PyTorch implementation of Differentiable Optimizers with Perturbations in Tens

54 Jun 22, 2022

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

AutoML for Image Semantic Segmentation Currently this repo contains the only working open-source implementation of Auto-Deeplab which, by the way out-

299 Dec 17, 2022

Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Course: Introduction to Applied Algorithms with Python (in French) This is the source code of the website for the Applied Algorithms with Python cours

0 Jan 27, 2022

Adversarial Reweighting for Partial Domain Adaptation

Adversarial Reweighting for Partial Domain Adaptation Code for paper "Xiang Gu, Xi Yu, Yan Yang, Jian Sun, Zongben Xu, Adversarial Reweighting for Par

12 Dec 01, 2022

Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

BigGAN Audio Visualizer Description This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generat

2 Nov 21, 2022

Code release for NeRF (Neural Radiance Fields)

NeRF: Neural Radiance Fields Project Page | Video | Paper | Data Tensorflow implementation of optimizing a neural representation for a single scene an

6.5k Jan 01, 2023

Implementation of the federated dual coordinate descent (FedDCD) method.

FedDCD.jl Implementation of the federated dual coordinate descent (FedDCD) method. Installation To install, just call Pkg.add("https://github.com/Zhen

6 Sep 21, 2022

traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

traiNNer traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation to

202 Jan 04, 2023

OpenL3: Open-source deep audio and image embeddings

Related tags

Overview

OpenL3

Installing OpenL3

Dependencies

libsndfile

Tensorflow

Tensorflow 1x & OpenL3 <= v0.3.1

Installing OpenL3

Using OpenL3

Acknowledging OpenL3

Model Weights License

Comments

Releases(v0.4.1)

v0.4.1(Aug 6, 2021)

v0.4.0(Aug 6, 2021)

v0.4.0rc2(May 30, 2021)

v0.4.0rc1(May 30, 2021)

v0.4.0rc0(May 30, 2021)

v0.3.1(Feb 28, 2020)

v0.3.1rc0(Feb 28, 2020)

v0.3.0(Jan 23, 2020)

v0.3.0rc0(Jan 23, 2020)

v0.2.0(Apr 18, 2019)

v0.2.0rc0(Apr 13, 2019)

v0.1.1(Mar 7, 2019)

v0.1.1rc1(Mar 6, 2019)

v0.1.1rc0(Feb 21, 2019)

v0.1.0(Nov 22, 2018)

v0.1.0rc6(Nov 20, 2018)

v0.1.0rc5(Nov 20, 2018)

v0.1.0rc4(Nov 20, 2018)

v0.1.0rc3(Nov 20, 2018)

v0.1.0rc2(Nov 20, 2018)

v0.1.0rc1(Nov 20, 2018)

v0.1.0rc0(Nov 20, 2018)

Owner

Music and Audio Research Laboratory - NYU

This is a library for training and applying sparse fine-tunings with torch and transformers.

Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

Decompose to Adapt: Cross-domain Object Detection via Feature Disentanglement

Relative Uncertainty Learning for Facial Expression Recognition

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

CL-Gym: Full-Featured PyTorch Library for Continual Learning

Vision-Language Pre-training for Image Captioning and Question Answering

A library for building and serving multi-node distributed faiss indices.

Differentiable Optimizers with Perturbations in Pytorch

AutoDeeplab / auto-deeplab / AutoML for semantic segmentation, implemented in Pytorch

Cours d'Algorithmique Appliquée avec Python pour BTS SIO SISR

Adversarial Reweighting for Partial Domain Adaptation

Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

Code release for NeRF (Neural Radiance Fields)

Implementation of the federated dual coordinate descent (FedDCD) method.

traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

Deep universal probabilistic programming with Python and PyTorch

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

Lenia - Mathematical Life Forms

RGB-D Local Implicit Function for Depth Completion of Transparent Objects