Python I/O for STEM audio files

Overview

stempeg = stems + ffmpeg

Build Status Latest Version Supported Python versions

Python package to read and write STEM audio files. Technically, stems are audio containers that combine multiple audio streams and metadata in a single audio file. This makes it ideal to playback multitrack audio, where users can select the audio sub-stream during playback (e.g. supported by VLC).

Under the hood, stempeg uses ffmpeg for reading and writing multistream audio, optionally MP4Box is used to create STEM files that are compatible with Native Instruments hardware and software.

Features

  • robust and fast interface for ffmpeg to read and write any supported format from/to numpy.
  • reading supports seeking and duration.
  • control container and codec as well as bitrate when compressed audio is written.
  • store multi-track audio within audio formats by aggregate streams into channels (concatenation of pairs of stereo channels).
  • support for internal ffmpeg resampling furing read and write.
  • create mp4 stems compatible to Native Instruments traktor.
  • using multiprocessing to speed up reading substreams and write multiple files.

Installation

1. Installation of ffmpeg Library

stempeg relies on ffmpeg (>= 3.2 is suggested).

The Installation if ffmpeg differ among operating systems. If you use anaconda you can install ffmpeg on Windows/Mac/Linux using the following command:

conda install -c conda-forge ffmpeg

Note that for better quality encoding it is recommended to install ffmpeg with libfdk-aac codec support as following:

  • MacOS: use homebrew: brew install ffmpeg --with-fdk-aac
  • Ubuntu/Debian Linux: See installation script here.
  • Docker: docker pull jrottenberg/ffmpeg

1a. (optional) Installation of MP4Box

If you plan to write stem files with full compatibility with Native Instruments Traktor DJ hardware and software, you need to install MP4Box.

  • MacOS: use homebrew: brew install gpac
  • Ubuntu/Debian Linux: apt-get install gpac

Further installation instructions for all operating systems can be found here.

2. Installation of the stempeg package

A) Installation via PyPI using pip

pip install stempeg

B) Installation via conda

conda install -c conda-forge stempeg

Usage

stempeg_scheme

Reading audio

Stempeg can read multi-stream and single stream audio files, thus, it can replace your normal audio loaders for 1d or 2d (mono/stereo) arrays.

By default read_stems, assumes that multiple substreams can exit (default reader=stempeg.StreamsReader()). To support multi-stream, even when the audio container doesn't support multiple streams (e.g. WAV), streams can be mapped to multiple pairs of channels. In that case, reader=stempeg.ChannelsReader(), can be passed. Also see: stempeg.ChannelsWriter.

import stempeg
S, rate = stempeg.read_stems(stempeg.example_stem_path())

S is a numpy tensor that includes the time domain signals scaled to [-1..1]. The shape is (stems, samples, channels). An detailed documentation of the read_stems can be viewed here. Note, a small stems excerpt from The Easton Ellises, licensed under Creative Commons CC BY-NC-SA 3.0 is included and can be accessed using stempeg.example_stem_path().

Reading individual streams

Individual substreams of the stem file can be read by passing the corresponding stem id (starting from 0):

S, rate = stempeg.read_stems(stempeg.example_stem_path(), stem_id=[0, 1])

Read excerpts (set seek position)

Excerpts from the stem instead of the full file can be read by providing start (start) and duration (duration) in seconds to read_stems:

S, _ = stempeg.read_stems(stempeg.example_stem_path(), start=1, duration=1.5)
# read from second 1.0 to second 2.5

Writing audio

As seen in the flow chart above, stempeg supports multiple ways to write multi-track audio.

Write multi-channel audio

stempeg.write_audio can be used for single-stream, multi-channel audio files. Stempeg wraps a number of ffmpeg parameter to resample the output sample rate and adjust the audio codec, if necessary.

stempeg.write_audio(path="out.mp4", data=S, sample_rate=44100.0, output_sample_rate=48000.0, codec='aac', bitrate=256000)

Writing multi-stream audio

Writing stem files from a numpy tensor can done with.

stempeg.write_stems(path="output.stem.mp4", data=S, sample_rate=44100, writer=stempeg.StreamsWriter())

As seen in the flow chart above, stempeg supports multiple ways to write multi-stream audio. Each of the method has different number of parameters. To select a method one of the following setting and be passed:

  • stempeg.FilesWriter Stems will be saved into multiple files. For the naming, basename(path) is ignored and just the parent of path and its extension is used.
  • stempeg.ChannelsWriter Stems will be saved as multiple channels.
  • stempeg.StreamsWriter (default). Stems will be saved into a single a multi-stream file.
  • stempeg.NIStemsWriter Stem will be saved into a single multistream audio. Additionally Native Instruments Stems compabible Metadata is added. This requires the installation of MP4Box.

⚠️ Warning: Muxing stems using ffmpeg leads to multi-stream files not compatible with Native Instrument Hardware or Software. Please use MP4Box if you use the stempeg.NISTemsWriter()

For more information on writing stems, see stempeg.write_stems. An example that documents the advanced features of the writer, see readwrite.py.

Use the command line tools

stempeg provides a convenient cli tool to convert a stem to multiple wavfiles. The -s switch sets the start, the -t switch sets the duration.

stem2wav The Easton Ellises - Falcon 69.stem.mp4 -s 1.0 -t 2.5

F.A.Q

How can I improve the reading performance?

read_stems is called repeatedly, it always does two system calls, one for getting the file info and one for the actual reading speed this up you could provide the Info object to read_stems if the number of streams, the number of channels and the sample rate is identical.

file_path = stempeg.example_stem_path()
info = stempeg.Info(file_path)
S, _ = stempeg.read_stems(file_path, info=info)

How can the quality of the encoded stems be increased

For Encoding it is recommended to use the Fraunhofer AAC encoder (libfdk_aac) which is not included in the default ffmpeg builds. Note that the conda version currently does not include fdk-aac. If libfdk_aac is not installed stempeg will use the default aac codec which will result in slightly inferior audio quality.

Comments
  • stempeg 2.0

    stempeg 2.0

    This addresses #27 and implements a new ffmpeg backend. I choose ffmpeg-python for reading and writing. Here the audio is piped directly to stdin instead of writing temporarly files with pysoundfile and converting them in a separate process call.

    Part of the code was copied from spleeters audio backend. First benchmarks of the input piping indicate that this method is twice as fast as my previous "tmpfile based method".

    Saving stems still requires to save temporarly files since the complex filter cannot be carried out using python-ffmpeg. This enabled a new API. Here the idea was to not come up with presets and do all the checks to cover all use cases but instead let users have to do this themselves. This means more errors for users, but its way easier to maintain. E.g. if a user wants to write multistream audio as .wav files, an error will be thrown, since this container does not support multiple streams. The user would instead have to use streams_as_multichannel.

    This PR furthermore introduces a significant number of new features:

    Audio Loading

    • Loading audio now uses the same API as in spleeters audio loading backend
    • A target samplerate can be specified to resample audio on-the-fly and return the resampled audio
    • An option stems_from_multichannel was added to load stems that are aggregated into multichannel audio (concatenation of pairs of stereo channels), see more info on audio writing
    • substream titles can be read from the Info object.

    Audio Writing

    • stems can now be saved as substreams, aggregated into channels or saved as multiple files.
    • titles for each substream can now be embedded into metadata
    • in addition to write_stems (which is a preset to achieve compatibility with NI stems), we also have write_streams (supports writing as multichannel or multiple files). And, in case, stempeg is used for just stereo files, write_audio can be used (Again this is API compatible to spleeter).

    The procedure for writing stream files may be quite complex as it varies depending of the specified output container format. Basically there are two possible stream saving options:

    1.) container supports multiple streams (mp4/m4a, opus, mka) 2.) container does not support multiple streams (wav, mp3, flac)

    For 1.) we provide two options:

    1a.) streams will be saved as substreams aka when streams_as_multichannel=False (default) 1b.) streams will be aggregated into channels and saved as multichannel file. Here the audio tensor of shape=(streams, samples, 2) will be converted to a single-stream multichannel audio (samples, streams*2). This option is activated using streams_as_multichannel=True 1c.) streams will be saved as multiple files when streams_as_files is active

    For 2.), when the container does not support multiple streams there are also two options:

    2a) streams_as_multichannel has to be set to True (See 1b) otherwise an error will be raised. Note that this only works for wav and flac). * file ending of path determines the container (but not the codec!). 2b) streams_as_files so that multiple files will be created when streams_as_files is active

    Example / Use Cases

    """Opens a stem file and saves (re-encodes) back to a stem file
    """
    import argparse
    import stempeg
    import subprocess as sp
    import numpy as np
    from os import path as op
    
    
    if __name__ == '__main__':
        parser = argparse.ArgumentParser()
        parser.add_argument(
            'input',
        )
        args = parser.parse_args()
    
        # load stems
        stems, rate = stempeg.read_stems(args.input)
    
        # load stems,
        # resample to 96000 Hz,
        # use multiprocessing
        stems, rate = stempeg.read_stems(
            args.input,
            sample_rate=96000,
            multiprocess=True
        )
    
        # --> stems now has `shape=(stem x samples x channels)``
    
        # save stems from tensor as multi-stream mp4
        stempeg.write_stems(
            "test.stem.m4a",
            stems,
            sample_rate=96000
        )
    
        # save stems as dict for convenience
        stems = {
            "mix": stems[0],
            "drums": stems[1],
            "bass": stems[2],
            "other": stems[3],
            "vocals": stems[4],
        }
        # keys will be automatically used
    
        # from dict as files
        stempeg.write_stems(
            "test.stem.m4a",
            data=stems,
            sample_rate=96000
        )
    
        # `write_stems` is a preset for the following settings
        # here the output signal is resampled to 44100 Hz and AAC codec is used
        stempeg.write_stems(
            "test.stem.m4a",
            stems,
            sample_rate=96000,
            writer=stempeg.StreamsWriter(
                codec="aac",
                output_sample_rate=44100,
                bitrate="256000",
                stem_names=['mix', 'drums', 'bass', 'other', 'vocals']
            )
        )
    
        # Native Instruments compatible stems
        stempeg.write_stems(
            "test_traktor.stem.m4a",
            stems,
            sample_rate=96000,
            writer=stempeg.NIStemsWriter(
                stems_metadata=[
                    {"color": "#009E73", "name": "Drums"},
                    {"color": "#D55E00", "name": "Bass"},
                    {"color": "#CC79A7", "name": "Other"},
                    {"color": "#56B4E9", "name": "Vocals"}
                ]
            )
        )
    
        # lets write as multistream opus (supports only 48000 khz)
        stempeg.write_stems(
            "test.stem.opus",
            stems,
            sample_rate=96000,
            writer=stempeg.StreamsWriter(
                output_sample_rate=48000,
                codec="opus"
            )
        )
    
        # writing to wav requires to convert streams to multichannel
        stempeg.write_stems(
            "test.wav",
            stems,
            sample_rate=96000,
            writer=stempeg.ChannelsWriter(
                output_sample_rate=48000
            )
        )
    
        # # stempeg also supports to load merged-multichannel streams using
        stems, rate = stempeg.read_stems(
            "test.wav",
            reader=stempeg.ChannelsReader(nb_channels=2)
        )
    
        # mp3 does not support multiple channels,
        # therefore we have to use `stempeg.FilesWriter`
        # outputs are named ["output/0.mp3", "output/1.mp3"]
        # for named files, provide a dict or use `stem_names`
        # also apply multiprocessing
        stempeg.write_stems(
            ("output", ".mp3"),
            stems,
            sample_rate=rate,
            writer=stempeg.FilesWriter(
                multiprocess=True,
                output_sample_rate=48000,
                stem_names=["mix", "drums", "bass", "other", "vocals"]
            )
        )
    
    enhancement 
    opened by faroit 28
  • Is this not working on windows?

    Is this not working on windows?

    import glob, os
    import stempeg
    import os.path
    
    train_path = "path_to_train/"
    os.chdir(train_path)
    for file in glob.glob("*.stem.mp4"):
        file_path = train_path + file
        print(os.path.isfile(file_path))
        S, rate = stempeg.read_stems(file_path)
    
    

    Even isfile returns true, read_stems throws 'FileNotFoundError: [WinError 2] '

    opened by westside 17
  •  Ffprobe command returns non-zero exit status 3221225478

    Ffprobe command returns non-zero exit status 3221225478

    I am running it on anaconda. It seems to work perfectly on colab. However on anaconda it fails.

    The behavior is weird as well. I ran the command on bash and it runs correctly.

    I have a loop which runs through all the stem files and it breaks after executing random iterations giving the error stated below. I believe this could be an multiprocessing issue. Could it be that that file is already being used by another process?

    File "", line 1, in runfile('C:/Users/w1572032/.spyder-py3/temp.py', wdir='C:/Users/w1572032/.spyder-py3')

    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 705, in runfile execfile(filename, namespace)

    File "C:\ProgramData\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

    File "C:/Users/w1572032/.spyder-py3/temp.py", line 28, in t = np.copy(track.targets['vocals'].audio.T)

    File "C:\ProgramData\Anaconda3\lib\site-packages\musdb\audio_classes.py", line 113, in audio audio = source.audio

    File "C:\ProgramData\Anaconda3\lib\site-packages\musdb\audio_classes.py", line 47, in audio filename=self.path, stem_id=self.stem_id

    File "C:\ProgramData\Anaconda3\lib\site-packages\stempeg\read.py", line 90, in read_stems FFinfo = FFMPEGInfo(filename)

    File "C:\ProgramData\Anaconda3\lib\site-packages\stempeg\read.py", line 19, in init self.json_info = read_info(self.filename)

    File "C:\ProgramData\Anaconda3\lib\site-packages\stempeg\read.py", line 54, in read_info out = sp.check_output(cmd)

    File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 336, in check_output **kwargs).stdout

    File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 418, in run output=stdout, stderr=stderr)

    CalledProcessError: Command '['ffprobe', 'C:\Users\w1572032\Desktop\musdb18\train\BigTroubles - Phantom.stem.mp4', '-v', 'error', '-print_format', 'json', '-show_format', '-show_streams']' returned non-zero exit status 3221225478.

    opened by vinspatel 9
  • OSX quicklook support

    OSX quicklook support

    🥳 osx seems to support stem files and has a UI to select the stem right from the quicklook window:

    image

    However, in seems that is uses some specific metadata to read the stem track name. Currently I don't know how to do that with ffmpeg, but it would be great to find out if there is way to support this.

    enhancement help wanted 
    opened by faroit 8
  • Stems write - Format not recognised

    Stems write - Format not recognised

    Hello,

    As you stated in the documentation the stems write doesn't always work well. I am using this command with ffmpeg to create a STEM file:

    ffmpeg -i ~/mix.wav -i ~drums.wav -i ~/vocals.wav -map 0 -map 1 -map 2 -c:a libfdk_aac -metadata:s:0 title=mix -metadata:s:1 title=drums -metadata:s:2 title=vocals ~/output.stem.mp4
    

    I then tried to read it back using the musdb library and it works well. I was wondering if this could be included in your library to finally make it work properly.

    I unfortunately do not have much time to work more on this and ask for a pull request but I made a simple implementation if could be of any help. Also check this homebrew-ffmpeg if the right codecs are not installed properly in the official ffmpeg distribution.

    opened by shoegazerstella 8
  • Freeze when loading mp4 muli-stem file

    Freeze when loading mp4 muli-stem file

    I am using the musdb package and convert the mp4 files containing multiple audio sources to wave files, as shown here:

    https://github.com/f90/Wave-U-Net/blob/master/Datasets.py#L132

    But randomly during conversion (so with potentially any file), conversion just freezes forever. After interrupting the process I can read the following error:

    Traceback (most recent call last): File "/opt/local/pycharm/helpers/pydev/pydevd.py", line 1668, in main() File "/opt/local/pycharm/helpers/pydev/pydevd.py", line 1662, in main globals = debugger.run(setup['file'], None, None, is_module) File "/opt/local/pycharm/helpers/pydev/pydevd.py", line 1072, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/mnt/daten/PycharmProjects/Wave-U-Net/Training.py", line 326, in @ex.automain File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/sacred/experiment.py", line 137, in automain self.run_commandline() File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/sacred/experiment.py", line 260, in run_commandline return self.run(cmd_name, config_updates, named_configs, {}, args) File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/sacred/experiment.py", line 209, in run run() File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/sacred/run.py", line 221, in call self.result = self.main_function(*args) File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/sacred/config/captured_function.py", line 46, in captured_function result = wrapped(*args, **kwargs) File "/mnt/daten/PycharmProjects/Wave-U-Net/Training.py", line 348, in dsd_100_experiment dsd_train, dsd_test = Datasets.getMUSDB(model_config["musdb_path"]) # List of (mix, acc, bass, drums, other, vocal) tuples File "/mnt/daten/PycharmProjects/Wave-U-Net/Datasets.py", line 149, in getMUSDB vocal_audio = track.targets["vocals"].audio File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/musdb/audio_classes.py", line 113, in audio audio = source.audio File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/musdb/audio_classes.py", line 47, in audio filename=self.path, stem_id=self.stem_id File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/stempeg/read.py", line 91, in read_stems FFinfo = FFMPEGInfo(filename) File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/stempeg/read.py", line 19, in init self.json_info = read_info(self.filename) File "/home/daniel/tf-env-waveunet/local/lib/python2.7/site-packages/stempeg/read.py", line 55, in read_info out = sp.check_output(cmd) File "/usr/lib/python2.7/subprocess.py", line 567, in check_output process = Popen(stdout=PIPE, *popenargs, **kwargs) File "/usr/lib/python2.7/subprocess.py", line 711, in init errread, errwrite) File "/usr/lib/python2.7/subprocess.py", line 1319, in _execute_child data = _eintr_retry_call(os.read, errpipe_read, 1048576) File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call return func(*args) KeyboardInterrupt

    Process finished with exit code 1

    It seems that the ffmpeg/ffprobe process that identifies the stems within the mp4 file never returns, or returns empty output, or sth of that sort, so that the stempeg library waits forever for a response at sp.check_output. It doesnt look like there is a timeout for waiting for the ffmpeg output either. Plus ffmpeg is called with -v error, maybe that is suppressing errors that we should react to?

    Any idea of how to fix this?

    opened by f90 7
  • fix dithering when exporting to float

    fix dithering when exporting to float

    In the last version of stempeg the method to load PCM through python-ffmpeg resulted in small differences to what I obtained with the previous version of stempeg.

    This PR reverts the casting procedure for int16 to float32 + normalization [-1, 1]. This is important since dithering errors did significantly influence separation scores in regression tests.

    With this PR, ffmpeg pipes int16 into the numpy buffer and is converted and normalized to float in numpy. This got the same results as I used to have before where I used temporarly wav files, converting to float32 using soundfile. Furthermore, when using int16 pipes, conversion is slightly faster.

    Also ping @romi1502 and @mmoussallam since this function originally derived from spleeters code. You might want to change it there as well.

    opened by faroit 6
  • A loading error in Win System.

    A loading error in Win System.

    my data has a name format like 'xxxx - xxxx.stem.m64'. but stempeg cannot reconginize the blank space before "-". So it will raise a error said there is no file.

    The dataset actually is MUSDB18-7 set

    I solve it by following codes which actually deletes the front blank space. But I hope there is a better way to solve it.

    index = track_name.index('-')
    track_name = track_name[:index-1] + track_name[index:]
    
    opened by igo312 6
  • Centralize detection of ffmpeg executables

    Centralize detection of ffmpeg executables

    In some distributions (e.g. NixOS) ffmpeg is not necessarily in PATH. Unifying detection of the ffmpeg and ffprobe executables makes it easier for packagers to patch the package to accomodate such situations.

    opened by bgamari 5
  • Tests failing with wrong shapes

    Tests failing with wrong shapes

    Hi author(s),

    I'm trying to run the tests included in this package, but the assert statements on the shapes of the stems are failing. The tests expect a shape of (5, 265216, 2) but the file has a shape of (5, 267264, 2).

    Is this a bug or have the files been updated without updating the tests?

    Thanks!

    opened by jaidevd 5
  • allow ffmpeg format to be optional

    allow ffmpeg format to be optional

    carefully reviews the proposals made by @romi1502 in #39 and reverts the fix. To allow regression tests to pass for dependencies such as musdb or museval, the old behaviour can be used with

    stempeg.read_stems(..., ffmpeg_format="s16le")

    opened by faroit 3
  • warnings.warning() does not exist

    warnings.warning() does not exist

    Bug Description: When using stempeg as part of musdb, I encountered the following error:

            stem_durations = np.array([t.shape[0] for t in stems])
            if not (stem_durations == stem_durations[0]).all():
    >           warnings.warning("Stems differ in length and were shortend")
    E           AttributeError: module 'warnings' has no attribute 'warning'
    
    /usr/local/lib/python3.9/site-packages/stempeg/read.py:299: AttributeError
    

    warning() does not exist after checking the warnings package.

    Suggested Solution: warnings.warning() -> warnings.warn() since warn() exists.

    opened by jeswan 1
  • 16 bit flac output conversion?

    16 bit flac output conversion?

    Is there a way to convert the 4 stem output files from the new Open-Unmix UMX using Stempeg to output 16 bit flac files instead of the 24 bit flac files I am currently getting using it?

    Thank, Rog

    enhancement help wanted 
    opened by Mixerrog 3
  • Support reading from file-like objects

    Support reading from file-like objects

    supporting file-like objects to read and decode in-memory data would be a useful enhancement. There may be problems, as suggested here, though: https://github.com/kkroening/ffmpeg-python/issues/292

    enhancement 
    opened by faroit 0
Releases(v0.2.3)
  • v0.2.3(Jan 30, 2021)

    Version 0.2 is a rewrite of stempeg that focusses on speed and performance but also adding a number of additional features. Furthermore, stempeg now can read and write stem files in three different ways to utilize best the different audio containers. For example, as pcm/wav doesn't support multiple audio streams, instead, stempeg can read and write into streams aggregated into multiple pairs of stereo channels.

    Audio Loading

    • Underlying reading backend is now based on python-ffmpeg.
    • With this new backend, the creation of any temporary files is reduced, thus audio is directly piped into numpy via stdio. This leads to loading time improvement of 20%-30%.
    • A target sample rate can be specified to resample audio on-the-fly using ffmpeg.
    • An optional stems_from_multichannel was added to load stems that are aggregated into multichannel audio (concatenation of pairs of stereo channels), see more info on audio writing.
    • substream titles metadata can be read from the Info object.
    • Loading audio now uses the same API as in spleeters audio loading backend.

    Audio Writing

    This new version stabilizes writing support adding writer methods to be passed to stempeg.write_stems() to save multi-stream audio. The choice of the writing method mainly depends on the audio container and codec. E.g. some containers supports multiple stems (mp4/m4a, opus, mka) where as others does do not (wav, mp3...).

    • stempeg.FilesWriter saves stems into multiple files. This writer can be boosted in performance using multiprocess=True. Which writes the stems in parallel.
    • stempeg.ChannelsWriter saves as multiple channels. Stems will be multiplexed into channels and saved as a single multichannel file. E.g. an audio tensor of shape=(stems, samples, 2) will be converted to a single-stem multichannel audio (samples, stems*2).
    • stempeg.StreamsWriter saves into a single a multi-stream file.
    • stempeg.NIStemsWriter saves into a single multistream audio. Finally one can create stems files that are fully compatible with Native Instruments stems. For this, MP4Box has to be installed. See more info here.

    Furthermore the following features were added:

    • Names for each substream can now be embedded into metadata.
    • stempeg can be used to just write normal audio files (mono and multichannel) using write_audio which also is fully API compatible to spleeters audio backend.

    For more information see the updated documentation Thanks to @mmoussallam, @romi1502, @Rhymen, @nlswrnr, and @axeldelafosse

    Source code(tar.gz)
    Source code(zip)
  • v0.1.8(Jul 9, 2019)

    The seeking issue (#21) was not fully fixed. This release should address the remaining issues when using the chunked loading using very small float numbers as start parameter

    Source code(tar.gz)
    Source code(zip)
  • v0.1.7(Jul 8, 2019)

    Fixes a bug (#18) that occurs when start or duration is using very small float numbers (1e-6) that are literally converted into strings maintaining the scientific notation.

    Also addresses #21 and add an additional check for ffmpeg and ffprobe before actually reading any files

    Source code(tar.gz)
    Source code(zip)
  • v0.1.6(Mar 13, 2019)

  • v0.1.5(Mar 13, 2019)

  • v0.1.4(Nov 10, 2018)

    There was a bug in the earlier versions of stempeg that didn't respect the set out_type in the stem reader. This was fixed and the output defaults to np.float64.

    Thanks to

    @hexafraction

    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Feb 18, 2018)

    Add some code and warnings to detect the ffmpeg version and warn users when a version older than 3.0 is used since that is adding additional silence to the output files when encoding.

    Also addressing #3

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 20, 2017)

    this release fixes #1 by checking the available ffmpeg encoders and picking aac ist libfdk_aac is not available.

    Also the ffmpeg error now are visible

    Source code(tar.gz)
    Source code(zip)
Owner
Fabian-Robert Stöter
Audio-ML researcher
Fabian-Robert Stöter
Minimal command-line music player written in Python

pyms Minimal command-line music player written in Python. Designed with elegance and minimalism. Resizes dynamically with your terminal. Dependencies

12 Sep 23, 2022
A voice control utility for Spotify

Spotify Voice Control A voice control utility for Spotify · Report Bug · Request

Shoubhit Dash 27 Jan 01, 2023
PianoPlayer - Automatic fingering generator for piano scores

PianoPlayer - Automatic fingering generator for piano scores

Marco Musy 571 Jan 02, 2023
Library for working with sound files of the format: .ogg, .mp3, .wav

Library for working with sound files of the format: .ogg, .mp3, .wav. By work is meant - playing sound files in a straight line and in the background, obtaining information about the sound file (auth

Romanin 2 Dec 15, 2022
Pythonic bindings for FFmpeg's libraries.

PyAV PyAV is a Pythonic binding for the FFmpeg libraries. We aim to provide all of the power and control of the underlying library, but manage the gri

PyAV 1.8k Jan 03, 2023
Pianote - An application that helps musicians practice piano ear training

Pianote Pianote is an application that helps musicians practice piano ear traini

3 Aug 17, 2022
Codes for "Efficient Long-Range Attention Network for Image Super-resolution"

ELAN Codes for "Efficient Long-Range Attention Network for Image Super-resolution", arxiv link. Dependencies & Installation Please refer to the follow

xindong zhang 124 Dec 22, 2022
A python package for calculating the PESQ.

PyPESQ (WIP) Pypesq is a python wrapper for the PESQ score calculation C routine. It only can be used in evaluation purpose. INSTALL pip install https

Jingdong Li 269 Dec 18, 2022
Deep learning transformer model that generates unique music sequences.

music-ai Deep learning transformer model that generates unique music sequences. Abstract In 2017, a new state-of-the-art was published for natural lan

xacer 6 Nov 19, 2022
Real-Time Spherical Microphone Renderer for binaural reproduction in Python

ReTiSAR Implementation of the Real-Time Spherical Microphone Renderer for binaural reproduction in Python [1][2]. Contents: | Requirements | Setup | Q

Division of Applied Acoustics at Chalmers University of Technology 51 Dec 17, 2022
Reading list for research topics in sound event detection

Sound event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding sound events present at the auditory scene.

Soham 64 Jan 05, 2023
A small project where I identify notes and key harmonies in a piece of music and use them further to recreate and generate the same piece of music through Python

A small project where I identify notes and key harmonies in a piece of music and use them further to recreate and generate the same piece of music through Python

5 Oct 07, 2022
Voice helper on russian

Voice helper on russian

KreO 1 Jun 30, 2022
Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm.

LPC_for_TTS Linear Prediction Coefficients estimation from mel-spectrogram implemented in Python based on Levinson-Durbin algorithm. 基于Levinson-Durbin

Zewang ZHANG 58 Nov 17, 2022
An audio-solving python funcaptcha solving module

funcapsolver funcapsolver is a funcaptcha audio-solving module, which allows captchas to be interacted with and solved with the use of google's speech

Acier 8 Nov 21, 2022
Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

Campbell Barton 844 Jan 07, 2023
This library provides common speech features for ASR including MFCCs and filterbank energies.

python_speech_features This library provides common speech features for ASR including MFCCs and filterbank energies. If you are not sure what MFCCs ar

James Lyons 2.2k Jan 04, 2023
A Youtube audio player for your terminal

AudioLine A lightweight Youtube audio player for your terminal Explore the docs » View Demo · Report Bug · Request Feature · Send a Pull Request About

Haseeb Khalid 26 Jan 04, 2023
Vixtify - Python Controlled Music Player

Strumm Sound Playlist : Click me to listen Welcome to GitHub Pages You can use the editor on GitHub to maintain and preview the content for your websi

Vicky Kumar 2 Feb 03, 2022
[Singing Log] Let your program learn to sing!

[Singing Log] Let your program learn to sing! You must have thought this was changelog when you saw the English title, but it's not, it's chànggēlog. What it does is allow your program to print logs

黄巍 22 Sep 03, 2022