Fast Python reader and editor for ASAM MDF / MF4 (Measurement Data Format) files

Overview

asammdf is a fast parser and editor for ASAM (Association for Standardization of Automation and Measuring Systems) MDF (Measurement Data Format) files.

asammdf supports MDF versions 2 (.dat), 3 (.mdf) and 4 (.mf4).

asammdf works on Python >= 3.7 (for Python 2.7, 3.4 and 3.5 see the 4.x.y releases)

Status

Continuous Integration Coveralls Codacy ReadTheDocs
continuous integration Coverage Status Codacy Badge Documentation Status
PyPI conda-forge
PyPI version conda-forge version

Project goals

The main goals for this library are:

  • to be faster than the other Python based mdf libraries
  • to have clean and easy to understand code base
  • to have minimal 3-rd party dependencies

Features

  • create new mdf files from scratch

  • append new channels

  • read unsorted MDF v3 and v4 files

  • read CAN and LIN bus logging files

  • extract CAN and LIN signals from anonymous bus logging measurements

  • filter a subset of channels from original mdf file

  • cut measurement to specified time interval

  • convert to different mdf version

  • export to HDF5, Matlab (v4, v5 and v7.3), CSV and parquet

  • merge multiple files sharing the same internal structure

  • read and save mdf version 4.10 files containing zipped data blocks

  • space optimizations for saved files (no duplicated blocks)

  • split large data blocks (configurable size) for mdf version 4

  • full support (read, append, save) for the following map types (multidimensional array channels):

    • mdf version 3 channels with CDBLOCK

    • mdf version 4 structure channel composition

    • mdf version 4 channel arrays with CNTemplate storage and one of the array types:

      • 0 - array
      • 1 - scaling axis
      • 2 - look-up
  • add and extract attachments for mdf version 4

  • handle large files (for example merging two fileas, each with 14000 channels and 5GB size, on a RaspberryPi)

  • extract channel data, master channel and extra channel information as Signal objects for unified operations with v3 and v4 files

  • time domain operation using the Signal class

    • Pandas data frames are good if all the channels have the same time based
    • a measurement will usually have channels from different sources at different rates
    • the Signal class facilitates operations with such channels
  • graphical interface to visualize channels and perform operations with the files

Major features not implemented (yet)

  • for version 3

    • functionality related to sample reduction block: the samples reduction blocks are simply ignored
  • for version 4

    • experimental support for MDF v4.20 column oriented storage
    • functionality related to sample reduction block: the samples reduction blocks are simply ignored
    • handling of channel hierarchy: channel hierarchy is ignored
    • full handling of bus logging measurements: currently only CAN and LIN bus logging are implemented with the ability to get signals defined in the attached CAN/LIN database (.arxml or .dbc). Signals can also be extracted from an anonymous bus logging measurement by providing a CAN or LIN database (.dbc or .arxml)
    • handling of unfinished measurements (mdf 4): finalization is attempted when the file is loaded, however the not all the finalization steps are supported
    • full support for remaining mdf 4 channel arrays types
    • xml schema for MDBLOCK: most metadata stored in the comment blocks will not be available
    • full handling of event blocks: events are transferred to the new files (in case of calling methods that return new MDF objects) but no new events can be created
    • channels with default X axis: the default X axis is ignored and the channel group's master channel is used
    • attachment encryption/decryption using user provided encryption/decryption functions; this is not part of the MDF v4 spec and is only supported by this library

Usage

from asammdf import MDF

mdf = MDF('sample.mdf')
speed = mdf.get('WheelSpeed')
speed.plot()

important_signals = ['WheelSpeed', 'VehicleSpeed', 'VehicleAcceleration']
# get short measurement with a subset of channels from 10s to 12s
short = mdf.filter(important_signals).cut(start=10, stop=12)

# convert to version 4.10 and save to disk
short.convert('4.10').save('important signals.mf4')

# plot some channels from a huge file
efficient = MDF('huge.mf4')
for signal in efficient.select(['Sensor1', 'Voltage3']):
   signal.plot()

Check the examples folder for extended usage demo, or the documentation http://asammdf.readthedocs.io/en/master/examples.html

https://canlogger.csselectronics.com/canedge-getting-started/log-file-tools/asammdf-api/

Documentation

http://asammdf.readthedocs.io/en/master

And a nicely written tutorial on the CSS Electronics site

Contributing & Support

Please have a look over the contributing guidelines

If you enjoy this library please consider making a donation to the numpy project or to danielhrisca using liberapay Donate using Liberapay

Contributors

Thanks to all who contributed with commits to asammdf:

Installation

asammdf is available on

pip install asammdf
# for the GUI 
pip install asammdf[gui]
# or for anaconda
conda install -c conda-forge asammdf

In case a wheel is not present for you OS/Python versions and you lack the proper compiler setup to compile the c-extension code, then you can simply copy-paste the package code to your site-packages. In this way the python fallback code will be used instead of the compiled c-extension code.

Dependencies

asammdf uses the following libraries

  • numpy : the heart that makes all tick
  • numexpr : for algebraic and rational channel conversions
  • wheel : for installation in virtual environments
  • pandas : for DataFrame export
  • canmatrix : to handle CAN/LIN bus logging measurements
  • natsort
  • lxml : for canmatrix arxml support
  • lz4 : to speed up the disk IO performance

optional dependencies needed for exports

  • h5py : for HDF5 export
  • scipy : for Matlab v4 and v5 .mat export
  • hdf5storage : for Matlab v7.3 .mat export
  • fastparquet : for parquet export

other optional dependencies

  • PyQt5 : for GUI tool
  • pyqtgraph : for GUI tool and Signal plotting
  • matplotlib : as fallback for Signal plotting
  • cChardet : to detect non-standard Unicode encodings
  • chardet : to detect non-standard Unicode encodings
  • pyqtlet : for GPS window
  • isal : for faster zlib compression/decompression

Benchmarks

http://asammdf.readthedocs.io/en/master/benchmarks.html

Comments
  • Fix Mat file Export

    Fix Mat file Export

    Python version

    Please run the following snippet and write the output here

    python=2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)]'
    'os=Windows-7-6.1.7601-SP1'
    

    Code

     for file_name in os.listdir(dir_mdf_file):
            mdf_parser = asammdf.mdf.MDF(name=file_name, memory='low', version='4.00')
            mdf_file = mdf_parser.export('mat',file_name)
    
    

    Traceback

    (<type 'exceptions.TypeError'>, TypeError("'generator' object has no attribute '__getitem__'",), <traceback object at 0x000000000E247A88>)
    

    Error 2: (<type 'exceptions.ValueError'>, ValueError('array-shape mismatch in array 7',), <traceback object at 0x0000000014D9B0C8>)

    Description

    While trying to convert MDF files to .mat files using MDF module, I am getting these two errors. I am assuming that Error 2 may be because of NUMPY, can anyone help why this error comes while exporting? Can i get help on this??

    bug 
    opened by bhagivinni 122
  • Converting large files

    Converting large files

    When converting large files to parquet or a Pandas Dataframe, I Always get a Memory error. I was wondering, if it is possible to have some kind of low Memory mode or even better Streaming mode.

    Essentially I want parquet files, but I saw that asammdf converts the .dat, mf4 etc... files to a Pandas DataFrame under the Hood anyway and uses the result to export to Parquet.

    So I was playing around with the Code trying to cast the columns to more appropriate dtypes.

    def _downcast(self, src_series):
        if np.issubdtype(src_series.dtype, np.unsignedinteger):
            res_series = pd.to_numeric(src_series, downcast='unsigned')
        elif np.issubdtype(src_series.dtype, np.signedinteger):
            if src_series.min() < 0:
                res_series = pd.to_numeric(src_series, downcast='signed')
            else:
                res_series = pd.to_numeric(src_series, downcast='unsigned')
        elif np.issubdtype(src_series.dtype, np.floating):
            res_series = pd.to_numeric(src_series, downcast='float')
        else:
            res_series = src_series.astype('category')
    
        return res_series
    

    It saves some memory, but unfortunately this is not enough. I do have some files that are 5Gb or larger. When converted to a DataFrame they get inflated to beyond 20Gbs.

    Any help is appreciated.

    enhancement 
    opened by Nimi42 44
  • 'right_shift' not supported

    'right_shift' not supported

    Hello I met a issue when load MDF file, it seems comes from numpy, and this issue is not exist in version 2.5.3, bellowing is the error information

    site-packages\asammdf\mdf4.py", line 2912, in get vals = vals >> bit_offset

    ufunc 'right_shift' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

    at this time vals is a ndarray which with a list

    opened by lznzhou33zzz 34
  • Reading Channel Causes Crash

    Reading Channel Causes Crash

    Python version

    ('python=3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit ' '(AMD64)]') 'os=Windows-10-10.0.19043-SP0' 'numpy=1.22.3' ldf is not supported xls is not supported xlsx is not supported yaml is not supported 'asammdf=7.0.7'

    Code

    MDF version

    4.10

    Code snippet

        data = MDF(mf4_file)
        sig = data.get('MIPI_CSI2_Frame.DataBytes')  # type: Signal Error occurs here
        real_sam = sig.samples  # type: ndarray
        offset = real_sam[0].size - 25
    

    OR

        it_data = data.iter_get(name='MIPI_CSI2_Frame.DataBytes', samples_only=True, raw=True)
        next(it_data) # error occurs here
    

    Traceback

    Traceback (most recent call last):
      File "C:\Users\xxxx\PycharmProjects\MF4Convert\venv\lib\site-packages\asammdf\blocks\mdf_v4.py", line 7442, in _get_scalar
        vals = extract(signal_data, 1, vals - vals[0])
    SystemError: <built-in function __import__> returned a result with an error set
    
    Process finished with exit code -1073741819 (0xC0000005) 
    

    Description

    File is very large ~3.5GB, trying to read data from the mentioned channel using either the iteration method or Signal methods causes an error. A similar file ~1.7GB in size reads fine. Both load into the asammdf tool just fine, including the file causing the crash, however the file causing the crash causes the same error I have above whenever i try to export data with the GUI.

    Reading a different channel in this file does not result in the error.

    The machine im using has 32GB of RAM and plenty of storage space.

    opened by lkuza2 31
  • In MDF files samples and timestamps length mismatch

    In MDF files samples and timestamps length mismatch

    I am using this tool for analyzing MDF files, I am getting this error

    asammdf.utils.MdfException: <SIGNAL_NAME> samples and timestamps length mismatch (58 vs 67)

    I have used ETAS tools to check the signal but there it seemed everything is fine, what is going wrong here?

    Fixed - waiting for next release 
    opened by AkbarAlam 31
  • multiple actions for file handling

    multiple actions for file handling

    Hello, too much actions are needed to handle files.

    Example : I've done acquisition on vehicule and want to do a post-processing with Matlab on a part of the file. currently, I have first to cut to keep a temporal parts and create a new file. then I've to filter in another file to keep only a short list of channels. and finally I've to export in a last .mat file. Same thing if we need to transmit datas to a supplier who works with other tools.

    It could be easier and faster to merge the 'filter','export', 'convert', 'cut' and 'resample' functions. It leads to directly choose in a single tab which channels we would like to handle and what kind of action has to be applied on it.

    1. choose channels idea is to have a channel selection list with 'clear', 'load' & 'save' function in order to select easily It could be great also to add beside a selected channel list as you do for 'search' function in order to have a easy look of what is selected. (in order to be fast, no specific selected channel could mean that the whole file as to be used)

    2. choose action(s) and output format and at rigth, we could find all temporal or sampling management options and all Output formats.

    By this way, handling files could be easy and fast.

    I tried to summarize in an example: image

    Your opinion ?

    opened by gaetanrocq 30
  • Bug in reading .dat file

    Bug in reading .dat file

    Python version: 3.6.1

    from asammdf import MDF
    
    def merge_mdfs(files):
        return MDF.merge(files).resample(0.1)
    
    def main():
        # Here I have code to read in a series of '.dat' files as a list into the variable 'files'
        merge_file = merge_mdfs(files)
    
        # The variable 'merge_file' contains channel names with the corrupt data
    

    Code

    MDF version

    4.7.8

    Code snippet

    return MDF.merge(files).resample(0.1)

    Traceback

    This code doesn't produce an error or a traceback

    Description

    Let me preface by saying that I am relatively new to python and even newer to this library, but I understand the importance of collaboration so I will try my best to help. Since I work for a big company, I can't share exactly everything I am working on, especially the data files I am using. However, I will do my best to supply all available information I can to resolve this issue.

    The issue I'm having is that when I read in a series of '.dat' files, sometimes the data gets read in perfectly, but other times the data gets all messed up and values that were not in the original data find their way in.

    Example: I am reading in acceleration data from an accelerometer. The min and max values of this data trace are confirmed by one of the other tools my company uses to plot data to be max: ~6.5, min: ~-1.5 (units = m/s^2). When I read a series of these same files in I get a max of the same value and a min of ~-13 m/s^2. When I go in and look at the data there are more data points than their should be, and the data doesn't flow like what I would expect to see (i.e. a lot of repeating values).

    Please let me know if anyone needs more information to help solve this issue. I will try my best to supply any additional information requested.

    Thanks for supporting this awesome library! :)

    bug 
    opened by drhannah94 30
  • MDF4 get_group() breaks data

    MDF4 get_group() breaks data

    Python version

    ('python=3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 21:26:53) [MSC v.1916 32 bit '
     '(Intel)]')
    'os=Windows-10-10.0.18362-SP0'
    'numpy=1.16.1'
    'asammdf=5.10.4.dev0'
    

    Code

    MDF version

    4.00

    Code snippet

    from asammdf import MDF
    
    if __name__ == "__main__":
       mdf = MDF("path_to_file.MF4")
       df = mdf.get_group(5)
    

    Traceback

    Traceback (most recent call last):
    File "C:/myPath/main.py", line 5, in <module>
      df = mdf.get_group(5)
    File "myPath\venv\lib\site-packages\asammdf\mdf.py", line 3352, in get_group
      ignore_value2text_conversions=ignore_value2text_conversions,
    File "myPath\venv\lib\site-packages\asammdf\mdf.py", line 3423, in to_dataframe
      mdf = self.filter(channels)
    File "myPath\venv\lib\site-packages\asammdf\mdf.py", line 1699, in filter
      copy_master=False,
    File "myPath\venv\lib\site-packages\asammdf\blocks\mdf_v4.py", line 3954, in get
      _dtype = dtype(channel.dtype_fmt)
    ValueError: field '_level_1_structure._level_2_structure' occurs more than once
    

    so i added the following code to the following code to MDF4 -> _read_channels() -> while ch_addr https://github.com/danielhrisca/asammdf/blob/88e2e67a18a77c4ee437907a3f67603397b6eac0/asammdf/blocks/mdf_v4.py#L798

                if channel.component_addr:
                    channel.name += "_struct"
    

    I mentioned this in my email to you. This fixes the duplicate channel names. Now my Traceback is the following:

    C:\myPath\venv\lib\site-packages\pandas\core\indexes\numeric.py:443: RuntimeWarning: invalid value encountered in equal
      return ((left == right) | (self._isnan & other._isnan)).all()
    

    I was looking for the reason for this. It looks like you create a new filtered mdf file here, when using get_group() https://github.com/danielhrisca/asammdf/blob/88e2e67a18a77c4ee437907a3f67603397b6eac0/asammdf/mdf.py#L3423

    This new mdf has a fragmented data block. The first data_block fragment has a different total size and also a different samples_size than the following fragments. image

    When i check the returned pandas dataframe the data is correct for the first ~6500 rows, then garbage follows. If i use the get method to retrieve a single Signal from the original MDF object the data is okay.

    The file is rather large (160MB). Please tell me if you need it to find the cause.

    opened by zariiii9003 28
  • Slow reading from External Hard Driver

    Slow reading from External Hard Driver

    I am working on a data analysis of multiple files stored in an external hard drive. Files are large (1-7GB). I was using mdfreader before and have recently migrated to asammdf, and I am finding a more similar behaviour when working with the local hard drive files, but a huge difference when loading from the external drive.

    For instance: just loading a 5GB file, and getting the samples for one signal.

    asammdf: 329.76s mdfreader (with no_data_loading = True): 6.42s

    However, running with the file in the local hard drive:

    asammdf: 14.22s mdfreader: 3.79s

    Is it there any way to improve this difference?

    opened by sirlagg 27
  • Loading Data is slow since asammdf 5.x.x

    Loading Data is slow since asammdf 5.x.x

    Hi Daniel,

    since I updated asammdf to 5.x.x (used 5.0.3, 5.0.4 and 5.1.0dev so far), loading data is dramatically slower than before!

    My general procedure for loading data into a pandas dataframe is:

    • loading the file
    • fetching (asammdf.get()) the samples, unit, comment and conversion of all channels using group and index information from channels_db
    • interpolation to the same raster (0.1s)
    • creating a pandas dataframe for the samples, units, comment

    For a mdf3 file with a size of 1,7Gb this needs 12 minutes to load! Doing the same with mdfreader only needs 45s.

    With asammdf 4.7.11 this only needs 26s. Obviously a huge difference.

    I think this is due to the removed memory option. As mentioned in asammdfs documentation, i can speed up data loading by using configure and tune the read_fragment_size parameter. Is there a way to have the same loading speed as with the old memory='full' option?

    Regards, legout

    opened by legout 27
  • cannot fit 'int' into an index-sized integer

    cannot fit 'int' into an index-sized integer

    Python version

    'python=3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 19:16:44) \n[GCC 7.3.0]' 'os=Linux-4.20.12-arch1-1-ARCH-x86_64-with-arch' 'numpy=1.16.2' 'asammdf=5.0.2'

    Code

    MDF version

    4.10

    Code snippet

    Any list of valid channels will result in that error.

    with MDF(file) as mdf:
        mdf.filter(channels=["Ams_Mp"])
    

    Traceback

    ---------------------------------------------------------------------------
    OverflowError                             Traceback (most recent call last)
    <ipython-input-63-cb54d2207089> in <module>
          1 with MDF(file) as mdf:
    ----> 2     mdf.filter(channels=["Ams_Mp"])
    
    ~/.conda/envs/datatools/lib/python3.6/site-packages/asammdf/mdf.py in filter(self, channels, version)
       1587                 info=None
       1588                 comment="">
    -> 1589         <Signal SIG:
       1590                 samples=[ 12.  12.  12.  12.  12.]
       1591                 timestamps=[0 1 2 3 4]
    
    ~/.conda/envs/datatools/lib/python3.6/site-packages/asammdf/blocks/mdf_v4.py in _load_data(self, group, record_offset, record_count)
       1230 
       1231         Returns
    -> 1232         -------
       1233         data : bytes
       1234             aggregated raw data
    
    OverflowError: cannot fit 'int' into an index-sized integer
    

    Description

    Hello!

    I recently discovered an issue with the new version of asammdf (5.0.2) in version (4.7.9) it worked fine for me. As soon as I try to filter or select certain channels the aforementioned error will be raised.

    opened by ofesseler 27
  • Add support for loading data partially according to integer or boolean arrays

    Add support for loading data partially according to integer or boolean arrays

    Python version

    Please run the following snippet and write the output here

    ('python=3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:50:36) [MSC '
     'v.1929 64 bit (AMD64)]')
    'os=Windows-10'
    'numpy=1.23.4'
    'asammdf=7.1.0'
    

    Description

    It is currently possible to load partial data using the mdf.get method: https://github.com/danielhrisca/asammdf/blob/fda5d791ac0a78943eb3dcf8899811cdcda34b82/asammdf/blocks/mdf_v4.py#L6427-L6428

    This allows loading a range of data in similar fashion to np.array([1,2,3,4])[record_offset:record_count] which saves some RAM if you only need a certain range.

    It would be nice to extend the function to also support filtering the data using

    • integer arrays, np.array([0, 1, 2, 8, 9], dtype=int).
    • boolean arrays, np.array([0, 1, 0, 1], dtype=bool).

    This would allow quite advanced filtering without having to load all the data to RAM.

    For inspiration h5py supports these and quite a bit of numpy's fancy indexing: https://docs.h5py.org/en/stable/high/dataset.html?highlight=indexing#fancy-indexing

    opened by Illviljan 2
  • Create a MF4 file from CSV and AVI files

    Create a MF4 file from CSV and AVI files

    Python version

    ('python=3.10.8 | packaged by conda-forge | (main, Nov 24 2022, 14:07:00) [MSC ' 'v.1916 64 bit (AMD64)]') 'os=Windows-10-10.0.19044-SP0' 'numpy=1.23.5' 'asammdf=7.2.0'

    Code

    MDF version

    4.00

    Code snippet

    #%% Imports
    import pandas as pd
    import logging
    from asammdf import MDF
    from os import listdir
    from os.path import isfile, join
    
    #%% Directories
    csv_directory = "in/csvfile.csv"
    avi_directory = "in/avifile.avi"
    mf4_directory = "out/mf4file.mf4"
    
    #%% Create empty mdf object
    mdf  = MDF()
    
    #%% Append data from csv file 
    df_csv = pd.read_csv(csv_directory )
    mdf.append(df_csv)
    
    #%% Attach video 
    with open(avi_directory , 'rb') as f:
            data = f.read()
            index = mdf.attach(data, file_name='front_camera.avi', compression=False, embedded=False)
            mdf.groups[-1].channels[-1].data_block_addr = index
            mdf.attachments[0].file_name = Path(str(mdf.attachments[0].file_name).replace("FROM_", "", 1))
    
    #%% Save to mf4
    mdf.save(mf4_directory)
    

    Description

    Hi all!

    I have some vehicle data from CAN Bus written in csv and a synchronized video with matching number of frames. I would like to display both the signals and the associated video frames in a tool, which requires mf4 as input. Therefore, I have setup the code above to convert csv files and avi files into mf4 format by recycling a similar issue #316. The code does not trigger an error and appends the csv flawlessly to the mdf object, but it doesn't work for the avi attachment (neither size nor channel of mf4 output changes - the mf4 only contains signals from csv input). In debug mode I can see that the avi is successfully read to a binary file (data = b'RIFFX\xceM\x01AVI LIST\xec\x11\x00\x00hdrlavih8\x00\x00\x005\x82 ... ).

    I'm quite new to this topic and am struggling to find documentation and threads to this kind of task. Any help is highly appreciated!

    opened by LinusUng 1
  • No .mat file created using export function, no errors aswell

    No .mat file created using export function, no errors aswell

    Python version

    Please run the following snippet and write the output here

    ('python=3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 ' 'bit (AMD64)]') 'os=Windows-10-10.0.19045-SP0' 'numpy=1.23.1' ldf is not supported xls is not supported xlsx is not supported yaml is not supported 'asammdf=7.1.1'

    Code

    from asammdf import MDF from pathlib import Path

    import tkinter as tk from tkinter import filedialog

    mdf_extension = ".MF4" input_folder = "input" output_folder = "output"

    path = Path(file).parent.absolute()

    root = tk.Tk() root.withdraw()

    mf4_path = tuple(input("MF4_File_Path_Apaix").split())

    root = tk.Tk() root.withdraw()

    dbc_path = filedialog.askopenfilenames(parent=root, title='Choose DBC file(s)', filetypes=[("dbc files", ".dbc")])

    if not dbc_path: exit()

    logfiles = list(mf4_path) dbc_files = {"CAN": [(dbc, 0) for dbc in dbc_path]}

    mdf = MDF.concatenate(logfiles)

    mdf_scaled = mdf.extract_bus_logging(dbc_files) mdf_scaled.export(fmt='mat', filename=Path('C:\Temp\APAIX\MDF4\matlab.mat'))

    MDF version

    4.11

    Code snippet

    no error

    Traceback

    no traceback

    Description

    Hi, I am trying to convert CAN data in MF4 from a CSS logger to a matfile. Currently it works fine in CSV but the files are too heavy and it will be easier to load it to our data analysis SW in .mat. Unfortunately, the export function doesn't output any mat file. I made sure to have scipy installed and checked the other issues on gitHub.

    opened by FlorentTKS 6
  • Feature request: Use enums instead of constant definitions

    Feature request: Use enums instead of constant definitions

    I wanted to ask if you thought about re-organizing the code to use Enums instead of constant integers. For example: Instead of

    CONVERSION_TYPE_NON = 0
    CONVERSION_TYPE_LIN = 1
    CONVERSION_TYPE_RAT = 2
    CONVERSION_TYPE_ALG = 3
    CONVERSION_TYPE_TABI = 4
    CONVERSION_TYPE_TAB = 5
    CONVERSION_TYPE_RTAB = 6
    CONVERSION_TYPE_TABX = 7
    CONVERSION_TYPE_RTABX = 8
    CONVERSION_TYPE_TTAB = 9
    CONVERSION_TYPE_TRANS = 10
    CONVERSION_TYPE_BITFIELD = 11
    
    CONVERSION_TYPE_TO_STRING = dict(
        enumerate(
            [
                "NON",
                "LIN",
                "RAT",
                "ALG",
                "TABI",
                "TAB",
                "RTAB",
                "TABX",
                "RTABX",
                "TTAB",
                "TRANS",
                "BITFIELD",
            ]
        )
    )
    

    Maybe something like:

    class ConversionType(Enum):
        NON = 0
        LIN = 1
        RAT = 2
        ALG = 3
        TABI = 4
        TAB = 5
        RTAB = 6
        TABX = 7
        RTABX = 8
        TTAB = 9
        TRANS = 10
        BITFIELD = 11
    
    
    CONVERSION_TYPE_TO_STRING = dict(
        enumerate(e.name for e in ConversionType)
    )
    

    I think this would have the benefit of the user seeing the "pretty" conversion type (like TABX, LIN) instead of a number that they maybe do not know what it means. It should also help with checking and comparing the values to the known enum values.

    The same could be done with a lot of the constants in the v4_constants.py file (and v2 and v3 as well probably)

    opened by eblis 2
  • Support for loading files from Azure/ any other fsspec compatible file stores

    Support for loading files from Azure/ any other fsspec compatible file stores

    Python version

    ('python=3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC ' '10.4.0]') 'os=Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35' 'numpy=1.23.4'

    Code

    MDF version

    3.20

    Code snippet 1

       import adlfs
       fs = adlfs.AzureBlobFileSystem(account_name="account_name", sas_token="sas_token")
       MDF(fs)
    

    Traceback.

    TypeError                                 Traceback (most recent call last)
    Cell In [9], line 2
          1 file = fs.open('test/mdf/test.mdf', "rb")
    ----> 2 MDF(fs)
    
    File /opt/conda/lib/python3.10/site-packages/asammdf/mdf.py:292, in MDF.__init__(self, name, version, channels, **kwargs)
        289     do_close = True
        291 else:
    --> 292     name = original_name = Path(name)
        293     if not name.is_file() or not name.exists():
        294         raise MdfException(f'File "{name}" does not exist')
    
    File /opt/conda/lib/python3.10/pathlib.py:960, in Path.__new__(cls, *args, **kwargs)
        958 if cls is Path:
        959     cls = WindowsPath if os.name == 'nt' else PosixPath
    --> 960 self = cls._from_parts(args)
        961 if not self._flavour.is_supported:
        962     raise NotImplementedError("cannot instantiate %r on your system"
        963                               % (cls.__name__,))
    
    File /opt/conda/lib/python3.10/pathlib.py:594, in PurePath._from_parts(cls, args)
        589 @classmethod
        590 def _from_parts(cls, args):
        591     # We need to call _parse_args on the instance, so as to get the
        592     # right flavour.
        593     self = object.__new__(cls)
    --> 594     drv, root, parts = self._parse_args(args)
        595     self._drv = drv
        596     self._root = root
    
    File /opt/conda/lib/python3.10/pathlib.py:578, in PurePath._parse_args(cls, args)
        576     parts += a._parts
        577 else:
    --> 578     a = os.fspath(a)
        579     if isinstance(a, str):
        580         # Force-cast str subclasses to str (issue #21127)
        581         parts.append(str(a))
    
    TypeError: expected str, bytes or os.PathLike object, not AzureBlobFileSystem
    
    

    Code snippet 2

       import adlfs
       fs = adlfs.AzureBlobFileSystem(account_name="account_name", sas_token="sas_token")
       file = fs.open('apitest/mdf/test.mdf', "rb")
       MDF(file)
    

    Traceback.

    ---------------------------------------------------------------------------
    MdfException                              Traceback (most recent call last)
    Cell In [11], line 2
          1 file = fs.open('test/mdf/test.mdf', "rb")
    ----> 2 MDF(file)
    
    File /opt/conda/lib/python3.10/site-packages/asammdf/mdf.py:265, in MDF.__init__(self, name, version, channels, **kwargs)
        262         do_close = True
        264     else:
    --> 265         raise MdfException(
        266             f"{type(name)} is not supported as input for the MDF class"
        267         )
        269 elif isinstance(name, zipfile.ZipFile):
        271     archive = name
    
    MdfException: <class 'adlfs.spec.AzureBlobFile'> is not supported as input for the MDF class
    
    

    Description

    MDF class fails to identity files streamed from cloud stores as files. I've tested this with a file on azure blob store.

    A simple fix that works on my fork of this repo is by adding below to https://github.com/neerajd12/asammdf/commit/3bff61a84c9a9764310a0b332738c97d5e1d36aa

    from fsspec.spec import AbstractBufferedFile
    if isinstance(name, AbstractBufferedFile):
        original_name = None
        file_stream = name
        do_close = False
    if isinstance(name, BytesIO):
    
    

    this works for any/all file systems Supported with fsspec

    Hope this helps anyone using azure/aws etc

    opened by neerajd12 3
Releases(7.2.0)
Owner
Daniel Hrisca
Daniel Hrisca
Python package to read and display segregated file names present in a directory based on type of the file

tpyfilestructure Python package to read and display segregated file names present in a directory based on type of the file. Installation You can insta

Tharun Kumar T 2 Nov 28, 2021
Various converters to convert value sets from CSV to JSON, etc.

ValueSet Converters Tools for converting value sets in different formats. Such as converting extensional value sets in CSV format to JSON format able

Health Open Terminology Ecosystem 4 Sep 08, 2022
Kartothek - a Python library to manage large amounts of tabular data in a blob store

Kartothek - a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store

15 Dec 25, 2022
Small-File-Explorer - I coded a small file explorer with several options

Petit explorateur de fichier / Small file explorer Pour la première option (création de répertoire) / For the first option (creation of a directory) e

Xerox 1 Jan 03, 2022
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 07, 2022
A python module to parse text files with contains secret variables.

A python module to parse text files with contains secret variables.

0 Dec 05, 2022
Extract an archive file (zip file or tar file) stored on AWS S3

S3 Extract Extract an archive file (zip file or tar file) stored on AWS S3. Details Downloads archive from S3 into memory, then extract and re-upload

Evan 1 Dec 14, 2021
Copy only text-like files from the folder

copy-only-text-like-files-from-folder-python copy only text-like files from the folder This project is for those who want to copy only source code or

1 May 17, 2022
Python code snippets for extracting PDB codes from .fasta files

Python_snippets_for_bioinformatics Python code snippets for extracting PDB codes from .fasta files If you have a single .fasta file for all protein se

Sofi-Mukhtar 3 Feb 09, 2022
A Python script to backup your favorite Discord gifs

About the project Discord recently felt like it would be a good idea to limit the favorites to 250, which made me lose most of my gifs... Luckily for

4 Aug 03, 2022
OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

Brian Maloney 100 Dec 13, 2022
Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

Quy Truong 5 Nov 22, 2021
useful files for the Freenove Big Hexapod

FreenoveBigHexapod useful files for the Freenove Big Hexapod HexaDogPos is a utility for converting the Freenove xyz co-ordinate system to servo angle

Alex 2 May 28, 2022
Python library for reading and writing tabular data via streams.

tabulator-py A library for reading and writing tabular data (csv/xls/json/etc). [Important Notice] We have released Frictionless Framework. This frame

Frictionless Data 231 Dec 09, 2022
Some-tasks - Files for some of the tasks for the group sessions

Files for some of the tasks for the group sessions Here you can find some of the

<a href=[email protected] Computer Networks"> 0 Aug 25, 2022
OnedataFS is a PyFilesystem interface to Onedata virtual file system

OnedataFS OnedataFS is a PyFilesystem interface to Onedata virtual file system. As a PyFilesystem concrete class, OnedataFS allows you to work with On

onedata 0 Jan 10, 2022
ValveVMF - A python library to parse Valve's VMF files

ValveVMF ValveVMF is a Python library for parsing .vmf files for the Source Engi

pySourceSDK 2 Jan 02, 2022
Measure file similarity in a many-to-many fashion

Mesi Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The outpu

GatorEducator 3 Feb 02, 2022
Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

PyFstab Generator PyFstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems. NOTE : Th

Mahdi 2 Nov 09, 2021
Python file organizer application

Python file organizer application

Pak Maneth 1 Jun 21, 2022