ObjTables: Tools for creating and reusing high-quality spreadsheets

Overview

PyPI package Documentation Test results Test coverage Code analysis License Analytics

ObjTables: Tools for creating and reusing high-quality spreadsheets

ObjTables is a toolkit which makes it easy to use spreadsheets (e.g., XLSX workbooks) to work with complex datasets by combining spreadsheets with rigorous schemas and an object-relational mapping system (ORM; similar to Active Record (Ruby), Django (Python), Doctrine (PHP), Hibernate (Java), Propel (PHP), SQLAlchemy (Python), etc.). This combination enables users to use programs such as Microsoft Excel, LibreOffice Calc, and OpenOffice Calc to view and edit spreadsheets and use schemas and the ObjTables software to validate the syntax and semantics of datasets, compare and merge datasets, and parse datasets into object-oriented data structures for further querying and analysis with languages such as Python.

ObjTables makes it easy to:

  • Use collections of tables (e.g., an XLSX workbook) to represent complex data consisting of multiple related objects of multiple types (e.g., rows of worksheets), each with multiple attributes (e.g., columns).
  • Use complex data types (e.g., numbers, strings, numerical arrays, symbolic mathematical expressions, chemical structures, biological sequences, etc.) within tables.
  • Use progams such as Excel and LibreOffice as a graphical interface for viewing and editing complex datasets.
  • Use embedded tables and grammars to encode relational information into columns and groups of columns of tables.
  • Define clear schemas for tabular datasets.
  • Use schemas to rigorously validate tabular datasets.
  • Use schemas to parse tabular datasets into data structures for further analysis in languages such as Python.
  • Compare, merge, split, revision, and migrate tabular datasets.

The ObjTables toolkit includes five components:

  • Format for schemas for tabular datasets
  • Numerous data types
  • Format for tabular datasets
  • Software tools for parsing, validating, and manipulating tabular datasets
  • Python package for more flexibility and analysis

Please see https://objtables.org for more information.

Installing the command-line program and Python API

Please see the documentation.

Examples, tutorials, and documentation

Please see the user documentation, developer documentation, and tutorials.

License

ObjTables is released under the MIT license.

Development team

ObjTables was developed by the Karr Lab at the Icahn School of Medicine at Mount Sinai in New York, USA and the Applied Mathematics and Computer Science, from Genomes to the Environment research unit at the National Research Institute for Agriculture, Food and Environment in Jouy en Josas, FR.

Questions and comments

Please contact the developers with any questions or comments.

Comments
  • json / yaml export - complicated data structure

    json / yaml export - complicated data structure

    Hi Jonathan,

    again a comment regarding file conversion (same data file, examples/biochemical_models/data.xlsx); now it concerns json and yaml export via

    obj-tables convert schema.csv data.xlsx data.json obj-tables convert schema.csv data.xlsx data.yml

    From the original tables, I had expected a data structure such as

    [ [model_1], [compound1, compound2, compound3, ..] [reaction1, reaction2] ]

    where the entries point to each other via their ids: e.g., reaction1 would have an attribute "Model", with value "e_coli", which matches the id of model1.

    Now I saw that the the attributes (by which objects point to other objects) do not contain ids (or "variable names"), but the objects themselves. Specifically, the data structure starts with compound1, which contains (as an attribute) a data structure describing model1, which in turn contains all compounds and reactions (which then, again, contain "simple" instances of the model). In the end, there are all other compounds (with no model or reaction information at all.

    I imagine that this data structure does the job for exporting / importing the python data objects, but it is difficult to make sense of. Can you have a look at this again and see if my solution (described above) would also work for you? (that is, representing a table by a list of relatively "flat" objects, whose attributes can be strings or lists, but not objects themselves, and which instead point to other objects via ids?)

    Thank you!

    All the best, Wolf

    enhancement question discussion 
    opened by liebermeister 18
  • Support schema migration

    Support schema migration

    Stored Models, such as spreadsheets and delimited files on disk, can become incompatible with updates to model definitions in obj_model. Create a utility that migrates stored models to be compatible with a modified model. E.g., Django has a migration utility, which was originally called fabric.

    enhancement 
    opened by artgoldberg 8
  • csv / tsv export - only one table is exported

    csv / tsv export - only one table is exported

    Hi Jonathan,

    the example file examples/biochemical_models/data.xlsx contains three data tables, but when I convert it to csv or tsv, e.g. using

    obj-tables convert schema.csv data.xlsx data.csv

    the output file contains only a singel table (Reaction), which is the last table in the original xlsx file. I think by default, all three tables should be exported as one multi-table csv / tsv file. Ideally, the tables should appear in the same order as in the original file, but that's only for convenience - if it's difficult to implement this, the order may also change.

    Thank you!

    Best, Wolf

    question 
    opened by liebermeister 5
  • 'NumpyArrayAttribute' object has no attribute 'shape'

    'NumpyArrayAttribute' object has no attribute 'shape'

    When trying to validate that two obj_tables.obj_math.NumpyArrayAttribute are of same shape, I got the error that they do not have an attribute shape. What am I doing wrong.

    question 
    opened by paulflang 5
  • Query functions (both class & instance) for attributes.

    Query functions (both class & instance) for attributes.

    Class methods

    • cls.get_related_attributes() - returns names of all RelatedAttributes of the class
    • cls.get_scalar_attributes() - returns all LiteralAttributes of the class
    • cls.get_attributes() - returns names of all Attributes
    • cls.get_related_name(attribute) - for a given RelatedAttribute of this class, get the related name

    Instance methods

    • self.get_empty_scalar_attributes() - returns LiteralAttributes that are set to None
    • self.get_nonempty_scalar_attributes() - opposite of above
    • self.get_empty_related_attributes() - returns RelatedAttributes that are set to None or []
    • self.get_nonempty_related_attributes() - opposite of above
    enhancement 
    opened by johnsekar 5
  • Parse comments in tabular schema files

    Parse comments in tabular schema files

    Hi Jonathan,

    I just tried to put comment lines (with the usual %/ .. /% syntax) into a schema file, in order to visually structure the file. Apparently, comments in schema files are not supported so far. Would it be possible / desirable to allow such comment lines (which the parser could simply ignore), just for convenience?

    (This is low priority)

    Thank you!

    Best, Wolf

    enhancement future 
    opened by liebermeister 4
  • Better error message for set_value()

    Better error message for set_value()

    Hi Jonathan

    In def set_value(self, obj, new_value), at ../obj_model/obj_model/core.py:3884: ValueError an example error looks like: ValueError: Attribute '<wc_lang.core.RateLaw object at 0x7fbb31678a58>' of '<wc_lang.core.RateLawEquation object at 0x7fbb31678b00>' must be None

    I think it would be better if the first value was the string value of the attribute and the 2nd was something like the classname, id and name of the Model. Thanks

    enhancement 
    opened by artgoldberg 4
  • Allow overriding of attributes when inheriting from classes in wc_kb

    Allow overriding of attributes when inheriting from classes in wc_kb

    Currently, overriding attributes managed by the ManyToX relationship manager gives for example the following error: ValueError: Attribute "obj_tables.core" of class "Evidence" inherited from "object" must be a subclass of StringAttribute because the attribute is already defined in the superclass.

    question 
    opened by paulflang 3
  • How to define the name of a schema?

    How to define the name of a schema?

    Hi Jonathan,

    on the documentation page, in "Tabular format for datasets" I find

    "schema: Use this key to annotate the name of the schema of the table. If this key is set, ObjTables will check that its value matches the name of the schema used to interpret the table."

    It's not clear to me how the name of the schema is defined in the schema file itself. Is there a metadata item "name='..'"? Could you maybe clarify this in the documentation?

    Thank you!

    Best, Wolf

    documentation 
    opened by liebermeister 2
  • Retain order of tables in multi.csv or multi.tsv file

    Retain order of tables in multi.csv or multi.tsv file

    Hi Jonathan,

    I noticed that, when importing a multi-table tsv ObjTables file, the tables become reordered alphabetically. Can this be changed, to keep the original order? (I understand that alphabetical order is a good solution if the tables come from multiple tsv files).

    Thank you!

    Best, Wolf

    enhancement future 
    opened by liebermeister 2
  • obj-tables command line tool throws an error

    obj-tables command line tool throws an error

    Hi Jonathan,

    I'm trying out the obj-tables command line tool, command

    obj-tables init-schema schema.csv schema.py

    and get the error message

    Traceback (most recent call last): File "/usr/local/bin/obj-tables", line 10, in sys.exit(main()) File "/usr/local/lib/python3.6/dist-packages/obj_tables/main.py", line 270, in main app.run() File "/usr/local/lib/python3.6/dist-packages/cement/core/foundation.py", line 916, in run return_val = self.controller._dispatch() File "/usr/local/lib/python3.6/dist-packages/cement/ext/ext_argparse.py", line 808, in _dispatch return func() File "/usr/local/lib/python3.6/dist-packages/obj_tables/main.py", line 125, in _default utils.init_schema(args.in_file, out_filename=args.out_file) File "/usr/local/lib/python3.6/dist-packages/obj_tables/utils.py", line 445, in init_schema wb = wc_utils.workbook.io.read(filename) File "/usr/local/lib/python3.6/dist-packages/wc_utils/workbook/io.py", line 889, in read reader = reader_cls(path) File "/usr/local/lib/python3.6/dist-packages/wc_utils/workbook/io.py", line 705, in init path)) ValueError: path 'schema.csv' must have one glob pattern '*' in its base name

    Can you make sense of it?

    Thank you!

    All the best, Wolf

    opened by liebermeister 2
  • Improve parsing of expressions by using ASTs

    Improve parsing of expressions by using ASTs

    Support linear expressions not written in canonical polynomial form, such as x * 2. Expand on the work in LinearParsedExpressionValidator which uses Python's ast to parse and analyze expressions.

    enhancement 
    opened by artgoldberg 0
  • Documentation warning about aliased references

    Documentation warning about aliased references

    Because obj tables Model instances are indexed by unique keys, unexpected behavior can arise from aliased references. E.g., suppose that a person using wc_lang creates two reactions Models that use the same flux_bounds (another Model) in a whole-cell model model. Let the reaction ids be 'ex_m1' & 'ex_m2'. Then these expressions refer to the two reactions

    model.reactions.get_one(id='ex_m1')
    model.reactions.get_one(id='ex_m2')
    

    And these expressions refer to the FluxBounds Model that they share.

    model.reactions.get_one(id='ex_m1').flux_bounds
    model.reactions.get_one(id='ex_m2').flux_bounds
    

    Thus, any changes to one flux_bounds entity will change the other. These two references to one FluxBounds Model are called 'aliases' in Computer Science, which has studied aliasing deeply, especially in computer language and compiler theory (google scholar "mark wegman aliasing").

    Aliasing is likely to cause problems for obj model users that don't expect aliasing, i.e., that don't realize that multiple references point to the instance. E.g., model.reactions.get_one(id='ex_m1').flux_bounds.min *= 10 would also change the value of model.reactions.get_one(id='ex_m2').flux_bounds.min which might not be expected.

    The only possible technical change obj tables could make to address this would be to avoid reusing objects just on the basis of data values in unique keys. However, that would likely have other consequence.

    Other than that, there's nothing technical obj tables can do to address this. Warnings aren't appropriate, and would almost certainly be ignored; and errors aren't appropriate. So the best thing to do is alert users to this issue in the obj tables documentation.

    discussion 
    opened by artgoldberg 0
  • Uncontrolled columns - import and export

    Uncontrolled columns - import and export

    Hi Jonathan,

    I'm sorry for bringing this up again, it's about the "uncontrolled" columns (those that do not start with a "!").

    If I remember correctly, the parser does not ignore them, but reads them and attaches their content to the data objects as "comments" or extra information. For my own intended usage of ObjTables (for the results of Enzyme Cost Minimisation) it would be important that this information can also be exported again, such that after a round of import and export, the "uncontrolled" columns would still exist. I think this is not the case right now.

    Could you implement this?

    Thank you!!

    Best, Wolf

    enhancement future 
    opened by liebermeister 1
  • Grouped columns - optional alternative syntax?

    Grouped columns - optional alternative syntax?

    Hi Jonathan,

    we talked about this already, but I'd like to bring this up again.

    Since SBtab does not support grouped tables, it will not be able to import ObjTables files that use this feature. Therefore, it would be great if there was an option in ObjTables that allows for import and export of data files in an alternative form. Instead of

     | !GROUP |       |
    

    !A | !X | !Y |!B

    this form could read, for example,

    !A | !::GROUP::X | !::GROUP::Y | !B

    It's not urgent, but it would be good to know whether you would be willing to implement this in principle.

    Thank you!

    Best, Wolf

    enhancement future 
    opened by liebermeister 4
  • Add more concise documentation of arguments for the attributes

    Add more concise documentation of arguments for the attributes

    Hi Jonathan,

    In the list of datatypes on the website, it could be helpful to give an example string for each datatype (or at least those where strings are not lengthy). In most cases the sytax is obvious, but with Boolean values, lists, or dates, for example, the syntax may be less clear. If this makes the website to crowded (which I think it may do), maybe there could be an extra dataset file that contains one item for each datatype, just for the sake of an example?

    Best, Wolf

    future documentation 
    opened by liebermeister 5
Releases(1.0.11)
Owner
Karr whole-cell modeling lab
Developing whole-cell computational models to predict and engineer biology.
Karr whole-cell modeling lab
Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

pyexcel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenue ge

1.1k Dec 29, 2022
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

John McNamara 3.1k Dec 29, 2022
Excel-report-evaluator - A simple Python GUI application to aid with bulk evaluation of Microsoft Excel reports.

Excel Report Evaluator Simple Python GUI with Tkinter for evaluating Microsoft Excel reports (.xlsx-Files). Usage Start main.py and choose one of the

Alexander H. 1 Dec 29, 2021
Xiaobo Zhang 30 Jan 08, 2023
According to the received excel file (.xlsx,.xlsm,.xltx,.xltm), it converts to word format with a given table structure and formatting

According to the received excel file (.xlsx,.xlsm,.xltx,.xltm), it converts to word format with a given table structure and formatting

Diakonov Andrey 2 Feb 18, 2022
A suite of utilities for converting to and working with CSV, the king of tabular file formats.

csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats. It is inspired by pdftk, GDAL and th

wireservice 5.2k Dec 31, 2022
A wrapper library to read, manipulate and write data in xlsx and xlsm format using openpyxl

pyexcel-xlsx - Let you focus on data, instead of xlsx format pyexcel-xlsx is a tiny wrapper library to read, manipulate and write data in xlsx and xls

110 Nov 16, 2022
Library to create spreadsheet files compatible with MS Excel 97/2000/XP/2003 XLS files, on any platform.

xlwt This is a library for developers to use to generate spreadsheet files compatible with Microsoft Excel versions 95 to 2003. The package itself is

1k Dec 24, 2022
A set of Python scripts for finding threats in Office365

Py365 A collection of scripts for finding threats in Office365 Risky Rules A tool for finding risky or suspicious inbox rules - more detail in this po

Martin Rothe 49 May 18, 2022
xlwings is a BSD-licensed Python library that makes it easy to call Python from Excel and vice versa. It works with Microsoft Excel on Windows and macOS.

xlwings - Make Excel fly with Python! xlwings (Open Source) xlwings is a BSD-licensed Python library that makes it easy to call Python from Excel and

xlwings 2.5k Jan 06, 2023
Reads Data from given Excel File and exports Single PDFs and a complete PDF grouped by Gateway

E-Shelter Excel2QR Reads Data from given Excel File and exports Single PDFs and a complete PDF grouped by Gateway Features Reads Excel 2021 Export Sin

Stefan Knaak 1 Nov 13, 2021
xlwings is a BSD-licensed Python library that makes it easy to call Python from Excel and vice versa. It works with Microsoft Excel on Windows and macOS. Sign up for the newsletter or follow us on twitter via

xlwings - Make Excel fly with Python! xlwings CE xlwings CE is a BSD-licensed Python library that makes it easy to call Python from Excel and vice ver

xlwings 2.5k Jan 06, 2023
Use a docx as a jinja2 template

python-docx-template Use a docx as a jinja2 template Introduction This package uses 2 major packages : python-docx for reading, writing and creating s

Eric Lapouyade 1.4k Dec 28, 2022
PyExcelerate - Accelerated Excel XLSX Writing Library for Python 2/3

PyExcelerate Accelerated Excel XLSX writing library for Python master: dev: test coverage: Authors: Kevin Wang and Kevin Zhang Copyright 2015 Kevin Wa

448 Dec 28, 2022
ExcelPeek is a tool designed to help investigate potentially malicious Microsoft Excel files.

ExcelPeek is a tool designed to help investigate potentially malicious Microsoft Excel files.

James Slaughter 37 Apr 16, 2022
Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.

Automated conversion and styling using LibreOffice Universal Office Converter (unoconv) is a command line tool to convert any document format that Lib

2.4k Jan 03, 2023
Upload an Excel/CSV file ( < 200 MB) and produce a short summary of the data.

Data-Analysis-Report Deployed App 1. What is this app? Upload an excel/csv file and produce a summary report of the data. 2. Where to upload? How to p

Easwaran T H 0 Feb 26, 2022
PowerShell module to import/export Excel spreadsheets, without Excel

PowerShell + Excel = Better Together Automate Excel via PowerShell without having Excel installed. Runs on Windows, Linux and MAC. Creating Tables, Pi

Doug Finke 2k Dec 30, 2022
Transpiler for Excel formula like language to Python. Support script and module mode

Transpiler for Excel formula like language to Python. Support script and module mode (formulas are functions).

Edward Villegas-Pulgarin 1 Dec 07, 2021
Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.

Tablib: format-agnostic tabular dataset library _____ ______ ___________ ______ __ /_______ ____ /_ ___ /___(_)___ /_ _ __/_ __ `/__ _

Jazzband 4.2k Dec 30, 2022