A simple XLSX/CSV reader - to dictionary converter

Overview

Test Python package codecov Upload Python Package to PyPI PythonVersions Sourcery Black Snyk Downloads Twitter Follow

sheet2dict

A simple XLSX/CSV reader - to dictionary converter

Installing

To install the package from pip, first run:

python3 -m pip install --no-cache-dir sheet2dict

Required pip packages for sheet2doc: csv, openpyxl

Usage

This library has 2 main features: reading a spreadsheet files and converting them to array of python dictionaries.

- XLSX

Use xlsx_to_dict() method when converting form spreadsheets.
Supported file formats for spreadsheets are: .xlsx,.xlsm,.xltx,.xltm
Spreadsheets with multiple worksheets are supported. If no sheet is specified, the active sheet is selected. If there is only one sheet, it is considered active.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Convert active sheet (without specifying sheet name)
ws.xlsx_to_dict(path='inventory.xlsx')

# Convert the 'Main Warehouse' sheet of the 'inventory.xslx' spreadsheet file.
ws.xlsx_to_dict(path='inventory.xlsx', select_sheet='Main Warehouse')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

You can parse data when worksheet is an object

# Import the library
from sheet2dict import Worksheet

# Example: read spreadsheet as object
path = 'inventory.xlsx'
xlsx_file = open(path, 'rb')
xlsx_file = BytesIO(xlsx_file.read())

# Parse spreadsheet from object
ws = Worksheet()
ws.xlsx_to_dict(path=xlsx_file)
print(ws.header)

- CSV

Use csv_to_dict() method when converting form csv.
CSV is a format with many variations, better handle encodings and delimiters on user side and not within module itself.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Read CSV file
csv_file = open('inventory.csv', 'r', encoding='utf-8-sig')

# Convert 
ws.csv_to_dict(csv_file=csv_file, delimiter=';')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

- Other functions

Worksheet object.header returns first row with the data in a spreadsheet

>> ws.header {'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}">
Python 3.9.1
[Clang 12.0.0 (clang-1200.0.32.28)] on darwin
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.header
{'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}

Worksheet object.sanitize_sheet_items removes None or empty dictionary keys from sheet_items

>> ws.sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'} ] >>> ws.sanitize_sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'} ]">
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'}
]

>>> ws.sanitize_sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'}
]

Contributing and Code of Conduct

Contributing to sheet2dict

As an open source project, sheet2dict welcomes contributions of many forms.
Please read and follow our Contributing to sheet2dict

Contributors:

  • Thanks to 白一百 (bái-yī-bǎi) for making sheet2dict work with multi-sheet Excel files.

Code of Conduct

As a contributor, you can help us keep the sheet2dict project open and inclusive.
Please read and follow our Code of Conduct

You might also like...
SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful.

Simple Codes That You Might Need SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful. In

Żmija is a simple universal code generation tool.

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta

 Simple yet powerful CAD (Computer Aided Design) library, written with Python.
Simple yet powerful CAD (Computer Aided Design) library, written with Python.

Py-MADCAD it's time to throw parametric softwares out ! Simple yet powerful CAD (Computer Aided Design) library, written with Python. Installation

A simple USI Shogi Engine written in python using python-shogi.

Revengeshogi My attempt at creating a USI Shogi Engine in python using python-shogi. Current State of Engine Currently only generating random moves us

A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework

Django-Text-to-HTML-converter This is the simple Text to HTML Converter using Dj

Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

pyexcel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenue ge

A flask extension using pyexcel to read, manipulate and write data in different excel formats: csv, ods, xls, xlsx and xlsm.

Flask-Excel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenu

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

mcc-mnc.com-webscraper Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX) A Python script for web scraping mcc-mnc.com Link: mcc

LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

JSON and CSV data for Swahili dictionary with over 16600+ words

kamusi JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from swahili dictionary with about 16683 words toge

Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

DB-Drive-CSV - This is app is can be used to access CSV file as JSON from Google Drive.

DB Drive CSV This is app is can be used to access CSV file as JSON from Google Drive. How To Use Create file/ upload file to Google Drive There's 2 fi

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.
É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.

API de rastreamento de pacotes É uma API feita em Python e Flask que pesquisa informações de rastreamento de pacotes em uma tabela .xlsx e retorna o r

Transfers a image file(.png) to an Excel file(.xlsx)
Transfers a image file(.png) to an Excel file(.xlsx)

Transfers a image file(.png) to an Excel file(.xlsx)

An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

Comments
  • Empty cells in xlsx become the non-empty string

    Empty cells in xlsx become the non-empty string "None" in the dictionary

    Thank you for writing this. I have recently moved from using csv files to excel files and this has made the transition easier.

    One issue I have encountered is that empty cells are not being returned as a Falsey value like None, or the empty string ''.

    From what I have been able to understand, openpyxl reads the empty cell as None, and the following code in sheet2dict uses str to convert this into the string 'None'.

    https://github.com/Pytlicek/sheet2dict/blob/93bf731d327d620755ffd5585aedb0b23a17a6a8/sheet2dict/main.py#L30-L34

    I have filtered my own dictionaries to turn 'None' into '' but that will cause issues when any of my sheet really do contain the word None.

    I just did some quick testing, and it seems that the csv.DictReader() in the python standard library uses the empty string '' for missing values in the a csv file, so that might be a good thing to do for these cases.

    enhancement 
    opened by JoshuaCapel 5
  • Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    This library is awesome! I'm using it a lot since it makes working with Excel files very simple.

    I have a proposal: Allow the user to specify the sheet they want to modify.

    I write a lot of functional programming and I can't remember if it's possible to overwrite a class function at import, so instead I've modified the sheet2dict to return a specified sheet when openpyxl opens the file.

    in main.py:

    def xlsx_to_dict(self, path, select_sheet=None):
            """
            Read a Worksheet and return it as array of dictionaries
            :param self: Worksheet Object
            :param path: Path to XLSX file
            :return: Array of rows as dictionaries
            """
            book = load_workbook(path)
            if select_sheet == None:
                  sheet = book.active
            else:
                  sheet = book[select_sheet]
    

    The user would then read an Excel file like this:

    ws =Worksheet()
    ws.xlsx_to_dict(path=<insert path>, select_sheet='<sheetname>'
    

    I don't know if this is the most elegant way to select a sheet, but it works for me. I suppose to import all worksheets by default - I think the maximum is 32 - into a larger dictionary, <edit I didn't complete my thought>, that there could be a clever way to put dictionaries in dictionaries, but this might be too convoluted since I assume most users will just use one sheet. Another option is just to continue creating Worksheet() objects with different names and sheets selected.

    enhancement 
    opened by bai-yi-bai 3
  • Add an option to parse all worksheets in one function call

    Add an option to parse all worksheets in one function call

    Added an option to parse all sheets from a workbook at once while keeping the way existing features worked untouched.

    Had to modify sheet_items to a dict with sheet names as keys and sheet's objects as values.

    enhancement 
    opened by 68Kamil68 2
  • Update README.md

    Update README.md

    Changed "the first sheet" to "active sheet" to avoid confusion if someone saves a spreadsheet and the active worksheet is, fe. second in line. Also, what is considered an active worksheet when there is only one worksheet in the spreadsheet. Add an example with and without the "select_sheet" option.

    documentation 
    opened by Pytlicek 1
Releases(0.1.5)
Owner
Tomas Pytel
PyConSK Organizer, Python Developer & DevOps
Tomas Pytel
Materi workshop "Light up your Python!" Himpunan Mahasiswa Sistem Informasi Fakultas Ilmu Komputer Universitas Singaperbangsa Karawang, 4 September 2021 (Online via Zoom).

Workshop Python UNSIKA 2021 Materi workshop "Light up your Python!" Himpunan Mahasiswa Sistem Informasi Fakultas Ilmu Komputer Universitas Singaperban

Eka Putra 20 Mar 24, 2022
Mayan EDMS is a document management system.

Mayan EDMS is a document management system. Its main purpose is to store, introspect, and categorize files, with a strong emphasis on preserving the contextual and business information of documents.

3 Oct 02, 2021
Lightweight, configurable Sphinx theme. Now the Sphinx default!

What is Alabaster? Alabaster is a visually (c)lean, responsive, configurable theme for the Sphinx documentation system. It is Python 2+3 compatible. I

Jeff Forcier 670 Dec 19, 2022
This is the repository that includes the code material for the ESweek 2021 for the Education Class Lecture A3 "Learn to Drive (and Race!) Autonomous Vehicles"

ESweek2021_educationclassA3 This is the repository that includes the code material for the ESweek 2021 for the Education Class Lecture A3 "Learn to Dr

F1TENTH Autonomous Racing Community 29 Dec 06, 2022
YAML metadata extension for Python-Markdown

YAML metadata extension for Python-Markdown This extension adds YAML meta data handling to markdown with all YAML features. As in the original, metada

Nikita Sivakov 14 Dec 30, 2022
epub2sphinx is a tool to convert epub files to ReST for Sphinx

epub2sphinx epub2sphinx is a tool to convert epub files to ReST for Sphinx. It uses Pandoc for converting HTML data inside epub files into ReST. It cr

Nihaal 8 Dec 15, 2022
DataRisk Detection Learning Resources

DataRisk Detection Learning Resources Data security: Based on the "data-centric security system" position, it generally refers to the entire security

Liao Wenzhe 59 Dec 05, 2022
Demonstration that AWS IAM policy evaluation docs are incorrect

The flowchart from the AWS IAM policy evaluation documentation page, as of 2021-09-12, and dating back to at least 2018-12-27, is the following: The f

Ben Kehoe 15 Oct 21, 2022
Highlight Translator can help you translate the words quickly and accurately.

Highlight Translator can help you translate the words quickly and accurately. By only highlighting, copying, or screenshoting the content you want to translate anywhere on your computer (ex. PDF, PPT

Coolshan 48 Dec 21, 2022
PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation for Power Apps.

powerapps-docstring PowerApps-docstring is a console based, pipeline ready application that automatically generates user and technical documentation f

Sebastian Muthwill 30 Nov 23, 2022
100 numpy exercises (with solutions)

100 numpy exercises This is a collection of numpy exercises from numpy mailing list, stack overflow, and numpy documentation. I've also created some p

Nicolas P. Rougier 9.5k Dec 30, 2022
Paper and Code for "Curriculum Learning by Optimizing Learning Dynamics" (AISTATS 2021)

Curriculum Learning by Optimizing Learning Dynamics (DoCL) AISTATS 2021 paper: Title: Curriculum Learning by Optimizing Learning Dynamics [pdf] [appen

Tianyi Zhou 15 Dec 06, 2022
A python package to avoid writing and maintaining duplicated python docstrings.

docstring-inheritance is a python package to avoid writing and maintaining duplicated python docstrings.

Antoine Dechaume 15 Dec 07, 2022
Sphinx Bootstrap Theme

Sphinx Bootstrap Theme This Sphinx theme integrates the Bootstrap CSS / JavaScript framework with various layout options, hierarchical menu navigation

Ryan Roemer 584 Nov 16, 2022
An interview engine for businesses, interview those who are actually qualified and are worth your time!

easyInterview V0.8B An interview engine for businesses, interview those who are actually qualified and are worth your time! Quick Overview You/the com

Vatsal Shukla 1 Nov 19, 2021
Essential Document Generator

Essential Document Generator Dead Simple Document Generation Whether it's testing database performance or a new web interface, we've all needed a dead

Shane C Mason 59 Nov 11, 2022
Markdown documentation generator from Google docstrings

mkgendocs A Python package for automatically generating documentation pages in markdown for Python source files by parsing Google style docstring. The

Davide Nunes 44 Dec 18, 2022
This is the data scrapped of all the pitches made up potential startup's to established bussiness tycoons of India with all the details of Investments made, equity share, Name of investor etc.

SharkTankInvestor This is the data scrapped of all the pitches made up potential startup's to established bussiness tycoons of India with all the deta

Subradip Poddar 2 Aug 02, 2022
Types that make coding in Python quick and safe.

Type[T] Types that make coding in Python quick and safe. Type[T] works best with Python 3.6 or later. Prior to 3.6, object types must use comment type

Contains 17 Aug 01, 2022
Openapi-core is a Python library that adds client-side and server-side support for the OpenAPI Specification v3.

Openapi-core is a Python library that adds client-side and server-side support for the OpenAPI Specification v3.

A 186 Dec 30, 2022