LightCSV - This CSV reader is implemented in just pure Python.

Overview

LightCSV

Python 3.8 Python 3.9 Code style: black

Simple light CSV reader

This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles (or get the first row as titles). Nothing more, nothing else.

Usage

Usage is pretty straightforward:

from lightcsv import LightCSV

for row in LightCSV().read_file("myfile.csv"):
    print(row)

This will open a file named myfile.csv and iterate over the CSV file returning each row as a key-value dictionary. Line endings can be either \n or \r\n. The file will be opened in text-mode with utf-8 encoding.

You can supply your own stream (i.e. an open file instead of a filename). You can use this, for example, to open a file with a different encoding, etc.:

from lightcsv import LightCSV

with open("myfile.csv") as f:
    for row in LightCSV().read(f):
        print(row)
NOTE: Blank lines at any point in the file will be ignored

Parameters

LightCSV can be parametrized during initialization to fine-tune its behaviour.

The following example shows initialization with default parameters:

from lightcsv import LightCSV

myCSV_reader = LightCSV(
    separator=",",
    quote_char='"',
    field_names = None,
    strict=True,
    has_headers=False
)

Available settings:

  • separator: character used as separator (defaults to ,)
  • quote_char: character used to quote strings (defaults to ").
    This char can be escaped by duplicating it.
  • field_names: can be any iterable or sequence of str (i.e. a list of strings).
    If set, these will be used as column titles (dictionary keys), and also sets the expected number of columns.
  • strict: Sets whether the parser runs in strict mode or not.
    In strict mode the parser will raise a ValueError exception if a cell cannot be decoded or column numbers don't match. In non-strict mode non-recognized cells will be returned as strings. If there are more columns than expected they will be ignored. If there are less, the dictionary will contain also fewer values.
  • has_headers: whether the first row should be taken as column titles or not.
    If set, field_names cannot be specified. If not set, and no field names are specified, dictionary keys will be just the column positions of the cells.

Data types recognized

The parser will try to match the following types are recognized in this order:

  • None (empty values). Unlike CSV reader, it will return None (null) for empty values.
    Empty strings ("") are recognized correctly.
  • str (strings): Anything that is quoted with the quotechar. Default quotechar is ".
    If the string contains a quote, it must be escaped duplicating it. i.e. "HELLO ""WORLD""" decodes to HELLO "WORLD" string.
  • int (integers): an integer with a preceding optional sign.
  • float: any float recognized by Python
  • datetime: a datetime in ISO format (with 'T' or whitespace in the middle), like 2022-02-02 22:02:02
  • date: a date in ISO format, like 2022-02-02
  • time: a time in ISO format, like 22:02:02

If all this parsing attempts fails, a string will be returned, unless strict_mode is set to True. In the latter case, a ValueError exception will be raised.

Implementing your own type recognizer

You can implement your own deserialization by subclassing LightCSV and override the method parse_obj().

For example, suppose we want to recognize hexadecimal integers in the format 0xNNN.... We can implement it this way:

import re
from lightcsv import LightCSV

RE_HEXA = re.compile('0[xX][A-Za-z0-9]+$')  # matches 0xNNNN (hexadecimals)


class CSVHexRecognizer(LightCSV):
    def parse_obj(self, lineno: int, chunk: str):
        if RE_HEXA.match(chunk):
            return int(chunk[2:], 16)
        
        return super().parse_obj(lineno, chunk)

As you can see, you have to override parse_obj(). If your match fails, you have to invoke super() (overridden) parse_obj() method and return its result.


Why

Python built-in CSV module is a bit over-engineered for simple tasks, and one normally doesn't need all bells and whistles. With LightCSV you just open a filename and iterate over its rows.

Decoding None for empty cells is needed very often and can be really cumbersome as the standard csv tries hard to cover many corner-cases (if that's your case, this tool might not be suitable for you).

Owner
Jose Rodriguez
Computer Scientist. Software Engineer. Opinions expressed here are solely my own and not necessarily those of my employer.
Jose Rodriguez
Swiss army knife for Apple's .tbd file manipulation

Description Inspired by tbdswizzler, this simple python tool for manipulating Apple's .tbd format. Installation python3 -m pip install --user -U pytbd

10 Aug 31, 2022
Various converters to convert value sets from CSV to JSON, etc.

ValueSet Converters Tools for converting value sets in different formats. Such as converting extensional value sets in CSV format to JSON format able

Health Open Terminology Ecosystem 4 Sep 08, 2022
Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

3 Feb 09, 2022
A file utility for accessing both local and remote files through a unified interface.

A file utility for accessing both local and remote files through a unified interface.

AI2 19 Nov 16, 2022
Python file organizer application

Python file organizer application

Pak Maneth 1 Jun 21, 2022
Lumar - Smart File Creator

Lumar is a free tool for creating and managing files. With Lumar you can quickly create any type of file, add a file content and file size. With Lumar you can also find out if Photoshop or other imag

Paul - FloatDesign 3 Dec 10, 2021
csv2ir is a script to convert ir .csv files to .ir files for the flipper.

csv2ir csv2ir is a script to convert ir .csv files to .ir files for the flipper. For a repo of .ir files, please see https://github.com/logickworkshop

Alex 38 Dec 31, 2022
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 02, 2021
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.

Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of. Th

Singer 1.1k Jan 05, 2023
Publicly Open Amazon AWS S3 Bucket Viewer

S3Viewer Publicly open storage viewer (Amazon S3 Bucket, Azure Blob, FTP server, HTTP Index Of/) s3viewer is a free tool for security researchers that

Sharon Brizinov 377 Dec 02, 2022
Pti-file-format - Reverse engineering the Polyend Tracker instrument file format

pti-file-format Reverse engineering the Polyend Tracker instrument file format.

Jaap Roes 14 Dec 30, 2022
A simple tool to find and replace all the matches of a regular expression in file(s).

FindREp A simple tool to find and replace all the matches of a regular expression in file(s). You can either select the file(s) directly or select a f

Biraj 5 Oct 18, 2022
A Python script to backup your favorite Discord gifs

About the project Discord recently felt like it would be a good idea to limit the favorites to 250, which made me lose most of my gifs... Luckily for

4 Aug 03, 2022
Simple archive format designed for quickly reading some files without extracting the entire archive

Simple archive format designed for quickly reading some files without extracting the entire archive

Jarred Sumner 336 Dec 30, 2022
Kartothek - a Python library to manage large amounts of tabular data in a blob store

Kartothek - a Python library to manage (create, read, update, delete) large amounts of tabular data in a blob store

15 Dec 25, 2022
Organizer is a python program that organizes your downloads folder

Organizer Organizer is a python program that organizes your downloads folder, it can run as a service and so will start along with the system, and the

Gustavo 2 Oct 18, 2021
A tiny Configuration File Parser for Python Projects

A tiny Configuration File Parser for Python Projects. Currently working on JSON Config Files only.

Tanmoy Sen Gupta 1 Feb 12, 2022
ZipFly is a zip archive generator based on zipfile.py

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without m

Buzon 506 Jan 04, 2023
This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it. In the current state, it outputs a PrettyTable to txt file as

Joshua Wren 1 Nov 09, 2021
A Python library that provides basic functions to read / write Aseprite format files

A Python library that provides basic functions to read / write Aseprite format files

Joe Trewin 1 Jan 13, 2022