A fast streaming JSON parser for Python that generates SAX-like events using yajl

Related tags

JSONjson-streamer
Overview

json-streamer Build Status

jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits top level entities in any JSON object. Based on the fast c libary 'yajl'. Great for parsing streaming json over a network as it comes in or json objects that are too large to hold in memory altogether.

Dependencies

git clone [email protected]:lloyd/yajl.git
cd yajl
./configure && make install

Setup

pip3 install jsonstreamer

Also available at PyPi - https://pypi.python.org/pypi/jsonstreamer

Example

Shell

python -m jsonstreamer.jsonstreamer < some_file.json

Code

variables which contain the input we want to parse

json_object = """
    {
        "fruits":["apple","banana", "cherry"],
        "calories":[100,200,50]
    }
"""
json_array = """[1,2,true,[4,5],"a"]"""

a catch-all event listener function which prints the events

def _catch_all(event_name, *args):
    print('\t{} : {}'.format(event_name, args))

JSONStreamer Example

Event listeners get events in their parameters and must have appropriate signatures for receiving their specific event of interest.

JSONStreamer provides the following events:

  • doc_start
  • doc_end
  • object_start
  • object_end
  • array_start
  • array_end
  • key - this also carries the name of the key as a string param
  • value - this also carries the value as a string|int|float|boolean|None param
  • element - this also carries the value as a string|int|float|boolean|None param

Listener methods must have signatures that match

For example for events: doc_start, doc_end, object_start, object_end, array_start and array_end the listener must be as such, note no params required

def listener():
    pass

OR, if your listener is a class method, it can have an additional 'self' param as such

def listener(self):
    pass

For events: key, value, element listeners must also receive an additional payload and must be declared as such

def key_listener(key_string):
    pass

import and run jsonstreamer on 'json_object'

from jsonstreamer import JSONStreamer 

print("\nParsing the json object:")
streamer = JSONStreamer() 
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_object[0:10]) #note that partial input is possible
streamer.consume(json_object[10:])
streamer.close()

output

Parsing the json object:
    doc_start : ()
    object_start : ()
    key : ('fruits',)
    array_start : ()
    element : ('apple',)
    element : ('banana',)
    element : ('cherry',)
    array_end : ()
    key : ('calories',)
    array_start : ()
    element : (100,)
    element : (200,)
    element : (50,)
    array_end : ()
    object_end : ()
    doc_end : ()

run jsonstreamer on 'json_array'

print("\nParsing the json array:")
streamer = JSONStreamer() #can't reuse old object, make a fresh one
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_array[0:5])
streamer.consume(json_array[5:])
streamer.close()

output

Parsing the json array:
    doc_start : ()
    array_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    array_start : ()
    element : (4,)
    element : (5,)
    array_end : ()
    element : ('a',)
    array_end : ()
    doc_end : ()

ObjectStreamer Example

ObjectStreamer provides the following events:

  • object_stream_start
  • object_stream_end
  • array_stream_start
  • array_stream_end
  • pair
  • element

import and run ObjectStreamer on 'json_object'

from jsonstreamer import ObjectStreamer

print("\nParsing the json object:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_object[0:9])
object_streamer.consume(json_object[9:])
object_streamer.close()

output

Parsing the json object:
    object_stream_start : ()
    pair : (('fruits', ['apple', 'banana', 'cherry']),)
    pair : (('calories', [100, 200, 50]),)
    object_stream_end : ()

run the ObjectStreamer on the 'json_array'

print("\nParsing the json array:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_array[0:4])
object_streamer.consume(json_array[4:])
object_streamer.close()

output - note that the events are different for an array

Parsing the json array:
    array_stream_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    element : ([4, 5],)
    element : ('a',)
    array_stream_end : ()

Example on attaching listeners for various events

ob_streamer = ObjectStreamer()

def pair_listener(pair):
    print('Explicit listener: Key: {} - Value: {}'.format(pair[0],pair[1]))
    
ob_streamer.add_listener('pair', pair_listener) #same for JSONStreamer
ob_streamer.consume(json_object)

ob_streamer.remove_listener(pair_listener) #if you need to remove the listener explicitly

Even easier way of attaching listeners

class MyClass:
    
    def __init__(self):
        self._obj_streamer = ObjectStreamer() #same for JSONStreamer
        
        # this automatically finds listeners in this class and attaches them if they are named
        # using the following convention '_on_eventname'. Note method names in this class
        self._obj_streamer.auto_listen(self) 
    
    def _on_object_stream_start(self):
        print ('Root Object Started')
        
    def _on_pair(self, pair):
        print('Key: {} - Value: {}'.format(pair[0],pair[1]))
        
    def parse(self, data):
        self._obj_streamer.consume(data)
        
        
m = MyClass()
m.parse(json_object)

Troubleshooting

  • If you get an OSError('Yajl cannot be found.') Please ensure that libyajl is available in the relevant directory. For example, on mac(osx) /usr/local/lib should have a "libyajl.dylib" Linux -> libyajl.so Windows -> yajl.dll
Comments
  • Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Ubuntu 14.04 system and also verified it's presence and correct installation (refer: [1] & [2])

    Still, on running the command python3 -m jsonstreamer.jsonstreamer < test.json i.e. using it with jsonstreamer gives me the following :

      File "/usr/local/lib/python3.4/dist-packages/jsonstreamer/yajl/parse.py", line 29, in load_lib
        raise OSError('Yajl cannot be found.')
    OSError: Yajl cannot be found.
    

    Following up in https://github.com/lloyd/yajl/issues/190 it seems that there might be an issue in the parse.py file itself ? Maybe it's looking for yajl1 and not yajl2.

    Any pointers on this one ? Help appreciated.


    [1] Running gcc -lyajl yields:

    [email protected]:~$ gcc -lyajl
    ....
    /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
    (.text+0x20): undefined reference to `main'
    collect2: error: ld returned 1 exit status
    

    [2] And sudo ldconfig -p | grep yajl results in:

    [email protected]:~$ sudo ldconfig -p | grep yajl
        libyajl.so.2 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libyajl.so.2
    
    opened by jigyasa-grover 10
  • Ensure exception __str__ methods return strings

    Ensure exception __str__ methods return strings

    Hi there,

    Issues that throw JSONStreamerException classes are difficult to debug because there is no expectation that a str will be returned. This makes debugging a PITA.

    awesome_module.py", line 51, in map_step
        url + '\n' + str(e))
    TypeError: __str__ returned non-string (type bytes)
    
    opened by mach-kernel 3
  • Missing tests & tags

    Missing tests & tags

    PyPI has 1.3.6 , and no tests.

    GitHub only has a tag for v1.0.0 , so I cant use that.

    Could you tag v1.3.6 in GitHub, so I can use it to get tests, and finish https://build.opensuse.org/package/show/home:jayvdb:py-new/python-jsonstreamer after https://github.com/kashifrazzaqui/again/issues/8 is also fixed.

    opened by jayvdb 2
  • SyntaxError: invalid syntax

    SyntaxError: invalid syntax

    Traceback (most recent call last): File "test_jsonstreamer.py", line 3, in from jsonstreamer import JSONStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/init.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/jsonstreamer.py", line 12, in from again import events File "/usr/local/lib/python2.7/dist-packages/again/init.py", line 4, in from .events import EventSource, AsyncEventSource File "/usr/local/lib/python2.7/dist-packages/again/events.py", line 49 yield from each(*args, **kwargs) ^ SyntaxError: invalid syntax python --version Python 2.7.3

    opened by tuhaolam 2
  • Want to split a 22M JSON file into smaller files to track a problem

    Want to split a 22M JSON file into smaller files to track a problem

    I have a large JSON file that has an error somewhere. I want to split the up the JSON file into smaller files that are also JSON so that I can find out where the error is. Possible with your package ?

    opened by winash12 1
  • Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Windows 10 system and installed it as below:

    C:\Users\mianand\Downloads\lloyd-yajl-2.1.0-0-ga0ecdde\lloyd-yajl-66cb08c\build>nmake install

    Microsoft (R) Program Maintenance Utility Version 14.00.24210.0 Copyright (C) Microsoft Corporation. All rights reserved.

    [ 30%] Built target yajl_s [ 60%] Built target yajl [ 66%] Built target yajl_test [ 72%] Built target gen-extra-close [ 78%] Built target json_reformat [ 84%] Built target json_verify [ 90%] Built target parse_config [100%] Built target perftest Install the project... -- Install configuration: "Release" -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.dll -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl_s.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_parse.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_gen.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_common.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_tree.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_version.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/share/pkgconfig/yajl.pc -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_reformat.exe -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_verify.exe

    Still, on running the conda with python 3.6 gives me the following :

    from jsonstreamer import JSONStreamer Traceback (most recent call last): File "", line 1, in File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer_init_.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\jsonstreamer.py", line 14, in from .yajl.parse import YajlParser, YajlListener, YajlError File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 32, in yajl = load_lib() File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 29, in load_lib raise OSError('Yajl cannot be found.') OSError: Yajl cannot be found.

    Any pointers on this one ? Help appreciated.

    opened by mitendraanand 1
  • Not looking for yajl.dll when loading Yajl

    Not looking for yajl.dll when loading Yajl

    In the method load_lib(), there is never an attempt to load Yajl from yajl.dll, which is the name of Yajl on windows. I think it would be rather easy to add this, and make this package useful on Windows as well.

    opened by Groomtar 1
  • pypi version ahead of master branch

    pypi version ahead of master branch

    Please update the PyPI entry of json-streamer https://pypi.python.org/pypi/jsonstreamer/1.3.6 and consider linking there from the short text description here.

    opened by johnyf 1
  • outdated pypi package

    outdated pypi package

    Hi,

    Could you update the pypi package? As far as I see, there were some commits since the last pypi upload. Also, I think it is a bit confusing that there is one tagged release, which is 1.0, while pypi package has 1.3.6 version number, but both of them almost a year older than some important fixes, e.g. the exponential floats. (I can install the file on my own, but I think it would be nice to update the releases.)

    opened by dvolgyes 0
Releases(v1.3.8)
Owner
Kashif Razzaqui
https://medium.com/@kashifrazzaqui
Kashif Razzaqui
Small python wrapper around the valico rust library to provide fast JSON schema validation.

Small python wrapper around the valico rust library to provide fast JSON schema validation.

Simon J Knibbs 5 Jul 12, 2019
cysimdjson - Very fast Python JSON parsing library

Fast JSON parsing library for Python, 7-12 times faster than standard Python JSON parser.

TeskaLabs 235 Dec 29, 2022
Python script for converting .json to .md files using Mako templates.

Install Just install poetry and update script dependencies Usage Put your settings in settings.py and .json data (optionally, with attachments) in dat

Alexey Borontov 6 Dec 07, 2021
Fileson - JSON File database tools

Fileson is a set of Python scripts to create JSON file databases

Joonas Pihlajamaa 2 Feb 02, 2022
JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files.

JSONManipulator JSONManipulator is a Python package to retrieve, add, delete, change and store objects in JSON files. Installation Use the package man

Andrew Polukhin 1 Jan 07, 2022
A tools to find the path of a specific key in deep nested JSON.

如何快速从深层嵌套 JSON 中找到特定的 Key #公众号 在爬虫开发的过程中,我们经常遇到一些 Ajax 加载的接口会返回 JSON 数据。

kingname 56 Dec 13, 2022
Json GUI for No Man's Sky save file

NMS-Save-Parser Json GUI for No Man's Sky save file GUI python NMS_SAVE_PARSER.py [optional|save.hg] converter only python convert.py usage: conver

2 Oct 19, 2022
Convert your JSON data to a valid Python object to allow accessing keys with the member access operator(.)

JSONObjectMapper Allows you to transform JSON data into an object whose members can be queried using the member access operator. Unlike json.dumps in

Owen Trump 4 Jul 20, 2022
import json files directly in your python scripts

Install Install from git repository pip install git+https://github.com/zaghaghi/direct-json-import.git Use With the following json in a file named inf

Hamed Zaghaghi 51 Dec 01, 2021
JSON for Modern C++ Release Scripts

JSON for Modern C++ Release Scripts Preparations Install required tools: make install_requirements. Add required keys to config.json (apparently not c

Niels Lohmann 4 Sep 19, 2022
Low code JSON to extract data in one line

JSON Inline Low code JSON to extract data in one line ENG RU Installation pip install json-inline Usage Rules Modificator Description ?key:value Searc

Aleksandr Sokolov 12 Mar 09, 2022
API that provides Wordle (ES) solutions in JSON format

Wordle (ES) solutions API that provides Wordle (ES) solutions in JSON format.

Álvaro García Jaén 2 Feb 10, 2022
JsonParser - Parsing the Json file by provide the node name

Json Parser This project is based on Parsing the json and dumping it to CSV via

Ananta R. Pant 3 Aug 08, 2022
A Cobalt Strike Scanner that retrieves detected Team Server beacons into a JSON object

melting-cobalt 👀 A tool to hunt/mine for Cobalt Strike beacons and "reduce" their beacon configuration for later indexing. Hunts can either be expans

Splunk GitHub 150 Nov 23, 2022
Python script to extract news from RSS feeds and save it as json.

Python script to extract news from RSS feeds and save it as json.

Alex Trbznk 14 Dec 22, 2022
Convert your subscriptions csv file into a valid json for Newpipe!

Newpipe-CSV-Fixer Convert your Google subscriptions CSV file into a valid JSON for Newpipe! Thanks to nikcorg for sharing how to convert the CSV into

Juanjo 44 Dec 29, 2022
An tiny CLI to load data from a JSON File during development.

JSON Server - An tiny CLI to load data from a JSON File during development.

Yuvraj.M 4 Mar 22, 2022
With the help of json txt you can use your txt file as a json file in a very simple way

json txt With the help of json txt you can use your txt file as a json file in a very simple way Dependencies re filemod pip install filemod Installat

Kshitij 1 Dec 14, 2022
A daily updated JSON dataset of all the Open House London venues, events, and metadata

Open House London listings data All of it. Automatically scraped hourly with updates committed to git, autogenerated per-day CSV's, and autogenerated

Jonty Wareing 4 Jan 01, 2022
simdjson : Parsing gigabytes of JSON per second

JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to

16.3k Dec 29, 2022