Python client for the Socrata Open Data API

Overview

PyPI version Build Status Code Coverage

sodapy

sodapy is a python client for the Socrata Open Data API.

Installation

You can install with pip install sodapy.

If you want to install from source, then clone this repository and run python setup.py install from the project root.

Requirements

At its core, this library depends heavily on the Requests package. All other requirements can be found in requirements.txt. sodapy is currently compatible with Python 3.5, 3.6, 3.7 and 3.8.

Documentation

The official Socrata Open Data API docs provide thorough documentation of the available methods, as well as other client libraries. A quick list of eligible domains to use with this API is available via the Socrata Discovery API or Socrata's Open Data Network.

This library supports writing directly to datasets with the Socrata Open Data API. For write operations that use data transformations in the Socrata Data Management Experience (the user interface for creating datasets), use the Socrata Data Management API. For more details on when to use SODA vs the Data Management API, see the Data Management API documentation. A Python SDK for the Socrata Data Management API can be found at socrata-py.

Examples

There are some jupyter notebooks in the examples directory with usage examples of sodapy in action.

Interface

Table of Contents

client

Import the library and set up a connection to get started.

>>> from sodapy import Socrata
>>> client = Socrata(
        "sandbox.demo.socrata.com",
        "FakeAppToken",
        username="[email protected]",
        password="mypassword",
        timeout=10
    )

username and password are only required for creating or modifying data. An application token isn't strictly required (can be None), but queries executed from a client without an application token will be subjected to strict throttling limits. You may want to increase the timeout seconds when making large requests. To create a bare-bones client:

>>> client = Socrata("sandbox.demo.socrata.com", None)

A client can also be created with a context manager to obviate the need for teardown:

>>> with Socrata("sandbox.demo.socrata.com", None) as client:
>>>    # do some stuff

The client, by default, makes requests over HTTPS. To modify this behavior, or to make requests through a proxy, take a look here.

datasets(limit=0, offset=0)

Retrieve datasets associated with a particular domain. The optional limit and offset keyword args can be used to retrieve a subset of the datasets. By default, all datasets are returned.

>>> client.datasets()
[{"resource" : {"name" : "Approved Building Permits", "id" : "msk6-43c6", "parent_fxf" : null, "description" : "Data of approved building/construction permits",...}, {resource : {...}}, ...]

get(dataset_identifier, content_type="json", **kwargs)

Retrieve data from the requested resources. Filter and query data by field name, id, or using SoQL keywords.

>>> client.get("nimj-3ivp", limit=2)
[{u'geolocation': {u'latitude': u'41.1085', u'needs_recoding': False, u'longitude': u'-117.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Nevada', u'occurred_at': u'2012-09-14T22:38:01', u'number_of_stations': u'15', u'depth': u'7.60', u'magnitude': u'2.7', u'earthquake_id': u'00388610'}, {...}]

>>> client.get("nimj-3ivp", where="depth > 300", order="magnitude DESC", exclude_system_fields=False)
[{u'geolocation': {u'latitude': u'-15.563', u'needs_recoding': False, u'longitude': u'-175.6104'}, u'version': u'9', u':updated_at': 1348778988, u'number_of_stations': u'275', u'region': u'Tonga', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T21:16:43', u':id': 132, u'source': u'us', u'depth': u'328.30', u'magnitude': u'4.8', u':meta': u'{\n}', u':updated_meta': u'21484', u'earthquake_id': u'c000cnb5', u':created_at': 1348778988}, {...}]

>>> client.get("nimj-3ivp/193", exclude_system_fields=False)
{u'geolocation': {u'latitude': u'21.6711', u'needs_recoding': False, u'longitude': u'142.9236'}, u'version': u'C', u':updated_at': 1348778988, u'number_of_stations': u'136', u'region': u'Mariana Islands region', u':created_meta': u'21484', u'occurred_at': u'2012-09-13T11:19:07', u':id': 193, u'source': u'us', u'depth': u'300.70', u'magnitude': u'4.4', u':meta': u'{\n}', u':updated_meta': u'21484', u':position': 193, u'earthquake_id': u'c000cmsq', u':created_at': 1348778988}

>>> client.get("nimj-3ivp", region="Kansas")
[{u'geolocation': {u'latitude': u'38.10', u'needs_recoding': False, u'longitude': u'-100.6135'}, u'version': u'9', u'source': u'nn', u'region': u'Kansas', u'occurred_at': u'2010-09-19T20:52:09', u'number_of_stations': u'15', u'depth': u'300.0', u'magnitude': u'1.9', u'earthquake_id': u'00189621'}, {...}]

get_all(dataset_identifier, content_type="json", **kwargs)

Read data from the requested resource, paginating over all results. Accepts the same arguments as get(). Returns a generator.

>>> client.get_all("nimj-3ivp")
<generator object Socrata.get_all at 0x7fa0dc8be7b0>

>>> for item in client.get_all("nimj-3ivp"):
...     print(item)
...
{'geolocation': {'latitude': '-15.563', 'needs_recoding': False, 'longitude': '-175.6104'}, 'version': '9', ':updated_at': 1348778988, 'number_of_stations': '275', 'region': 'Tonga', ':created_meta': '21484', 'occurred_at': '2012-09-13T21:16:43', ':id': 132, 'source': 'us', 'depth': '328.30', 'magnitude': '4.8', ':meta': '{\n}', ':updated_meta': '21484', 'earthquake_id': 'c000cnb5', ':created_at': 1348778988}
...

>>> import itertools
>>> items = client.get_all("nimj-3ivp")
>>> first_five = list(itertools.islice(items, 5))
>>> len(first_five)
5

get_metadata(dataset_identifier, content_type="json")

Retrieve the metadata associated with a particular dataset.

>>> client.get_metadata("nimj-3ivp")
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "http://foo.bar.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

update_metadata(dataset_identifier, update_fields, content_type="json")

Update the metadata for a particular dataset. update_fields should be a dictionary containing only the metadata keys that you wish to overwrite.

Note: Invalid payloads to this method could corrupt the dataset or visualization. See this comment for more information.

>>> client.update_metadata("nimj-3ivp", {"attributionLink": "https://anothertest.com"})
{"newBackend": false, "licenseId": "CC0_10", "publicationDate": 1436655117, "viewLastModified": 1451289003, "owner": {"roleName": "administrator", "rights": [], "displayName": "Brett", "id": "cdqe-xcn5", "screenName": "Brett"}, "query": {}, "id": "songs", "createdAt": 1398014181, "category": "Public Safety", "publicationAppendEnabled": true, "publicationStage": "published", "rowsUpdatedBy": "cdqe-xcn5", "publicationGroup": 1552205, "displayType": "table", "state": "normal", "attributionLink": "https://anothertest.com", "tableId": 3523378, "columns": [], "metadata": {"rdfSubject": "0", "renderTypeConfig": {"visible": {"table": true}}, "availableDisplayTypes": ["table", "fatrow", "page"], "attachments": ... }}

download_attachments(dataset_identifier, content_type="json", download_dir="~/sodapy_downloads")

Download all attachments associated with a dataset. Return a list of paths to the downloaded files.

>>> client.download_attachments("nimj-3ivp", download_dir="~/Desktop")
    ['/Users/xmunoz/Desktop/nimj-3ivp/FireIncident_Codes.PDF', '/Users/xmunoz/Desktop/nimj-3ivp/AccidentReport.jpg']

create(name, **kwargs)

Create a new dataset. Optionally, specify keyword args such as:

  • description description of the dataset
  • columns list of fields
  • category dataset category (must exist in /admin/metadata)
  • tags list of tag strings
  • row_identifier field name of primary key
  • new_backend whether to create the dataset in the new backend

Example usage:

>>> columns = [{"fieldName": "delegation", "name": "Delegation", "dataTypeName": "text"}, {"fieldName": "members", "name": "Members", "dataTypeName": "number"}]
>>> tags = ["politics", "geography"]
>>> client.create("Delegates", description="List of delegates", columns=columns, row_identifier="delegation", tags=tags, category="Transparency")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

publish(dataset_identifier, content_type="json")

Publish a dataset after creating it, i.e. take it out of 'working copy' mode. The dataset id id returned from create will be used to publish.

>>> client.publish("2frc-hyvj")
{u'id': u'2frc-hyvj', u'name': u'Foo Bar', u'description': u'test dataset', u'publicationStage': u'unpublished', u'columns': [ { u'name': u'Foo', u'dataTypeName': u'text', u'fieldName': u'foo', ... }, { u'name': u'Bar', u'dataTypeName': u'number', u'fieldName': u'bar', ... } ], u'metadata': { u'rowIdentifier': 230641051 }, ... }

set_permission(dataset_identifier, permission="private", content_type="json")

Set the permissions of a dataset to public or private.

>>> client.set_permission("2frc-hyvj", "public")
<Response [200]>

upsert(dataset_identifier, payload, content_type="json")

Create a new row in an existing dataset.

>>> data = [{'Delegation': 'AJU', 'Name': 'Alaska', 'Key': 'AL', 'Entity': 'Juneau'}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 1, u'By RowIdentifier': 0}

Update/Delete rows in a dataset.

>>> data = [{'Delegation': 'sfa', ':id': 8, 'Name': 'bar', 'Key': 'doo', 'Entity': 'dsfsd'}, {':id': 7, ':deleted': True}]
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 1, u'Rows Updated': 1, u'By SID': 2, u'Rows Created': 0, u'By RowIdentifier': 0}

upsert's can even be performed with a csv file.

>>> data = open("upsert_test.csv")
>>> client.upsert("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 1, u'By SID': 1, u'Rows Created': 0, u'By RowIdentifier': 0}

replace(dataset_identifier, payload, content_type="json")

Similar in usage to upsert, but overwrites existing data.

>>> data = open("replace_test.csv")
>>> client.replace("eb9n-hr43", data)
{u'Errors': 0, u'Rows Deleted': 0, u'Rows Updated': 0, u'By SID': 0, u'Rows Created': 12, u'By RowIdentifier': 0}

create_non_data_file(params, file_obj)

Creates a new file-based dataset with the name provided in the files tuple. A valid file input would be:

files = (
    {'file': ("gtfs2", open('myfile.zip', 'rb'))}
)
>>> with open(nondatafile_path, 'rb') as f:
>>>     files = (
>>>         {'file': ("nondatafile.zip", f)}
>>>     )
>>>     response = client.create_non_data_file(params, files)

replace_non_data_file(dataset_identifier, params, file_obj)

Same as create_non_data_file, but replaces a file that already exists in a file-based dataset.

Note: a table-based dataset cannot be replaced by a file-based dataset. Use create_non_data_file in order to replace.

>>>  with open(nondatafile_path, 'rb') as f:
>>>      files = (
>>>          {'file': ("nondatafile.zip", f)}
>>>      )
>>>      response = client.replace_non_data_file(DATASET_IDENTIFIER, {}, files)

delete(dataset_identifier, row_id=None, content_type="json")

Delete an individual row.

>>> client.delete("nimj-3ivp", row_id=2)
<Response [200]>

Delete the entire dataset.

>>> client.delete("nimj-3ivp")
<Response [200]>

close()

Close the session when you're finished.

>>> client.close()

Run tests

$ pytest

Contributing

See CONTRIBUTING.md.

Meta

This package uses semantic versioning.

Source and wheel distributions are available on PyPI. Here is how I create those releases.

python3 setup.py bdist_wheel
python3 setup.py sdist
twine upload dist/*
Owner
Cristina
ACAB
Cristina
A Slack bot for playing Texas Hold 'Em where the currency is various workout tasks e.g. pushups

A Slack app/bot for playing Texas Hold 'Em where the currency is various workout tasks e.g. pushups. The intent is to make the workday more fun & active for remote teams.

Kyle McIntyre 3 Sep 19, 2022
A wrapper for The Movie Database API v3 and v4 that only uses the read access token (not api key).

fulltmdb A wrapper for The Movie Database API v3 and v4 that only uses the read access token (not api key). Installation Use the package manager pip t

Jacob Hale 2 Sep 26, 2021
Maestral is an open-source Dropbox client written in Python.

Maestral - A light-weight and open-source Dropbox client for macOS and Linux

2.6k Jan 03, 2023
Takes upcoming items from a Google Calendar and posts them to Slack.

Google Calendar to Slack by Jason Snell - [email protected] This Python s

6 Aug 21, 2022
Code to help me strengthen my bot army

discord-bot-manager an api to help you manage your other bots auth lazy: using the browser dev tools, capture a post call and view the Authorization h

Riley Snyder 2 Mar 18, 2022
Telegram bot to download almost all from Instagram

Instagram Manager Bot The most advanced Instagram Downloader Bot. Please fork this repository don't import code Made with Python3 (C) @subinps Copyrig

SUBIN 300 Dec 30, 2022
Program that automates the bump of the Disboard Bot. Done 100% in Python with PyAutoGUI library

Auto-Discord-Bump Program that automates the bump of the Disboard Bot done 100% in python with PyAutoGUI How to configue You will need 3 things before

Mateus 1 Dec 19, 2021
Scrapes an instagram user's photos and videos

Instagram Scraper instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos.

7.3k Nov 18, 2022
LHXP・Official "LH - Cyber Security" Discord Leveling-Bot

LHXP・Official "LH - Cyber Security" Discord Leveling-Bot Based on nsde/NOVΛLIX Feature Overview /clear @user Requires admin permission Purges all XP

Felix・onlix 2 Mar 09, 2022
A simple API wrapper for the Tenor API

Gifpy A simple API wrapper for the Tenor API Installation Python 3.9 or higher is recommended python3 -m pip install gifpy Clone repository: $ git cl

Juan Ignacio Battiston 4 Dec 22, 2021
One of Best renamer bot with python

🌀 One of Best renamer bot repo Please Give a ☆ if You like This Open Source and Don't Forget to Follow Me On Github For More Repos And Codes. Scrappe

1 Dec 14, 2021
Python API to interact with Uwazi

Python Uwazi API Quick Start To use the API install the requirements pip3 install -r requirements.txt and use it like this: uwazi_adapter = UwaziAdap

HURIDOCS 2 Dec 16, 2021
Zero2 Discord bot is written with Discord.py using Python.

Zero2 Discord bot is written with Discord.py using Python.

Siva Avanish 4 Nov 08, 2021
AWS SQS event redrive Lambda With Python

AWS SQS event redrive Lambda This repository contains one simple AWS Lambda function in Python to redrive AWS SQS events from source queue to destinat

1 Oct 19, 2021
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

1 Oct 28, 2021
Python client for Arista eAPI

Arista eAPI Python Library The Python library for Arista's eAPI command API implementation provides a client API work using eAPI and communicating wit

Arista Networks EOS+ 124 Nov 23, 2022
Roaster - this gui app + program bundle roasts.

Roaster - this gui app + program bundle roasts.

Harsh ADV) 1 Jan 04, 2022
PaddleOCR推理的pytorch实现和模型转换

PaddleOCR2Pytorch 简介 ”真·白嫖“PaddleOCR 注意 PytorchOCR由PaddleOCR-2.0rc1+动态图版本移植。 特性 高质量推理模型,准确的识别效果 超轻量ptocr_mobile移动端系列 通用ptocr_server系列 支持中英文数字组合识别、竖排文本

519 Jan 08, 2023
Updater for PGCG (Paradox Game Converters Group) converters written in Python.

Updater Updater for PGCG (Paradox Game Converters Group) converters written in Python. Needs to be put inside an "Updater" directory in the root conve

Paradox Game Converters 2 Jan 10, 2022
Bitbucket Server API Wrapper

A simple wrapper for the Atlassian's Bitbucket Server / Bitbucket Datacenter (formerly Stash) REST API, written in Python.

Schweitzer Engineering Laboratories 4 Jan 06, 2023