Geocode rows in a SQLite database table

Overview

geocode-sqlite

PyPI Changelog Tests License

Geocode rows from a SQLite table

Installation

Install this tool using pip or pipx:

# install inside a virtualenv
pip install geocode-sqlite

# install globally
pipx install geocode-sqlite

Usage

Let's say you have a spreadsheet with addresses in it, and you'd like to map those locations. First, create a SQLite database and insert rows from that spreadsheet using sqlite-utils.

sqlite-utils insert data.db data data.csv --csv

Now, geocode it using OpenStreetMap's Nominatum geocoder.

geocode-sqlite nominatum data.db data \
 --location="{address}, {city}, {state} {zip}" \
 --delay=1 \
 --user-agent="this-is-me"

In the command above, you're using Nominatum, which is free and only asks for a unique user agent (--user-agent).

This will connect to a database (data.db) and read all rows from the table data (skipping any that already have both a latitude and longitude column filled).

You're also telling the geocoder how to extract a location query (--location) from a row of data, using Python's built-in string formatting, and setting a rate limit (--delay) of one request per second.

For each row where geocoding succeeds, latitude and longitude will be populated. If you hit an error, or a rate limit, run the same query and pick up where you left off.

Under the hood, this package uses the excellent geopy library, which is stable and thoroughly road-tested. If you need help understanding a particular geocoder's options, consult geopy's documentation.

Supported Geocoders

The CLI currently supports these geocoders:

  • bing
  • googlev3
  • mapquest (and open-mapquest)
  • nominatum

More will be added soon.

Common arguments and options

Each geocoder needs to know where to find the data it's working with. These are the first two arguments:

  • database: a path to a SQLite file, which must already exist
  • table: the name of a table, in that database, which exists and has data to geocode

From there, we have a set of options passed to every geocoder:

  • location: a string format that will be expanded with each row to build a full query, to be geocoded
  • delay: a delay between each call (some services require this)
  • latitude: latitude column name
  • longitude: longitude column name

Each geocoder takes additional, specific arguments beyond these, such as API keys. Again, geopy's documentation is an excellent resource.

Python API

The command line interface aims to support the most common options for each geocoder. For more find-grained control, use the Python API.

As with the CLI, this assumes you already have a SQLite database and a table of location data.

from geocode_sqlite import geocode_table
from geopy.geocoders import Nominatum

# create a geocoder instance, with some extra options
nominatum = Nominatum(user_agent="this-is-me", domain="nominatum.local.dev", scheme="http")

# assuming our database is in the same directory
count = geocode_table("data.db", "data", query_template="{address}, {city}, {state} {zip}")

# when it's done
print(f"Geocoded {count} rows")

Any geopy geocoder can be used with the Python API.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd geocode-sqlite
python -m venv .venv
source .venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Please remember that this library is mainly glue code between other well-tested projects, specifically: click, geopy and sqlite-utils. Tests should focus on making sure those parts fit together correctly. We can assume the parts themselves already work.

To that end, there is a test geocoder included: geocode_sqlite.testing.DummyGeocoder. That geocoder works with an included dataset of In-N-Out Burger locations provided by AllThePlaces. It works like a normal GeoPy geocoder, except it will only return results for In-N-Out locations using the included database.

Comments
  • fixing

    fixing "ImportError" in the 6 lines and 7

    hi Chris Amico. I Using GNU/Linux os And in GNU/Linux, I can not run the cli.py file. But with a small change I made, this problem was solved. If you like, please merging.

    opened by EmadDeve20 4
  • Add OpenCage option to CLI

    Add OpenCage option to CLI

    I'm a contractor for OpenCage geocoder. We'd like to add an opencage option to the CLI. OpenCage is already available in geopy so I have just copied the existing geocoder code with the relevant options.

    • Added OpenCage as an option to CLI
    • Added OpenCage to README
    • Added OpenCage test to Makefile
    opened by sbscully 2
  • Better argument ergonomics

    Better argument ergonomics

    Looking at this with fresh eyes, this whole bit is dumb:

    Note the order of options: There are two sets of options we need to pass.

    The first concerns the data we're geocoding. We need to say where our database is and what table we're using, and optionally, how to extract a location query.

    Then, we need to say what geocoder we're using, and pass in any options needed to initalize it. This will be different for each geocoder we want to use.

    Ideally, it should be something like this:

    geocode-sqlite nominatum  data.db data \
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me"
    

    Maybe this would be ok:

    geocode-sqlite
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me" \
    nominatum data.db data
    

    But the first way is definitely better.

    opened by eyeseast 2
  • Spatialite helper?

    Spatialite helper?

    Would it be worth helping convert lat/lng fields to a spatial index? https://docs.datasette.io/en/stable/spatialite.html#spatial-indexing-latitude-longitude-columns

    Here's the whole code block:

    import sqlite3
    conn = sqlite3.connect('museums.db')
    # Lead the spatialite extension:
    conn.enable_load_extension(True)
    conn.load_extension('/usr/local/lib/mod_spatialite.dylib')
    # Initialize spatial metadata for this database:
    conn.execute('select InitSpatialMetadata(1)')
    # Add a geometry column called point_geom to our museums table:
    conn.execute("SELECT AddGeometryColumn('museums', 'point_geom', 4326, 'POINT', 2);")
    # Now update that geometry column with the lat/lon points
    conn.execute('''
        UPDATE events SET
        point_geom = GeomFromText('POINT('||"longitude"||' '||"latitude"||')',4326);
    ''')
    # Now add a spatial index to that column
    conn.execute('select CreateSpatialIndex("museums", "point_geom");')
    # If you don't commit your changes will not be persisted:
    conn.commit()
    conn.close()
    

    I don't know if it belongs in this package or not.

    question 
    opened by eyeseast 2
  • Use rate limiting by default?

    Use rate limiting by default?

    It's generally bad practice to throw a few hundred geocoding requests at a free service. Given that, is it worth setting a default delay of one second, and let people explicitly turn it off if needed?

    opened by eyeseast 2
  • Geocoder CLI supprt

    Geocoder CLI supprt

    Here are the geocoders currently supported by the CLI. I want to add the most common ones (which I use) first:

    • [x] Google
    • [x] MapBox
    • [x] OpenStreetMap
    • [x] MapQuest
    • [x] Bing
    • [ ] Geocode.Earth
    • [ ] Geocodio
    • [ ] Pelias
    • [ ] MapTiler

    What else?

    help wanted good first issue 
    opened by eyeseast 2
  • fix failing tests that exepect an envvar

    fix failing tests that exepect an envvar

    if MAPBOX_API_KEY is not set in the environment, test_pass_kwargs fails.

    noticed this when trying to run tests locally. the D prefix defers to values already set in the environment.

    you may not want to introduce a new dependency to your project (namely pytest-env), and I would understand that, but just thought I'd bring this to your attention with a suggested fix. take it or leave it! this is a very useful library. thanks!

    opened by noslouch 1
  • GeoJSON and SpatiaLite support

    GeoJSON and SpatiaLite support

    Closes #22 Closes #24 Closes #26

    Passing a --geojson flag will store results as a GeoJSON geometry, instead of in latitude and longitude columns.

    Using --spatialite will add a geometry column and store results as a SpatiaLite binary.

    This should make it easier to work with datasette-geojson and datasette-geojson-map.

    opened by eyeseast 1
  • Save results as geojson?

    Save results as geojson?

    Right now, results are saved in two columns: latitude and longitude.

    Now that datasette-geojson and datasette-geojson-map exist, it would be nice to have this plugin feed into those.

    Maybe it's an option: sqlite-geocode data.db table ... --geojson

    That would be backwards compatible at least.

    question 
    opened by eyeseast 1
  • Capture full geocoding results?

    Capture full geocoding results?

    opened by eyeseast 1
  • Need a progress bar

    Need a progress bar

    Especially when rate limiting, which is now the default, geocoding a table is slow. With a one-second delay, a 300 row table takes five minutes to finish, at best.

    opened by eyeseast 1
  • Consolidate code to geocode a list

    Consolidate code to geocode a list

    I do this in three places:

    • In geocode_table, which I wrote first
    • In the CLI
    • In geocode_list, which I needed for the progress bar

    This whole block is basically repeated: https://github.com/eyeseast/geocode-sqlite/blob/main/geocode_sqlite/utils.py#L50-L68. Part of the issue is that in the Python API, I'm using log.info and in the CLI I'm using click.echo. Maybe I can abstract that and pass in a print function.

    opened by eyeseast 0
  • Async support

    Async support

    Geopy has a set of tools to help with async here: https://geopy.readthedocs.io/en/latest/#async-mode

    Should definitely use those, especially when thinking about Datasette integration.

    opened by eyeseast 2
Releases(v0.8.2)
  • v0.8.2(Nov 7, 2022)

    What's Changed

    • Require requests. More forgiving timeout. by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/43

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.1...v0.8.2

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Nov 6, 2022)

    What's Changed

    • Raw results in geocode_table, too by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/42

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.0...v0.8.1

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Nov 6, 2022)

    What's Changed

    • Just a typo! by @shaver in https://github.com/eyeseast/geocode-sqlite/pull/40
    • Capture raw geocoding results by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/41

    New Contributors

    • @shaver made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/40

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.7.0...v0.8.0

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Oct 31, 2022)

    The OpenCage geocoding service is now supported. Thanks @sbscully. This library is now tested on Python 3.11.

    What's Changed

    • Add OpenCage option to CLI by @sbscully in https://github.com/eyeseast/geocode-sqlite/pull/39

    New Contributors

    • @sbscully made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/39

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.1...v0.7.0

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Mar 16, 2022)

    What's Changed

    • if no bbox options are passed, value is None by @noslouch in https://github.com/eyeseast/geocode-sqlite/pull/30
    • Pass --api-key to mapbox test by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/33

    New Contributors

    • @noslouch made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/30

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.0...v0.6.1

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 11, 2022)

    What's Changed

    • GeoJSON and SpatiaLite support by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/23

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.2...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Mar 1, 2022)

    What's Changed

    • Require click >= 7.0 to hide commands by @1-Byte in https://github.com/eyeseast/geocode-sqlite/pull/20
    • Fix error with rowid tables by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/27

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.1...v0.5.2

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 20, 2021)

  • v0.3.0(Apr 4, 2021)

  • v0.2.0(Apr 3, 2021)

    Arguments and options now follow a much more sensible order, following the example of other thing-to-sqlite tools. For example:

    geocode-sqlite nominatum geo.db innout_test \
    	--location "{full}, {city}, {state} {postcode}" \
    	--delay 1 \
    	--user-agent "geocode-sqlite"
    

    The order is now:

    1. geocode-sqlite
    2. geocoder, such as nominatum
    3. database path
    4. table name
    5. any additional options
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Sep 8, 2020)

Owner
Chris Amico
Journalist / Developer in Boston
Chris Amico
A Python framework for building geospatial web-applications

Hey there, this is Greppo... A Python framework for building geospatial web-applications. Greppo is an open-source Python framework that makes it easy

Greppo 304 Dec 27, 2022
Geographic add-ons for Django REST Framework. Maintained by the OpenWISP Project.

Geographic add-ons for Django REST Framework. Maintained by the OpenWISP Project.

OpenWISP 982 Jan 06, 2023
Computer Vision in Python

Mahotas Python Computer Vision Library Mahotas is a library of fast computer vision algorithms (all implemented in C++ for speed) operating over numpy

Luis Pedro Coelho 792 Dec 20, 2022
Client library for interfacing with USGS datasets

USGS API USGS is a python module for interfacing with the US Geological Survey's API. It provides submodules to interact with various endpoints, and c

Amit Kapadia 104 Dec 30, 2022
Xarray backend to Copernicus Sentinel-1 satellite data products

xarray-sentinel WARNING: this product is a "technology preview" / pre-Alpha Xarray backend to explore and load Copernicus Sentinel-1 satellite data pr

B-Open 191 Dec 15, 2022
Processing and interpolating spatial data with a twist of machine learning

Documentation | Documentation (dev version) | Contact | Part of the Fatiando a Terra project About Verde is a Python library for processing spatial da

Fatiando a Terra 468 Dec 20, 2022
Spatial Interpolation Toolbox is a Python-based GUI that is able to interpolate spatial data in vector format.

Spatial Interpolation Toolbox This is the home to Spatial Interpolation Toolbox, a graphical user interface (GUI) for interpolating geographic vector

Michael Ward 2 Nov 01, 2021
prettymaps - A minimal Python library to draw customized maps from OpenStreetMap data.

A small set of Python functions to draw pretty maps from OpenStreetMap data. Based on osmnx, matplotlib and shapely libraries.

Marcelo de Oliveira Rosa Prates 9k Jan 08, 2023
Record railway train route profile with GNSS tools

Train route profile recording with GNSS technology based on ARDUINO platform Project target Develop GNSS recording tools based on the ARDUINO platform

tomcom 1 Jan 01, 2022
Extract GoPro highlights and GPMF data.

Python script that parses the gpmd stream for GOPRO moov track (MP4) and extract the GPS info into a GPX (and kml) file.

Chris Auron 2 May 13, 2022
Script that allows to download data with satellite's orbit height and create CSV with their change in time.

Satellite orbit height ◾ Requirements Python = 3.8 Packages listen in reuirements.txt (run pip install -r requirements.txt) Account on Space Track ◾

Alicja Musiał 2 Jan 17, 2022
A Django application that provides country choices for use with forms, flag icons static files, and a country field for models.

Django Countries A Django application that provides country choices for use with forms, flag icons static files, and a country field for models. Insta

Chris Beaven 1.2k Jan 03, 2023
Google maps for Jupyter notebooks

gmaps gmaps is a plugin for including interactive Google maps in the IPython Notebook. Let's plot a heatmap of taxi pickups in San Francisco: import g

Pascal Bugnion 747 Dec 19, 2022
A light-weight, versatile XYZ tile server, built with Flask and Rasterio :earth_africa:

Terracotta is a pure Python tile server that runs as a WSGI app on a dedicated webserver or as a serverless app on AWS Lambda. It is built on a modern

DHI GRAS 531 Dec 28, 2022
QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios

QLUSTER is a relative orbit design tool for formation flying satellite missions and space rendezvous scenarios, that I wrote in Python 3 for my own research and visualisation. It is currently unfinis

Samuel Low 9 Aug 23, 2022
A Python tool to display geolocation information in the traceroute.

IP2Trace Python IP2Trace Python is a Python tool allowing user to get IP address information such as country, region, city, latitude, longitude, zip c

IP2Location 22 Jan 08, 2023
peartree: A library for converting transit data into a directed graph for sketch network analysis.

peartree 🍐 🌳 peartree is a library for converting GTFS feed schedules into a representative directed network graph. The tool uses Partridge to conve

Kuan Butts 183 Dec 29, 2022
Search and download Copernicus Sentinel satellite images

sentinelsat Sentinelsat makes searching, downloading and retrieving the metadata of Sentinel satellite images from the Copernicus Open Access Hub easy

837 Dec 28, 2022
A proof-of-concept jupyter extension which converts english queries into relevant python code

Text2Code for Jupyter notebook A proof-of-concept jupyter extension which converts english queries into relevant python code. Blog post with more deta

DeepKlarity 2.1k Dec 29, 2022
Creates 3D geometries from 2D vector graphics, for use in geodynamic models

geomIO - creating 3D geometries from 2D input This is the Julia and Python version of geomIO, a free open source software to generate 3D volumes and s

3 Feb 01, 2022