Geocode rows in a SQLite database table

Overview

geocode-sqlite

PyPI Changelog Tests License

Geocode rows from a SQLite table

Installation

Install this tool using pip or pipx:

# install inside a virtualenv
pip install geocode-sqlite

# install globally
pipx install geocode-sqlite

Usage

Let's say you have a spreadsheet with addresses in it, and you'd like to map those locations. First, create a SQLite database and insert rows from that spreadsheet using sqlite-utils.

sqlite-utils insert data.db data data.csv --csv

Now, geocode it using OpenStreetMap's Nominatum geocoder.

geocode-sqlite nominatum data.db data \
 --location="{address}, {city}, {state} {zip}" \
 --delay=1 \
 --user-agent="this-is-me"

In the command above, you're using Nominatum, which is free and only asks for a unique user agent (--user-agent).

This will connect to a database (data.db) and read all rows from the table data (skipping any that already have both a latitude and longitude column filled).

You're also telling the geocoder how to extract a location query (--location) from a row of data, using Python's built-in string formatting, and setting a rate limit (--delay) of one request per second.

For each row where geocoding succeeds, latitude and longitude will be populated. If you hit an error, or a rate limit, run the same query and pick up where you left off.

Under the hood, this package uses the excellent geopy library, which is stable and thoroughly road-tested. If you need help understanding a particular geocoder's options, consult geopy's documentation.

Supported Geocoders

The CLI currently supports these geocoders:

  • bing
  • googlev3
  • mapquest (and open-mapquest)
  • nominatum

More will be added soon.

Common arguments and options

Each geocoder needs to know where to find the data it's working with. These are the first two arguments:

  • database: a path to a SQLite file, which must already exist
  • table: the name of a table, in that database, which exists and has data to geocode

From there, we have a set of options passed to every geocoder:

  • location: a string format that will be expanded with each row to build a full query, to be geocoded
  • delay: a delay between each call (some services require this)
  • latitude: latitude column name
  • longitude: longitude column name

Each geocoder takes additional, specific arguments beyond these, such as API keys. Again, geopy's documentation is an excellent resource.

Python API

The command line interface aims to support the most common options for each geocoder. For more find-grained control, use the Python API.

As with the CLI, this assumes you already have a SQLite database and a table of location data.

from geocode_sqlite import geocode_table
from geopy.geocoders import Nominatum

# create a geocoder instance, with some extra options
nominatum = Nominatum(user_agent="this-is-me", domain="nominatum.local.dev", scheme="http")

# assuming our database is in the same directory
count = geocode_table("data.db", "data", query_template="{address}, {city}, {state} {zip}")

# when it's done
print(f"Geocoded {count} rows")

Any geopy geocoder can be used with the Python API.

Development

To contribute to this tool, first checkout the code. Then create a new virtual environment:

cd geocode-sqlite
python -m venv .venv
source .venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Please remember that this library is mainly glue code between other well-tested projects, specifically: click, geopy and sqlite-utils. Tests should focus on making sure those parts fit together correctly. We can assume the parts themselves already work.

To that end, there is a test geocoder included: geocode_sqlite.testing.DummyGeocoder. That geocoder works with an included dataset of In-N-Out Burger locations provided by AllThePlaces. It works like a normal GeoPy geocoder, except it will only return results for In-N-Out locations using the included database.

Comments
  • fixing

    fixing "ImportError" in the 6 lines and 7

    hi Chris Amico. I Using GNU/Linux os And in GNU/Linux, I can not run the cli.py file. But with a small change I made, this problem was solved. If you like, please merging.

    opened by EmadDeve20 4
  • Add OpenCage option to CLI

    Add OpenCage option to CLI

    I'm a contractor for OpenCage geocoder. We'd like to add an opencage option to the CLI. OpenCage is already available in geopy so I have just copied the existing geocoder code with the relevant options.

    • Added OpenCage as an option to CLI
    • Added OpenCage to README
    • Added OpenCage test to Makefile
    opened by sbscully 2
  • Better argument ergonomics

    Better argument ergonomics

    Looking at this with fresh eyes, this whole bit is dumb:

    Note the order of options: There are two sets of options we need to pass.

    The first concerns the data we're geocoding. We need to say where our database is and what table we're using, and optionally, how to extract a location query.

    Then, we need to say what geocoder we're using, and pass in any options needed to initalize it. This will be different for each geocoder we want to use.

    Ideally, it should be something like this:

    geocode-sqlite nominatum  data.db data \
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me"
    

    Maybe this would be ok:

    geocode-sqlite
     --location="{address}, {city}, {state} {zip}" \
     --delay=1 \
     --user-agent="this-is-me" \
    nominatum data.db data
    

    But the first way is definitely better.

    opened by eyeseast 2
  • Spatialite helper?

    Spatialite helper?

    Would it be worth helping convert lat/lng fields to a spatial index? https://docs.datasette.io/en/stable/spatialite.html#spatial-indexing-latitude-longitude-columns

    Here's the whole code block:

    import sqlite3
    conn = sqlite3.connect('museums.db')
    # Lead the spatialite extension:
    conn.enable_load_extension(True)
    conn.load_extension('/usr/local/lib/mod_spatialite.dylib')
    # Initialize spatial metadata for this database:
    conn.execute('select InitSpatialMetadata(1)')
    # Add a geometry column called point_geom to our museums table:
    conn.execute("SELECT AddGeometryColumn('museums', 'point_geom', 4326, 'POINT', 2);")
    # Now update that geometry column with the lat/lon points
    conn.execute('''
        UPDATE events SET
        point_geom = GeomFromText('POINT('||"longitude"||' '||"latitude"||')',4326);
    ''')
    # Now add a spatial index to that column
    conn.execute('select CreateSpatialIndex("museums", "point_geom");')
    # If you don't commit your changes will not be persisted:
    conn.commit()
    conn.close()
    

    I don't know if it belongs in this package or not.

    question 
    opened by eyeseast 2
  • Use rate limiting by default?

    Use rate limiting by default?

    It's generally bad practice to throw a few hundred geocoding requests at a free service. Given that, is it worth setting a default delay of one second, and let people explicitly turn it off if needed?

    opened by eyeseast 2
  • Geocoder CLI supprt

    Geocoder CLI supprt

    Here are the geocoders currently supported by the CLI. I want to add the most common ones (which I use) first:

    • [x] Google
    • [x] MapBox
    • [x] OpenStreetMap
    • [x] MapQuest
    • [x] Bing
    • [ ] Geocode.Earth
    • [ ] Geocodio
    • [ ] Pelias
    • [ ] MapTiler

    What else?

    help wanted good first issue 
    opened by eyeseast 2
  • fix failing tests that exepect an envvar

    fix failing tests that exepect an envvar

    if MAPBOX_API_KEY is not set in the environment, test_pass_kwargs fails.

    noticed this when trying to run tests locally. the D prefix defers to values already set in the environment.

    you may not want to introduce a new dependency to your project (namely pytest-env), and I would understand that, but just thought I'd bring this to your attention with a suggested fix. take it or leave it! this is a very useful library. thanks!

    opened by noslouch 1
  • GeoJSON and SpatiaLite support

    GeoJSON and SpatiaLite support

    Closes #22 Closes #24 Closes #26

    Passing a --geojson flag will store results as a GeoJSON geometry, instead of in latitude and longitude columns.

    Using --spatialite will add a geometry column and store results as a SpatiaLite binary.

    This should make it easier to work with datasette-geojson and datasette-geojson-map.

    opened by eyeseast 1
  • Save results as geojson?

    Save results as geojson?

    Right now, results are saved in two columns: latitude and longitude.

    Now that datasette-geojson and datasette-geojson-map exist, it would be nice to have this plugin feed into those.

    Maybe it's an option: sqlite-geocode data.db table ... --geojson

    That would be backwards compatible at least.

    question 
    opened by eyeseast 1
  • Capture full geocoding results?

    Capture full geocoding results?

    opened by eyeseast 1
  • Need a progress bar

    Need a progress bar

    Especially when rate limiting, which is now the default, geocoding a table is slow. With a one-second delay, a 300 row table takes five minutes to finish, at best.

    opened by eyeseast 1
  • Consolidate code to geocode a list

    Consolidate code to geocode a list

    I do this in three places:

    • In geocode_table, which I wrote first
    • In the CLI
    • In geocode_list, which I needed for the progress bar

    This whole block is basically repeated: https://github.com/eyeseast/geocode-sqlite/blob/main/geocode_sqlite/utils.py#L50-L68. Part of the issue is that in the Python API, I'm using log.info and in the CLI I'm using click.echo. Maybe I can abstract that and pass in a print function.

    opened by eyeseast 0
  • Async support

    Async support

    Geopy has a set of tools to help with async here: https://geopy.readthedocs.io/en/latest/#async-mode

    Should definitely use those, especially when thinking about Datasette integration.

    opened by eyeseast 2
Releases(v0.8.2)
  • v0.8.2(Nov 7, 2022)

    What's Changed

    • Require requests. More forgiving timeout. by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/43

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.1...v0.8.2

    Source code(tar.gz)
    Source code(zip)
  • v0.8.1(Nov 6, 2022)

    What's Changed

    • Raw results in geocode_table, too by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/42

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.8.0...v0.8.1

    Source code(tar.gz)
    Source code(zip)
  • v0.8.0(Nov 6, 2022)

    What's Changed

    • Just a typo! by @shaver in https://github.com/eyeseast/geocode-sqlite/pull/40
    • Capture raw geocoding results by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/41

    New Contributors

    • @shaver made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/40

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.7.0...v0.8.0

    Source code(tar.gz)
    Source code(zip)
  • v0.7.0(Oct 31, 2022)

    The OpenCage geocoding service is now supported. Thanks @sbscully. This library is now tested on Python 3.11.

    What's Changed

    • Add OpenCage option to CLI by @sbscully in https://github.com/eyeseast/geocode-sqlite/pull/39

    New Contributors

    • @sbscully made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/39

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.1...v0.7.0

    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Mar 16, 2022)

    What's Changed

    • if no bbox options are passed, value is None by @noslouch in https://github.com/eyeseast/geocode-sqlite/pull/30
    • Pass --api-key to mapbox test by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/33

    New Contributors

    • @noslouch made their first contribution in https://github.com/eyeseast/geocode-sqlite/pull/30

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.6.0...v0.6.1

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(Mar 11, 2022)

    What's Changed

    • GeoJSON and SpatiaLite support by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/23

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.2...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.2(Mar 1, 2022)

    What's Changed

    • Require click >= 7.0 to hide commands by @1-Byte in https://github.com/eyeseast/geocode-sqlite/pull/20
    • Fix error with rowid tables by @eyeseast in https://github.com/eyeseast/geocode-sqlite/pull/27

    Full Changelog: https://github.com/eyeseast/geocode-sqlite/compare/v0.5.1...v0.5.2

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Apr 20, 2021)

  • v0.3.0(Apr 4, 2021)

  • v0.2.0(Apr 3, 2021)

    Arguments and options now follow a much more sensible order, following the example of other thing-to-sqlite tools. For example:

    geocode-sqlite nominatum geo.db innout_test \
    	--location "{full}, {city}, {state} {postcode}" \
    	--delay 1 \
    	--user-agent "geocode-sqlite"
    

    The order is now:

    1. geocode-sqlite
    2. geocoder, such as nominatum
    3. database path
    4. table name
    5. any additional options
    Source code(tar.gz)
    Source code(zip)
  • 0.1.2(Sep 8, 2020)

Owner
Chris Amico
Journalist / Developer in Boston
Chris Amico
Program that shows all the details of the given IP address. Build with Python and ipinfo.io API

ip-details This is a program that shows all the details of the given IP address. Build with Python and ipinfo.io API Usage To use this program, run th

4 Mar 01, 2022
Manipulation and analysis of geometric objects

Shapely Manipulation and analysis of geometric objects in the Cartesian plane. Shapely is a BSD-licensed Python package for manipulation and analysis

3.1k Jan 03, 2023
Starlite-tile38 - Showcase using Tile38 via pyle38 in a Starlite application

Starlite-Tile38 Showcase using Tile38 via pyle38 in a Starlite application. Repo

Ben 8 Aug 07, 2022
A modern, geometric typeface by @chrismsimpson (last commit @ 85fa625 Jun 9, 2020 before deletion)

Metropolis A modern, geometric typeface. Influenced by other popular geometric, minimalist sans-serif typefaces of the new millenium. Designed for opt

Darius 183 Dec 25, 2022
A Jupyter - Leaflet.js bridge

ipyleaflet A Jupyter / Leaflet bridge enabling interactive maps in the Jupyter notebook. Usage Selecting a basemap for a leaflet map: Loading a geojso

Jupyter Widgets 1.3k Dec 27, 2022
Daily social mapping project in November 2021. Maps made using PyGMT whenever possible.

Daily social mapping project in November 2021. Maps made using PyGMT whenever possible.

Wei Ji 20 Nov 24, 2022
Histogram matching plugin for rasterio

rio-hist Histogram matching plugin for rasterio. Provides a CLI and python module for adjusting colors based on histogram matching in a variety of col

Mapbox 75 Sep 23, 2022
Digital Earth Australia notebooks and tools repository

Repository for Digital Earth Australia Jupyter Notebooks: tools and workflows for geospatial analysis with Open Data Cube and xarray

Geoscience Australia 335 Dec 24, 2022
Python renderer for OpenStreetMap with custom icons intended to display as many map features as possible

Map Machine project consists of Python OpenStreetMap renderer: SVG map generation, SVG and PNG tile generation, Röntgen icon set: unique CC-BY 4.0 map

Sergey Vartanov 0 Dec 18, 2022
WhiteboxTools Python Frontend

whitebox-python Important Note This repository is related to the WhiteboxTools Python Frontend only. You can report issues to this repo if you have pr

Qiusheng Wu 304 Dec 15, 2022
A multi-page streamlit app for the geospatial community.

A multi-page streamlit app for the geospatial community.

Qiusheng Wu 522 Dec 30, 2022
Streamlit Component for rendering Folium maps

streamlit-folium This Streamlit Component is a work-in-progress to determine what functionality is desirable for a Folium and Streamlit integration. C

Randy Zwitch 224 Dec 30, 2022
Fiona reads and writes geographic data files

Fiona Fiona reads and writes geographic data files and thereby helps Python programmers integrate geographic information systems with other computer s

987 Jan 04, 2023
Search and download Copernicus Sentinel satellite images

sentinelsat Sentinelsat makes searching, downloading and retrieving the metadata of Sentinel satellite images from the Copernicus Open Access Hub easy

837 Dec 28, 2022
Using SQLAlchemy with spatial databases

GeoAlchemy GIS Support for SQLAlchemy. Introduction GeoAlchemy is an extension of SQLAlchemy. It provides support for Geospatial data types at the ORM

109 Dec 01, 2022
pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef enu eci

Python 3-D coordinate conversions Pure Python (no prerequistes beyond Python itself) 3-D geographic coordinate conversions and geodesy. API similar to

Geospace code 292 Dec 29, 2022
Python tools for geographic data

GeoPandas Python tools for geographic data Introduction GeoPandas is a project to add support for geographic data to pandas objects. It currently impl

GeoPandas 3.5k Jan 03, 2023
A Python framework for building geospatial web-applications

Hey there, this is Greppo... A Python framework for building geospatial web-applications. Greppo is an open-source Python framework that makes it easy

Greppo 304 Dec 27, 2022
Python interface to PROJ (cartographic projections and coordinate transformations library)

pyproj Python interface to PROJ (cartographic projections and coordinate transformations library). Documentation Stable: http://pyproj4.github.io/pypr

832 Dec 31, 2022
Implementation of Trajectory classes and functions built on top of GeoPandas

MovingPandas MovingPandas implements a Trajectory class and corresponding methods based on GeoPandas. Visit movingpandas.org for details! You can run

Anita Graser 897 Jan 01, 2023