Specification for storing geospatial vector data (point, line, polygon) in Parquet

Last update: Dec 27, 2022

Related tags

Overview

GeoParquet

About

This repository defines how to store geospatial vector data (point, lines, polygons) in Apache Parquet, a popular columnar storage format for tabular data - see this vendor explanation for more on what that means. Our goal is to standardize how geospatial data is represented in Parquet to further geospatial interoperability among tools using Parquet today, and hopefully help push forward what's possible with 'cloud-native geospatial' workflows.

Warning: This is not (yet) a stable specification that can be relied upon. All 0.X releases are made to gather wider feedback, and we anticipate that some things may change. For now we reserve the right to make changes in backwards incompatible ways (though will try not to), see the versioning section below for more info. If you are excited about the potential please collaborate with us by building implementations, sounding in on the issues and contributing PR's!

Early contributors include developers from GeoPandas, GeoTrellis, OpenLayers, Vis.gl, Voltron Data, Microsoft, Carto, Azavea, Planet & Unfolded. Anyone is welcome to join us, by building implementations, trying it out, giving feedback through issues and contributing to the spec via pull requests. Initial work started in the geo-arrow-spec GeoPandas repository, and that will continue on Arrow work in a compatible way, with this specification focused solely on Parquet.

Goals

There are a few core goals driving the initial development.

Establish a great geospatial format for workflows that excel with columnar data - Most data science and 'business intelligence' workflows have been moving towards columnar data, but current geospatial formats can not be as efficiently loaded as other data. So we aim to bring geospatial data best practices to one of the most popular formats, and hopefully establish a good pattern for how to do so.
Introduce columnar data formats to the geospatial world - And most of the geospatial world is not yet benefitting from all the breakthroughs in data analysis in the broader IT world, so we are excited to enable interesting geospatial analysis with a wider range of tools.
Enable interoperability among cloud data warehouses - BigQuery, Snowflake, Redshift and others all support spatial operations but importing and exporting data with existing formats can be problematic. All support and often recommend Parquet, so defining a solid GeoParquet can help enable interoperability.
Persist geospatial data from Apache Arrow - GeoParquet is developed in parallel with a GeoArrow spec, to enable cross-language in-memory analytics of geospatial information with Arrow. Parquet is already well-supported by Arrow as the key on disk persistance format.

And our broader goal is to innovate with 'cloud-native vector' providing a stable base to try out new ideas for cloud-native & streaming workflows.

Features

A quick overview of what geoparquet supports (or at least plans to support).

Multiple spatial reference systems - Many tools will use GeoParquet for high-performance analysis, so it's important to be able to use data in its native projection. But we do provide a clear default recommendation to better enable interoperability, giving a clear target for implementations that don't want to worry about projections.
Multiple geometry columns - There is a default geometry column, but additional geometry columns can be included.
Great compression / small files - Parquet is designed to compress very well, so data benefits by taking up less disk space & being more efficient over the network.
Work with both planar and spherical coordinates - Most cloud data warehouses support spherical coordinates, and so GeoParquet aims to help persist those and be clear about what is supported.
Great at read-heavy analytic workflows - Columnar formats enable cheap reading of a subset of columns, and Parquet in particular enables efficient filtering of chunks based on column statistics, so the format will perform well in a variety of modern analytic workflows.
Support for data partitioning - Parquet has a nice ability to partition data into different files for efficiency, and we aim to enable geospatial partitions.
Enable spatial indices - To enable top performance a spatial index is essential. This will be the focus of a future release.

It should be noted what GeoParquet is less good for. The biggest one is that it is not a good choice for write-heavy interactions. A row-based format will work much better if it is backing a system that is constantly updating the data and adding new data.

Roadmap

Our aim is to get to a 1.0.0 within 'months', not years. The rough plan is:

0.1 - Get the basics established, provide a target for implementations to start building against.
0.2 / 0.3 - Feedback from implementations, 3D coordinates support, geometry types, crs optional.
0.x - Several iterations based on feedback from implementations, spatial index best practices.
1.0.0-RC.1 - Aim for this when there are at least 6 implementations that all work interoperably and all feel good about the spec.
1.0.0 - Once there are 12(?) implementations in diverse languages we will lock in for 1.0

Our detailed roadmap is in the Milestones and we'll aim to keep it up to date.

Versioning

After we reach version 1.0 we will follow SemVer, so at that point any breaking change will require the spec to go to 2.0.0. Currently implementors should expect breaking changes, though at some point, hopefully relatively soon (0.4?), we will declare that we don't think there will be any more potential breaking changes. Though the full commitment to that won't be made until 1.0.0.

Current Implementations & Examples

Examples of geoparquet files following the current spec can be found in the examples/ folder. There is also a larger sample dataset nz-building-outlines.parquet available on Google Cloud Storage.

Currently known libraries that can read and write GeoParquet files:

GeoPandas (Python)
sfarrow (R)
GDAL/OGR (C++, bindings in several languages)

Comments

Define polygon orientation rules
I think the standard should define polygon orientation.

1. Spherical edges case

With spherical edges on sphere, there is an ambiguity in polygon definition, if the system allows polygons larger than hemisphere.

A sequence of vertices that define a polygon boundary can define either polygon to the left of that line, or to the right of the line. E.g. global coastal line can define either continents or oceans. Systems that support polygons larger than hemisphere usually use orientation rule to solve this ambiguity. E.g. MS SQL, Google BigQuery interpret the side to the left of the line as the content of the ring.

2. Planar edges case

Planar case does not have such ambiguity, but it is still good idea to have specific rule.

E.g. GeoJson RFC defines a rule consistent with the rule above:

o Polygon rings MUST follow the right-hand rule for orientation (counterclockwise external rings, clockwise internal rings).
opened by mentin 35
Script to write nz-building-outlines to geoparquet 0.4.0
@cholmes asked me to write a script to update the nz-building-outlines file to Parquet version 0.4.0. This reads from the GeoPackage version of the data, which you can download from here (1.3GB).

Part of this is derived from the GeoPandas code here. But this additionally ~~uses pyogrio~~ (reverted because it failed to build on CI) and pygeos to try and speed things up a little bit. It's probably not a bad thing to have a Python script here because the GeoPandas release schedule is likely slower than our release schedule (at least thus far).

You can install the new dependencies with poetry install and run it with

poetry run python write_nz_building_outline.py \ --input nz-building-outlines.gpkg \ --output nz-building-outlines.parquet \ --compression SNAPPY

This takes about 5 minutes on my computer. With Snappy compression, the 1.3GB GeoPackage file became 410MB in Parquet. (375MB with ZSTD compression).

To see the CLI options you can run

poetry run python write_nz_building_outline.py --help

Closes https://github.com/opengeospatial/geoparquet/issues/42
opened by kylebarron 21

Add validator script for Python based on JSON Schema

Implements https://github.com/opengeospatial/geoparquet/issues/23.

This PR adds a draft for a basic validator using Python. It checks the metadata of a geoparquet file using JSON Schema, but it can be extended to include specific custom validations.

Example

Try to validate this wrong metadata

metadata = {
    "version": "0.x.0",
    "primary_column": "geom",
    "columns": {
        "geometry": {
            "encoding": "WKT",
            "edges": "",
            "bbox": [180, 90, 200, -90],
        },
    },
}

$ python3 validator/validate.py examples/example-wrong.parquet
Validating file...
- [ERROR] $.version: '0.x.0' does not match '^0\\.1\\.[0-9]+$'
          INFO: The version of the geoparquet metadata standard used when writing.
- [ERROR] $.columns.geometry.encoding: 'WKT' is not one of ['WKB']
          INFO: Name of the geometry encoding format. Currently only 'WKB' is supported.
- [ERROR] $.columns.geometry.edges: '' is not one of ['planar', 'spherical']
          INFO: Name of the coordinate system for the edges. Must be one of 'planar' or 'spherical'. The default value is 'planar'.
- [ERROR] $.columns.geometry.bbox[2]: 200 is greater than the maximum of 180
          INFO: The eastmost constant longitude line that bounds the rectangle (xmax).
- [ERROR] $.columns.geometry: 'crs' is a required property
- [ERROR] $.primary_column: must be in $.columns
This is an invalid GeoParquet file.

The output for a correct GeoParquet file

$ python3 validator/validate.py examples/example.parquet
Validating file...
This is a valid GeoParquet file.

opened by Jesus89 17

Correct the 'bbox' description for 3D geometries

In https://github.com/opengeospatial/geoparquet/pull/45 we explicitly allowed 3D geometries, but in that PR I forgot to update the bbox description to reflect this.

opened by jorisvandenbossche 16
Advertizing geometry field "type" in Column metadata ?
A common use case if for a geometry column to hold a single geometry type (Point, LineString, Polygon, ...) for all its records. It could be good to have an optional "type" field name under https://github.com/opengeospatial/geoparquet/blob/main/format-specs/geoparquet.md#column-metadata to capture that when it is known.

This would help for example conversion between GeoParquet and GIS formats (shapefiles, geopackage) that have typically this information in their metadata.

Values for type could be the ones accepted by GeoJSON: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection (that would be extended to CircularString, CompoundCurve, CurvePolygon, MultiCurve, MultiSurface, PolyhedralSurface, TIN if we support ISO WKB, and with Z, M or ZM suffixes for other dimensionalities)

What to do when there is mixed geometry types ?

do not set "type"

set "type": "mixed"

set "type": array of values, e.g. [ "Polygon", "MultiPolygon" ] (this one would be typical when converting from shapefiles where the polygonal type can hold both polygons and multipolygons )
opened by rouault 15
geoparquet coordinate reference system default / format / extension
There are a lot of options for how we approach coordinate reference systems.

GeoJSON only allows 4326 data. They started with more options, but then narrowed it down.

Simple Features for SQL defines an 'SRID table' where you are supposed to map number id's to crs well known text. PostGIS uses the same srid's as epsg, but oracle doesn't.

WKT doesn't include projection info, but that's seen as a weakness, and one main reason why ewkt came. I believe ewkt just assumes epsg / postgis default srid table, so it's not fully interoperable with Oracle.

STAC uses epsg, but you can set it to null and provide projjson or crs-wkt

OGC API - Features core specifies just WGS-84 (long, lat), using a URI like http://www.opengis.net/def/crs/OGC/1.3/CRS84, see crs info

And there's obviously more.

My general take is that we should have a default, and expect most things to use that. But should specify it in a way that it could be an extension in the future. So we shouldn't just say 'everything is 4326' just in the spec, but should have a field that says this field is always 4326 for the core spec, but in the future that field could have other values.

So I think we do the first version with just 4326, and then when people ask for more we can have an extension.

One thing I'm not sure about is whether we should use 'epsg' as the field. EPSG covers most projections people want, but not all. In geopackage they just create a whole srid table to then refer to, so the SRID's used are defined. Usually the full epsg database is included, but then users can add other options.

One potential option would be to follow ogc api - features and use URI's. I'm not sure how widely accepted that approach is, like if the full epsg database is already referenced online. So instead of 'epsg' as the field we'd have 'crs', and it's a string URI.
opened by cholmes 14
Spherical - orientation required, smaller-of-two-possible, or just recommended?
It looks like there are a couple options for the case where edges is spherical:

If orientation is spherical than counterclockwise orientation is required. (Or say that if it is not set then the default is counterclockwise instead of null - effectively the same, but maybe slightly better?)

If spherical and orientation is left blank then have implementations use the 'smaller-of-two-possible' rule, as used by bigquery, sqlserver.

We could also just 'recommend' its use, and not mention the smaller-of-two-possible rule. Though that seems far from ideal for me, as it doesn't tell implementations what to do if they get spherical data without it set.

Currently in main it does say use the smaller-of-two-possible rule, but it is likely poorly described, as I wrote it and was just trying to capture I don't 100% understand.

In #80 @jorisvandenbossche removed the rule of two possible thing. Which I think is totally fine. But I'd like us to make an explicit decision about it.

Originally posted by @mentin in https://github.com/opengeospatial/geoparquet/issues/46#issuecomment-1105505534
opened by cholmes 12
Add basic valid and invalid tests for the json schema

Closes https://github.com/opengeospatial/geoparquet/issues/135 (rework of https://github.com/opengeospatial/geoparquet/pull/134 to focus on json schema for now)

opened by jorisvandenbossche 11
Array of geometry types

This renames the geometry_type property to geometry_types and updates the JSON type to be an array of strings (instead of a string or array of strings). The spec language has been updated to reflect that an empty array indicates that the geometry types are not known. The schema now enforces that the items in the list are one of the expected types and allows the Z suffix ~for everything except GeometryCollection~.

I updated the example.parquet file to use geometry_types instead of geometry_type. (I followed the readme, but am struggling to run the validator, so admit I haven't yet done that.)

I bumped the schema version to 0.5.0-dev. Ideally this would happen as part of the process for releasing a tag (create release branch, update version, create tag, generate release, bump version back to X.Y.Z-dev, merge branch).

Fixes #132. Fixes #133.

opened by tschaub 11
Feature identifiers

Has there been discussion around including an id_column or something similar in the file metadata? I think it would assist in round-tripping features from other formats if it were known which column represented the feature identifier.

It looks like GDAL has a FID layer creation option. But I'm assuming that the information about which column was used when writing would be lost when reading from the parquet file (@rouault would need to confirm).

I grant that this doesn't feel "geo" specific, and there may be existing conventions in Parquet that would be appropriate.

opened by tschaub 11
How to deal with dynamic CRS or CRS with ensemble datums (such as EPSG:4326)?
From https://github.com/opengeospatial/geoparquet/pull/25#issuecomment-1059016020. The spec has a required crs field that stores a WKT2:2019 string representation of the Coordinate Reference System.

We currently recommend using EPSG:4326 for the widest interoperability of the written files. However, this is a dynamic CRS, and in addition uses an ensemble datum. See https://gdal.org/user/coordinate_epoch.html for some context. In summary, when using coordinates with a dynamic CRS, you also need to know the point in time of the observation to know the exact location.

Some discussion topics related to this:

How do we deal with a dynamic CRS? We should probably give the option to include the "coordinate epoch" in the metadata (the point in time at which the coordinate is valid)

This coordinate epoch is not part of the actual CRS definition, so I think the most straightforward option is to specify an additional (optional) "epoch" field in the column metadata (next to "crs") that holds the epoch value as a decimal year (eg 2021.3).

This means we would only support a constant epoch per file. This is in line with the initial support for other formats in GDAL, and we can always later think about expanding this (eg additional field in the parquet file that has a epoch per geometry, or per vertex)
opened by jorisvandenbossche 11
value not present vs null

From experience in STAC and openEO, it seems some implementations/programming languages have a hard time to distinguish between "not present" and "null". Specifying different meanings for null and not present as for crs (unknown / CRS:84) might be a bad idea. Therefore, I'm putting out for discussion whether it's a good idea to do this and maybe instead use "unknown" as string or so instead of null?

opened by m-mohr 1
Add test data covering various options in the spec

Related to https://github.com/opengeospatial/geoparquet/issues/123

This is incomplete (more parameters should be added) and still draft (the script should be cleaned-up, ensure to add this to the CI to check the generated files, validate with json schema, etc). But wanted to open a PR already to see where and how we want to go with this.

This is a script that writes a bunch of .parquet files, and then also saves the metadata as a separate json file (extracted from the .parquet files using the existing script scripts/update_example_schemas.py).

opened by jorisvandenbossche 0
Validator improvements
For 1.0.0 we should have a validator that:

Tests not just the metadata but looks at the data itself to make sure it matches the metadata

Is user-friendly, not requiring python. Ideally a web-page and/or an easily installable binary.

This could be building on the current validator in this repo, or could be a new project we reference, but we want to be sure something exists, so putting this issue in to track it.
opened by cholmes 4
Example data to test implementations

One idea that @jorisvandenbossche suggested is that we should have a set of data that shows the range of the specification, that implementors can use to make sure they're handling right, and which could be the basis of 'integration' testing.

This would include geoparquet files that have a variety of projections, geometry types (including multiple geometry types as in #119), plus things like multiple geometry columns, different edges values, etc. It could also be good to have a set of 'bad' files that can also be tested against.

opened by cholmes 0
Consider externalizability of metadata

When [Geo]Parquet files/sources are used within systems that treat them as tables (like Spark, Trino/Presto, Athena, etc.), basic Parquet metadata is tracked in a "catalog" (e.g., a Hive-compatible catalog like AWS Glue Catalog). The engine being used for querying uses metadata to limit the parts of files (and files themselves) that are scanned, but they only expose the columnar content that's present, not the metadata. In some cases, metadata can be queried from the catalogs (e.g., from Athena, but the catalogs need additional work to support the metadata specified by GeoParquet (and this largely hasn't been done yet).

In the meantime, I'm curious if it makes sense to take the same metadata that's contained in the footer and externalize it into an alternate file (which could be serialized as Parquet, Avro, JSON, etc.). This would allow the query engines to register the metadata as a separate "table" (query-able as a standard source vs. requiring catalog support) and surface/take advantage of "table"-level information like CRS at query-time. At the moment, the CRS of a geometry column is something that needs to be determined out of band.

This is somewhat similar to #79, in that it doesn't look at GeoParquet sources as "files" ("tables" are often backed by many files), and could be seen as another reason to (de-)duplicate data from file footers into something that covers the whole set.

/cc @jorisvandenbossche and @kylebarron, since we talked a bit about this at FOSS4G.

opened by mojodna 4

Releases(v1.0.0-beta.1)

v1.0.0-beta.1(Dec 15, 2022)
We're getting close to the first stable GeoParquet release! We may have one or two more beta releases after gathering feedback from implementors, but we don't have any other planned breaking changes before 1.0.0. Please give it a try and create issues with your feedback.

The 1.0.0-beta.1 release includes a number of metadata changes since the previous 0.4.0 release. One breaking change is that the previous geometry_type metadata property is now named geometry_types. The value is always an array of string geometry types (instead of sometimes a single string and sometimes an array). In addition, we've clarified a number of things in the specification and tightened up the JSON schema to improve validation. See below for a complete list of changes.

What's Changed

Add GeoParquet.jl, an implementation in Julia by @evetion in https://github.com/opengeospatial/geoparquet/pull/109

add R examples and geoarrow by @yeelauren in https://github.com/opengeospatial/geoparquet/pull/108

fix lint error by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/125

Include suggestion about feature identifiers by @tschaub in https://github.com/opengeospatial/geoparquet/pull/121

Consistent spelling of GeoParquet by @tschaub in https://github.com/opengeospatial/geoparquet/pull/142

Add link to gpq by @tschaub in https://github.com/opengeospatial/geoparquet/pull/139

Correct the 'bbox' description for 3D geometries by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/88

Clarify nesting and repetition of geometry columns by @tschaub in https://github.com/opengeospatial/geoparquet/pull/138

Clarify that bbox follows column's CRS by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/143

Clarify geographic vs non-geographic bbox values by @tschaub in https://github.com/opengeospatial/geoparquet/pull/145

Array of geometry types by @tschaub in https://github.com/opengeospatial/geoparquet/pull/140

More consistent spelling and punctuation for JSON types by @tschaub in https://github.com/opengeospatial/geoparquet/pull/149

Add Apache Sedona to known libraries by @jiayuasu in https://github.com/opengeospatial/geoparquet/pull/150

schema.json: update reference to projjson schema to v0.5 by @rouault in https://github.com/opengeospatial/geoparquet/pull/151

Require minimum length of 1 for primary_geometry #129 by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/153

Clean-up spec and JSON Schema by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/131

Clean-up README by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/156

The default value of the crs field is required #152 by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/154

Require at least one column by @m-mohr in https://github.com/opengeospatial/geoparquet/pull/158

Add basic valid and invalid tests for the json schema by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/141

Refer to RFC 2119 for definition of requirement levels by @tschaub in https://github.com/opengeospatial/geoparquet/pull/160

Read version number from the schema by @tschaub in https://github.com/opengeospatial/geoparquet/pull/159

New Contributors

@evetion made their first contribution in https://github.com/opengeospatial/geoparquet/pull/109

@yeelauren made their first contribution in https://github.com/opengeospatial/geoparquet/pull/108

@tschaub made their first contribution in https://github.com/opengeospatial/geoparquet/pull/121

@jiayuasu made their first contribution in https://github.com/opengeospatial/geoparquet/pull/150

@m-mohr made their first contribution in https://github.com/opengeospatial/geoparquet/pull/153

Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.4.0...v1.0.0-beta.1
Source code(tar.gz)
Source code(zip)
geoparquet.md(14.78 KB)
schema.json(2.15 KB)
v0.4.0(May 26, 2022)
What's Changed

Allow the "crs" to be unknown ("crs": null) by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/94

Use PROJJSON instead of WKT2:2019 by @brendan-ward in https://github.com/opengeospatial/geoparquet/pull/96

Move JSON Schema definition to format-specs/ by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/93

Bump json schema version to 0.4.0 by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/104

Updates for 0.4.0 release by @cholmes in https://github.com/opengeospatial/geoparquet/pull/105

Script to write nz-building-outlines to geoparquet 0.4.0 by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/87

Example: Use total_bounds for finding bounds of GeoDataFrame by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/91

README.md: mentions GDAL as implementation by @rouault in https://github.com/opengeospatial/geoparquet/pull/100

New Contributors

@brendan-ward made their first contribution in https://github.com/opengeospatial/geoparquet/pull/96

Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.3.0...v0.4.0
Source code(tar.gz)
Source code(zip)
v0.3.0(Apr 27, 2022)
What's Changed

New orientation field to specify winding order for polygons by @felixpalmer @jorisvandenbossche and @cholmes in https://github.com/opengeospatial/geoparquet/pull/74, https://github.com/opengeospatial/geoparquet/pull/80, and https://github.com/opengeospatial/geoparquet/pull/83.

Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.2.0...v0.3.0
Source code(tar.gz)
Source code(zip)
v0.2.0(Apr 19, 2022)
What's Changed

Add Apache license by @cholmes in https://github.com/opengeospatial/geoparquet/pull/38

Expand WKB encoding to ISO WKB to support 3D geometries by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/45

CRS field is now optional (with default to OGC:CRS84) by @alasarr in https://github.com/opengeospatial/geoparquet/pull/60

Add a "geometry_type" field per column by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/51

Add "epoch" field to optionally specify the coordinate epoch for a dynamic CRS by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/49

Add section on winding order by @felixpalmer in https://github.com/opengeospatial/geoparquet/pull/59

Add validator script for Python based on JSON Schema by @Jesus89 in https://github.com/opengeospatial/geoparquet/pull/58

Script to store JSON copy of metadata next to example Parquet files by @kylebarron in https://github.com/opengeospatial/geoparquet/pull/68

Readme enhancements by @jzb in https://github.com/opengeospatial/geoparquet/pull/53

geoparquet.md: refer to OGC spec for WKB instead of ISO by @rouault in https://github.com/opengeospatial/geoparquet/pull/54

Update validator with the latest spec changes by @Jesus89 https://github.com/opengeospatial/geoparquet/pull/70

New Contributors

@cayetanobv made their first contribution in https://github.com/opengeospatial/geoparquet/pull/57

@rouault made their first contribution in https://github.com/opengeospatial/geoparquet/pull/54

@jzb made their first contribution in https://github.com/opengeospatial/geoparquet/pull/53

@felixpalmer made their first contribution in https://github.com/opengeospatial/geoparquet/pull/59

Full Changelog: https://github.com/opengeospatial/geoparquet/compare/v0.1.0...v0.2.0
Source code(tar.gz)
Source code(zip)
v0.1.0(Mar 8, 2022)
Initial Release of GeoParquet

This is our first release of the GeoParquet specification. It should provide a clear target for implementations interested in providing support for geospatial vector data with Parquet, as we iterate to a stable 1.0.0 spec.

Initial work started in the geo-arrow-spec GeoPandas repository, and that will continue on Arrow work in a compatible way, with this specification focused solely on Parquet.

What's Changed

Update geoparquet spec by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/2

Attempt to align with geoarrow spec by @cholmes in https://github.com/opengeospatial/geoparquet/pull/4

Align "geo" key in metadata and example by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/5

Clarify the Parquet FileMetadata value formatting (UTF8 string, JSON-encoded) by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/6

Clarify that WKB means "standard" WKB enconding by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/16

More explicitly mention the metadata is stored in the parquet FileMetaData by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/20

Readme enhancements by @cholmes in https://github.com/opengeospatial/geoparquet/pull/19

Optional column metadata field to store bounding box information by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/21

Clarify that additional top-level fields in the JSON metadata are allowed by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/28

CRS spec definition for version 0.1 by @alasarr in https://github.com/opengeospatial/geoparquet/pull/25

Update example parquet file by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/24

Clean up TODOs in geoparquet.md by @TomAugspurger in https://github.com/opengeospatial/geoparquet/pull/31

"edges" field spec definition for version 0.1 by @Jesus89 in https://github.com/opengeospatial/geoparquet/pull/27

Add known libraries that support GeoParquet to README by @jorisvandenbossche in https://github.com/opengeospatial/geoparquet/pull/29

Updated warning in readme by @cholmes in https://github.com/opengeospatial/geoparquet/pull/33

New Contributors

@TomAugspurger made their first contribution in https://github.com/opengeospatial/geoparquet/pull/2

@cholmes made their first contribution in https://github.com/opengeospatial/geoparquet/pull/4

@jorisvandenbossche made their first contribution in https://github.com/opengeospatial/geoparquet/pull/5

@alasarr made their first contribution in https://github.com/opengeospatial/geoparquet/pull/25

@Jesus89 made their first contribution in https://github.com/opengeospatial/geoparquet/pull/27

Full Changelog: https://github.com/opengeospatial/geoparquet/commits/v0.1.0
Source code(tar.gz)
Source code(zip)

Owner

Open Geospatial Consortium

GitHub Repository

Implementation of Trajectory classes and functions built on top of GeoPandas

MovingPandas MovingPandas implements a Trajectory class and corresponding methods based on GeoPandas. Visit movingpandas.org for details! You can run

897 Jan 01, 2023

Zora is a python program that searches for GeoLocation info for given CIDR networks , with options to search with API or without API

Zora Zora is a python program that searches for GeoLocation info for given CIDR networks , with options to search with API or without API Installing a

1 Oct 26, 2021

Earthengine-py-notebooks - A collection of 360+ Jupyter Python notebook examples for using Google Earth Engine with interactive mapping

earthengine-py-notebooks A collection of 360+ Jupyter Python notebook examples for using Google Earth Engine with interactive mapping Contact: Qiushen

1.1k Dec 29, 2022

ColoringMapAlgorithm-CSP- - Graphical Coloring of Countries with Condition Satisfaction Algorithm

ColoringMapAlgorithm-CSP- Condition Satisfaction Algorithm Output Condition

2 Jan 10, 2022

Automatización de procesos geográficos con Python sobre Esri ArcGIS y QGIS.

Algoritmos y Programación GIS con Python by [email protected] Introducció

7 Dec 28, 2022

Python interface to PROJ (cartographic projections and coordinate transformations library)

pyproj Python interface to PROJ (cartographic projections and coordinate transformations library). Documentation Stable: http://pyproj4.github.io/pypr

832 Dec 31, 2022

leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.

A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment

1.4k Jan 02, 2023

A compilation of several single-beam bathymetry surveys of the Caribbean

Caribbean - Single-beam bathymetry This dataset is a compilation of several single-beam bathymetry surveys of the Caribbean ocean displaying a wide ra

0 Jan 20, 2022

Python library to decrypt Airtag reports, as well as a InfluxDB/Grafana self-hosted dashboard example

Openhaystack-python This python daemon will allow you to gather your Openhaystack-based airtag reports and display them on a Grafana dashboard. You ca

19 Jan 03, 2023

List of Land Cover datasets in the GEE Catalog

List of Land Cover datasets in the GEE Catalog A list of all the Land Cover (or discrete) datasets in Google Earth Engine. Values, Colors and Descript

5 Aug 24, 2022

An API built to format given addresses using Python and Flask.

An API built to format given addresses using Python and Flask. About The API returns properly formatted data, i.e. removing duplicate fields, distingu

1 Feb 27, 2022

Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

5 May 18, 2022

A short term landscape evolution using a path sampling method to solve water and sediment flow continuity equations and model mass flows over complex topographies.

r.sim.terrain A short-term landscape evolution model that simulates topographic change for both steady state and dynamic flow regimes across a range o

7 Oct 21, 2022

Specification for storing geospatial vector data (point, line, polygon) in Parquet

Related tags

Overview

GeoParquet

About

Goals

Features

Roadmap

Versioning

Current Implementations & Examples

Comments

1. Spherical edges case

2. Planar edges case

Releases(v1.0.0-beta.1)

v1.0.0-beta.1(Dec 15, 2022)

What's Changed

New Contributors

v0.4.0(May 26, 2022)

What's Changed

New Contributors

v0.3.0(Apr 27, 2022)

What's Changed

v0.2.0(Apr 19, 2022)

What's Changed

New Contributors

v0.1.0(Mar 8, 2022)

Initial Release of GeoParquet

What's Changed

New Contributors

Owner

Open Geospatial Consortium

Implementation of Trajectory classes and functions built on top of GeoPandas

Zora is a python program that searches for GeoLocation info for given CIDR networks , with options to search with API or without API

Earthengine-py-notebooks - A collection of 360+ Jupyter Python notebook examples for using Google Earth Engine with interactive mapping

ColoringMapAlgorithm-CSP- - Graphical Coloring of Countries with Condition Satisfaction Algorithm

Automatización de procesos geográficos con Python sobre Esri ArcGIS y QGIS.

Python interface to PROJ (cartographic projections and coordinate transformations library)

leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.

A compilation of several single-beam bathymetry surveys of the Caribbean

Python library to decrypt Airtag reports, as well as a InfluxDB/Grafana self-hosted dashboard example

List of Land Cover datasets in the GEE Catalog

An API built to format given addresses using Python and Flask.

Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

Manipulation and analysis of geometric objects

Wraps GEOS geometry functions in numpy ufuncs.

Python module and script to interact with the Tractive GPS tracker.

This GUI app was created to show the detailed information about the weather in any city selected by user

ArcGIS Python Toolbox for WhiteboxTools

Summary statistics of geospatial raster datasets based on vector geometries.

A short term landscape evolution using a path sampling method to solve water and sediment flow continuity equations and model mass flows over complex topographies.