Summary statistics of geospatial raster datasets based on vector geometries.

Overview

rasterstats

BuildStatus

rasterstats is a Python module for summarizing geospatial raster datasets based on vector geometries. It includes functions for zonal statistics and interpolated point queries. The command-line interface allows for easy interoperability with other GeoJSON tools.

Documentation

For details on installation and usage, visit the documentation at http://pythonhosted.org/rasterstats.

What does it do?

Given a vector layer and a raster band, calculate the summary statistics of each vector geometry. For example, with a polygon vector layer and a digital elevation model (DEM) raster, compute the mean elevation of each polygon.

zones elevation

Command Line Quick Start

The command line interfaces to zonalstats and point_query are rio subcommands which read and write geojson features

$ fio cat polygon.shp | rio zonalstats -r elevation.tif

$ fio cat points.shp | rio pointquery -r elevation.tif

See the CLI Docs. for more detail.

Python Quick Start

For zonal statistics

>>> from rasterstats import zonal_stats
>>> stats = zonal_stats("tests/data/polygons.shp", "tests/data/slope.tif")
>>> stats[0].keys()
dict_keys(['min', 'max', 'mean', 'count'])
>>> [f['mean'] for f in stats]
[14.660084635416666, 56.60576171875]

and for point queries

>>> from rasterstats import point_query
>>> point = {'type': 'Point', 'coordinates': (245309.0, 1000064.0)}
>>> point_query(point, "tests/data/slope.tif")
[74.09817594635244]

Issues

Find a bug? Report it via github issues by providing

  • a link to download the smallest possible raster and vector dataset necessary to reproduce the error
  • python code or command to reproduce the error
  • information on your environment: versions of python, gdal and numpy and system memory
Comments
  • Wrong mean/std stats

    Wrong mean/std stats

    Hi,

    I think there is an error in calculating mean and standard deviation in rasterstats 0.10.3 Here is the statistics calculated in ArcMap for my region:
    Band COUNT MIN MAX MEAN STD MEDIAN 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 51615 3675 52475 10 151057 14939 44531 37989 1722 38133 11 151057 10063 56708 48523 3707 49289 12 151057 13024 46146 34272 5463 34959 13 151057 23169 59005 46253 3292 46804

    And here is the same statistics calculated using rasterstats (have a look at mean and std values for bands 9-13):
    Band count min max mean std percentile_50 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 23182 28669 52475 10 151057 14939 44531 9556 28485 38133 11 151057 10063 56708 20090 28673 49289 12 151057 13024 46146 5839 28953 34959 13 151057 23169 59005 17820 28623 46804

    The code to reproduce the results using rasterstats: for i in range(1, 14): stats = zonal_stats('F:/test.shp', "F:/test_2.dat", stats=['count','min', 'max', 'mean', 'std', 'percentile_50'], all_touched=False, band_num=i, nodata=0)

    Here is the data: https://www.dropbox.com/sh/e8g0jwr1ci0vp2r/AAATlwU2eFjtbtMCO3R5SB1Fa?dl=0

    Can you please let me know what the problem could be?

    opened by yurithefury 17
  • Geopandas and Zonal Statistic error

    Geopandas and Zonal Statistic error

    I run this code

    import geopandas as gpd
    from rasterstats import zonal_stats
    
    geodf = gpd.read_file("SHP/shape.shp")
    zonal_stats(geodf,"raster.tif")
    

    but I get this error:

    ParseException: Unknown type: 'FID'
    ParseException: Unexpected EOF parsing WKB
    Traceback (most recent call last):
    File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2820, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
    File "<ipython-input-43-3d15c4dddd4f>", line 1, in <module>
    zonal_stats(geodf,"raster.tif",stats='mean')
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 21, in zonal_stats
    return list(gen_zonal_stats(*args, **kwargs))
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/main.py", line 128, in gen_zonal_stats
    for i, feat in enumerate(features_iter):
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 107, in <genexpr>
    features_iter = (parse_feature(x) for x in obj)
    File "/usr/local/lib/python2.7/dist-packages/rasterstats/io.py", line 70, in parse_feature
    raise ValueError("Can't parse %s as a geojson Feature object" % obj)
    ValueError: Can't parse FID as a geojson Feature object
    

    The data frame looks like this:

    In[43]: geodf
    Out[42]: 
         FID                                      geometry
    
    0     0  POLYGON ((7.662965420348624 45.05600185397847,...
    1     1  POLYGON ((7.665210146514162 45.05607590098681,...
    2     2  POLYGON ((7.663353281768154 45.05601113630467,...
    3     3  POLYGON ((7.660229137844505 45.0570570693801, ...
    4     4  POLYGON ((7.660047499767138 45.0572381357898, ...
    ....
    

    Any suggestion how to solve this problem?

    bug 
    opened by LorenzoBottaccioli 14
  • Add groupby arg to gen_zonal_stats

    Add groupby arg to gen_zonal_stats

    This PR adds a groupby argument to gen_zonal_stats inspired by the ArcGIS gp.ZonalStatistics zone_field argument. The effect is that stats are computed by group, instead of by each individual feature.

    The groupby arg takes a string or a function. If the value is a string, it is assumed to be a field present in each feature properties. If the value is a function, the function's return value will be the groupby key.

    Caveats:

    • The major caveat of this feature is that groups containing overlapping geometry will yield unexpected results as all features are rasterized together...not individually. There are currently no warnings given for this use-case.
    • zone_arrays will be larger as the unioned bounds of the grouped features will make potentially large intermediate rasters
    • All zones will need to be able to fit into memory

    Also included in this PR is an example shapefile for use with testing.

    opened by brendancol 11
  • Partial polygon overlap in raster stats

    Partial polygon overlap in raster stats

    I am using the raster_stats module in python to calculate the mean carbon value for different sets of polygons. Here is the code that I am using:

    africa_sat = 'E:/Research_Work/data/processing_data/carbon_satcchi/data/raw_data/africa_carbon_1km.tif'
    name_clipshp = 'data/grid_clipped/' + name_hol + '_grid_cl.shp' # create object to point to clipped grid file
    
    africa_stats = raster_stats(name_clipshp, africa_sat,stats=['mean'])
    

    I have approximately 260 grid surface areas that I am employing. The grid areas have between ~10,000 polygons and ~40,000 polygons in them. Here is the error message I get for some of grids:

    Traceback (most recent call last): File "", line 16, in File "C:\Python27\lib\site-packages\rasterstats\main.py", line 153, in raster_stats rvds.SetGeoTransform(new_gt) AttributeError: 'NoneType' object has no attribute 'SetGeoTransform'

    I do not get this error message, if the polygon overlaps completely with the raster image (tiff file) or if the polygon does not at all overlap with the polygon. However, if only a part of the polygon overlaps with the raster I get the error that is above.

    Here is a link to a raster and shapefile that created the problem for me:

    https://www.dropbox.com/sh/c80v8v576imd09f/7pSGN-p9lS

    opened by engelmannjens 10
  • Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Rasterstats - Error - Rasterize Geom - Width and Height must be > 0

    Hi Matthew,

    I was trying to do the basic zonal statistics and I cannot get around this error. I tried the process with different shapefiles (with polygons). The result was the same. I am running it with Jupyter-Lab

    Here is my env configuration with anaconda: python 3.7.7 rasterio 1.1.4 py37h02db82b_0 conda-forge rasterstats 0.14.0 py_0 requests 2.23.0 pyh8c360ce_2 conda-forge rioxarray 0.0.26 py_0 conda-forge xarray 0.15.1

    Here is my code:

    shapefile=r'G:\gadm\version36 simplified\gadm36_2_simplified.shp' temp_tif=r'E:\weather\data\gpm\late_source\temp.tif' from rasterstats import zonal_stats print(shapefile, temp_tif) zonal_stats(shapefile,temp_tif)

    Using those files: shapefile.zip temp.zip

    Returning this error: error.txt

    Objective: I would like to use the gid_2 as the code to output the zonal statistics for each polygon

    Thanks in advance

    opened by marcello-moreira 9
  • AttributeError when running zonal_stats

    AttributeError when running zonal_stats

    I'm trying to run zonal_stats and I'm getting the following error in the _init_ function of io.py (line 242): AttributeError: 'GeneratorContextManager' object has no attribute 'transform'

    I've recreated the error with a variety of raster and vector data on a fresh virtual environment with rasterstats version 0.13.0. I've experimented with using different combinations of rasterstats/rasterio versions but I'm still seeing the issue. Any insight would be greatly appreciated.

    Thanks!

    opened by HenryW95 9
  • Exit code -1073741819 (0xC0000005)

    Exit code -1073741819 (0xC0000005)

    Hi all, In django console, a simple code as :

    from rasterstats import zonal_stats, point_query
    stats = zonal_stats('Good Raster/wetlands.json', 'Good Raster/tempRaster.tif')
    

    I received : Process finished with exit code -1073741819 (0xC0000005) (Python Crash)

    Here is the files, (geojson and raster): https://www.dropbox.com/sh/gacm4iu00aama4v/AADUm0RD0LYD_0fUV6Mxo3QFa?dl=0

    Version : Windows 10 pro 64 bit PyCharm 2016.3.2 Python 2.7.12 Django 1.10.4 gdal 2.1.3 numpy 1.12.0+mkl rasterio 0.36.0 rasterstats 0.12

    Update: The entire Zonal Statistics project, with OGR, OSR, Gdal, numpy, geojson that I code myself, work good but too dam slow. I need a efficient Zonal Statistic with geojson and a raster. This lib could save me but a critical error occurred when I use it without any other code or import.

    Work in a virtual environment created by Pycharm. All in 32 bit

    Update2 error django console

    When ran into Shell from terminal, I got some error details.

    FATAL: CPLMalloc(): Out of memory allocating 516 bytes. Python crash

    error shell terminal

    Alex

    opened by Neavrys 8
  • Geo raster mini rasters

    Geo raster mini rasters

    @perrygeo Matthew I have changed the file so as to allow an option to use GeoRasters as output if georasters is installed on the system . GeoRasters have a function to write GeoTiffs. They should make life in general easier for the user...let me know if you want to incorporate this version.

    opened by ozak 8
  • New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    New option: Cell area correction for unprojected data using a weighted mean based on cell latitude

    This will allow users to generate stats using unprojected / WGS84 data while still accounting for cell area in the calculations used for the "mean" stat. It is just a weighted mean where every cell inside a feature is given a weight based on the latitude of the cells centroid using the haversine function.

    opened by sgoodm 7
  • Rasterstats install on Anaconda

    Rasterstats install on Anaconda

    I seem to be having trouble installing rasterstats successfully in the windows 64bit environment (Windows 7 or Server 2012). I am currently using Anaconda (python 2.7,also tried 3.4) to install, with either the rasterio install instructions (pip install GDAL-2.0.2-cp27-none-win32.whl and rasterio-0.34.0-cp27-cp27m-win32.whl) or via conda forge (https://github.com/conda-forge/rasterio-feedstock).

    In both cases gdal/rasterio will run as expected but when I try to install rasterstats (via rasterstats-0.10.3-py2-none-any.whl or pip install rasterstats) I receive an error "GeoVersion" or "GdalVersion" not found, accompanyied with a "python setup.py egg error code 1".

    I've checked that the corresponding package is installed and the version is up to date. Any ideas what may be causing the install issue?

    Thanks for your time. -Bob

    opened by ghost 7
  • memoryerror exception on raster file in specific projection

    memoryerror exception on raster file in specific projection

    I have a very simple test case, a shapefile, and a raster file in a latlon grid projection. When running stats function it generates an exception:

    python(36508,0x7fff7a4cc000) malloc: *** mach_vm_map(size=362320535494656) failed (error code=3)
    *** error: can't allocate region
    *** set a breakpoint in malloc_error_break to debug
    Traceback (most recent call last):
      File "./create_stats.py", line 56, in <module>
        main()
      File "./create_stats.py", line 52, in main
        create_stats(r, p, date)
      File "./create_stats.py", line 22, in create_stats
        geojson = zonal_stats(shp_file, tif_file, stats='mean sum', geojson_out=True)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 21, in zonal_stats
        return list(gen_zonal_stats(*args, **kwargs))
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/main.py", line 136, in gen_zonal_stats
        fsrc = rast.read(bounds=geom_bounds)
      File "/net/ellis/usr/local/home/pavel/code/ricedss/.venv/lib/python2.7/site-packages/rasterstats/io.py", line 288, in read
        new_array = self.src.read(self.band, window=win, boundless=True, masked=masked)
      File "rasterio/_io.pyx", line 793, in rasterio._io.RasterReader.read (rasterio/_io.c:10053)
    MemoryError
    

    The test data are in http://oka.ags.io/scratch/

    failing code is

    stats = zonal_stats("RRD.shp", "RRD_rice_2014157.tif", stats='mean sum', geojson_out=True)
    

    working code on reprojected data works

    stats = zonal_stats("RRD_sinu.shp", "RRD_rice_2014157_sinu.tif", stats='mean sum', geojson_out=True)
    

    version is '0.10.0'

    opened by demiurg 7
  • ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    ShapelyDeprecationWarning: 'type' attribute to 'geom_type' attribute

    Welcome to the python-rasterstats issue tracker. Thanks for putting together a bug report! By following the template below, we'll be better able to reproduce the problem on our end.

    If you don't have a bug specifically but a general support question, please visit https://gis.stackexchange.com/

    Describe the bug

    /usr/local/lib/python3.8/dist-packages/rasterstats/main.py:156: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.

    The warning that appears in the provided code refers to a change in the way that Shapely, a geometry processing library, handles the 'type' attribute of geometries. Instead of using the 'type' attribute, it is recommended to use the 'geom_type' attribute. To fix this warning, the code can be modified to use the 'geom_type' attribute instead of the 'type' attribute when necessary.

    To Reproduce Steps to reproduce the behavior:

    1. How did you install rasterstats and its dependencies?
    2. What datasets are necessary to reproduce the bug? Please provide links to example data if necessary.
    3. What code is necessary to reproduce the bug? Provide the code directly below or provide links to it.
    # Code to reproduce the error
    zonal_stats = rasterstats.zonal_stats(
        output_geojson, vi_array, 
        affine=affine, nodata=nodata,
        stats="min max mean ",
        geojson_out=True,
    ) 
    

    Feature Requests

    python-rasterstats is not currently accepting any feature requests via the issue tracker. If you'd like to add a backwards-compatible feature, please open a pull request - it doesn't need to be 100% ready but should include a working proof-of-concept, tests, and should not break the existing API.

    opened by marianobonelli 0
  • segfault when importing rasterio

    segfault when importing rasterio

    I'm not sure what is happening but as soon as I import rasterio I get a segfault (using the test data)

    >>> import rasterio # this will cause the segfault
    >>> from rasterstats import zonal_stats
    >>> 
    >>> stats = zonal_stats(
    ...     "polygons.shp",
    ...     "slope.tif",
    ... )
    [1]    3980 segmentation fault  python3
    

    name = "rasterstats" version = "0.17.0"

    name = "rasterio" version = "1.3.4"

    opened by TimMcCauley 1
  • Performance optimizations

    Performance optimizations

    A small collection of performance enhancements aimed at reducing the number of times the data is traversed.

    Also improved percentile computation by computing them all at once instead of one at a time (and avoid recreating the compressed array each time).

    opened by groutr 1
  • Make int64 casting optional

    Make int64 casting optional

    With 'large' raster datasets of integer datatype, rasterstats wastefully casts the ndarray read from the raster up to int64. In the extreme case, this uses 8 times more memory than necessary if the underlying datatype of the raster dataset is Byte.

    At least for categorical rasters and when using categorical=True, it seems desirable to disable this behaviour to save memory.

    In a not particularly extreme case, this saved us on the order of 20G of memory:

    The raster datasets covers all of Scotland and is of the Byte type. The individual features in the vector dataset are the Scottish country boundaries and therefore cover a decent portion of the raster area (and their bounding box an even larger portion).

    The raster dataset is 46478 * 69365 pixels, 1 byte per pixel = 3.00GiB uncompressed in memory if stored in an ndarray of dtype int8. It would be 24GiB in int64.

    opened by maxfenv 3
  • Resolve shapely deprecation warnings; use pytest.importorskip

    Resolve shapely deprecation warnings; use pytest.importorskip

    This PR resolves a few shapely deprecation warnings:

    • Use shapely.errors.ShapelyError (ReadingError was also only from shapely.errors)
    • Use geom.geom_type instead of geom.type
    • Fix inconsistent dimensionality error with test polygon
    • Use pytest.importorskip feature
    opened by mwtoews 1
  • missing `click` dependency

    missing `click` dependency

    Hi, this is more a proposal than a bug report. Since python-rasterstats directly imports click: https://github.com/perrygeo/python-rasterstats/blob/5da3110230df44e228ba21f16a9d31ab7e84e0d5/src/rasterstats/cli.py#L6

    and since there's a known issue with click<7.1 (#254) , adding a click>=7.1 in requirements.txt would be great.

    The following is a real case scenario in which it would come in handy:

    on a centos8/rocky8 server:

    $ python3 -mvenv --system-site-packages env
    $ env/bin/pip install rasterstats
    # ... various logs ...
    
    $ env/bin/python -c "import rasterstats"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/__init__.py", line 4, in <module>
        from rasterstats import cli
      File "/autofs/nfshomes/dbranchini/buttami/env/lib64/python3.6/site-packages/rasterstats/cli.py", line 28, in <module>
        @cligj.use_rs_opt
      File "/usr/lib/python3.6/site-packages/click/decorators.py", line 170, in decorator
        _param_memo(f, OptionClass(param_decls, **attrs))
      File "/usr/lib/python3.6/site-packages/click/core.py", line 1510, in __init__
        raise TypeError('Got secondary option for non boolean flag.')
    TypeError: Got secondary option for non boolean flag.
    

    (then after some googling I saw #254 and forced the update of click)

    opened by brancomat 2
Releases(0.17.0)
Owner
Matthew Perry
Matthew Perry
gpdvega is a bridge between GeoPandas and Altair that allows to seamlessly chart geospatial data

gpdvega gpdvega is a bridge between GeoPandas a geospatial extension of Pandas and the declarative statistical visualization library Altair, which all

Ilia Timofeev 49 Jul 25, 2022
A light-weight, versatile XYZ tile server, built with Flask and Rasterio :earth_africa:

Terracotta is a pure Python tile server that runs as a WSGI app on a dedicated webserver or as a serverless app on AWS Lambda. It is built on a modern

DHI GRAS 531 Dec 28, 2022
Read images to numpy arrays

mahotas-imread: Read Image Files IO with images and numpy arrays. Mahotas-imread is a simple module with a small number of functions: imread Reads an

Luis Pedro Coelho 67 Jan 07, 2023
Script that allows to download data with satellite's orbit height and create CSV with their change in time.

Satellite orbit height ◾ Requirements Python = 3.8 Packages listen in reuirements.txt (run pip install -r requirements.txt) Account on Space Track ◾

Alicja Musiał 2 Jan 17, 2022
A GUI widget for Linux to show current time in different timezones.

A GUI widget to show current time in different timezones (under development). To use this widget: Run scripts/startup.py Select a country. A list of t

B.Jothin kumar 11 Nov 10, 2022
This program analizes films database with adresses, and creates a folium map with closest films to the coordinates

Films-map-project UCU CS lab 1.2, 1st year This program analizes films database with adresses, and creates a folium map with closest films to the coor

Artem Moskovets 1 Feb 09, 2022
Python bindings and utilities for GeoJSON

geojson This Python library contains: Functions for encoding and decoding GeoJSON formatted data Classes for all GeoJSON Objects An implementation of

Jazzband 763 Dec 26, 2022
Python tools for geographic data

GeoPandas Python tools for geographic data Introduction GeoPandas is a project to add support for geographic data to pandas objects. It currently impl

GeoPandas 3.5k Jan 03, 2023
Hapi is a Python library for building Conceptual Distributed Model using HBV96 lumped model & Muskingum routing method

Current build status All platforms: Current release info Name Downloads Version Platforms Hapi - Hydrological library for Python Hapi is an open-sourc

Mostafa Farrag 15 Dec 26, 2022
Histogram matching plugin for rasterio

rio-hist Histogram matching plugin for rasterio. Provides a CLI and python module for adjusting colors based on histogram matching in a variety of col

Mapbox 75 Sep 23, 2022
Create Siege configuration files from Cloud Optimized GeoTIFF.

cogeo-siege Documentation: Source Code: https://github.com/developmentseed/cogeo-siege Description Create siege configuration files from Cloud Optimiz

Development Seed 3 Dec 01, 2022
pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef enu eci

Python 3-D coordinate conversions Pure Python (no prerequistes beyond Python itself) 3-D geographic coordinate conversions and geodesy. API similar to

Geospace code 292 Dec 29, 2022
Download and process satellite imagery in Python using Sentinel Hub services.

Description The sentinelhub Python package allows users to make OGC (WMS and WCS) web requests to download and process satellite images within your Py

Sentinel Hub 659 Dec 23, 2022
ColoringMapAlgorithm-CSP- - Graphical Coloring of Countries with Condition Satisfaction Algorithm

ColoringMapAlgorithm-CSP- Condition Satisfaction Algorithm Output Condition

Kerem TAN 2 Jan 10, 2022
WhiteboxTools Python Frontend

whitebox-python Important Note This repository is related to the WhiteboxTools Python Frontend only. You can report issues to this repo if you have pr

Qiusheng Wu 304 Dec 15, 2022
Client library for interfacing with USGS datasets

USGS API USGS is a python module for interfacing with the US Geological Survey's API. It provides submodules to interact with various endpoints, and c

Amit Kapadia 104 Dec 30, 2022
iNaturalist observations along hiking trails

This tool reads the route of a hike and generates a table of iNaturalist observations along the trails. It also shows the observations and the route of the hike on a map. Moreover, it saves waypoints

7 Nov 11, 2022
A simple reverse geocoder that resolves a location to a country

Reverse Geocoder This repository holds a small web service that performs reverse geocoding to determine whether a user specified location is within th

4 Dec 25, 2021
Calculate the area inside of any GeoJSON geometry. This is a port of Mapbox's geojson-area for Python

geojson-area Calculate the area inside of any GeoJSON geometry. This is a port of Mapbox's geojson-area for Python. Installation $ pip install area U

Alireza 87 Dec 14, 2022
leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.

A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment

Qiusheng Wu 1.4k Jan 02, 2023