How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

Related tags

Geolocationcog_how_to
Overview

How to use COG's (Cloud optimized GeoTIFFs) with Rasterio

According to Cogeo.org:

A Cloud Opdtimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need.

Think about the following case: You want to analyze the NDVI of your local 1km² park by using Sentinel 2 geoTIFF imaginery. Sentinel 2 satellite images cover very big regions. In the past, you had to download the whole file (100mb +) for band 4 (red) and the whole file for band 8 (near infrared) even that in fact, you need only a small portion of the data. That's why COG's (cloud optimized geoTIFFs) have been invented. With them, we ask the server to only send specific bytes of the image.

Cloud optimized geoTIFFs offer:

  • efficient imaginery data access
  • reduced duplication of data
  • legacy compatibility

COG's can be read just like normal geoTIFFs. In our example, we will use an AOI (area of interest), that is described in a geoJSON. We will also use sat-search to query the latest available Sentinel-2 satellite imaginery for our specific location. Then we will use Rasterio to perform a range request to download only the parts of the files we need. We will also use Pyproj to perform neccessary coordinate transformations. The cloud optimized Sentinel 2 imaginery is hosted in a AWS S3 repository.

Install libraries (matplotlib optional)

pip install rasterio pyproj sat-search matplotlib

Import libraries

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

First, we need to open our geoJSON file and extract the geometry. To create a geoJSON, you can go to geojson.io. Do not make a very large geoJSON (a good size is 1x1km²), otherwise you might get an error later.

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

We will query for images not older than 60 days that contain less than 20% clouds.

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

Now we got the URLs of the most recent Sentinel 2 imaginery for our region. In the next step, we need to calculate which pixels to query from our geoTIFF server. The satellite image comes with 10980 x 10980 pixels. Every pixel represents 10 meter ground resolution. In order to calculate which pixels fall into our area of interest, we need to reproject our geoJSON coordinates into pixel row/col. With the recent Rasterio versions, we can read COGs by passing a rasterio.windows.Window (that specifies which row/col to query) to the read function. Before we can query, we need to open a virtual file(urls of a hosted file):

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:

Then, we calculate the bounding box around our geometry and use the pyproj.Transformer to transform our geoJSON coordinates (EPSG 4326) into Sentinel Sat's EPSG 32633 projection.

        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 

Now that we have the right coordinates, we can calculate from coordinates to pixels in our geoTIFF file using rasterio.

        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()

Now we are ready for the desired range request.

        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

The subset object contains the desired data. We can access and vizualize it with:

        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()

red nir

I hope, I was able to show you how COG's work and that you are ready now to access your cloud optimized geoTIFF images in seconds compared to minutes in the past. Have a great day!

All together:

from satsearch import Search
from datetime import datetime, timedelta
from pyproj import Transformer
from json import load

import rasterio
from rasterio.features import bounds

file_path = "path/to/your/file.geojson"
with open(file_path,"r") as fp:
    file_content = load(fp)
geometry = file_content["features"][0]["geometry"]

# search last 60 days
current_date = datetime.now()
date_60_days_ago = current_date - timedelta(days=60)
current_date = current_date.strftime("%Y-%m-%d")
date_60_days_ago = date_60_days_ago.strftime("%Y-%m-%d")

# only request images with cloudcover less than 20%
query = {
    "eo:cloud_cover": {
        "lt": 20
        }
    }
search = Search(
    url='https://earth-search.aws.element84.com/v0',
    intersects=geometry,
    datetime=date_60_days_ago + "/" + current_date,
    collections=['sentinel-s2-l2a-cogs'],
    query=query
    )        
# grep latest red && nir
items = search.items()
latest_data = items.dates()[-1]
red = items[0].asset('red')["href"]
nir = items[0].asset('nir')["href"]
print(f"Latest data found that intersects geometry: {latest_data}")
print(f"Url red band: {red}")
print(f"Url nir band: {nir}")

for geotiff_file in [red, nir]:
    with rasterio.open(geotiff_file) as geo_fp:
        bbox = bounds(geometry)
        coord_transformer = Transformer.from_crs("epsg:4326", geo_fp.crs) 
        # calculate pixels to be streamed in cog 
        coord_upper_left = coord_transformer.transform(bbox[3], bbox[0])
        coord_lower_right = coord_transformer.transform(bbox[1], bbox[2]) 
        pixel_upper_left = geo_fp.index(
            coord_upper_left[0], 
            coord_upper_left[1]
            )
        pixel_lower_right = geo_fp.index(
            coord_lower_right[0], 
            coord_lower_right[1]
            )
        
        for pixel in pixel_upper_left + pixel_lower_right:
            # If the pixel value is below 0, that means that
            # the bounds are not inside of our available dataset.
            if pixel < 0:
                print("Provided geometry extends available datafile.")
                print("Provide a smaller area of interest to get a result.")
                exit()
        
        # make http range request only for bytes in window
        window = rasterio.windows.Window.from_slices(
            (
            pixel_upper_left[0], 
            pixel_lower_right[0]
            ), 
            (
            pixel_upper_left[1], 
            pixel_lower_right[1]
            )
        )
        subset = geo_fp.read(1, window=window)

        # vizualize
        import matplotlib.pyplot as plt
        plt.imshow(subset, cmap="seismic")
        plt.colorbar()
        plt.show()
Owner
Marvin Gabler
specialized in climate, data & risk | interested in nature, rockets and outer space | The earth's data for our world's future
Marvin Gabler
Python bindings and utilities for GeoJSON

geojson This Python library contains: Functions for encoding and decoding GeoJSON formatted data Classes for all GeoJSON Objects An implementation of

Jazzband 763 Dec 26, 2022
This program analizes films database with adresses, and creates a folium map with closest films to the coordinates

Films-map-project UCU CS lab 1.2, 1st year This program analizes films database with adresses, and creates a folium map with closest films to the coor

Artem Moskovets 1 Feb 09, 2022
Google Maps keeps old satellite imagery around for a while – this tool collects what's available for a user-specified region in the form of a GIF.

google-maps-at-88-mph The folks maintaining Google Maps regularly update the satellite imagery it serves its users, but outdated versions of the image

Noah Doersing 111 Sep 27, 2022
Documentation and samples for ArcGIS API for Python

ArcGIS API for Python ArcGIS API for Python is a Python library for working with maps and geospatial data, powered by web GIS. It provides simple and

Esri 1.4k Dec 30, 2022
Wraps GEOS geometry functions in numpy ufuncs.

PyGEOS PyGEOS is a C/Python library with vectorized geometry functions. The geometry operations are done in the open-source geometry library GEOS. PyG

362 Dec 23, 2022
gpdvega is a bridge between GeoPandas and Altair that allows to seamlessly chart geospatial data

gpdvega gpdvega is a bridge between GeoPandas a geospatial extension of Pandas and the declarative statistical visualization library Altair, which all

Ilia Timofeev 49 Jul 25, 2022
Minimum Bounding Box of Geospatial data

BBOX Problem definition: The spatial data users often are required to obtain the coordinates of the minimum bounding box of vector and raster data in

Ali Khosravi Kazazi 1 Sep 08, 2022
A modern, geometric typeface by @chrismsimpson (last commit @ 85fa625 Jun 9, 2020 before deletion)

Metropolis A modern, geometric typeface. Influenced by other popular geometric, minimalist sans-serif typefaces of the new millenium. Designed for opt

Darius 183 Dec 25, 2022
Python library to visualize circular plasmid maps

Plasmidviewer Plasmidviewer is a Python library to visualize plasmid maps from GenBank. This library provides only the function to visualize circular

Mori Hideto 9 Dec 04, 2022
Pure python WMS

Ogcserver Python WMS implementation using Mapnik. Depends Mapnik = 0.7.0 (and python bindings) Pillow PasteScript WebOb You will need to install Map

Mapnik 130 Dec 28, 2022
This repository contains the scripts to derivate the ENU and ECEF coordinates from the longitude, latitude, and altitude values encoded in the NAD83 coordinates.

This repository contains the scripts to derivate the ENU and ECEF coordinates from the longitude, latitude, and altitude values encoded in the NAD83 coordinates.

Luigi Cruz 1 Feb 07, 2022
A proof-of-concept jupyter extension which converts english queries into relevant python code

Text2Code for Jupyter notebook A proof-of-concept jupyter extension which converts english queries into relevant python code. Blog post with more deta

DeepKlarity 2.1k Dec 29, 2022
Tile Map Service and OGC Tiles API for QGIS Server

Tiles API Add tiles API to QGIS Server Tiles Map Service API OGC Tiles API Tile Map Service API - TMS The TMS API provides these URLs: /tms/? to get i

3Liz 6 Dec 01, 2021
A compilation of several single-beam bathymetry surveys of the Caribbean

Caribbean - Single-beam bathymetry This dataset is a compilation of several single-beam bathymetry surveys of the Caribbean ocean displaying a wide ra

Fatiando a Terra Datasets 0 Jan 20, 2022
Program that shows all the details of the given IP address. Build with Python and ipinfo.io API

ip-details This is a program that shows all the details of the given IP address. Build with Python and ipinfo.io API Usage To use this program, run th

4 Mar 01, 2022
Simple, concise geographical visualization in Python

Geographic visualizations for HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it? GeoViews is a Python library that ma

HoloViz 445 Jan 02, 2023
Open Data Cube analyses continental scale Earth Observation data through time

Open Data Cube Core Overview The Open Data Cube Core provides an integrated gridded data analysis environment for decades of analysis ready earth obse

Open Data Cube 410 Dec 13, 2022
pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef enu eci

Python 3-D coordinate conversions Pure Python (no prerequistes beyond Python itself) 3-D geographic coordinate conversions and geodesy. API similar to

Geospace code 292 Dec 29, 2022
Hapi is a Python library for building Conceptual Distributed Model using HBV96 lumped model & Muskingum routing method

Current build status All platforms: Current release info Name Downloads Version Platforms Hapi - Hydrological library for Python Hapi is an open-sourc

Mostafa Farrag 15 Dec 26, 2022
leafmap - A Python package for geospatial analysis and interactive mapping in a Jupyter environment.

A Python package for geospatial analysis and interactive mapping with minimal coding in a Jupyter environment

Qiusheng Wu 1.4k Jan 02, 2023