Python spatial statistics

September 11, 2020 — July 26, 2023

data sets
time series

Python tools for spatial statistics spatiotemporal processes.

Figure 1

1 Background: GDAL

Key concept: As with R spatial stuff, a cluster of projects that looms large is GDAL/OGR and the related PROJ library. These are behemoth C++ libraries that have supported geospatial stuff for decades and are very powerful, but also very confusing and demanding to install. Projects either wrap GDAL and make it usable, or implement parts of the functionality separately. If I can avoid needing the more esoteric GDAL features, I prefer things that do not require GDAL, because installing it is horrible, especially cross-platform, and it is usually overkill for my work.

2 RasterIO

Rasterio: access to geospatial raster data

Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on Numpy N-dimensional arrays and GeoJSON.

Does nto depend upon GDAL, but IIRC can invoke it for some stuff.

3 Fiona

Fiona accesses vector formates like ESRI shapefiles in a pythonic manner.

Fiona streams simple feature data to and from GIS formats like GeoPackage and Shapefile. Simple features are record, or row-like, and have a single geometry attribute. Fiona can read and write real-world simple feature data using multi-layered GIS formats, zipped and in-memory virtual file systems, from files on your hard drive or in cloud storage. This project includes Python modules and a command line interface (CLI).

Does not depend upon GDAL, but can use it for some stuff.

4 pyproj

Python interface to PROJ (cartographic projections and coordinate transformations library).

PROJ can be boring to compile, like GDAL. Useful for more esoteric coordinate transformations.

5 Geostack


…is a toolkit for high performance geospatial processing, modelling and analysis.

Some highlights of Geostack include:

  • Range of programmable geospatial operations based on OpenCL, including map algebra, distance mapping and rasterisation.
  • Data IO for common geospatial types such as geotiff and shapefiles with no dependencies.
  • Implicit handling geospatial alignment and projections, allowing easier coding of geospatial models.
  • Python bindings for interoperability with GDAL/RasterIO/xarray/NetCDF.
  • Built-in computational solvers including level set and network flow models.

More information and build guides are on our wiki. .

Seems to support optional interfaces to GDAL and rasterio.

They only document conda installation; I suspect that means that installation in other packaging systems is punishing.

6 Pangeo

Pangeo: A community platform for Big Data geoscience.

Figure 2

Pangeo is first and foremost a community promoting open, reproducible, and scalable science. This community provides documentation, develops and maintains software, and deploys computing infrastructure to make scientific research and programming easier. The Pangeo software ecosystem involves open source tools such as xarray, iris, dask, jupyter, and many other packages. There is no single software package called “pangeo”; rather, the Pangeo project serves as a coordination point between scientists, software, and computing infrastructure. On this website, scientists can find guides for accessing data and performing analysis using these tools (read the Guide for Scientists, browse the Pangeo Gallery, and learn about the Packages). Those interested in building infrastructure can find instructions for deploying Pangeo environments on HPC or cloud clusters (learn about the Technical Architecture or read the Deployment Setup Guides). For more general information, read About Pangeo, see the Funders and Collaborators, or read the Frequently Asked Questions.

These folks support Dask, Xarray, and probably other famous pieces of python big data infrastructure.


PySAL, the Python Spatial Analysis library (libraries, really), incorporating

Core spatial data structures, file IO. Construction and interactive editing of spatial weights matrices & graphs. Alpha shapes, spatial indices, and spatial-topological relationships.
Modules to conduct exploratory analysis of spatial and spatio-temporal data, including statistical testing on points, networks, and polygonal lattices. Also includes methods for spatial inequality and distributional dynamics.
Estimation of spatial relationships in data with a variety of linear, generalized-linear, generalized-additive, and nonlinear models.
Visualize patterns in spatial data to detect clusters, outliers, and hot-spots.

This seems to be a rich ecosystem; it is kind of dual to QGIS, in that it seems to put statistical analyses first and geography second. Personally I am curious about their spatial Gibbs sampler.


This GIS software has a python interface, and a lot of the functionality is exposed through it. See spatial statistics for more on that.

9 Incoming

  • Google Earth Engine is easy to access from colaboratory, e.g. ee-api-colab-setup.ipynb; there is an amount of geospatial imagery processing in there.

  • TransBigData – for Transportation Spatio-Temporal Big Data

    TransBigData is a Python package developed for transportation spatio-temporal big data processing and analysis. TransBigData provides fast and concise methods for processing common traffic spatio-temporal big data such as Taxi GPS data, bicycle sharing data and bus GPS data. It includes general methods such as rasterization, data quality analysis, data pre-processing, data set counting, trajectory analysis, GIS processing, map base map loading, coordinate and distance calculation, and data visualization.

10 References

Hales, Nelson, Williams, et al. 2021. The Grids Python Tool for Querying Spatiotemporal Multidimensional Water Data.” Water.
Rey, and Anselin. 2010. PySAL: A Python Library of Spatial Analytical Methods.” In Handbook of Applied Spatial Analysis.