Foundation models for geoscience
2024-11-14 — 2025-11-12
Wherein a catalogue of planetary foundation models is presented, their multi‑temporal training, inclusion of Sentinel‑1 radar and diverse spectral bands is noted, and suitability for H100 fine‑tuning is indicated for inundation and burn‑scar tasks.
Foundation models for the whole planet, rather than classical NNs or classical geospatial techniques.
I’m currently the project lead for a CSIRO Geospatial Foundation Model research program, so I know a lot about the area, but I don’t have much time to blog about it yet.
What follows are notes I prepared for my team; they’re desperately in need of generalisation and updating.
1 Popular Geospatial Foundation Models
As of mid-2025.
The three most relevant state-of-the-art models for our tasks are Prithvi-V2, Clay, and SatlasPretrain. Each represents a different strategic choice in terms of data, scale, and ecosystem. I’m focusing here on satellite-observation models, but there are others.
| Feature | Prithvi-V2 (IBM/NASA) | Clay | SatlasPretrain (Allen AI) |
|---|---|---|---|
| Core Architecture & Size | |||
| Training Data | |||
| Required Bands | |||
| Geographic Suitability (Australia) | |||
| Applicability to Inundation | |||
| Applicability to Burn Scar | |||
| Ease of Use (on our 4x H100) | |||
| Key Differentiator | Temporal Focus. Designed from the ground up for multi-temporal change detection. | Spectral Focus. Leverages more Sentinel-2 bands for tasks sensitive to specific spectral signatures. | Scale & Generalization. Trained on an unmatched diversity and volume of data; the most powerful general-purpose feature extractor. |
2 Modelling governing equations
See foundation models for PDEs and Schmude et al. (2024). For example, Prithvi WxC is a weather-model surrogate, infilling “earth systems models”.
3 Incoming
DeepMind’s AlphaEarth is out! See Alice Heiman’s story about it (Brown et al. 2025).
Using Foundation Models for Earth Observation — Development Seed (interesting model comparison)
IBM and NASA’s Prithvi project has several models within it (Jakubik et al. 2023; Schmude et al. 2024; Szwarcman et al. 2025)
- NASA-IMPACT/Prithvi-EO-2.0: This repository contains details of the release of the Prithvi-EO-2.0 foundation model.
- IBM and NASA are building an AI foundation model for weather and climate - IBM Research
- NASA and IBM Openly Release Geospatial AI Foundation Model for NASA Earth Observation Data | NASA Earthdata
- Prithvi EO 2.0 BurnScars Demo - a Hugging Face Space by ibm-nasa-geospatial
An AWS marketing piece that helpfully walks through a workflow
OlmoEarth: A new state-of-the-art Earth observation foundation model family | Ai2
OlmoEarth was designed to be easily extensible for bespoke applications across a wide range of problems. It can be leveraged within the OlmoEarth Platform to build customized, highly performant models serving organizations across the entire lifecycle, from raw data acquisition through labeling, fine-tuning, and production deployment.
To demonstrate this in action, we’re releasing fine-tuned OlmoEarth models for real-world challenges such as mangrove classification, crop-type and cropland mapping, and forest-fire fuel classification. These were developed with organizations across multiple regions and are ready for adaptation to new areas.
The next family of OlmoEarth foundation models, already in development and expected to be released next year, expands into new sectors like humanitarian response with support for weather data and additional modalities.
4 What is “multi-modal” for planets?
We are usd to foundation models being multi-modal, handling text and images, or images and audio. For GFMs the multi-modality we need to care about is sensor modality satelite photos, wind gauges, ocean buoys, radar, LiDAR, etc. GFMs are distinct from general-domain foundation models (like GPT-4 or Stable Diffusion) because they are designed to solve challenges unique to geospatial data. The primary challenge is the need to “facilitate the processing of multi-modal data from different satellites” and sensor types. Overviews here: (Yang et al. 2025; Yu et al. 2025) Key terms
Modality Heterogeneity: This refers to the fundamental disparities between data sources, including differences in “imaging physics, viewpoint, spatial and temporal resolution, spectral range, and noise”. A multi-modal GFM must be able to fuse optical imagery, radar (SAR), LiDAR point clouds, and textual or vector data.
Distribution Shifts: Sensors have different “spatial coverage” and “revisit frequency”. This results in data imbalance and sparsity across modalities , which can introduce biases and limit the model’s generalization capabilities.
Semantic Gap: This is the “intrinsic disconnect between low-level pixel data and high-level conceptual understanding” (Yang et al. 2025). This gap is particularly wide in the geospatial domain. For example, recent research notes that even powerful Large Language Models (LLMs) struggle significantly with “qualitative spatial reasoning” and “executing spatial tasks from implicit textual descriptions involving coordinates”.
The GFM attempts to bridge this semantic gap by throwing data into it. The self-supervised, multi-modal pre-training of models is designed to learn a unified representation of these heterogeneous, sparse, and noisy data streams.
