Geospatial and Artificial Intelligence. From GIS to Foundation Models

September 16, 2025

By Janea Systems

Geospatial,
Artificial Intelligence,
Open Source

The Limitations of Traditional Geospatial Technologies

Geographic Information Systems (GIS), the foundational technology in the field of geospatial, relies mainly on manual processes and rule-based analysis. Under this approach, a GIS analyst manually digitizes features from satellite images, creates rules like "find all areas within 500 meters of water," and interprets results based on predefined criteria.

This method works but is time-consuming and often struggles with the complexity and scale of modern geospatial data. For example, the Landsat program alone has been collecting imagery since 1972; commercial satellites like Planet Labs take daily global imagery at 3-meter resolution. That’s petabytes of data being generated continuously.

Traditional rule-based systems can’t keep up with analyzing such amounts of data because they work with simple decision trees: "if pixel value is green and NDVI is above 0.4, classify as vegetation." Real-world patterns, on the other hand, require a more nuanced approach – one that combines contextual understanding, multimodal awareness, and real-time analytics. In traditional GIS, these can only be provided by humans. Given the growing scale of geospatial data, this is becoming more and more challenging.

The Promise of Foundation Models for Geospatial

The GIS community saw massive breakthroughs at the intersection of AI, NLP, and computer vision. ChatGPT revolutionized how people interacted with text; models like DALL-E generated images in seconds from natural-language descriptions.

These breakthroughs, however, wouldn’t have been possible without so-called foundation models. These are large, pretrained AI systems designed to be incredibly versatile and adaptable across many different tasks and domains.

The power of foundation AI models comes from the scale of their training, their ability to generalize in real time, and their multimodality. Trained on huge chunks of the internet and millions of images, foundation models can effectively generalize across various tasks and work across different domains - vision, language, audio, or a combination.

Before foundation models, if you wanted an AI to do image recognition, you'd train one model specifically for that. Want language translation? You'd need a completely separate model trained from scratch. Want to detect objects in satellite imagery? Another specialized model.

Today, a foundation model that learned to understand language can also be fine-tuned to write code, translate languages, or answer questions - without being trained for each task from scratch.

The potential for geospatial applications was obvious - imagine being able to ask "show me all the areas experiencing drought in California" or "identify potential flood zones near this river" in natural language.

Towards Foundation Models in Geospatial

The geospatial community had been experimenting with various forms of AI and machine learning since the 1990s. However, none of the developments before the 2020s could qualify as a true foundation model.

It was only the arrival of the transformer architecture - particularly GPT-3 or DALL-E - that offered us a glimpse into the power of foundation models and their potential applications in geospatial.

The key insight was how these language models could perform tasks they weren't explicitly trained for. The same model that learned to complete text could also write poetry, debug code, translate languages, and engage in complex reasoning. This demonstrated the transfer learning potential that had been theoretically understood but never practically realized at such a scale.

For geospatial researchers, this was revelatory because they recognized that Earth observation data shared many of the same characteristics that made language models successful: massive datasets, rich patterns across multiple modalities, and the need for models that could generalize across different tasks and data types.

Prithvi – The First True Foundation Model in GeoAI

Taking the insight that Earth observation data has the same characteristics as language data, geospatial researchers got to work.

The result was Prithvi (Figure 1) - a transformer-based geospatial foundation model trained on absolutely massive amounts of data: over 4.2 million satellite image samples spanning seven years from the Harmonized Landsat Sentinel-2 dataset.

Fig. 1: An early screenshot of Prithvi (source)

Prithvi’s creators made sure to represent all land use and land cover classes, covered about 60% of urban regions globally, and included around 800 different ecoregions with multiple samples from each. The dataset ended up being more than three times larger than previous global datasets.

First of its kind, Prithvi ushered in an era of geospatial foundation models.

Beyond Prithvi: IBM’s TerraMind and Google’s Geospatial Reasoning

After Prithvi broke the ice, impressive follow-up models took the foundation model concept to the next level. The two notable examples are IBM’s TerraMind and Google’s Geospatial Reasoning models.

TerraMind: The Multimodal Powerhouse

TerraMind is probably the most ambitious geospatial foundation model unveiled so far, developed in collaboration between IBM, the European Space Agency (ESA), and Forschungszentrum Jülich in Germany.

Described as an "any-to-any" generative multimodal model, TerraMind doesn't just process one type of data, like Prithvi did with optical imagery. Instead, it can handle nine different data modalities simultaneously: optical imagery, Synthetic Aperture Radar (SAR), elevation models, weather data, textual annotations, land use maps, and more (see Figure 2 for an example).

Fig. 2: TerraMind’s “any-to-any” demonstration. Left to right: optical input, synthetic radar, a generated land use classification (source).

The scale is also staggering. TerraMind was trained on over 9 million spatiotemporally aligned multimodal samples, processing 500 billion tokens from the so-called "TerraMesh" dataset.

Even more impressively, TerraMind is capable of generating missing data modalities. Suppose there’s optical imagery but no radar data for a particular area. The model can infer what the radar signature should look like based on its understanding of how these different data types relate to each other. This feature is called "Thinking-in-Modalities" tuning.

Google's Geospatial Reasoning: The System Orchestrator

Google's Geospatial Reasoning models take a completely different approach. Instead of building one massive model, they've created an autonomous GIS system that orchestrates multiple foundation models and geospatial tools together.

In creating these models, Google leveraged their own existing infrastructure and built on the success of the self-supervised learning approach that gave us ChatGPT and DALL-E. The final product is a combination of data from Google Earth, Maps, and Cloud, with multiple specialized foundation models for different tasks like flood forecasting, wildfire detection, and population dynamics analysis.

Users can query the model in natural language. For example, if prompted: "In the recent hurricane, how many medical facilities were damaged?", the model will automatically figure out what data sources to use, which models to apply, and how to combine everything to give you an answer (Figure 3).

Fig. 3: A screenshot from Google’s overview of its Geospatial Reasoning (source). Here, the model responds to the query “In the recent hurricane, how many medical facilities were damaged?”

The Future of Foundation Models in GeoAI

The future of geospatial foundation models looks exciting and may unfold in several exciting dimensions.

Open Source Development

We’re already witnessing an unprecedented democratization of geospatial AI. Prithvi started it by being open-source. TerraMind followed suit, with models like Clay Foundation Model by Development Seed joining the open-source movement. This means academic institutions, government agencies, and smaller organizations are gaining access to sophisticated geospatial foundation models, including features like similarity search (Figure 4), without the massive computational costs of training from scratch.

Fig. 4: Similarity search results from the Clay Foundation Model (source). Similarity search in GeoAI can be used for locating specific geographic elements such as pools or solar panels.

Intelligence at the Edge

Equally exciting is the potential of GeoAI for edge deployment. Drones conducting agricultural surveys, mobile phones used by field researchers, and even vehicle-mounted systems may soon be able to run geospatial AI models locally. This could enable wildfire detection in real-time on remote monitoring stations or making precision agriculture decisions in the field without cloud connectivity.

The Security Imperative

Among all the optimism and enthusiasm surrounding GeoAI, security looms as the biggest concern.

Geospatial foundation models can inadvertently memorize and reveal sensitive location data, personal information, or even enable sophisticated attacks. The field is actively developing countermeasures, such as federated learning, differential privacy techniques, and secure protocols, but security and privacy remain a critical area in the development of this technology.

For more information on the privacy and security challenges in GeoAI, head to our recent article, where we explore the most pressing issues and offer suggestions for countermeasures.

Ready to Turn Geospatial Data into Decisions?

Foundation models are redefining what’s possible in GeoAI. Janea Systems helps teams move beyond manual GIS and brittle rules into fast, scalable, and multimodal intelligence.

What we’ve delivered for leaders in mapping

Front-to-back mapping optimization (Azure Maps): backend style generation and selective server-side rendering for faster load times, cross-device consistency, and more cost-effective infrastructure — with reduced data costs for end users.

Deep-learning acceleration (Bing Maps QAS): automated query processing and model refactors delivering 100% automated error correction, 50x faster TensorFlow, 2x higher batch throughput, 7x faster training runs, and 30% speedup on dual-GPU pipelines.

Do you need a realistic plan to modernize mapping workflows with a path toward foundation-model capabilities? Let’s talk.

Frequently Asked Questions

Foundation models are large-scale AI systems trained on vast and diverse datasets, allowing them to generalize across multiple tasks and domains. Instead of building separate models for each task (like image recognition or translation), a single foundation model can be fine-tuned to perform many different functions, from analyzing language to interpreting satellite imagery.

Geographic Information Systems (GIS) are tools used to capture, store, analyze, and visualize spatial data, such as maps and satellite imagery. Traditionally, GIS relies on manual processes and rule-based methods to interpret geospatial data, which can be time-consuming and limited when dealing with today’s massive and complex datasets.

AI, particularly foundation models, enhances GIS by enabling automated, scalable, and multimodal analysis of geospatial data. This allows users to ask natural language questions (e.g., “show me drought areas in California”), detect patterns across different data types, and generate insights much faster than manual or rule-based approaches. Examples like Prithvi, TerraMind, and Google’s Geospatial Reasoning demonstrate how AI is reshaping the future of geospatial analysis.