Geospatial Domain
Overview
The geospatial domain provides spatial analysis capabilities covering distance measurement, coordinate transformations, spatial joins and overlays, interpolation, network routing, and spatial statistics. It is built on geopandas, shapely, and pyproj as required dependencies, with optional enhancement from scikit-gstat (for advanced kriging) and networkx (for network analysis).
Use this domain when you need to:
Compute distances between geographic points (Haversine/great-circle or Euclidean)
Reproject data between coordinate reference systems (e.g. WGS84 to UTM)
Join point datasets to polygon datasets by spatial relationship
Interpolate values across a spatial grid from sparse point observations
Find shortest paths or service area isochrones on a road/graph network
Measure spatial autocorrelation (clustering vs dispersion) in attribute values
All transformers are sklearn-compatible (BaseEstimator, TransformerMixin) and can be embedded
in scikit-learn pipelines. High-level convenience functions are also available for direct use.
Available Analyses
Analysis |
Class / Function |
Description |
|---|---|---|
Haversine distance |
|
Great-circle distance between lat/lon pairs |
Batch distance matrix |
|
All-pairs or nearest-k distance matrix |
CRS transformation |
|
Reproject coordinates between EPSG/PROJ systems |
Spatial join |
|
Intersect, contains, nearest-neighbour joins |
Spatial overlay |
|
Union, intersection, difference of polygon layers |
Point aggregation |
|
Count/summarise points falling within polygons |
IDW interpolation |
|
Inverse-distance weighted interpolation |
Kriging interpolation |
|
Variogram-based geostatistical interpolation |
Moran’s I |
|
Global spatial autocorrelation index |
Geary’s C |
|
Alternative spatial autocorrelation statistic |
Network shortest path |
|
Dijkstra/A* shortest path on a graph |
Network MST |
|
Minimum spanning tree |
Accessibility analysis |
|
Distance/time from each node to service locations |
Service isochrones |
|
Reachable area within time/distance threshold |
Autocorrelation pipeline |
|
Convenience wrapper for Moran’s I via pipeline |
Spatial clustering |
|
K-means or DBSCAN on geographic coordinates |
Convex hull / bounding box |
|
Geometric envelope calculations |
MCP Tool Reference
The geospatial domain exposes its functionality through Python API calls. The high-level functions below are the primary entry points.
calculate_spatial_distance
Compute distances between points in a DataFrame.
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
pd.DataFrame |
required |
DataFrame with coordinate columns |
|
list[str] |
|
Names of the x/y (or lon/lat) columns |
|
list[str] |
|
Metrics: |
|
str |
|
Output as |
Return format
Returns a DataFrame with distance columns appended, or a distance matrix.
perform_spatial_join
Join two spatial datasets by geometric relationship.
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
GeoDataFrame |
required |
Left dataset (points or polygons) |
|
GeoDataFrame |
required |
Right dataset to join against |
|
str |
|
Join predicate: |
|
str |
|
Join direction: |
|
float |
None |
Maximum distance for nearest joins |
Return format
SpatialJoinResult(
result_gdf=..., # GeoDataFrame with joined attributes
join_count=..., # Number of matched pairs
unmatched_left=..., # Indices with no match
join_type="intersects",
metadata={...}
)
perform_spatial_overlay
Combine two polygon layers with set operations.
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
GeoDataFrame |
required |
First polygon layer |
|
GeoDataFrame |
required |
Second polygon layer |
|
str |
|
Operation: |
analyze_spatial_autocorrelation
Measure spatial autocorrelation in a variable across a point dataset.
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
pd.DataFrame |
required |
DataFrame with coordinate and value columns |
|
list[str] |
|
Coordinate column names |
|
str |
required |
Column to test for spatial pattern |
|
str |
|
Method: |
|
int |
|
Number of spatial neighbours for weights matrix |
Return format
SpatialStatisticsResult(
statistic="morans_i",
value=0.42, # Moran's I: -1 (dispersion) to +1 (clustering)
p_value=0.001,
z_score=3.8,
interpretation="Strong positive spatial autocorrelation detected",
weights_summary={...}
)
optimize_route
Find the shortest path between two nodes in a spatial network.
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
dict |
required |
Graph definition with nodes and edges |
|
str/int |
required |
Origin node identifier |
|
str/int |
required |
Destination node identifier |
|
str |
|
Edge attribute to minimise |
|
str |
|
Algorithm: |
Return format
RouteResult(
path=[node1, node2, ...],
total_distance=12.4,
total_time=None,
segments=[{"from": ..., "to": ..., "distance": ...}],
metadata={...}
)
generate_service_isochrones
Compute reachable areas from service locations within a distance/time budget.
Parameters
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
dict |
required |
Graph definition |
|
list |
required |
Origin node IDs for service points |
|
list[float] |
required |
Distance or time budgets for each isochrone band |
|
str |
|
|
Return format
IsochroneResult(
service_areas={threshold: [node_ids]},
coverage_stats={"fraction_covered": 0.72, ...},
metadata={...}
)
Method Details
Distance Calculations
Haversine (great-circle) distance accounts for Earth’s curvature and is the correct method
for geographic coordinates in degrees (lat/lon). The SpatialDistanceCalculator.haversine_distance
method supports scalar pairs, NumPy arrays (vectorised), and tuple-based calling conventions.
calc = SpatialDistanceCalculator()
dist_km = calc.haversine_distance(lat1=48.85, lon1=2.35, lat2=51.51, lon2=-0.13, unit="km")
Supported units: km, m, mi, nmi.
Euclidean distance is appropriate for projected coordinate systems (e.g., UTM metres) where distances are small enough that Earth’s curvature is negligible (typically <100 km).
When to choose haversine: Any time your coordinates are in degrees (EPSG:4326 / WGS84). When to choose Euclidean: Projected coordinates (metres), small geographic areas.
Coordinate Transformations
SpatialCoordinateTransformer wraps pyproj to reproject point or GeoDataFrame data between
coordinate reference systems.
transformer = SpatialCoordinateTransformer(
source_crs="EPSG:4326", # WGS84 geographic
target_crs="EPSG:32632", # UTM Zone 32N
)
projected_df = transformer.fit_transform(df_with_lon_lat)
Common CRS codes:
Code |
Description |
|---|---|
|
WGS84 geographic (lat/lon degrees) — GPS default |
|
Web Mercator (metres) — web tiles |
|
UTM Zone 32N (metres) — central Europe |
|
British National Grid |
Spatial Joins
Three join predicates are supported:
Predicate |
Meaning |
|---|---|
|
Geometries share any point |
|
Left geometry fully contains right geometry |
|
Left geometry is fully inside right geometry |
|
Joins each left feature to its closest right feature |
For nearest joins, distance_threshold limits the maximum allowable distance and prevents
spurious matches when no close neighbour exists.
aggregate_points_in_polygons is a convenience function that counts or summarises attribute
values of points falling within each polygon — useful for spatial binning.
Spatial Interpolation
IDW (Inverse Distance Weighting)
Estimates values at unsampled locations as a weighted average of nearby observations, with weights proportional to 1/distance^p. No distributional assumptions; fast.
Use IDW when:
You need a quick interpolation without stationarity assumptions
The spatial process is expected to be locally smooth
You have sparse but reliable point observations
Kriging
Kriging is a best linear unbiased estimator that models spatial dependence through a variogram. It provides interpolated values plus prediction variance (uncertainty estimates).
The VariogramModel class fits empirical variograms using one of four theoretical models:
Model |
Shape |
Best for |
|---|---|---|
|
Gradual flattening with finite range |
Most natural phenomena |
|
Asymptotic approach to sill |
Soil properties, precipitation |
|
Smooth curve, no nugget |
Very smooth processes |
|
No sill |
Unbounded processes |
When scikit-gstat is installed, advanced variogram fitting and model selection are available. Without it, a fallback implementation is used.
Interpretation: Low prediction variance indicates the interpolated value is well-constrained by nearby observations. High variance flags locations far from data points.
Spatial Statistics / Autocorrelation
Moran’s I
Global statistic measuring whether similar values cluster together spatially.
I near +1: values cluster (hot-spots and cold-spots)
I near 0: random spatial arrangement (no spatial pattern)
I near -1: values are dispersed (chess-board pattern)
The test uses a normal approximation. p_value from the z-score tests the null hypothesis
of complete spatial randomness.
Geary’s C
Complementary to Moran’s I, Geary’s C uses squared differences between neighbouring values. C = 1 means no spatial autocorrelation; C < 1 indicates positive autocorrelation.
Spatial weights matrix methods: knn (k-nearest neighbours, default k=8) or distance_band
(all neighbours within a threshold distance).
Network Analysis
NetworkX is required for network operations. Check availability with NETWORKX_AVAILABLE.
Shortest paths: Dijkstra’s algorithm (non-negative weights), Bellman-Ford (handles negative weights), or A* (with heuristic, faster on geographic networks).
Minimum spanning tree: Kruskal’s or Prim’s algorithm. Returns the subgraph connecting all nodes with minimum total edge weight.
Accessibility analysis: For each node, computes distance/cost to the nearest service location. Results show coverage statistics (fraction of nodes within each threshold band).
Isochrones: Service areas computed by expanding from service nodes until the budget is exhausted. Returns the set of reachable nodes per threshold.
Centrality measures (when include_centrality=True):
Measure |
Interpretation |
|---|---|
Degree centrality |
Fraction of nodes connected to this node |
Betweenness centrality |
How often a node lies on shortest paths between other pairs |
Closeness centrality |
Mean inverse distance to all other nodes |
Eigenvector centrality |
Influence score weighted by neighbour influence |
Composition
After geospatial analysis |
Chain to |
Purpose |
|---|---|---|
Distance matrix |
Regression/Modeling |
Spatial lag features for prediction |
Interpolated grid |
Statistical Analysis |
Test distribution of interpolated surface |
Moran’s I result |
Pattern Recognition |
Cluster detection based on confirmed autocorrelation |
Isochrone coverage |
Business Intelligence |
Segment customers by accessibility |
Spatial join result |
Statistical Analysis |
Compare attribute distributions across spatial zones |
Examples
Compute haversine distances between store locations and customers
from localdata_mcp.domains.geospatial_analysis import SpatialDistanceCalculator
calc = SpatialDistanceCalculator()
distances = calc.haversine_distance(
lat1=customer_df["lat"].values,
lon1=customer_df["lon"].values,
lat2=store_lat,
lon2=store_lon,
unit="km",
)
Run autocorrelation analysis via pipeline
result = analyze_spatial_autocorrelation(
data=df,
coordinate_columns=["longitude", "latitude"],
value_column="house_price",
method="moran",
k_neighbors=6,
)
print(f"Moran's I = {result.value:.3f}, p = {result.p_value:.4f}")
print(result.interpretation)
Spatial join: assign census polygon attributes to point data
from localdata_mcp.domains.geospatial_analysis import perform_spatial_join
joined = perform_spatial_join(
left_gdf=customer_points,
right_gdf=census_polygons,
join_type="intersects",
how="left",
)
# Each customer point now carries attributes from the census polygon it falls within
Interpolate sensor readings onto a regular grid
from localdata_mcp.domains.geospatial_analysis import SpatialInterpolator
interpolator = SpatialInterpolator(method="kriging", variogram_model="spherical")
result = interpolator.interpolate(
points=sensor_locations, # list of (x, y) tuples
values=sensor_readings, # np.ndarray of observed values
grid_resolution=100,
)
# result.interpolated_values: grid of estimated values
# result.prediction_variance: uncertainty per grid cell
Route optimisation on a delivery network
route = optimize_route(
network_data={"nodes": node_list, "edges": edge_list},
origin="depot_A",
destination="customer_42",
weight="travel_time_minutes",
)
print(f"Route: {route.path}, total time: {route.total_distance} min")