Spatial statistics

Spatial statistics are widely used in geography, geology, environmental science, sociology, urban planning, epidemiology and economics. We have previously participated in an effort to provide a spatial statistics guide for spatial transcriptomics that explains the basic concepts and methods and is also applicable to spatial proteomics data: pasta spatial statistics guide

spatiomic also provides a set of functions to calculate spatial statistics, too. In this notebook, we will show how to use these methods on spatial proteomics data. spatiomic builds on top of esda and pysal, which can also be used for further analysis as spatiomic.spatial.spatial_weights returns a pysal W object.

[1]:
import esda
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from colorcet import coolwarm
from matplotlib.colors import ListedColormap
from skimage.filters import threshold_otsu
from skimage.transform import downscale_local_mean

import spatiomic as so

Data loading

[2]:
image_id = "245 g1"
[3]:
# read the image and convert the dimension order to XYC/channel-last
img_data = so.data.read().read_tiff(
    f"./data/mouse_cgn/{image_id}.tif",
    input_dimension_order="CYX",
)
img_data = downscale_local_mean(img_data, (4, 4, 1))

# load the clustered image
clustered_img = np.load(f"./data/mouse_cgn/{image_id}-labelled.npy")

# load the channel list
channels = list(pd.read_csv("./data/mouse_cgn/channel_names.csv", header=0).T.to_numpy()[0])

img_data.shape, clustered_img.shape
[3]:
((438, 438, 35), (438, 438))

Calculate spatial weights

[4]:
neighborhood_offset = so.spatial.neighborhood_offset(
    neighborhood_type="queen",
    order=2,
    exclude_inner_order=1,
)

Create a pysal weights object for your image based on the neighborhood offset.

[5]:
spatial_weights = so.spatial.spatial_weights(
    data_shape=img_data.shape[:2],
    neighborhood_offset=neighborhood_offset,
)
Creating spatial weights for each offset: 100%|██████████| 16/16 [00:00<00:00, 1520.19it/s]

Let’s visualize the spatial weights matrix for an example point.

[6]:
image = np.zeros_like(clustered_img).flatten()
point_idx = 16000
image[
    spatial_weights.to_adjlist(drop_islands=True)[spatial_weights.to_adjlist(drop_islands=True).focal == point_idx][
        "neighbor"
    ].to_numpy()
] = 1
image[point_idx] = 2
image = image.reshape(clustered_img.shape)

plt.imshow(image[33:40, 229:236])
plt.gca().set_axis_off()
../_images/tutorials_spatial_11_0.png

Calculate the autocorrelation

[7]:
correlation = so.spatial.autocorrelation().predict(
    data=clustered_img,
    spatial_weights=spatial_weights,
)
Calculating spatial autocorrelation for each channel/cluster: 100%|██████████| 25/25 [00:24<00:00,  1.02it/s]
[8]:
correlation.sort_values("morans_i", ascending=False).plot.bar(
    x="cluster",
    y="morans_i",
    figsize=(12, 6),
    title="Moran's I for each cluster",
)
[8]:
<Axes: title={'center': "Moran's I for each cluster"}, xlabel='cluster'>
../_images/tutorials_spatial_14_1.png

Local statistics

Let’s take a look at OAT1, a marker of the proximale tubule, especially the brush border.

[9]:
plt.imshow(img_data[:, :, channels.index("OAT1")], cmap="gray")
plt.gca().set_axis_off()
../_images/tutorials_spatial_17_0.png

Getis-Ord Gi*

Let’s try to find local hotspots using Getis-Ord Gi*. For this, we will downscale the image to speed up the computation.

[10]:
image_downscaled = downscale_local_mean(
    img_data[:, :, channels.index("OAT1") : channels.index("OAT1") + 1],
    (4, 4, 1),
)
[11]:
spatial_weights_circle = so.spatial.spatial_weights(
    data_shape=image_downscaled.shape[:2],
    neighborhood_offset=so.spatial.neighborhood_offset(
        neighborhood_type="circle",
        order=10,
        exclude_inner_order=0,
    ),
)
Creating spatial weights for each offset: 100%|██████████| 316/316 [00:00<00:00, 20182.73it/s]
[12]:
local_autocorrelation_gi = so.spatial.local_autocorrelation().predict(
    data=image_downscaled,
    spatial_weights=spatial_weights_circle,
    method="getisord",
    star=True,
)
local_autocorrelation_gi.columns
/var/folders/vg/vt1jjh256dn1g4sd6pknwxbs_qmnws/T/ipykernel_74345/1097534021.py:1: UserWarning: Data is not in float64 format. Converting to float64 required by `esda`.
  local_autocorrelation_gi = so.spatial.local_autocorrelation().predict(
[12]:
Index(['local_statistic', 'z_score', 'p_value', 'expected_value'], dtype='object')

We can visualize the hotspots and through simple thresholding, we can get an approximate segmentation of the proximale tubule.

[13]:
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.imshow(
    local_autocorrelation_gi["local_statistic"].to_numpy().reshape(image_downscaled.shape[:2]),
    cmap=ListedColormap(coolwarm),
)
ax.set_axis_off()
../_images/tutorials_spatial_24_0.png

We can do the same with the local Moran’s I.

[14]:
local_autocorrelation_moran = so.spatial.local_autocorrelation().predict(
    data=image_downscaled,
    spatial_weights=spatial_weights_circle,
    method="moran",
    permutation_count=100,
)
local_autocorrelation_moran.columns
/var/folders/vg/vt1jjh256dn1g4sd6pknwxbs_qmnws/T/ipykernel_74345/1680012595.py:1: UserWarning: Data is not in float64 format. Converting to float64 required by `esda`.
  local_autocorrelation_moran = so.spatial.local_autocorrelation().predict(
[14]:
Index(['local_statistic', 'quadrant', 'p_value', 'z_score', 'expected_value'], dtype='object')
[15]:
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.imshow(
    local_autocorrelation_moran["local_statistic"].to_numpy().reshape(image_downscaled.shape[:2]),
    cmap=ListedColormap(coolwarm),
)
ax.set_axis_off()
../_images/tutorials_spatial_27_0.png

Local Heteroskedasticity

LOSH can help us find regions were signals change fast, e.g., borders. Since this is sensitive to the intensity of the specific signal, let’s look at nuclei and binarize our data.

[16]:
nuclei_thresholded = (
    img_data[..., channels.index("DRAQ5") : channels.index("DRAQ5") + 1]
    > threshold_otsu(img_data[..., channels.index("DRAQ5") : channels.index("DRAQ5") + 1]) * 0.5
)
[17]:
plt.imshow(nuclei_thresholded, cmap="gray")
plt.gca().set_axis_off()
../_images/tutorials_spatial_31_0.png

We will use a queen neighborhood that does not reach far.

[18]:
spatial_weights_losh = so.spatial.spatial_weights(
    data_shape=nuclei_thresholded.shape[:2],
    neighborhood_offset=so.spatial.neighborhood_offset(
        neighborhood_type="queen",
        order=1,
        exclude_inner_order=0,
    ),
)
Creating spatial weights for each offset: 100%|██████████| 8/8 [00:00<00:00, 1307.81it/s]
[19]:
losh = so.spatial.local_heteroskedasticity().predict(
    data=nuclei_thresholded,
    channel_names=["DRAQ5"],
    spatial_weights=spatial_weights_losh,
)

Now we can visualize the border region.

[20]:
plt.imshow(
    losh.loc["DRAQ5"].losh.reshape(nuclei_thresholded.shape[:2]) > 4,
    cmap="gray",
)
plt.gca().set_axis_off()
../_images/tutorials_spatial_36_0.png

Join counts and vicinity graph

[21]:
df_vicinity_composition, df_p_values = so.spatial.vicinity_composition(
    data=clustered_img,
    ignore_repeated=True,
    permutations=100,
)
df_vicinity_composition.head()
[21]:
0 1 2 3 4 5 6 7 8 9 ... 15 16 17 18 19 20 21 22 23 24
0 0 731 96 180 393 131 178 94 435 206 ... 233 25 46 63 237 184 121 68 176 62
1 522 0 996 789 3599 1031 829 69 149 250 ... 263 82 39 199 144 151 111 18 628 31
2 94 1012 0 933 1545 207 191 234 33 19 ... 29 48 69 58 98 31 103 27 425 37
3 214 792 937 0 1279 253 326 133 162 48 ... 120 82 81 143 153 61 125 152 297 72
4 393 3704 1614 1243 0 324 164 233 338 49 ... 130 54 28 260 227 81 327 25 1217 552

5 rows × 25 columns

The association between cluster 9 and cluster 6 seems to be significant.

[22]:
df_p_values.iloc[9, 6].round(4)
[22]:
np.float64(0.0099)
[23]:
vicinity_graph = so.spatial.vicinity_graph(
    data=df_vicinity_composition,
)
[24]:
fig = so.plot.spatial_graph(
    vicinity_graph,
    edge_threshold=0.15,
)
fig.set_size_inches(5, 5)
fig.set_dpi(100)
../_images/tutorials_spatial_42_0.png

Interoperability with esda

Use esda to get the spatial bivariate statistics for combinations of immunofluorescence markers. We can use the same spatial weights object as before.

[25]:
np.random.seed(3407)
moran_bv = esda.moran.Moran_BV(
    img_data[:, :, channels.index("cjun")],
    img_data[:, :, channels.index("Vimentin")],
    spatial_weights,
    transformation="r",
    permutations=100,
)
np.round(moran_bv.I, 4), np.round(moran_bv.p_sim, 4)
[25]:
(np.float64(0.3642), np.float64(0.0099))
[26]:
np.random.seed(3407)
lees_l_estimator = esda.lee.Spatial_Pearson(connectivity=spatial_weights.to_sparse(), permutations=100)
lees_l_estimator.fit(
    img_data[:, :, channels.index("cjun")].reshape(-1, 1),
    img_data[:, :, channels.index("Vimentin")].reshape(-1, 1),
)
np.round(lees_l_estimator.association_[0, 1], 4)
[26]:
np.float64(0.3427)
[ ]: