Spatial statistics¶

Spatial statistics are widely used in geography, geology, environmental science, sociology, urban planning, epidemiology and economics. We have previously participated in an effort to provide a spatial statistics guide for spatial transcriptomics that explains the basic concepts and methods and is also applicable to spatial proteomics data: pasta spatial statistics guide

spatiomic also provides a set of functions to calculate spatial statistics, too. In this notebook, we will show how to use these methods on spatial proteomics data. spatiomic builds on top of esda and pysal, which can also be used for further analysis as spatiomic.spatial.spatial_weights returns a pysal W object.

[1]:

import esda
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from colorcet import coolwarm
from matplotlib.colors import ListedColormap
from skimage.filters import threshold_otsu
from skimage.transform import downscale_local_mean

import spatiomic as so

Data loading¶

[2]:

image_id = "245 g1"

[3]:

# read the image and convert the dimension order to XYC/channel-last
img_data = so.data.read().read_tiff(
    f"./data/mouse_cgn/{image_id}.tif",
    input_dimension_order="CYX",
)
img_data = downscale_local_mean(img_data, (4, 4, 1))

# load the clustered image
clustered_img = np.load(f"./data/mouse_cgn/{image_id}-labelled.npy")

# load the channel list
channels = list(pd.read_csv("./data/mouse_cgn/channel_names.csv", header=0).T.to_numpy()[0])

img_data.shape, clustered_img.shape

[3]:

((438, 438, 35), (438, 438))

Calculate spatial weights¶

[4]:

neighborhood_offset = so.spatial.neighborhood_offset(
    neighborhood_type="queen",
    order=2,
    exclude_inner_order=1,
)

Create a pysal weights object for your image based on the neighborhood offset.

[5]:

spatial_weights = so.spatial.spatial_weights(
    data_shape=img_data.shape[:2],
    neighborhood_offset=neighborhood_offset,
)

Creating spatial weights for each offset: 100%|██████████| 16/16 [00:00<00:00, 1520.19it/s]

Let’s visualize the spatial weights matrix for an example point.

[6]:

image = np.zeros_like(clustered_img).flatten()
point_idx = 16000
image[
    spatial_weights.to_adjlist(drop_islands=True)[spatial_weights.to_adjlist(drop_islands=True).focal == point_idx][
        "neighbor"
    ].to_numpy()
] = 1
image[point_idx] = 2
image = image.reshape(clustered_img.shape)

plt.imshow(image[33:40, 229:236])
plt.gca().set_axis_off()

Calculate the autocorrelation¶

[7]:

correlation = so.spatial.autocorrelation().predict(
    data=clustered_img,
    spatial_weights=spatial_weights,
)

Calculating spatial autocorrelation for each channel/cluster: 100%|██████████| 25/25 [00:24<00:00,  1.02it/s]

[8]:

correlation.sort_values("morans_i", ascending=False).plot.bar(
    x="cluster",
    y="morans_i",
    figsize=(12, 6),
    title="Moran's I for each cluster",
)

[8]:

<Axes: title={'center': "Moran's I for each cluster"}, xlabel='cluster'>

Local statistics¶

Let’s take a look at OAT1, a marker of the proximale tubule, especially the brush border.

[9]:

plt.imshow(img_data[:, :, channels.index("OAT1")], cmap="gray")
plt.gca().set_axis_off()

Getis-Ord Gi*¶

Let’s try to find local hotspots using Getis-Ord Gi*. For this, we will downscale the image to speed up the computation.

[10]:

image_downscaled = downscale_local_mean(
    img_data[:, :, channels.index("OAT1") : channels.index("OAT1") + 1],
    (4, 4, 1),
)

[11]:

spatial_weights_circle = so.spatial.spatial_weights(
    data_shape=image_downscaled.shape[:2],
    neighborhood_offset=so.spatial.neighborhood_offset(
        neighborhood_type="circle",
        order=10,
        exclude_inner_order=0,
    ),
)

Creating spatial weights for each offset: 100%|██████████| 316/316 [00:00<00:00, 20182.73it/s]

[12]:

local_autocorrelation_gi = so.spatial.local_autocorrelation().predict(
    data=image_downscaled,
    spatial_weights=spatial_weights_circle,
    method="getisord",
    star=True,
)
local_autocorrelation_gi.columns

/var/folders/vg/vt1jjh256dn1g4sd6pknwxbs_qmnws/T/ipykernel_74345/1097534021.py:1: UserWarning: Data is not in float64 format. Converting to float64 required by `esda`.
  local_autocorrelation_gi = so.spatial.local_autocorrelation().predict(

[12]:

Index(['local_statistic', 'z_score', 'p_value', 'expected_value'], dtype='object')

We can visualize the hotspots and through simple thresholding, we can get an approximate segmentation of the proximale tubule.

[13]:

fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.imshow(
    local_autocorrelation_gi["local_statistic"].to_numpy().reshape(image_downscaled.shape[:2]),
    cmap=ListedColormap(coolwarm),
)
ax.set_axis_off()

We can do the same with the local Moran’s I.

[14]:

local_autocorrelation_moran = so.spatial.local_autocorrelation().predict(
    data=image_downscaled,
    spatial_weights=spatial_weights_circle,
    method="moran",
    permutation_count=100,
)
local_autocorrelation_moran.columns

/var/folders/vg/vt1jjh256dn1g4sd6pknwxbs_qmnws/T/ipykernel_74345/1680012595.py:1: UserWarning: Data is not in float64 format. Converting to float64 required by `esda`.
  local_autocorrelation_moran = so.spatial.local_autocorrelation().predict(

[14]:

Index(['local_statistic', 'quadrant', 'p_value', 'z_score', 'expected_value'], dtype='object')

[15]:

fig, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.imshow(
    local_autocorrelation_moran["local_statistic"].to_numpy().reshape(image_downscaled.shape[:2]),
    cmap=ListedColormap(coolwarm),
)
ax.set_axis_off()

Local Heteroskedasticity¶

LOSH can help us find regions were signals change fast, e.g., borders. Since this is sensitive to the intensity of the specific signal, let’s look at nuclei and binarize our data.

[16]:

nuclei_thresholded = (
    img_data[..., channels.index("DRAQ5") : channels.index("DRAQ5") + 1]
    > threshold_otsu(img_data[..., channels.index("DRAQ5") : channels.index("DRAQ5") + 1]) * 0.5
)

[17]:

plt.imshow(nuclei_thresholded, cmap="gray")
plt.gca().set_axis_off()

We will use a queen neighborhood that does not reach far.

[18]:

spatial_weights_losh = so.spatial.spatial_weights(
    data_shape=nuclei_thresholded.shape[:2],
    neighborhood_offset=so.spatial.neighborhood_offset(
        neighborhood_type="queen",
        order=1,
        exclude_inner_order=0,
    ),
)

Creating spatial weights for each offset: 100%|██████████| 8/8 [00:00<00:00, 1307.81it/s]

[19]:

losh = so.spatial.local_heteroskedasticity().predict(
    data=nuclei_thresholded,
    channel_names=["DRAQ5"],
    spatial_weights=spatial_weights_losh,
)

Now we can visualize the border region.

[20]:

plt.imshow(
    losh.loc["DRAQ5"].losh.reshape(nuclei_thresholded.shape[:2]) > 4,
    cmap="gray",
)
plt.gca().set_axis_off()

Join counts and vicinity graph¶

[21]:

df_vicinity_composition, df_p_values = so.spatial.vicinity_composition(
    data=clustered_img,
    ignore_repeated=True,
    permutations=100,
)
df_vicinity_composition.head()

[21]:

	0	1	2	3	4	5	6	7	8	9	...	15	16	17	18	19	20	21	22	23	24
0	0	731	96	180	393	131	178	94	435	206	...	233	25	46	63	237	184	121	68	176	62
1	522	0	996	789	3599	1031	829	69	149	250	...	263	82	39	199	144	151	111	18	628	31
2	94	1012	0	933	1545	207	191	234	33	19	...	29	48	69	58	98	31	103	27	425	37
3	214	792	937	0	1279	253	326	133	162	48	...	120	82	81	143	153	61	125	152	297	72
4	393	3704	1614	1243	0	324	164	233	338	49	...	130	54	28	260	227	81	327	25	1217	552

5 rows × 25 columns

The association between cluster 9 and cluster 6 seems to be significant.

[22]:

df_p_values.iloc[9, 6].round(4)

[22]:

np.float64(0.0099)

[23]:

vicinity_graph = so.spatial.vicinity_graph(
    data=df_vicinity_composition,
)

[24]:

fig = so.plot.spatial_graph(
    vicinity_graph,
    edge_threshold=0.15,
)
fig.set_size_inches(5, 5)
fig.set_dpi(100)

Interoperability with `esda`¶

Use esda to get the spatial bivariate statistics for combinations of immunofluorescence markers. We can use the same spatial weights object as before.

[25]:

np.random.seed(3407)
moran_bv = esda.moran.Moran_BV(
    img_data[:, :, channels.index("cjun")],
    img_data[:, :, channels.index("Vimentin")],
    spatial_weights,
    transformation="r",
    permutations=100,
)
np.round(moran_bv.I, 4), np.round(moran_bv.p_sim, 4)

[25]:

(np.float64(0.3642), np.float64(0.0099))

[26]:

np.random.seed(3407)
lees_l_estimator = esda.lee.Spatial_Pearson(connectivity=spatial_weights.to_sparse(), permutations=100)
lees_l_estimator.fit(
    img_data[:, :, channels.index("cjun")].reshape(-1, 1),
    img_data[:, :, channels.index("Vimentin")].reshape(-1, 1),
)
np.round(lees_l_estimator.association_[0, 1], 4)

[26]:

np.float64(0.3427)

[ ]: