Tutorial 6: DLPFC interpolation

This tutorial primarily describes the process of using samples 151673 and 151675 to impute 151674.

Environment Configuration & Package Loading

[1]:

import os
import torch
import pandas as pd
import scanpy as sc
from GenOT import genot
import warnings

warnings.filterwarnings("ignore")
# Run device, by default, the package is implemented on 'cpu'. We recommend using GPU.
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# the location of R, which is necessary for mclust algorithm. Please replace the path below with local R installation path
os.environ['R_HOME'] = 'C:/Program Files/R/R-4.4.1'
os.environ['PATH'] = 'C:/Program Files/R/R-4.4.1/bin/x64;' + os.environ['PATH']

C:\ProgramData\anaconda3\envs\pytorch\lib\site-packages\scipy\__init__.py:177: UserWarning: A NumPy version >=1.18.5 and <1.26.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"

Data Loading

We load three different Visium spatial transcriptomics datasets from the DLPFC (Dorsolateral Prefrontal Cortex) region. These datasets will be used for interpolation and analysis.

[2]:

section_id = '151673'
input_dir = os.path.join('../Data', section_id)
adata1 = sc.read_visium(path=input_dir, count_file=section_id + '_filtered_feature_bc_matrix.h5', load_images=True)
adata1.var_names_make_unique(join="++")
print(adata1)
Ann_df = pd.read_csv(os.path.join(input_dir, section_id + '_truth.txt'), sep='\t', header=None, index_col=0)
Ann_df.columns = ['Ground Truth']
Ann_df[Ann_df.isna()] = "unknown"
adata1.obs['Ground Truth'] = Ann_df.loc[adata1.obs_names, 'Ground Truth'].astype('category')

section_id = '151675'
input_dir = os.path.join('../Data', section_id)
adata2 = sc.read_visium(path=input_dir, count_file=section_id + '_filtered_feature_bc_matrix.h5', load_images=True)
adata2.var_names_make_unique(join="++")
print(adata2)
Ann_df = pd.read_csv(os.path.join(input_dir, section_id + '_truth.txt'), sep='\t', header=None, index_col=0)
Ann_df.columns = ['Ground Truth']
Ann_df[Ann_df.isna()] = "unknown"
adata2.obs['Ground Truth'] = Ann_df.loc[adata2.obs_names, 'Ground Truth'].astype('category')

section_id = '151674'
input_dir = os.path.join('../Data', section_id)
adata3 = sc.read_visium(path=input_dir, count_file=section_id + '_filtered_feature_bc_matrix.h5', load_images=True)
# adata3 = sc.pp.subsample(adata3, n_obs=3000, random_state=0, copy=True)
adata3.var_names_make_unique(join="++")
print(adata3)
Ann_df = pd.read_csv(os.path.join(input_dir, section_id + '_truth.txt'), sep='\t', header=None, index_col=0)
Ann_df.columns = ['Ground Truth']
Ann_df[Ann_df.isna()] = "unknown"
adata3.obs['Ground Truth'] = Ann_df.loc[adata3.obs_names, 'Ground Truth'].astype('category')

AnnData object with n_obs × n_vars = 3639 × 33538
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'
AnnData object with n_obs × n_vars = 3592 × 33538
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'
AnnData object with n_obs × n_vars = 3673 × 33538
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

Normalize Data

Normalize the gene expression data for each AnnData object using normalize_sparse. This step is crucial for consistent downstream analysis.

[3]:

from GenOT.utils import normalize_sparse

adata1 = normalize_sparse(adata1)
adata2 = normalize_sparse(adata2)
adata3 = normalize_sparse(adata3)

Visualize adata1, adata2, and adata3.

Visualize the spatial distribution of ‘Ground Truth’ annotations for each dataset. This helps in understanding the tissue morphology and layer organization.

[4]:

sc.pl.spatial(adata1, img_key=None, color='Ground Truth')
sc.pl.spatial(adata2, img_key=None, color='Ground Truth')
sc.pl.spatial(adata3, img_key=None, color='Ground Truth')

_images/Tutorial_6_DLPFC_interpolation_8_0.png

_images/Tutorial_6_DLPFC_interpolation_8_1.png

_images/Tutorial_6_DLPFC_interpolation_8_2.png

Align the input datasets adata1 and adata2 using the PASTE2 algorithm.

PASTE2 is used to spatially align two AnnData objects by finding an optimal transformation that minimizes the distance between corresponding spots.

[5]:

from GenOT.utils import PASTE2_align_spatial_data

adata1.obsm['spatial'], adata2.obsm['spatial'] = PASTE2_align_spatial_data(adata1, adata2, visualize=True)

PASTE2 starts...
Starting GLM-PCA...
Iteration: 0 | deviance=4.2825E+6
Iteration: 1 | deviance=4.2825E+6
Iteration: 2 | deviance=4.1865E+6
Iteration: 3 | deviance=4.0313E+6
Iteration: 4 | deviance=3.9559E+6
Iteration: 5 | deviance=3.9226E+6
Iteration: 6 | deviance=3.9048E+6
Iteration: 7 | deviance=3.8942E+6
Iteration: 8 | deviance=3.8876E+6
Iteration: 9 | deviance=3.8831E+6
Iteration: 10 | deviance=3.8797E+6
Iteration: 11 | deviance=3.8769E+6
Iteration: 12 | deviance=3.8747E+6
Iteration: 13 | deviance=3.8728E+6
Iteration: 14 | deviance=3.8712E+6
Iteration: 15 | deviance=3.8699E+6
Iteration: 16 | deviance=3.8687E+6
Iteration: 17 | deviance=3.8677E+6
Iteration: 18 | deviance=3.8668E+6
Iteration: 19 | deviance=3.8660E+6
Iteration: 20 | deviance=3.8653E+6
Iteration: 21 | deviance=3.8647E+6
Iteration: 22 | deviance=3.8641E+6
Iteration: 23 | deviance=3.8636E+6
Iteration: 24 | deviance=3.8631E+6
Iteration: 25 | deviance=3.8626E+6
Iteration: 26 | deviance=3.8622E+6
Iteration: 27 | deviance=3.8618E+6
GLM-PCA finished.
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.629470e+02|0.000000e+00|0.000000e+00
    1|3.336742e+01|3.883416e+00|1.295796e+02
    2|1.202396e+01|1.775077e+00|2.134346e+01
    3|1.056753e+01|1.378212e-01|1.456430e+00
    4|1.053774e+01|2.827189e-03|2.979219e-02
    5|1.052598e+01|1.117887e-03|1.176685e-02
    6|1.051415e+01|1.124751e-03|1.182580e-02
    7|1.050555e+01|8.188502e-04|8.602469e-03
    8|1.049879e+01|6.437843e-04|6.758955e-03
    9|1.049461e+01|3.979337e-04|4.176159e-03
   10|1.049173e+01|2.742749e-04|2.877619e-03
   11|1.048998e+01|1.667681e-04|1.749395e-03
   12|1.048868e+01|1.242946e-04|1.303686e-03
   13|1.048781e+01|8.350228e-05|8.757556e-04
   14|1.048705e+01|7.169439e-05|7.518629e-04
   15|1.048655e+01|4.810681e-05|5.044744e-04
   16|1.048621e+01|3.218690e-05|3.375186e-04
   17|1.048601e+01|1.956590e-05|2.051682e-04
   18|1.048588e+01|1.186060e-05|1.243688e-04
   19|1.048581e+01|6.549029e-06|6.867190e-05
   20|1.048570e+01|1.094125e-05|1.147266e-04
   21|1.048559e+01|9.945426e-06|1.042837e-04
   22|1.048552e+01|6.964700e-06|7.302851e-05
   23|1.048549e+01|3.334384e-06|3.496264e-05
   24|1.048547e+01|1.783350e-06|1.869926e-05
   25|1.048546e+01|1.107618e-06|1.161388e-05
   26|1.048544e+01|1.171810e-06|1.228695e-05
   27|1.048543e+01|1.719212e-06|1.802667e-05
   28|1.048540e+01|2.407388e-06|2.524242e-05
   29|1.048536e+01|3.734546e-06|3.915807e-05
   30|1.048535e+01|1.443932e-06|1.514013e-05
   31|1.048534e+01|5.351043e-07|5.610751e-06
   32|1.048534e+01|4.176353e-07|4.379047e-06
   33|1.048533e+01|1.684452e-07|1.766204e-06
   34|1.048533e+01|1.453831e-07|1.524390e-06
   35|1.048533e+01|4.463739e-07|4.680376e-06
   36|1.048532e+01|3.445573e-07|3.612796e-06
   37|1.048532e+01|8.397153e-07|8.804680e-06
   38|1.048531e+01|4.483258e-07|4.700835e-06
   39|1.048531e+01|2.094770e-07|2.196431e-06
   40|1.048531e+01|5.465983e-08|5.731251e-07
   41|1.048531e+01|8.697478e-09|9.119574e-08
   42|1.048531e+01|6.659006e-09|6.982173e-08
 1000|1.048531e+01|8.122292e-10|8.516473e-09

_images/Tutorial_6_DLPFC_interpolation_10_1.png

Get unique marker genes from adata3 based on its ‘Ground Truth’ annotations. These marker genes will be used to ensure consistent gene sets across all datasets.

[6]:

from GenOT.utils import get_unique_marker_genes

unique_marker_genes = get_unique_marker_genes(adata3, annotation_column_name='Ground Truth', n_top_genes=10)

Using annotation column 'Ground Truth' for marker gene analysis...
Unique values in annotation column: ['Layer_3', 'Layer_1', 'WM', 'Layer_5', 'Layer_6', 'Layer_2', 'Layer_4', 'unknown']
Extracted 66 unique marker genes from the top 10 genes per annotation group.
First 10 extracted marker genes: ['AGR3', 'ATP1A1', 'AZGP1', 'B3GALT2', 'C11orf87', 'CA10', 'CARTPT', 'CBX6', 'CLDN11', 'CLDND1']...

Data Preprocessing

This section preprocesses the data by ensuring all datasets contain the same set of genes, specifically the marker genes identified earlier.

[7]:

common_genes = set(unique_marker_genes) & set(adata1.var_names) & set(adata2.var_names) & set(adata3.var_names)
matching_genes = list(common_genes)

adata1 = adata1[:, matching_genes].copy()
adata2 = adata2[:, matching_genes].copy()
adata3 = adata3[:, matching_genes].copy()

Run GenOT

This section initializes and trains the GenOT DualEncoder, which learns a shared latent space representation for the input spatial transcriptomics datasets.

[8]:

# define model
Encoder = genot.DualEncoder(adata1, adata2, device=device, pca_n=16)

WARNING: adata.X seems to be already log-transformed.
WARNING: adata.X seems to be already log-transformed.

[9]:

adata1, adata2 = Encoder.train_encoder()

Training DualEncoder on 2 datasets with 700 epochs...

Training: 100%|██████████| 700/700 [00:22<00:00, 31.65it/s]

Training completed!

Compute the spatial barycenter.

[10]:

from GenOT.OTutils import compute_spatial_barycenter

Xb_s, s_transport_plans = compute_spatial_barycenter(adata1, adata2, num_barycenters=adata3.n_obs)

It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.526740e+07|0.000000e+00|0.000000e+00
    1|1.386171e+07|1.014077e-01|1.405685e+06
    2|1.386171e+07|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.560258e+07|0.000000e+00|0.000000e+00
    1|1.393915e+07|1.193348e-01|1.663426e+06
    2|1.393915e+07|0.000000e+00|0.000000e+00
It.  |Err
-------------------
    0|1.269066e+07|
    0|1.608939e+05|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.315560e+07|0.000000e+00|0.000000e+00
    1|1.751202e+06|6.512327e+00|1.140440e+07
    2|1.735259e+06|9.187614e-03|1.594289e+04
    3|1.733098e+06|1.247302e-03|2.161696e+03
    4|1.733098e+06|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.338249e+07|0.000000e+00|0.000000e+00
    1|1.864172e+06|6.178786e+00|1.151832e+07
    2|1.850695e+06|7.282146e-03|1.347703e+04
    3|1.850147e+06|2.964959e-04|5.485609e+02
    4|1.849901e+06|1.328432e-04|2.457467e+02
    5|1.849901e+06|0.000000e+00|0.000000e+00
    1|6.255801e+06|
    1|4.585554e+04|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.385275e+07|0.000000e+00|0.000000e+00
    1|5.144138e+04|2.682920e+02|1.380131e+07
    2|2.491653e+04|1.064548e+00|2.652485e+04
    3|2.491579e+04|2.983232e-05|7.432956e-01
    4|2.491512e+04|2.685066e-05|6.689873e-01
    5|2.491499e+04|4.946091e-06|1.232318e-01
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.403153e+07|0.000000e+00|0.000000e+00
    1|4.188514e+04|3.340003e+02|1.398965e+07
    2|2.517658e+04|6.636546e-01|1.670855e+04
    3|2.517566e+04|3.647349e-05|9.182442e-01
    4|2.517561e+04|2.322090e-06|5.846001e-02
    2|3.023500e+05|
    2|4.879279e+03|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.388917e+07|0.000000e+00|0.000000e+00
    1|6.904005e+04|2.001756e+02|1.382013e+07
    2|1.799695e+04|2.836208e+00|5.104310e+04
    3|1.799676e+04|1.082792e-05|1.948675e-01
    4|1.799676e+04|3.308993e-09|5.955114e-05
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406795e+07|0.000000e+00|0.000000e+00
    1|5.281460e+04|2.653649e+02|1.401514e+07
    2|1.820903e+04|1.900461e+00|3.460556e+04
    3|1.820878e+04|1.407460e-05|2.562813e-01
    4|1.820878e+04|1.846640e-08|3.362506e-04
    3|5.511423e+04|
    3|9.412721e+02|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.388972e+07|0.000000e+00|0.000000e+00
    1|6.805087e+04|2.031079e+02|1.382167e+07
    2|1.764421e+04|2.856839e+00|5.040666e+04
    3|1.764384e+04|2.088678e-05|3.685230e-01
    4|1.764384e+04|3.249027e-08|5.732530e-04
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406850e+07|0.000000e+00|0.000000e+00
    1|5.442088e+04|2.575129e+02|1.401408e+07
    2|1.795211e+04|2.031447e+00|3.646877e+04
    3|1.795180e+04|1.764359e-05|3.167342e-01
    4|1.795180e+04|1.418568e-15|2.546585e-11
    4|3.770926e+04|
    4|6.411836e+02|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.388990e+07|0.000000e+00|0.000000e+00
    1|6.866108e+04|2.012966e+02|1.382124e+07
    2|1.746666e+04|2.930979e+00|5.119442e+04
    3|1.746637e+04|1.661041e-05|2.901235e-01
    4|1.746637e+04|1.394787e-07|2.436186e-03
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406868e+07|0.000000e+00|0.000000e+00
    1|5.194040e+04|2.698620e+02|1.401674e+07
    2|1.780608e+04|1.917004e+00|3.413432e+04
    3|1.780604e+04|1.973525e-06|3.514067e-02
    5|3.067786e+04|
    5|5.162343e+02|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.389002e+07|0.000000e+00|0.000000e+00
    1|6.706564e+04|2.061109e+02|1.382296e+07
    2|1.736066e+04|2.863081e+00|4.970498e+04
    3|1.736023e+04|2.463403e-05|4.276526e-01
    4|1.736023e+04|2.511478e-08|4.359985e-04
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406880e+07|0.000000e+00|0.000000e+00
    1|5.212078e+04|2.689270e+02|1.401668e+07
    2|1.769896e+04|1.944849e+00|3.442182e+04
    3|1.769848e+04|2.712819e-05|4.801278e-01
    4|1.769848e+04|7.511497e-08|1.329421e-03
    6|2.587306e+04|
    6|4.395153e+02|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.389011e+07|0.000000e+00|0.000000e+00
    1|6.686594e+04|2.067307e+02|1.382324e+07
    2|1.727665e+04|2.870307e+00|4.958929e+04
    3|1.727644e+04|1.212775e-05|2.095243e-01
    4|1.727644e+04|1.301304e-09|2.248190e-05
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406889e+07|0.000000e+00|0.000000e+00
    1|5.520170e+04|2.538634e+02|1.401369e+07
    2|1.762440e+04|2.132118e+00|3.757730e+04
    3|1.762402e+04|2.151800e-05|3.792336e-01
    4|1.762402e+04|5.608835e-08|9.885023e-04
    7|2.122793e+04|
    7|3.599307e+02|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.389017e+07|0.000000e+00|0.000000e+00
    1|6.878399e+04|2.009391e+02|1.382139e+07
    2|1.723114e+04|2.991841e+00|5.155284e+04
    3|1.723068e+04|2.661829e-05|4.586513e-01
    4|1.723068e+04|1.650662e-08|2.844203e-04
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406895e+07|0.000000e+00|0.000000e+00
    1|5.388683e+04|2.600833e+02|1.401507e+07
    2|1.756622e+04|2.067640e+00|3.632061e+04
    3|1.756541e+04|4.625594e-05|8.125043e-01
    4|1.756540e+04|2.487714e-07|4.369770e-03
    8|1.793257e+04|
    8|3.004944e+02|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.389022e+07|0.000000e+00|0.000000e+00
    1|6.684841e+04|2.067868e+02|1.382337e+07
    2|1.719598e+04|2.887443e+00|4.965242e+04
    3|1.719476e+04|7.100974e-05|1.220996e+00
    4|1.719476e+04|2.335213e-07|4.015342e-03
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406900e+07|0.000000e+00|0.000000e+00
    1|5.331611e+04|2.628789e+02|1.401568e+07
    2|1.752571e+04|2.042165e+00|3.579040e+04
    3|1.752557e+04|7.752226e-06|1.358622e-01
    9|1.618058e+04|
    9|2.707699e+02|

Run decoder

Initialize and train the GenOT Decoder. The decoder learns to reconstruct gene expression from the latent space embeddings.

[11]:

decoder = genot.Decoder(
    input_size=adata1.obsm['emb'].shape[1],  # Input dimension
    output_size=adata1.X.shape[1],  # Output dimension
)
# Train the decoder
trained_decoder = decoder.train_decoder(adata1, adata2, decoder, epochs=500, batch_size=2048)

100%|██████████| 500/500 [00:33<00:00, 14.73it/s]

[12]:

# Extract the learned embeddings from adata1 and adata2.
embd0 = adata1.obsm['emb']
embd1 = adata2.obsm['emb']

Compute the embedding barycenter for the latent representations.

[13]:

from GenOT.OTutils import compute_emb_barycenter

Xb, e_transport_plans = compute_emb_barycenter(
    adata1, adata2,
    weight1=0.5,
    alpha=0.5,
    num_barycenters=adata3.n_obs
)

It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|4.741920e+01|0.000000e+00|0.000000e+00
    1|3.864771e+01|2.269601e-01|8.771488e+00
    2|3.864771e+01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|3.640727e+01|0.000000e+00|0.000000e+00
    1|2.842808e+01|2.806795e-01|7.979181e+00
    2|2.840253e+01|8.997296e-04|2.555460e-02
    3|2.840253e+01|0.000000e+00|0.000000e+00
It.  |Err
-------------------
    0|7.186842e+03|
    0|4.668437e+02|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.402055e+01|0.000000e+00|0.000000e+00
    1|1.855133e+00|6.557708e+00|1.216542e+01
    2|1.846197e+00|4.840139e-03|8.935850e-03
    3|1.846197e+00|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.316487e+01|0.000000e+00|0.000000e+00
    1|1.514189e+00|7.694340e+00|1.165068e+01
    2|1.470494e+00|2.971447e-02|4.369494e-02
    3|1.470494e+00|0.000000e+00|0.000000e+00
    1|2.553234e+03|
    1|5.600021e+01|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.406848e+01|0.000000e+00|0.000000e+00
    1|9.724391e-01|1.346721e+01|1.309604e+01
    2|9.579550e-01|1.511980e-02|1.448409e-02
    3|9.579550e-01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.349701e+01|0.000000e+00|0.000000e+00
    1|1.055438e+00|1.178807e+01|1.244157e+01
    2|9.762700e-01|8.109177e-02|7.916746e-02
    3|9.762700e-01|0.000000e+00|0.000000e+00
    2|2.584962e+02|
    2|1.200261e+01|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.411329e+01|0.000000e+00|0.000000e+00
    1|9.411063e-01|1.399649e+01|1.317219e+01
    2|9.329591e-01|8.732595e-03|8.147155e-03
    3|9.284088e-01|4.901245e-03|4.550359e-03
    4|9.284088e-01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.354182e+01|0.000000e+00|0.000000e+00
    1|1.035858e+00|1.207305e+01|1.250596e+01
    2|9.505066e-01|8.979548e-02|8.535120e-02
    3|9.504068e-01|1.050017e-04|9.979437e-05
    4|9.504068e-01|0.000000e+00|0.000000e+00
    3|1.603657e+02|
    3|7.572746e+00|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.412767e+01|0.000000e+00|0.000000e+00
    1|9.264099e-01|1.424991e+01|1.320126e+01
    2|9.173938e-01|9.827951e-03|9.016102e-03
    3|9.173938e-01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.355619e+01|0.000000e+00|0.000000e+00
    1|1.018920e+00|1.230447e+01|1.253727e+01
    2|9.383659e-01|8.584497e-02|8.055399e-02
    3|9.383658e-01|9.151875e-08|8.587807e-08
    4|1.328084e+02|
    4|6.005610e+00|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.413600e+01|0.000000e+00|0.000000e+00
    1|9.202902e-01|1.436037e+01|1.321571e+01
    2|9.097421e-01|1.159466e-02|1.054815e-02
    3|9.097421e-01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.356453e+01|0.000000e+00|0.000000e+00
    1|1.015590e+00|1.235630e+01|1.254894e+01
    2|9.305206e-01|9.142149e-02|8.506958e-02
    3|9.305204e-01|1.618941e-07|1.506457e-07
    5|1.236219e+02|
    5|5.707096e+00|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.414319e+01|0.000000e+00|0.000000e+00
    1|9.092224e-01|1.455526e+01|1.323396e+01
    2|9.018503e-01|8.174382e-03|7.372069e-03
    3|9.010745e-01|8.609547e-04|7.757843e-04
    4|9.010745e-01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.357171e+01|0.000000e+00|0.000000e+00
    1|9.996367e-01|1.257664e+01|1.257207e+01
    2|9.238908e-01|8.198572e-02|7.574585e-02
    3|9.238907e-01|8.258421e-08|7.629878e-08
    6|1.038752e+02|
    6|4.729515e+00|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.414928e+01|0.000000e+00|0.000000e+00
    1|9.025142e-01|1.467763e+01|1.324677e+01
    2|8.962023e-01|7.042916e-03|6.311878e-03
    3|8.962023e-01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.357780e+01|0.000000e+00|0.000000e+00
    1|9.924141e-01|1.268159e+01|1.258539e+01
    2|9.177212e-01|8.138959e-02|7.469295e-02
    3|9.177211e-01|6.514683e-08|5.978662e-08
    7|1.004071e+02|
    7|4.477300e+00|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.415446e+01|0.000000e+00|0.000000e+00
    1|8.970260e-01|1.477932e+01|1.325743e+01
    2|8.931338e-01|4.357849e-03|3.892142e-03
    3|8.931338e-01|0.000000e+00|0.000000e+00
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.358298e+01|0.000000e+00|0.000000e+00
    1|9.911492e-01|1.270428e+01|1.259183e+01
    2|9.114000e-01|8.750175e-02|7.974910e-02
    3|9.114000e-01|1.066095e-07|9.716392e-08
    8|9.807398e+01|
    8|4.318798e+00|
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.415895e+01|0.000000e+00|0.000000e+00
    1|8.926285e-01|1.486208e+01|1.326632e+01
    2|8.882007e-01|4.985114e-03|4.427782e-03
    3|8.881973e-01|3.886970e-06|3.452396e-06
It.  |Loss        |Relative loss|Absolute loss
------------------------------------------------
    0|1.358747e+01|0.000000e+00|0.000000e+00
    1|9.849741e-01|1.279475e+01|1.260250e+01
    2|9.074867e-01|8.538674e-02|7.748734e-02
    3|9.074867e-01|3.718391e-08|3.374390e-08
    9|8.044219e+01|
    9|3.651166e+00|

Since the spatial barycenter and embedding barycenter reside in different spaces (not one-to-one cell correspondences), we align them by constraining the embedding barycenter. Refer to the [update_embedding_barycenter] function for details.

[14]:

from GenOT.OTutils import update_embedding_barycenter

Xb = update_embedding_barycenter(Xb_s, Xb, s_transport_plans, e_transport_plans)

Use the trained decoder to transform the embedding barycenter into gene expression values.

[15]:

new_embedding = torch.tensor(Xb, dtype=torch.float32)
new_embedding = new_embedding.to("cuda" if torch.cuda.is_available() else "cpu")

trained_decoder.eval()
with torch.no_grad():
    reconstructed_features = trained_decoder(new_embedding)

reconstructed_gene_expression = reconstructed_features.cpu().numpy()

print("Reconstructed Features Shape:", reconstructed_gene_expression.shape)

Reconstructed Features Shape: (3673, 66)

Create a new adata object where the gene expression matrix equals the generated expression values, and spatial coordinates match the spatial barycenter.

[16]:

new_adata = adata3.copy()
X = reconstructed_gene_expression.copy()
thresholds = X.max(axis=0) / 2
for i, threshold in enumerate(thresholds):
    X[:, i][X[:, i] < threshold] = 0
new_adata.X = X
new_adata.obsm['spatial'] = Xb_s

Map the generated gene expression data to the target dataset using the [create_mapped_adata] function.

[17]:

from GenOT.OTutils import create_mapped_adata

mapped_adata = create_mapped_adata(new_adata, adata3, threshold_denominator=2)

[18]:

import matplotlib.pyplot as plt
import scanpy as sc

import random

genes_of_interest = list(common_genes)[:10]

fig, axes = plt.subplots(1, 10, figsize=(35, 5))
plt.subplots_adjust(wspace=0.3)

for i, gene in enumerate(genes_of_interest):
    sc.pl.spatial(
        new_adata,
        color=gene,
        ax=axes[i],
        title=gene,
        show=False,
        size=1.5,
        spot_size=100,
        cmap='viridis',
        frameon=False,
        img_key=None
    )

plt.tight_layout()
plt.show()

_images/Tutorial_6_DLPFC_interpolation_35_0.png

Visualize the mapped data, original datasets (adata1/151673 and adata2/151675), and the imputed dataset (adata3/151674)

[19]:

import matplotlib.pyplot as plt
import scanpy as sc

import random

genes_of_interest = list(common_genes)[:10]

fig, axes = plt.subplots(1, 10, figsize=(35, 5))
plt.subplots_adjust(wspace=0.3)

for i, gene in enumerate(genes_of_interest):
    sc.pl.spatial(
        mapped_adata,
        color=gene,
        ax=axes[i],
        title=gene,
        show=False,
        size=1.5,
        spot_size=100,
        cmap='viridis',
        frameon=False
    )

plt.tight_layout()
plt.show()

_images/Tutorial_6_DLPFC_interpolation_37_0.png

[20]:

fig, axes = plt.subplots(1, 10, figsize=(35, 5))
plt.subplots_adjust(wspace=0.3)

for i, gene in enumerate(genes_of_interest):
    sc.pl.spatial(
        adata1,
        color=gene,
        ax=axes[i],
        img_key=None,
        title=gene,
        show=False,
        size=1.5,
        spot_size=100,
        cmap='viridis',
        frameon=False
    )

plt.tight_layout()
plt.show()

_images/Tutorial_6_DLPFC_interpolation_38_0.png

[21]:

fig, axes = plt.subplots(1, 10, figsize=(35, 5))
plt.subplots_adjust(wspace=0.3)

for i, gene in enumerate(genes_of_interest):
    sc.pl.spatial(
        adata2,
        color=gene,
        img_key=None,
        ax=axes[i],
        show=False,
        size=1.5,
        spot_size=100,
        cmap='viridis',
        frameon=False
    )

plt.tight_layout()
plt.show()

_images/Tutorial_6_DLPFC_interpolation_39_0.png

[22]:

fig, axes = plt.subplots(1, 10, figsize=(35, 5))
plt.subplots_adjust(wspace=0.3)

for i, gene in enumerate(genes_of_interest):
    sc.pl.spatial(
        adata3,
        img_key=None,
        color=gene,
        ax=axes[i],
        title=gene,
        show=False,
        size=1.5,
        spot_size=100,
        cmap='viridis',
        frameon=False
    )

plt.tight_layout()
plt.show()

_images/Tutorial_6_DLPFC_interpolation_40_0.png

Visualize the PCP4 gene expression across datasets.

[23]:

sc.pl.spatial(adata1, img_key=None, color='PCP4', cmap='viridis', size=2, spot_size=50)
sc.pl.spatial(adata2, img_key=None, color='PCP4', cmap='viridis', size=2.5, spot_size=50)
sc.pl.spatial(adata3, img_key=None, color='PCP4', cmap='viridis', size=2.5, spot_size=50)
sc.pl.spatial(new_adata, img_key=None, color='PCP4', cmap='viridis', size=2.5, spot_size=50)
sc.pl.spatial(mapped_adata, img_key=None, color='PCP4', cmap='viridis', size=2.5, spot_size=50)

_images/Tutorial_6_DLPFC_interpolation_42_0.png

_images/Tutorial_6_DLPFC_interpolation_42_1.png

_images/Tutorial_6_DLPFC_interpolation_42_2.png

_images/Tutorial_6_DLPFC_interpolation_42_3.png

_images/Tutorial_6_DLPFC_interpolation_42_4.png

[24]:

from GenOT.plotting import compare_spatial_expression_all_genes
# Compare the spatial expression patterns between the mapped (imputed) data and the original target data.
# This function calculates and plots similarity metrics (e.g., SSIM) for all genes.
mean_ssim, scores_df = compare_spatial_expression_all_genes(
    adata1=mapped_adata,
    adata1_name="Mapped Data",
    adata2=adata3,
    adata2_name="Original Data",
    max_genes=None,
    output_csv=None,
    plot_file=None,
    figsize=(10, 6)
)

Starting comparison for 66 common genes...

Calculating gene SSIM scores: 100%|██████████| 66/66 [01:02<00:00,  1.05it/s]


==================================================
Analysis complete! Processed 66 genes
Mean SSIM: 0.9873 ± 0.0033
Median SSIM: 0.9871
==================================================

_images/Tutorial_6_DLPFC_interpolation_43_3.png

[25]:

from GenOT.plotting import calculate_gene_embedding_metrics, plot_gene_embedding_metrics_distributions
# Calculate various metrics to evaluate the quality of the gene embeddings derived from the imputation.
# These metrics assess how well the imputed gene expression preserves the original's gene-embedding relationships.
df = calculate_gene_embedding_metrics(adata3, mapped_adata, n_pca_components=16)
# Plot the distributions of these calculated gene embedding metrics.
plot_gene_embedding_metrics_distributions(df)

--- Starting gene embedding metric calculation ---
Original adata shape: (3673, 66)
Original mapped_adata shape: (3673, 66)
Transposed adata_g1 shape (genes x cells): (66, 3673)
Transposed adata_g2 shape (genes x cells): (66, 3673)
Found 66 common genes for analysis.
adata_g1 shape after common gene filtering: (66, 3673)
adata_g2 shape after common gene filtering: (66, 3673)

Performing PCA (16 dimensions) for both datasets...
Gene embedding dimensions for Dataset 1: (66, 16)
Gene embedding dimensions for Dataset 2: (66, 16)

Calculating similarity metrics: PCC, Cosine Similarity, RMSE, JS Divergence...

--- Metric calculation complete ---

--- Generating distribution plots for similarity metrics ---
Average PCC: 0.5518
Average Cosine Similarity: 0.5618
Average RMSE: 9.7619
Average JS Divergence: 0.4180

_images/Tutorial_6_DLPFC_interpolation_44_1.png


--- Visualization complete ---

[25]:

[25]:

[25]: