Questions about the processing of Xenium samples

Hi, authors

Thanks for your incredible contributions and efforts in this amazing work.

After looking into the code and the Xenium samples downloaded from [HF](https://huggingface.co/datasets/MahmoodLab/hest), I found the following issues:
- **About spot size in pooling**: There are 65 Xenium samples in HEST. They are processed into pseudo-visium ST data by using `pool_transcripts_xenium(transcript_df, dict['pixel_size_um_estimated'], key_x='he_x', key_y='he_y', spot_size_um=spot_size_um)`, producing the ST file `st/{sample_id}.h5ad`. I checked each ST file and calculated `pixel_distance_between_two_adjacent_spots * meta_info['pixel_size_um_estimated']` using the spatial coordinates in the ST file. I found that the value is `100um` for most Xenium samples. But for two samples,`TENX122` and `NCBI784`, the value is significantly different from `100um`. Concretely, for `NCBI784`, `pixel_distance_between_two_adjacent_spots = 274.784593593837` (derived from `st/{sample_id}.h5ad`) and `meta_info['pixel_size_um_estimated'] = 0.2125` (from `HEST_v1_1_0.csv`), leading to spot_size_um = `58.39`. So, I want to ask, is `spot_size_um` set to `100um` for all Xenium samples when applying pooling to `transcripts/{sample_id}_transcripts.parquet`? Is there any consideration for setting `spot_size_um=100` for Xenium samples?
- **Inclusion criteria for Xenium samples in benchmark datasets**: For example, for `task 1` (`oncotree=IDC`), 4 samples are included in this task, *i.e.*, `TENX95`, `TENX99`, `NCBI783`, and `NCBI785`. However, in `HEST_v1_1_0.csv`, there are 7 Xenium samples: `TENX99`, `TENX98`, `TENX97`, `TENX95`, `NCBI785`, `NCBI784`, `NCBI783`. Three samples are excluded from the benchmark datasets. It is also the same case for `task 4` (`oncotree=SKCM`). Could you clarify the inclusion criteria for Xenium samples in the benchmark datasets a bit more?

Looking forward to hearing from you. Thanks

Best,
Pei

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions about the processing of Xenium samples #122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about the processing of Xenium samples #122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions