Digital Pathology | VALIANT /valiant 91 Advanced Lab for Immersive AI Translation (VALIANT) Wed, 23 Apr 2025 14:06:33 +0000 en-US hourly 1 PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit /valiant/2025/04/23/pyspatial-a-high-speed-whole-slide-image-pathomics-toolkit/ Wed, 23 Apr 2025 14:06:33 +0000 /valiant/?p=4174 Yang, Yuechen; Wang, Yu; Yao, Tianyuan; Deng, Ruining; Yin, Mengmeng; Zhao, Shilin; Yang, Haichun; Huo, Yuankai. “PIS and T International Symposium on Electronic Imaging Science and Technology 37, no. 12 (2025): HPCI-177. .

Analyzing Whole Slide Images (WSIs) is an important part of modern digital pathology because it allows researchers to extract many features from tissue samples. However, traditional methods using tools like CellProfiler can be slow and involve several steps: dividing the WSI into smaller patches, extracting features from each patch, and then combining those features back into the full image.

To make this process faster and simpler, we developed PySpatial, a high-speed toolkit designed specifically for analyzing WSIs. PySpatial improves the usual process by working directly on selected regions of the image, which reduces unnecessary steps. It uses special techniques like rtree-based spatial indexing and matrix-based computation to quickly find and process these regions.

We tested PySpatial on two datasets—one involving Perivascular Epithelioid Cell (PEC) tumors and another from the Kidney Precision Medicine Project (KPMP)—and found major improvements in performance. For small and scattered features in the PEC data, PySpatial was almost 10 times faster than CellProfiler. For larger structures like glomeruli and arteries in the KPMP data, it was twice as fast.

These results show that PySpatial can speed up large-scale WSI analysis while keeping accuracy high, making it a useful tool for advancing digital pathology.

]]>
HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis /valiant/2024/11/21/hats-hierarchical-adaptive-taxonomy-segmentation-for-panoramic-pathology-image-analysis/ Thu, 21 Nov 2024 16:48:50 +0000 /valiant/?p=3301 Deng, R.; Liu, Q.; Cui, C.; Yao, T.; Xiong, J.; Bao, S.; Li, H.; Yin, M.; Wang, Y.; Zhao, S.; Tang, Y.; Yang, H.; Huo, Y. “.” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Volume 15004 LNCS, 2024, pp. 155-166, .

Segmenting large, detailed images of tissue samples in computational pathology is very challenging because tissues have complex structures at different scales. For example, in kidney pathology, you have larger regions like the cortex and medulla, as well as smaller structures like glomeruli, tubules, blood vessels, and various cell types.

To tackle this, we developed a new method called Hierarchical Adaptive Taxonomy Segmentation (HATs). HATs is designed to accurately identify and label these different kidney structures in large panoramic images by using a detailed understanding of kidney anatomy. The key features of HATs include:

  1. A special technique that uses the spatial relationships between 15 different tissue types to create a flexible model that can be applied to different scales, from large regions down to individual cells.
  1. A simplified way of representing these tissue structures in a matrix format, making it easier to process the whole panoramic image.
  1. Integration of a new AI model (EfficientSAM) to help extract features from the images without needing manual inputs, making it more adaptable and efficient.

Our tests showed that HATs effectively combines clinical knowledge and imaging techniques to accurately segment over 15 types of tissue structures. The code for HATs is available at .

Fig. 1.

Knowledge transformation from kidney anatomy to a hierarchical taxonomy tree. This figure demonstrates the transformation of intricate clinical anatomical relationships within the kidney into a hierarchical taxonomy tree. (a) Pathologists examine histopathology in accordance with kidney anatomy. (b) This study revisits kidney anatomy using a hierarchical semantic taxonomy for panoramic segmentation, covering 15 classes across regions, units, and cells. The tree incorporates spatial relationships into a semi-supervised learning paradigm and uses hierarchical scale information as prior knowledge to weigh the relationship between classes.

]]>
Consensus tissue domain detection in spatial omics data using multiplex image labeling with regional morphology (MILWRM) /valiant/2024/11/21/consensus-tissue-domain-detection-in-spatial-omics-data-using-multiplex-image-labeling-with-regional-morphology-milwrm/ Thu, 21 Nov 2024 16:46:39 +0000 /valiant/?p=3298 Kaur, H.; Heiser, C.N.; McKinley, E.T.; Ventura-Antunes, L.; Harris, C.R.; Roland, J.T.; Farrow, M.A.; Selden, H.J.; Pingry, E.L.; Moore, J.F.; Ehrlich, L.I.R.; Shrubsole, M.J.; Spraggins, J.M.; Coffey, R.J.; Lau, K.S.; Vandekar, S.N. “ (MILWRM).” Communications Biology, Volume 7, Issue 1, 2024, Article 1295, .

New molecular imaging methods can capture detailed genetic and protein information directly from tissues, allowing scientists to study diseases while keeping the original structure of the tissue intact. By combining this molecular data with traditional tissue images, researchers can learn more about how different parts of tissues are affected by diseases. However, making sense of all this complex data, especially when comparing many samples, is challenging.

To help with this, we created MILWRM, a Python tool that can quickly find and label different areas within tissue samples. MILWRM analyzes images and groups similar parts of the tissue together, making it easier to identify specific regions.

We tested MILWRM on various tissue samples, including human colon polyps, lymph nodes, mouse kidneys, and mouse brain slices. The tool was able to distinguish different types of polyps and identify unique areas in the brain based on their molecular characteristics. MILWRM helps researchers understand the structure and molecular features of tissues, making it a valuable tool for studying diseases.

Fig. 1: The workflow of the MILWRM pipeline.

MILWRM begins with constructing a tissue labeler object from all the sample slides that undergo data preprocessing, serialization, and subsampling to create a randomly subsampled dataset used for k-means model construction. This subsampled data is used to find an optimal number of tissue domains, and k-selection using the adjusted inertia method. Finally, a k-means model is constructed, and each pixel is assigned a TD. Each TD has a distinct domain profile describing its molecular features. MILWRM also provides quality control metrics such as confidence scores (created with BioRender.com).

]]>
Identification and multimodal characterization of a specialized epithelial cell type associated with Crohn’s disease /valiant/2024/09/22/identification-and-multimodal-characterization-of-a-specialized-epithelial-cell-type-associated-with-crohns-disease/ Sun, 22 Sep 2024 15:41:06 +0000 /valiant/?p=3032
Li, Jia, Simmons, Alan J., Hawkins, Caroline V., Chiron, Sophie, Ramirez-Solano, Marisol A., Tasneem, Naila, Kaur, Harsimran, Xu, Yanwen, Revetta, Frank, Vega, Paige N., Bao, Shunxing, Cui, Can, Tyree, Regina N., Raber, Larry W., Conner, Anna N., Pilat, Jennifer M., Jacobse, Justin, McNamara, Kara M., Allaman, Margaret M., Raffa, Gabriella A., Gobert, Alain P., Asim, Mohammad, Goettel, Jeremy A., Choksi, Yash A., Beaulieu, Dawn B., Dalal, Robin L., Horst, Sara N., Pabla, Baldeep S., Huo, Yuankai, Landman, Bennett A., Roland, Joseph T., Scoville, Elizabeth A., Schwartz, David A., Washington, M. Kay, Shyr, Yu, Wilson, Keith T., Coburn, Lori A., Lau, Ken S., & Liu, Qi. (2024). Identification and multimodal characterization of a specialized epithelial cell type associated with Crohn’s disease. Nature Communications, 15(1), 7204.
This study investigates Crohn’s disease (CD), a chronic inflammatory condition affecting both the gastrointestinal system and other parts of the body due to immune system dysregulation. By analyzing over 202,000 cells from 170 tissue samples across 83 patients, the researchers identified a specific epithelial cell type, termed ‘LND,’ present in both the terminal ileum and ascending colon. These LND cells, which show high expression of genes related to antimicrobial response and immune regulation (such as LCN2, NOS2, and DUOX2), were found to be rare in individuals without inflammatory bowel disease (IBD) but significantly expanded in patients with active CD.

Further in-situ RNA and protein imaging confirmed the presence of LND cells, which interact closely with immune cells and express genes linked to CD susceptibility, suggesting their involvement in the disease’s immune dysfunction. Additionally, the study identified early and late subpopulations of LND cells, each with distinct developmental trajectories. Interestingly, patients with a higher ratio of late-to-early LND cells were more likely to respond positively to anti-TNF treatment, a common therapy for CD. These findings highlight a potentially pathogenic role for LND cells in CD and provide new insights into disease mechanisms and treatment responses.

Single-cell landscape in Crohn’s disease and non-IBD controls.
A Schematic for processing endoscopic and surgical samples from TI and AC for
non-IBD controls, inactive and active CD patients. B Summary of the number of
samples in each group. C UMAP of 155,093 cells from endoscopy samples colored
by cell clusters. D Dotplot showing markers for each cell type. E UMAP of 155,093
cells colored by tissue origin, TI (brown) or AC (blue). F Proportion of each cell
cluster in TI (brown) and AC samples (blue). G UMAP of 155,093 cells colored by
disease status, controls (tan), inactive (green) or active CD (purple). H MDS plot of
cell compositional differences across all endoscopy specimens
]]>
Mitigating Over-Saturated Fluorescence Images Through a Semi-Supervised Generative Adversarial Network /valiant/2024/09/22/mitigating-over-saturated-fluorescence-images-through-a-semi-supervised-generative-adversarial-network/ Sun, 22 Sep 2024 03:58:25 +0000 /valiant/?p=2994 Bao, Shunxing, Guo, Junlin, Lee, Ho Hin, Deng, Ruining, Cui, Can, Remedios, Lucas W., Liu, Quan, Yang, Qi, Xu, Kaiwen, Yu, Xin, Li, Jia, & Li, Yike. (2024). Mitigating over-saturated fluorescence images through a semi-supervised generative adversarial network. In Proceedings of the 21st IEEE International Symposium on Biomedical Imaging (ISBI 2024), Athens, Greece, May 27-30, 2024. https://doi.org/10.1109/ISBI56570.2024.10635687

This study addresses a key challenge in multiplex immunofluorescence (MxIF) imaging, a technique used in biomedical research to provide detailed insights into cell structures and spatial organization. While MxIF imaging, such as using DAPI staining to identify cell nuclei and CD20 staining for cell membranes, is invaluable for understanding cell composition, it suffers from saturation artifacts. These artifacts occur when certain areas of the image become overly bright, making it difficult to analyze individual cells accurately. Existing methods for correcting these saturation issues, like gamma correction, often fall short because they assume uniform saturation, which is rarely the case in practice.

The authors propose a novel solution using a hybrid generative adversarial network (GAN) called HD-mixGAN, which combines two different types of neural networks (CycleGAN and Pix2pixHD) to correct saturation artifacts. This approach takes advantage of both small datasets where paired (before and after) images are available and larger datasets that only have unpaired images of over-saturated regions. By generating synthetic data from the unpaired datasets using a CycleGAN and combining it with real data, the model effectively learns to correct saturation artifacts, improving the overall image quality.

The method was tested in a task to detect cell nuclei, where it significantly outperformed traditional methods, improving the accuracy (F1 score) by 6%. This approach represents the first focused effort to address saturation issues in multi-round MxIF imaging, providing a data-driven solution that enhances the accuracy of single-cell analysis. The study also makes its code and implementation freely available, facilitating further research and applications in this area.

This study addresses a key challenge in multiplex immunofluorescence (MxIF) imaging, a technique used in biomedical research to provide detailed insights into cell structures and spatial organization. While MxIF imaging, such as using DAPI staining to identify cell nuclei and CD20 staining for cell membranes, is invaluable for understanding cell composition, it suffers from saturation artifacts. These artifacts occur when certain areas of the image become overly bright, making it difficult to analyze individual cells accurately. Existing methods for correcting these saturation issues, like gamma correction, often fall short because they assume uniform saturation, which is rarely the case in practice.
The authors propose a novel solution using a hybrid generative adversarial network (GAN) called HD-mixGAN, which combines two different types of neural networks (CycleGAN and Pix2pixHD) to correct saturation artifacts. This approach takes advantage of both small datasets where paired (before and after) images are available and larger datasets that only have unpaired images of over-saturated regions. By generating synthetic data from the unpaired datasets using a CycleGAN and combining it with real data, the model effectively learns to correct saturation artifacts, improving the overall image quality.
The method was tested in a task to detect cell nuclei, where it significantly outperformed traditional methods, improving the accuracy (F1 score) by 6%. This approach represents the first focused effort to address saturation issues in multi-round MxIF imaging, providing a data-driven solution that enhances the accuracy of single-cell analysis. The study also makes its code and implementation freely available, facilitating further research and applications in this area.
]]>
Deep Learning-Based Open Source Toolkit for Eosinophil Detection in Pediatric Eosinophilic Esophagitis /valiant/2024/06/20/deep-learning-based-open-source-toolkit-for-eosinophil-detection-in-pediatric-eosinophilic-esophagitis/ Thu, 20 Jun 2024 17:14:02 +0000 /valiant/?p=2581 Juming Xiong, Yilin Liu, Ruining Deng, Regina N. Tyree, Hernan Correa, Girish Hiremath, Yaohong Wang, and Yuankai Huo. “.” Proceedings of SPIE Medical Imaging 2024: Digital and Computational Pathology, vol. 12933, 129330X, 2024, San Diego, California

Eosinophilic Esophagitis (EoE) is a chronic, immune/antigen-mediated esophageal disease characterized by symptoms related to esophageal dysfunction and histological evidence of eosinophil-dominant inflammation. Due to the complex microscopic representation of EoE in imaging, current manual identification methods are labor-intensive and prone to inaccuracies.

This study introduces an open-source toolkit, named Open-EoE, designed for end-to-end whole slide image (WSI) level eosinophil (Eos) detection with a single line of command via Docker. The toolkit supports three state-of-the-art deep learning-based object detection models and optimizes performance through an ensemble learning strategy, enhancing precision and reliability.

Experimental results demonstrate that Open-EoE can efficiently detect Eos on a testing set of 289 WSIs. At the widely accepted diagnostic threshold of ≥15 Eos per high power field (HPF) for EoE, Open-EoE achieved an accuracy of 91%, showing good consistency with pathologist evaluations. This suggests a promising avenue for integrating machine learning methodologies into the diagnostic process for EoE.

Open Source Toolkit for EoE DetectionEos detected in WSIOutputsOriginal WSIInputsAggregationSliding WindowObject DetectionEnsembleMaximum Eos count / HPFFigure 1: This figure shows the overview of the Open-EoE Toolkit. The inputs are original WSIs at 40×magnification,while the outputs are the maximum Eos count number and the Eos detected bounding boxes that can ovelay on the original WSIs.
]]>
Leverage Weakly Annotation to Pixel-wise Annotation via Zero-shot Segment Anything Model for Molecular-empowered Learning /valiant/2024/06/20/leverage-weakly-annotation-to-pixel-wise-annotation-via-zero-shot-segment-anything-model-for-molecular-empowered-learning/ Thu, 20 Jun 2024 16:05:10 +0000 /valiant/?p=2576 Xueyuan Li, Ruining Deng, Yucheng Tang, Shunxing Bao, Haichun Yang, and Yuankai Huo. “.” Proceedings of SPIE Medical Imaging 2024: Digital and Computational Pathology, vol. 12933, 129330K, 2024, San Diego, California

Precise identification of multiple cell classes in high-resolution Giga-pixel whole slide imaging (WSI) is essential for various clinical applications. Building an AI model for this purpose usually requires pixel-level annotations, which are time-consuming, require domain expertise (e.g., pathologists), and are prone to errors, particularly when differentiating intricate cell types (e.g., podocytes and mesangial cells) through visual inspection alone.

A recent study found that lay annotators could sometimes outperform domain experts in labeling tasks when provided with additional immunofluorescence (IF) images for reference, a method termed molecular-empowered learning. However, the manual delineation required for these annotations remains a resource-intensive task.

This paper explores bypassing pixel-level delineation by using the recent segment anything model (SAM) with weak box annotations in a zero-shot learning approach. SAM’s capability to generate pixel-level annotations from box annotations is leveraged to train a segmentation model. The findings indicate that SAM-assisted molecular-empowered learning (SAM-L) reduces the annotation effort required from lay annotators by necessitating only weak box annotations, without compromising annotation accuracy or the performance of the deep learning-based segmentation.

This research represents a significant advancement in making the annotation process for training pathological image segmentation more accessible, relying solely on non-expert annotators.

Figure 1:Major Idea of This Work.The top-left panel illustrates the conventional annotation process employing onlyPAS images for pathological segmentation. In contrast, the top-right panel shows molecular-informed annotation utilizingboth PAS and IF images, yielding superior annotation quality by lay annotators compared to the top-left process. Thebottom panel demonstrates the proposed SAM-L annotation method, which leverages box annotations to accomplish pixel-level segmentation results, using these boxes as prompts for the zero-shot SAM segmentation. This holistic approachenhances annotation quality and paves the way for more precise and resilient segmentation models
]]>
Cell Spatial Analysis in Crohn’s Disease: Unveiling Local Cell Arrangement Pattern with Graph-based Signatures /valiant/2024/06/20/cell-spatial-analysis-in-crohns-disease-unveiling-local-cell-arrangement-pattern-with-graph-based-signatures/ Thu, 20 Jun 2024 15:08:06 +0000 /valiant/?p=2551 Shunxing Bao, Sichen Zhu, Vasantha L. Kolachala, Lucas W. Remedios, Yeonjoo Hwang, Yutong Sun, Ruining Deng, Can Cui, Rendong Zhang, Yike Li, Jia Li, Joseph T. Roland, Qi Liu, Ken S. Lau, Subra Kugathasan, Peng Qiu, Keith T. Wilson, Lori A. Coburn, Bennett A. Landman, and Yuankai Huo. “.” Proceedings of SPIE Medical Imaging 2024: Digital and Computational Pathology, vol. 12933, 1293314, 2024, San Diego, California

Crohn’s disease (CD) is a chronic, relapsing inflammatory condition that affects various segments of the gastrointestinal tract. The activity of CD is determined through histological findings, particularly by examining the density of neutrophils in Hematoxylin and Eosin (H&E) stained images. However, a deeper understanding of morphometry and local cell arrangements beyond simple cell counting and tissue morphology is needed.

To address this, researchers characterized six distinct cell types from H&E images and developed a novel method to analyze the local spatial signature of each cell. This method involves creating a 10-cell neighborhood matrix, which represents the arrangement of neighboring cells around each individual cell. By utilizing t-SNE for non-linear spatial projection, the study presented these arrangements in scatter-plot and Kernel Density Estimation contour-plot formats. The analysis examined patterns of differences in the cellular environment, focusing on the odds ratio of spatial patterns between active CD and control groups, using data collected from two research institutes.

The findings revealed heterogeneous nearest-neighbor patterns, indicating distinct tendencies of cell clustering, especially in the rectum region. These variations underscore the influence of data heterogeneity on cell spatial arrangements in CD patients. Additionally, the observed spatial distribution disparities between the two research sites highlight the importance of collaborative efforts among healthcare organizations to ensure comprehensive analysis.

All tools used in the research analysis pipeline are available at .

Figure 1: We present four patches collected from two sample slides, where one slide is CD normal, and the other slide is diagnosed as CD active. Neutrophils can be found in both types of the tissues. Counting the density of neutrophils is one of the pivotal biomarkers used in identify CD activity. Additionally, pathologists have the capability to zoom in and observe the morphology of individual nuclei or clusters. Our objective is to explore and quantify a more comprehensive morphometric pattern within the localized cellular arrangement.
]]>
High-performance Data Management for Whole Slide Image Analysis in Digital Pathology /valiant/2024/06/20/high-performance-data-management-for-whole-slide-image-analysis-in-digital-pathology/ Thu, 20 Jun 2024 15:05:06 +0000 /valiant/?p=2548 Haoju Leng, Ruining Deng, Shunxing Bao, Dazheng Fang, Bryan A. Millis, Yucheng Tang, Haichun Yang, Xiao Wang, Yifan Peng, Lipeng Wan, and Yuankai Huo. “.” Proceedings of SPIE Medical Imaging 2024: Digital and Computational Pathology, vol. 12933, 129330Y, 2024, San Diego, California

When dealing with giga-pixel digital pathology in whole-slide imaging (WSI), a significant portion of data records is relevant during each analysis operation. The computational bottleneck often lies in the input-output (I/O) system, especially during patch-level processing, which imposes a substantial I/O load on the computer system. However, this data management process can be further optimized by parallelizing it, as patch-level image processes are typically independent across different patches.

This paper discusses efforts to address this data access challenge by implementing the Adaptable IO System version 2 (ADIOS2). The focus is on constructing and releasing a digital pathology-centric pipeline using ADIOS2, which streamlines data management across WSIs. Additionally, strategies have been developed to reduce data retrieval times.

The performance evaluation covers two key scenarios: (1) a CPU-based image analysis scenario (“CPU scenario”) and (2) a GPU-based deep learning framework scenario (“GPU scenario”). The findings reveal significant outcomes. In the CPU scenario, ADIOS2 achieves an impressive two-fold speed-up compared to the brute-force approach. In the GPU scenario, its performance is on par with the state-of-the-art GPU I/O acceleration framework, NVIDIA Magnum IO GPU Direct Storage (GDS).

This study represents one of the initial instances of utilizing ADIOS2 in the field of digital pathology. The source code for this implementation has been made publicly available at .

Figure 1. The comparison between the current prevalent I/O methods and the ADIOS2 framework approach in digital pathology image analysis pipelines.
]]>
Nucleus subtype classification using inter-modality learning /valiant/2024/06/20/nucleus-subtype-classification-using-inter-modality-learning/ Thu, 20 Jun 2024 15:02:04 +0000 /valiant/?p=2545 Lucas W. Remedios, Shunxing Bao, Samuel W. Remedios, Ho Hin Lee, Leon Cai, Thomas Li, Ruining Deng, Can Cui, Jia Li, Qi Liu, Ken S. Lau, Joseph T. Roland, Mary K. Washington, Lori A. Coburn, Keith T. Wilson, Yuankai Huo, and Bennett A. Landman. “” Proceedings of SPIE Medical Imaging 2024: Digital and Computational Pathology, vol. 12933, 129330F, 2024, San Diego, California,

Understanding how cells communicate, co-locate, and interrelate is essential for grasping human physiology. Hematoxylin and eosin (H&E) staining is widely used in clinical studies and research. The Colon Nucleus Identification and Classification (CoNIC) Challenge recently advanced artificial intelligence to label six cell types on H&E stained colon tissues. However, this only covers a small fraction of potential cell classifications, missing various epithelial (progenitor, endocrine, goblet), lymphocyte (B cells, helper T cells, cytotoxic T cells), and connective tissue (fibroblasts, stromal) subtypes.

To address this limitation, the study proposes using inter-modality learning to label previously unclassifiable cell types on virtual H&E images. The researchers utilized multiplexed immunofluorescence (MxIF) histology imaging to identify 14 cell type subclasses. By performing style transfer, they synthesized virtual H&E images from MxIF and transferred the detailed labels from MxIF to these virtual H&E images. They then assessed the effectiveness of this learning approach.

The results demonstrated that helper T cells and progenitor cells could be identified with positive predictive values of 0.34±0.15 (prevalence 0.03±0.01) and 0.47±0.1 (prevalence 0.07±0.02) respectively on virtual H&E images. This method represents a promising step toward automating cell type annotation in digital pathology, significantly enhancing the capability to classify diverse cell types beyond current limitations.

Figure 1. We leveraged inter-modality learning to investigate identification of cells on virtual H&E staining that are traditionally viewed with specialized staining. The realistic quality of our virtual H&E holds at multiple scales (top row). Representative nuclei from each of our 14 classes in both virtual H&E and MxIF illustrate intensity and morphological variation across cell types (lower section). Green is used to denote the MxIF stain of interest, which is a different stain for each of the 14 cell types in this figure. While the signal to identify these classes of nuclei is present in MxIF, the nucleus classes are more difficult to distinguish on virtual H&E.
]]>