Deep Learning | VALIANT

HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

waddelma — Tue, 26 May 2026 16:58:02 +0000

Tang, Yucheng.; He, Yufan.; Nath, Vishwesh.; Guo, Pengfeig.; Deng, Ruining.; Yao, Tianyuan.; Liu, Quan.; Cui, Can.; Yang, Yuechen.; Yin, Mengmeng.; Xu, Ziyue.; Roth, Holger.; Xu, Daguang.; Yang, Haichun.; Huo, Yuankai. (2026).��.��IS and T International Symposium on Electronic Imaging Science and Technology, 38(11), 1991–1996.��

Traditional deep learning methods for segmenting medical images in digital pathology usually work in two steps: they first split a very large whole slide image into small pieces, called patches, and then stitch the results back together. This can miss important fine details and broader tissue context, especially for whole slide images, which are so large that they can contain more than 80,000 by 70,000 pixels. In this paper, we introduce HoloHisto, a new method designed to segment these extremely large pathology images end to end, meaning the system can analyze the full image directly and produce the matching segmentation mask without relying on a patch-and-rebuild approach. HoloHisto uses a large 4K starting patch to capture more visual information at once and a new sequential tokenization step, which converts image features into a structured set of smaller units so the model can better understand relationships across the image while processing it efficiently. To our knowledge, this is the first holistic method for segmenting gigapixel whole slide images and can handle both the full image and its corresponding pixel-level mask directly. We also introduce a random 4K sampling strategy that provides far more image information than standard smaller patches. To test the method, we created a new kidney pathology dataset with whole-slide segmentation of glomeruli, the tiny filtering structures in the kidney, from entire mouse kidneys. The results show that HoloHisto-4K performs substantially better than previous state-of-the-art methods.

MedPTQ: a practical pipeline for real post-training quantization in 3D medical image segmentation

waddelma — Thu, 26 Mar 2026 18:45:00 +0000

Chongyu Qu; Ritchie Zhao; Ye Yu; Bin Liu; Tianyuan Yao; Junchao Zhu; Bennett A. Landman; Yucheng Tang; Yuankai Huo (2026).��.��Journal of Medical Imaging, 13(1), 014006.��

This study focuses on making advanced deep learning models for medical imaging more efficient and practical to use, especially in settings with limited computing power. One common approach is��quantization, which reduces the numerical precision (or bit-width) of a model’s calculations—for example, using 8-bit numbers instead of standard 32-bit ones—to shrink model size and speed up processing. However, many previous methods only��simulate��this lower precision without actually improving real-world performance. To address this gap, the researchers developed MedPTQ, an open-source pipeline that enables true 8-bit (INT8) quantization for complex 3D medical imaging models, such as U-Net and transformer-based architectures. Their method works in two stages: first, it uses a tool called TensorRT to simulate lower-precision computations using sample data, and then it converts this into real low-precision execution on GPUs (graphics processing units), which are commonly used for high-performance computing.

The results show that MedPTQ can significantly reduce model size (by up to nearly four times) and speed up processing (by almost three times faster) while maintaining almost the same accuracy as full-precision models, as measured by the Dice similarity coefficient—a standard metric for evaluating how well predicted image segments match the true regions. Importantly, the approach was tested across multiple types of models and datasets, including scans of the brain, abdomen, and entire body from CT and MRI imaging, demonstrating strong flexibility and reliability. Overall, this work shows that real, not just simulated, low-precision AI models can be effectively deployed in medical imaging, making them more accessible and efficient without sacrificing performance.

Fig.��1

We introduce MedPTQ, an open-source pipeline for real post-training quantization that converts FP32 PyTorch models into INT8 TensorRT engines. By leveraging TensorRT for real INT8 deployment, MedPTQ reduces model size and inference latency while preserving segmentation accuracy for efficient GPU deployment.

UNISELF: A unified network with instance normalization and self-ensembled lesion fusion for multiple sclerosis lesion segmentation

waddelma — Wed, 25 Feb 2026 02:26:30 +0000

Zhang, Jinwei; Zuo, Lianrui; Dewey, Blake E.; Remedios, Samuel W.; Liu, Yihao; Hays, Savannah P.; Pham, Dzung L.; Mowry, Ellen M.; Newsome, Scott Douglas; Calabresi, Peter Arthur; Saidha, Shiv; Carass, Aaron; & Prince, Jerry L. (2026).��.��Medical Image Analysis, 109, 103954.��

Multiple sclerosis (MS) causes lesions, or areas of damage, in the brain that can be seen on multicontrast magnetic resonance (MR) images. Automatically segmenting, or outlining, these lesions using deep learning (DL) can improve speed and consistency compared to manual tracing by experts. Although many DL methods perform well on data similar to what they were trained on, they often struggle when tested on new datasets from different hospitals or scanners, a problem known as poor out-of-domain generalization.

To address this issue, the researchers developed a new method called UNISELF. The goal of UNISELF is to achieve high segmentation accuracy within the original training domain while also performing well on data from different sources. UNISELF introduces a test-time self-ensembled lesion fusion strategy, which combines multiple predictions at test time to improve accuracy. It also uses test-time instance normalization (TTIN) of latent features, meaning it adjusts internal feature representations during testing to better handle domain shifts and missing input contrasts, such as when certain MR image types are unavailable.

The model was trained using data from the ISBI 2015 longitudinal MS segmentation challenge. On the official test dataset, UNISELF ranked among the top-performing methods. Importantly, when evaluated on out-of-domain datasets with different scanners, imaging protocols, and missing contrasts—including the MICCAI 2016 dataset, the UMCL dataset, and a private multisite dataset—UNISELF outperformed other benchmark models trained on the same ISBI data. These results suggest that UNISELF is both accurate and robust to real-world variations in MR imaging, making it a promising tool for automated MS lesion segmentation across diverse clinical settings.

Fig. 1.��An illustration of the spatial augmentation, network input, and network output during training in UNISELF.

Multipath cycleGAN for harmonization of paired and unpaired low-dose lung computed tomography reconstruction kernels

waddelma — Sun, 23 Nov 2025 16:59:13 +0000

Krishnan, Aravind R., Li, Thomas Z., Remedios, Lucas W., Kim, Michael E., Gao, Chenyu., Rudravaram, Gaurav., McMaster, Elyssa M., Saunders, Adam M., Bao, Shunxing., Xu, Kaiwen., Zuo, Lianrui., Sandler, Kim Lori., Maldonado, Fabien., Huo, Yuankai., & Landman, Bennett Allan. (2025).��.��Medical Physics,��52(11), e70120.��

CT scans can look noticeably different depending on the��reconstruction kernel��used to process the images. These kernels change how sharp or noisy an image appears, which can lead to big differences in important measurements—such as how much emphysema is present in the lungs. While it’s fairly easy to make images consistent when they come from the same type of scanner, this becomes much harder in studies that collect scans from many hospitals and manufacturers. Because each manufacturer uses different kernels, the measurements can become inconsistent, making it difficult to compare results. To fix this, we need a way to standardize all CT images so they look as if they were created using the same reference kernel.

In this study, we tested whether we could train a computer model to do this standardization using both��paired��data (scans from the same person processed with two different kernels from one manufacturer) and��unpaired��data (scans from different people and different manufacturers). Our goal was to use both types of data to create a shared representation of the images that allows for consistent comparisons across all scanners.

We created a deep learning model called a multipath cycleGAN, which can learn how to “translate” CT images from one kernel style to another. It uses a shared latent space (a common internal representation), along with several encoder–decoder pathways and discriminators that help the model learn from both paired and unpaired examples. We trained the model using CT scans from seven common reconstruction kernels from the National Lung Screening Trial, giving us 42 possible kernel combinations to harmonize.

We then tested the model using hundreds of additional scans. For paired kernels, we looked at whether the model reduced differences in��percent emphysema, and it did—performing better than comparison methods in several cases. For unpaired kernels, we converted all scans to look like they were processed with a reference soft kernel or a reference hard kernel and again measured emphysema levels. Our model reduced differences in many kernel types and performed similarly or better than existing approaches. We also checked whether harmonization preserved important anatomical structures such as lung vessels, muscle, and fat, and found that our method generally maintained these details.

Overall, our results show that combining paired and unpaired data in a shared latent space multipath cycleGAN can reduce errors in emphysema measurement and keep anatomical structures consistent. This approach offers a promising way to make CT scans from different scanners and reconstruction kernels more comparable, which is important for large research studies and long-term patient monitoring.

FIGURE 1

Reconstruction kernels influence the noise and resolution of the underlying anatomical structure in a computed tomography image. (a) Paired reconstruction kernels obtained from a given vendor exhibit a one-to-one pixel correspondence between the scans, which enables kernel harmonization. However, (b) across vendors, unpaired kernels show differences in anatomy, scan protocol, field of view, and reconstruction window. This creates additional difficulties that make harmonization a more challenging task.

DeepAndes: A Self-Supervised Vision Foundation Model for Multispectral Remote Sensing Imagery of the Andes

waddelma — Sun, 23 Nov 2025 16:53:59 +0000

Guo, Junlin., Zimmer-Dauphinee, James R., Nieusma, Jordan M., Lu, Siqi., Liu, Quan., Deng, Ruining., Cui, Can., Yue, Jialin., Lin, Yizhe., Yao, Tianyuan., Xiong, Juming., Zhu, Junchao., Qu, Chongyu., Yang, Yuechen., Wilkes, Mitchell., Wang, Xiao., VanValkenburgh, Parker., Wernke, Steven A., & Huo, Yuankai. (2025).��.��IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,��18, 26983-26999.��

Archaeologists often use��remote sensing, which involves studying landscapes through satellite imagery, to understand how past societies grew, interacted, and adapted over long periods of time. These large-scale surveys can reveal patterns that ground-based fieldwork alone cannot. Their power increases even more when combined with��deep learning��Ի��computer vision, which help detect archaeological features automatically. However, traditional supervised deep learning methods struggle because they require huge amounts of detailed annotations, which are difficult and time-consuming to create for subtle archaeological features.

At the same time, new��vision foundation models—large, general-purpose computer vision systems—have shown impressive performance using minimal annotations. But most of these models are designed for standard��RGB images, not the��multispectral satellite imagery��(including eight different spectral bands) that archaeologists rely on for detecting subtle, buried, or eroded features.

To address this gap, the researchers created��DeepAndes, a transformer-based vision foundation model specifically built for��Andean archaeology. It was trained on��three million multispectral satellite images��and uses a customized version of the��DINOv2 self-supervised learning algorithm, adapted to handle eight-band data. This makes DeepAndes the first foundation model tailored to the Andean region and its archaeological detection challenges.

The team tested DeepAndes on tasks such as classifying difficult, imbalanced image datasets, retrieving specific types of images, and performing pixel-level��semantic segmentation. Across all areas, the model outperformed systems trained from scratch or on smaller datasets, achieving higher��F1 scores,��mean average precision, and��Dice scores, especially in��few-shot learning��situations where only a small number of labeled examples are available.

Overall, these results show that large-scale��self-supervised pretraining��can greatly improve archaeological remote sensing, helping researchers identify ancient sites and landscapes more accurately and efficiently.

Fig. 1.��

Overview of DeepAndes. This figure shows the training dataset (a)–(d) and three domain-specific downstream tasks (e) using DeepAndes—a vision foundation model designed for multispectral satellite imagery in the Andes region. Particularly, (a) shows a large-scale map of the imagery used to train DeepAndes, highlighting various land cover types, with their area distribution shown in (c). (b) presents the unit sample patch [red box in (a), (b), (d)] with eight spectral bands. (d) illustrates image patching for DINOv2 training, with geospatial sampling densely covering different archaeological sites.

DeepPhysioRecon: Tracing peripheral physiology in low frequency fMRI dynamics

waddelma — Thu, 23 Oct 2025 19:07:58 +0000

Bayrak, Roza G.; Hansen, Colin B.; Salas, Jorge Alberto; Ahmed, Nafis; Lyu, Ilwoo; Mather, Mara M.; Huo, Yuankai; Chang, Catie E. (2025 Imaging Neuroscience, 3, IMAG.a.163.

Many brain studies that use functional magnetic resonance imaging (fMRI) do not include measurements of basic body functions like breathing or heart rate, even though these physiological signals can strongly affect brain activity patterns. Natural changes in breathing and heart rate reflect important processes related to thinking, emotion, and overall health, and they can influence how fMRI signals are interpreted.

To address this gap, researchers developed��DeepPhysioRecon, a deep learning model based on a Long Short-Term Memory (LSTM) network. This model can estimate continuous changes in breathing amplitude and heart rate directly from fMRI scans of the whole brain—without the need for separate sensors. The team tested how well the model works across different datasets and experimental conditions and showed that including these reconstructed physiological signals improves how fMRI data are analyzed and interpreted.

This work emphasizes the importance of understanding the connections between the brain and the body. It also introduces a practical, open-source tool that can make fMRI a more effective biomarker for studying human health, cognition, and emotion.

Fig. 1.

DeepPhysioRecon��Pipeline. The pipeline for estimating respiration volume (RV) and heart rate (HR) signals from fMRI time-series dynamics is shown. Regions of interest are defined using four published atlases that had been constructed from different imaging modalities, comprising areas in cerebral cortex, white matter, subcortex, and the ascending arousal network. ROI time-series signals are extracted from the fMRI volumes, detrended, bandpass filtered and downsampled. The preprocessed signals are provided to a candidate network as input channels. A bidirectional LSTM network architecture is adapted for joint estimation. The output of linear layers are RV and HR signals.

ZeroReg3D: A zero-shot registration pipeline for 3D consecutive histopathology image reconstruction

waddelma — Fri, 26 Sep 2025 19:52:17 +0000

Xiong, Juming, Deng, Ruining, Yue, Jialin, Lu, Siqi, Guo, Junlin, Lionts, Marilyn, Yao, Tianyuan, Cui, Can, Zhu, Junchao, & Qu, Chongyu. (2025). Journal of Medical Imaging, 12(4), 44002.

Histological analysis, which examines tissue structure under a microscope, is essential for understanding both normal biology and disease. While recent methods have improved the alignment of 2D tissue images, they often struggle to preserve the true 3D structure of tissues, limiting their usefulness in research and clinical applications. Creating accurate 3D models from 2D slices is challenging because tissues can deform, slicing can introduce artifacts, imaging techniques vary, and lighting can be inconsistent. Deep learning methods have shown promise but usually need large amounts of training data and often don’t generalize well to new datasets. Non-deep-learning approaches are more generalizable but often less accurate.

To address these issues, we developed ZeroReg3D, a “zero-shot” registration pipeline that combines deep learning-based keypoint matching with traditional non-deep-learning registration techniques. This approach reduces tissue deformation and sectioning artifacts without requiring extensive training data.

Our evaluations show that ZeroReg3D improves 2D image alignment by about 10% compared to existing methods and produces high-quality 3D reconstructions from consecutive tissue sections. These results demonstrate that ZeroReg3D provides a reliable and accurate framework for reconstructing 3D tissue structure from 2D histological images.

In conclusion, ZeroReg3D successfully combines zero-shot deep learning with optimization-based registration to overcome challenges such as tissue deformation, slicing artifacts, staining differences, and uneven illumination, all without the need for retraining or fine-tuning.

Fig.��1

Overview. This figure shows a reconstructed 3D volume after alignment. The image sequence was stacked and subjected to 3D visualization to provide a comprehensive view.

Fast-RF-Shimming: Accelerate RF shimming in 7T MRI using deep learning

waddelma — Wed, 20 Aug 2025 19:38:10 +0000

Lu, Zhengyi, Liang, Hao, Lu, Ming, Wang, Xiao, Yan, Xinqiang, & Huo, Yuankai. (2025). “.” Meta-Radiology, 3(3), 100166.

Ultrahigh field (UHF) Magnetic Resonance Imaging (MRI) provides much stronger signals than standard MRI, allowing for extremely detailed images that can help both doctors and researchers. However, using such high fields creates new problems. One of the biggest challenges is uneven radiofrequency (RF) fields, which can cause parts of the image to appear brighter or darker than they should. These irregularities reduce image quality and make it harder to use UHF MRI widely in clinical settings. Traditional methods, such as RF shimming with Magnitude Least Squares (MLS) optimization, can correct these uneven fields, but the process is very slow. Recently, machine learning methods have been explored to solve this problem faster, but they often require long training times, limited model complexity, and large amounts of data.

In this study, we present a new machine learning approach called��Fast-RF-Shimming, which works about 5000 times faster than the traditional MLS method. First, we use a technique called Adaptive Moment Estimation (Adam) to calculate reference RF shimming weights from multi-channel field data. Then, we train a Residual Network (ResNet), a type of deep learning model, to directly predict the best RF shimming outputs. To improve accuracy, the model includes a confidence parameter in its training process. Finally, we add an optional step called the Non-uniformity Field Detector (NFD), which checks for extreme unevenness and corrects it.

When compared to the standard MLS method, Fast-RF-Shimming not only runs much faster but also produces more accurate results. These findings suggest that this new framework offers a promising and practical solution for overcoming long-standing image quality issues in ultrahigh field MRI.

Tractography from T1-weighted MRI: Empirically exploring the clinical viability of streamline propagation without diffusion MRI

waddelma — Mon, 28 Jul 2025 15:48:22 +0000

Cai, Leon Y., Lee, Ho Hin, Johnson, Graham W., Newlin, Nancy R., Ramadass, Karthik, Kim, Michael E., Archer, Derek B., Hohman, Timothy J., Jefferson, Angela L., Begnoche, J. Patrick, Boyd, Brian D., Taylor, Warren D., Morgan, Victoria L., Englot, Dario J., Nath, Vishwesh, Chotai, Silky, Barquero, Laura, D’Archangel, Micah, Cutting, Laurie E., Dawant, Benoit M., Rheault, François, Moyer, Daniel C., & Schilling, Kurt G. (2024). *Imaging Neuroscience, 2*, 1-20.

Over the last few decades, diffusion MRI (dMRI) streamline tractography has become the main way to estimate white matter (WM) pathways—the brain’s wiring—while a person is alive. But a big limitation is that this method usually needs a special type of scan called high angular resolution diffusion imaging (HARDI), which can be hard to get during regular medical care. This means tractography is mostly used in research settings and with certain groups of patients, limiting its use in everyday clinical practice and for rare or underfunded diseases. Because of this, having a tractography method that works with common clinical scans would be very important. Such a method would need to perform flexible tractography, use only standard clinical imaging as input, and be openly available for anyone to use. In this study, we tested a new deep learning model that uses T1-weighted (T1w) MRI scans—common clinical images—to estimate brain pathways. We compared its performance with traditional dMRI-based tractography and atlas-based methods in healthy young people, older adults, and patients with epilepsy, depression, and brain cancer. In healthy young people, our deep learning model showed slightly more error than traditional tractography, but the difference was small and less than errors seen with atlas-based methods. We also found that the model could replicate some important findings from previous dMRI studies in the clinical groups, especially for long-range brain connections that atlas methods miss, but not in all cases. These results suggest that deep learning using T1w MRI shows promise for clinical tractography, especially compared to atlas-based methods, but still needs improvement and careful testing before it can be widely used in hospitals. Additionally, our findings raise new questions about how differences between dMRI and T1w MRI scans affect tractography results, and more research on this will help us better understand what brain features influence these measurements.

Fig 1

Tractograms (left view), right arcuate fasciculi (right view), left cinguli (left view), and cortical connectomes from traditional SD_STREAM tractography and the CoRNN method in a representative in-distribution HCP participant. Arrows denote visually appreciable differences between connectomes.

Mapping Three-Dimensional Tumor Heterogeneity through Deep Learning Inference of Spatial Transcriptomics from Routine Histopathology: A Proof-of-Concept Comparative Study

waddelma — Mon, 24 Mar 2025 18:43:03 +0000

Azher, Zarif L.; Srinivasan, Gokul; Yao, Keluo; Le, Minh-Khang; Lau, Ken S.; Kaur, Harsimran; Kolling, Fred; Vaickus, Louis; Lu, Xiaoying; Levy, Joshua J. .” Proceedings of Machine Learning Research, vol. 259, 2024, pp. 73-85. ��

Spatial transcriptomics (ST) is a cutting-edge technology that allows scientists to study the amounts of genes and proteins in specific parts of tissues, giving a clearer picture than traditional methods, which can miss important information. By expanding this technology to look at tissues in three dimensions (3D), researchers can better understand complex biological processes that might be overlooked in two-dimensional (2D) studies. However, 3D ST is expensive and difficult to use on a large scale. To overcome this, we used deep learning (a type of artificial intelligence) to predict ST data from standard tissue samples, which is a cheaper and more accessible approach. In this study, we tested this method on 10 patients with colorectal cancer, analyzing 10 tissue samples per patient. Our results showed that the 3D approach provided more detailed insights into the cells compared to the traditional 2D approach. These findings open the door for future research to explore subtle 3D markers that could help detect cancer spread and recurrence.��