AI | VALIANT

Factors influencing the effectiveness of artificial intelligence-assisted decision-making in medicine: a scoping review

waddelma — Tue, 26 May 2026 21:23:41 +0000

Jackson, Nicholas J.; Brown, Katherine E.; Miller, Rachael.; Murrow, Matthew.; Cauley, Michael R.; Collins, Benjamin X.; Novak, Laurie L.; Benda, Natalie C.; Ancker, Jessica S. (2026).��.��Journal of the American Medical Informatics Association, 33(5), 1054–1064.��

Research on artificial intelligence tools that help clinicians make decisions has had mixed results: sometimes these tools improve decision-making, and sometimes they do not, and it is not always clear why. This review looked at what factors affect how well AI-based clinical decision-support systems, or AI-CDS, work in medicine. The authors searched three medical databases and found 45 relevant studies out of 5,850 screened articles. They focused on how both clinician factors and technology design features influenced three things: how clinicians feel about AI, whether they accept AI recommendations, and how well they perform when using AI. The review found that experienced clinicians may gain less from AI support than less experienced clinicians, although the results were not consistent across studies. It also found that explainable AI, meaning AI that gives reasons for its suggestions, can increase trust, but it may also lead clinicians to trust incorrect recommendations too much, which can hurt performance when humans and AI work together. Clinicians’ existing attitudes toward AI also influenced whether they accepted its advice. Overall, the review suggests that future research should focus on “appropriate trust,” meaning clinicians should rely on AI only when the advice is actually trustworthy, rather than simply trying to increase trust in AI overall.

Figure 1.

PRISMA diagram for study inclusion.

Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems

waddelma — Wed, 29 Apr 2026 03:52:07 +0000

Nath, Saptarshi; Peridis, Christos; Benjamin, Eseoghene; Liu, Xinran; Kolouri, Soheil; Kinnell, Peter; Li, Zexin; Liu, Cong; Dora, Shirin; Soltoggio, Andrea (2026).��.��Proceedings of the AAAI Conference on Artificial Intelligence, 40(29), 24504–24512.��

Agentic AI��refers to systems that can set their own goals, adapt to new situations, and improve over time through experience. This study explores how such systems can learn more efficiently by sharing knowledge with one another, instead of learning everything from scratch. In particular, it looks at how an AI agent can decide��what knowledge to reuse, which other agents to learn from, and how to incorporate that knowledge��into its own decision-making process (often called a��policy, or strategy for choosing actions).

The researchers introduce a new method called��MOSAIC (Modular Sharing and Composition in Collective Learning). This approach allows agents to compare tasks using mathematical representations, select useful knowledge from others based on performance and similarity, and integrate it using flexible, modular neural network components. In simple terms, agents can “borrow” and adapt pieces of what others have already learned.

The results show that agents using MOSAIC learn faster and perform better than those learning alone or sharing information indiscriminately. In some cases, they can even solve problems that individual agents cannot. The study also finds that targeted, selective sharing reduces confusion between tasks and leads to a kind of��self-organization, where agents working on easier problems help others tackle more complex ones. Overall, this work highlights the potential of collaborative learning strategies to make AI systems more efficient and adaptable.

ComCat: Expertise-Guided Context Generation to Enhance Code Comprehension

waddelma — Thu, 26 Mar 2026 19:50:56 +0000

Skyler Grandel; Scott Thomas Andersen; Yu Huang; Kevin Leach (2026).��.��ACM Transactions on Software Engineering and Methodology, 35(3), Article 82.��

Software maintenance makes up a large share of the total cost of software over its lifetime, and a big part of that cost comes from understanding existing code. One way to make code easier to understand is through documentation, especially comments that summarize what the code does or explain why it does it. In this work, we introduce ComCat, a system that uses large language models (LLMs, which are AI models trained on very large amounts of text) together with expert guidance to automatically generate useful comments for source code. ComCat is designed to choose the most relevant and informative comment for a specific piece of code. For C/C++ files, the system works in three steps: it first finds places where comments would be most helpful, then decides what kind of comment is needed, and finally writes the comment. In a study with human participants, ComCat’s comments improved code understanding on three software engineering tasks by up to 13% for most participants. The generated comments were also judged to be at least as accurate and readable as human-written comments, and they were preferred over standard ChatGPT-generated comments for up to 92% of code snippets. We also released a dataset containing code snippets, human-written comments, and human-labeled comment categories. Overall, ComCat shows that LLMs can be used to meaningfully improve how well people understand code.

Fig. 1.

ComCat��pipeline and study procedure. We use three instances of HSR to inform��ComCat’s design (1) and evaluate developer performance (2) and preference (3) with our tool.��ComCat��takes C/C++ code as input, using a Code Parser to identify code Snippets to be commented. These Snippets are classified, and the class of each Snippet is used in combination with our Template Catalog to create a prompt for each Snippet. These prompt ChatGPT, which outputs the commented code. This pipeline is informed by developer expertise, but it is fully automated and requires no human intervention.

A Vision-Based Deep Learning Framework for Monitoring and Recognition of Chemical Laboratory Operations

waddelma — Thu, 26 Mar 2026 19:09:53 +0000

Chuntao Guo; Jing Lin; Shunxing Bao; Xin Liu; Yaru Wang; Yunlin Chen (2026).��.��Sensors, 26(4), 1106.��

This study explores a way to automatically monitor how laboratory tasks are performed, focusing on pipetting—a common technique where small amounts of liquid are transferred using a pipette. Ensuring that such procedures are done correctly is important for safety and for producing reliable results, but it is difficult to track in real time because it involves complex hand movements, tool use, and multiple steps that vary between users. To address this, the researchers developed a vision-based artificial intelligence system that uses video recordings instead of physical sensors. The system first applies a YOLO-based model (a type of object detection algorithm) to identify human body positions and interactions with the pipette. It then uses a bidirectional long short-term memory (LSTM) network, a type of deep learning model designed to analyze sequences over time, to understand how the actions unfold step by step.

The results show that this approach can successfully distinguish between correct (standard) and incorrect (non-standard) pipetting behaviors, including different types of errors, and performs better than methods that analyze images one frame at a time without considering motion over time. Overall, the study demonstrates that AI systems using video analysis can provide a practical, non-contact way to monitor laboratory techniques, potentially improving quality control and extending to other manual procedures in scientific labs.

Figure 1.��Representative incorrect pipetting behaviors and key challenges for vision-based QA in chemical laboratory environments.

Monitoring morphometric drift in lifelong learning segmentation of the spinal cord

waddelma — Thu, 26 Mar 2026 19:08:20 +0000

Enamundram Naga Karthik; Sandrine Bédard; Jan Valošek; Christoph S. Aigner; Elise Bannier; Josef Bednařík; Virginie Callot; Anna Combes; Armin Curt; Gergely David; Falk Eippert; Lynn Farner; Michael G. Fehlings; Patrick Freund; Tobias Granberg; Cristina Granziera; Ulrike Horn; Tomáš Horák; Suzanne Humphreys; Markus Hupp; Anne Kerbrat; Nawal Kinany; Shannon Kolind; Petr Kudlička; Anna Lebret; Lisa Eunyoung Lee; Caterina Mainero; Allan R. Martin; Megan McGrath; Govind Nair; Kristin P. O’Grady; Jiwon Oh; Russell Ouellette; Nikolai Pfender; Dario Pfyffer; Pierre-François Pradat; Alexandre Prat; Emanuele Pravatà; Daniel S. Reich; Ilaria Ricchi; Naama Rotem-Kohavi; Simon Schading-Sassenhausen; Maryam Seif; Andrew Smith; Seth A. Smith; Grace Sweeney; Roger Tam; Anthony Traboulsee; Constantina Andrada Treaba; Charidimos Tsagkas; Zachary Vavasour; Dimitri Van De Ville; Kenneth Arnold Weber II; Sarath Chandar; Julien Cohen-Adad (2026).��.��Imaging Neuroscience, 4, Article a.1105.��

This study looks at how measurements of the spinal cord—such as its��cross-sectional area��(the size of the cord when viewed in a slice)—can be used as important indicators (biomarkers) for diagnosing and tracking neurological diseases like multiple sclerosis or spinal cord compression. Modern artificial intelligence methods can automatically identify and outline (segment) the spinal cord in MRI scans, but it is unclear whether these measurements stay consistent as models are updated with new data over time. This consistency is especially important when building “normal” reference values from healthy individuals.

To address this, the researchers developed a spinal cord segmentation model trained on a large and diverse dataset collected from 75 sites and over 1,600 participants, covering different MRI types and various spinal cord conditions. They also created a “lifelong learning” system that continuously monitors changes in measurements (called��morphometric drift) whenever the model is updated. This system automatically runs through a workflow (via GitHub Actions, an automated coding tool) to track how measurements evolve over time.

The results showed that the new model performs very well, accurately identifying the spinal cord even in challenging cases such as severe compression or tissue damage, with a high Dice score (a measure of how closely the model’s segmentation matches the true anatomy) of 0.95. The monitoring system also proved useful for quickly detecting any changes in measurements between model versions. Importantly, the study found that updates to the model caused only minimal shifts in spinal cord measurements, meaning the results remain stable and reliable. This allowed the researchers to safely update an existing database of normal spinal cord measurements. Overall, this work provides a reliable and transparent way to maintain consistency in AI-based medical measurements as models evolve.

Fig 1

Overview of the dataset and image characteristics. Representative axial slices of nine contrasts and the total of images used for each contrast in brackets, the orientation (axial/sagittal) along with the median resolution of images. The respective doughnut chart illustrates the proportion of clinical status among the scanned participants, including healthy controls (HC), patients with radiologically isolated syndrome (RIS), patients with multiple sclerosis (MS), and their different phenotypes, including primary progressive (PPMS) and relapsing-remitting (RRMS), patients with amyotrophic lateral sclerosis (ALS), patients with neuromyelitis optica spectrum disorder (NMOSD), pre-decompression acute traumatic SCI (AcuteSCI), post-decompression traumatic spinal cord injury (SCI), degenerative cervical myelopathy (DCM), and syringomyelia (SYR; not shown). Labels indicate the phenotype associated with the patient, with their respective colors shared across contrast sets.

Cost-effectiveness analysis of artificial intelligence-assisted risk stratification of indeterminate pulmonary nodules

waddelma — Thu, 26 Mar 2026 18:40:52 +0000

Caroline M. Godfrey; Ashley A. Leech; Kevin C. McGann; Jinyi Zhu; Hannah N. Marmor; Sophia Pena; Lyndsey C. Pickup; Fabien Maldonado; Evan C. Osmundson; Stacie B. Dusetzina; Eric L. Grogan; Stephen A. Deppen (2026).��.��PLOS ONE, 21(3), e0343492.��

Researchers evaluated whether artificial intelligence (AI) could help doctors better judge the cancer risk of indeterminate pulmonary nodules, which are small lung spots seen on a CT scan whose cause is not yet clear. These nodules are becoming more common as lung cancer screening and CT imaging are used more often. The team built a decision model to compare two approaches: clinician evaluation alone versus clinician evaluation supported by AI-based radiomics, a method that analyzes patterns in imaging data. They asked whether the AI approach would improve health outcomes and whether it would be worth the extra cost from a payer’s perspective over a patient’s lifetime. In their base case—a 60-year-old patient with a 1.1 cm nodule and a fairly high chance of cancer (65%)—AI support led to a small gain of 0.03 life-years and was cost-effective, with an incremental cost-effectiveness ratio of $4,485 per life-year gained. However, when the chance that the nodule was cancer was very low, below 5%, the AI approach no longer met a typical cost-effectiveness threshold of $100,000 per life-year gained. Overall, the study suggests that AI-assisted nodule assessment is cost-effective in settings where the likelihood of cancer is greater than 5%.

Fig 1.��Decision Model Structure.

The decision tree structure models the risk-stratification of an indeterminate pulmonary nodule utilizing artificial intelligence-assistance compared to the clinician alone. Repeated portions of the model have been collapsed into subtrees (A-G) for readability, each of which represents a diagnostic or management pathway that appears in various parts of the model (‘A’ = surveillance; ‘B’ = PET-CT evaluation; C = minimally invasive surgical (MIS) lobectomy; ‘D’ = low-risk surveillance; ‘E’ = intermediate-risk; ‘F’ = MIS wedge resection; ‘G’ = initial risk classification).

Img2ST-Net: efficient high-resolution spatial omics prediction from whole-slide histology images via fully convolutional image-to-image learning

waddelma — Wed, 28 Jan 2026 17:15:13 +0000

Zhu, Junchao; Deng, Ruining; Guo, Junlin; Yao, Tianyuan; Xiong, Juming; Qu, Chongyu; Yin, Mengmeng; Wang, Yu; Zhao, Shilin; Yang, Haichun; Xu, Daguang; Tang, Yucheng; & Huo, Yuankai. (2025).��.��Journal of Medical Imaging,��12(6), 61410.��

Recent progress in multimodal artificial intelligence has shown that spatial transcriptomics data, which measure where genes are active within tissue, can potentially be generated from standard histology images, reducing the cost and time required for specialized experiments. However, newer spatial transcriptomics platforms such as Visium HD operate at very high resolution, down to about 8 micrometers, which creates major computational challenges. At this scale, traditional methods that predict gene expression one spot at a time become slow, unstable, and poorly suited to the extreme sparsity of gene expression, where many genes have very low or zero signal. To address this, the authors developed Img2ST Net, a high-resolution framework that predicts spatial transcriptomics data from histology images using a fully convolutional neural network, meaning the model generates dense gene expression maps all at once rather than sequentially. The method represents high-resolution spatial transcriptomics as groups of small regions called super pixels and reframes the task as an image generation problem with hundreds or thousands of output channels, each corresponding to a gene. This approach improves efficiency and better preserves spatial structure in the tissue. To evaluate performance under sparse expression conditions, the authors also introduced SSIM ST, a structural similarity-based metric designed specifically for high-resolution spatial transcriptomics. Testing on public breast and colorectal cancer Visium HD datasets at 8 and 16 micrometer resolution showed that Img2ST Net outperformed existing methods in both prediction accuracy and spatial coherence, while reducing training time by up to 28 times compared with spot-based approaches. Additional analyses showed that contrastive learning further improved spatial fidelity. Overall, this work provides a scalable and biologically meaningful solution for predicting high-resolution spatial transcriptomics data and supports future large-scale and resolution-aware spatial omics modeling.

Fig.��1

Modeling paradigm for ST prediction. (a)��Conventional patch-to-spot regression manner for Visium ST data: each WSI contains hundreds of��55μm��spots for the ST slide. A separate gene expression vector is predicted for each spot from its corresponding image patch. (b)��Our proposed image-to-image prediction framework for Visium HD data: each WSI contains millions of��8μm��bins for the HD slide. A region-wise modeling strategy where each image region covers multiple bins is used to predict a high-resolution gene expression map, which enables more fine-grained and computationally efficient inference.

Evaluating cell AI foundation models in kidney pathology with human-in-the-loop enrichment

waddelma — Fri, 19 Dec 2025 16:47:48 +0000

Guo, J., Lu, S., Cui, C., Deng, R., Yao, T., Tao, Z., Lin, Y., Lionts, M., Liu, Q., Xiong, J., Wang, Y., Zhao, S., Chang, C. E., Wilkes, M., Fogo, A. B., Yin, M., Yang, H., & Huo, Y. (2025).��.��Communications Medicine,��5(1), 495.��

Large artificial intelligence foundation models are becoming important tools in healthcare, including digital pathology, where they help analyze medical images. Many of these models have been trained to handle complex tasks such as diagnosing diseases or measuring tissue features using very large and diverse datasets. However, it is less clear how well they perform on more focused tasks, such as identifying and outlining cell nuclei within images from a single organ like the kidney. This study examines how well current cell foundation models perform on this task and explores practical ways to improve them.

To do this, the researchers assembled a large dataset of 2,542 kidney whole slide images collected from multiple medical centers, covering different kidney diseases and even different species. They evaluated three widely used, state-of-the-art cell foundation models—Cellpose, StarDist, and CellViT—for their ability to segment cell nuclei. To improve performance without requiring extensive, time-consuming pixel-level annotations from experts, the team introduced a “human-in-the-loop” approach. This method combines predictions from multiple models to create higher-quality training labels and then refines a subset of difficult cases with corrections from pathologists. The models were fine-tuned using this enriched dataset, and their segmentation accuracy was carefully measured.

The results show that accurately segmenting cell nuclei in kidney pathology remains challenging and benefits from models that are more specifically tailored to this organ. Among the three models, CellViT showed the best initial performance, with an F1 score of 0.78. After fine-tuning with the improved training data, all models performed better, with StarDist reaching the highest F1 score of 0.82. Importantly, combining automatically generated labels from foundation models with a smaller set of pathologist-corrected “hard” image regions consistently improved performance across all models.

Overall, this study provides a clear benchmark for evaluating and improving cell AI foundation models in real-world pathology settings. It also demonstrates that high-quality nuclei segmentation can be achieved with much less expert annotation, supporting more efficient and scalable workflows in clinical pathology without sacrificing accuracy.

Fig. 1: Overall framework.

The upper panel��(a–c) illustrates the diverse evaluation dataset consisting of 2542 kidney WSIs.��a��shows the number of kidney WSIs in publicly available cell nuclei datasets versus our evaluation dataset, which exceeds existing datasets by a large margin.��b��depicts the diverse data sources included in our dataset.��c��indicates that these WSIs were stained using Hematoxylin and Eosin (H&E), Periodic acid–Schiff methenamine (PASM), and Periodic acid–Schiff (PAS).��Performance: Kidney cell nuclei instance segmentation was performed using three SOTA cell foundation models: Cellpose, StarDist, and CellViT. Model performance was evaluated based on qualitative human feedback for each prediction mask. Data Enrichment: A human-in-the-loop (HITL) design integrates prediction masks from performance evaluation into the model’s continual learning process, reducing reliance on pixel-level human annotation.

Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step

waddelma — Fri, 19 Dec 2025 16:44:58 +0000

Clayton, E. W., Rose, S., Nebecker, C., Novak, L., Bensoussan, Y. E., Chen, Y., Collins, B. X., Cordes, A., Evans, B. J., Ferryman, K. S., Hurst, S., Jiang, X., Lee, A. Y., McWeeney, S., Parker, J., Bélisle-Pipon, J.-C., Rosenthal, E. S., Yin, Z., Yracheta, J. M., & Malin, B. A. (2025).��.��JAMIA Open,��8(6), ooaf134.��

This article examines the experience of the NIH’s Bridge2AI Program, which funded four large biomedical and behavioral datasets designed to be well documented and ready for use with artificial intelligence (AI) and machine learning (ML). The goal of these datasets is to encourage responsible and effective use of AI in research, but building them raised many ethical, legal, social, and practical challenges. The authors describe the key steps involved in creating and managing these AI-ready datasets, including deciding which data to collect and why, responding to public concerns, handling participant consent based on how the data were obtained, ensuring responsible future use, determining where and how data are stored, clarifying how much control participants have over data sharing, and setting rules for data access and downloading.

Across these steps, the projects faced important questions about long-term data storage, future uses of the data, and how to balance openness with privacy and participant protection. The authors highlight the different choices made by the four projects, such as how they gathered public input, selected data storage solutions, and defined criteria for who can access and download the data. Although the governance approaches varied, common themes emerged, suggesting shared best practices.

Overall, the article summarizes key lessons learned from the Bridge2AI Program about how to collect, manage, and govern large datasets intended for AI and ML. These insights can guide future initiatives in designing datasets that are not only technically useful for AI, but also ethically sound, socially responsible, and trustworthy.

Figure 1.

Steps in governance of data collection and decision-making and responsible use for the development of AI with greater attention to public concerns throughout. The first 2 steps—promoting responsible selection—address the primary work of the DGPs, while the remaining 4 steps—promoting responsible use—are crucial factors the DGPs must consider.

Human-centered design of an artificial intelligence monitoring system: the 91�� Algorithmovigilance Monitoring and Operations System

waddelma — Sun, 23 Nov 2025 16:58:07 +0000

Salwei, Megan E., Davis, Sharon E., Reale, Carrie., Novak, Laurie Lovett., Walsh, Colin G., Beebe, Russ., Nelson, Scott D., Sundrani, Sameer., Rose, Susannah L., Wright, Adam T., Ripperger, Michael A., Shave, Peter., & Embi, Peter J. [2025]. .��JAMIA Open,��8(5), ooaf136.��

As artificial intelligence [AI] becomes more common in healthcare, there is growing awareness that these systems need continuous oversight after they are put into use—a process known as algorithmovigilance. However, few tools exist to help hospitals consistently monitor and manage the performance of AI across their entire organization. In this study, we worked to understand what end users need from such a system while designing a new monitoring platform called the 91�� Algorithmovigilance Monitoring and Operations System [VAMOS]. To do this, we brought together a multidisciplinary team at 91�� Medical Center and held nine participatory design sessions with clinicians, leaders, and technical experts to create early prototypes. After developing a working version, we conducted eight additional interviews to gather feedback and used rapid qualitative analysis to refine the design. A multidisciplinary heuristic evaluation then helped identify more ways to improve the system. Through this human-centered, iterative process, we identified the key features an AI monitoring system must include, such as specific data displays, performance dashboards, expandable “accordion” summaries, and model-specific pages that meet the needs of a wide range of users. We also outlined general design principles for long-term AI monitoring, highlighting the challenge of supporting teams spread across the health system as they track performance issues and respond to signs of algorithm deterioration. Ultimately, VAMOS is intended to help healthcare organizations monitor AI tools in a systematic and proactive way, with the goal of improving care quality and ensuring patient safety.

Figure 1.

Overview of human-centered design process to develop VAMOS.

��

AI | VALIANT

Factors influencing the effectiveness of artificial intelligence-assisted decision-making in medicine: a scoping review

Policy Search, Retrieval, and Composition via Task Similarity in Collaborative Agentic Systems

ComCat: Expertise-Guided Context Generation to Enhance Code Comprehension

A Vision-Based Deep Learning Framework for Monitoring and Recognition of Chemical Laboratory Operations

Monitoring morphometric drift in lifelong learning segmentation of the spinal cord

Cost-effectiveness analysis of artificial intelligence-assisted risk stratification of indeterminate pulmonary nodules

Img2ST-Net: efficient high-resolution spatial omics prediction from whole-slide histology images via fully convolutional image-to-image learning

Fig.��1

Evaluating cell AI foundation models in kidney pathology with human-in-the-loop enrichment

Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step

Human-centered design of an artificial intelligence monitoring system: the 91������ Algorithmovigilance Monitoring and Operations System

Human-centered design of an artificial intelligence monitoring system: the 91�� Algorithmovigilance Monitoring and Operations System