Research Papers

ARXIV Cancer: unknown Method: U-Net CNN

From 100,000+ images to winning the first brain MRI foundation model challenges: Sharing lessons and models

Pedro M. Gordaliza, Jaume Banus, Benoît Gérin, Maxence Wynen, Nataliia Molchanova, Jonas Richiardi, Meritxell Bach Cuadra
Published 2026-01-19 15:43

This paper discusses the development of Foundation Models for medical image analysis, specifically targeting the challenges associated with 3D brain MRI. The authors' solution, which employs a U-Net CNN architecture, achieved first place in two contests at MICCAI 2025. The models demonstrated significant efficiency, training faster and being smaller than competing transformer-based methods.

Read abstract

Developing Foundation Models for medical image analysis is essential to overcome the unique challenges of radiological tasks. The first challenges of this kind for 3D brain MRI, SSL3D and FOMO25, were held at MICCAI 2025. Our solution ranked first in tracks of both contests. It relies on a U-Net CNN architecture combined with strategies leveraging anatomical priors and neuroimaging domain knowledge. Notably, our models trained 1-2 orders of magnitude faster and were 10 times smaller than competing transformer-based approaches. Models are available here: https://github.com/jbanusco/BrainFM4Challenges.

ARXIV Cancer: lung cancer Method: Gradient-weighted Class Activation Mapping

Seeing Isn't Always Believing: Analysis of Grad-CAM Faithfulness and Localization Reliability in Lung Cancer CT Classification

Teerapong Panboonyuen
Published 2026-01-19 08:35

This study investigates the faithfulness and reliability of Grad-CAM, an explainable AI technique, in the context of lung cancer image classification. By evaluating various deep learning architectures, the research introduces a quantitative framework to assess Grad-CAM's interpretability. The findings indicate that while Grad-CAM highlights tumor regions effectively in convolutional networks, its reliability diminishes in Vision Transformer models, raising concerns about the trustworthiness of such explanations in medical AI.

Read abstract

Explainable Artificial Intelligence (XAI) techniques, such as Gradient-weighted Class Activation Mapping (Grad-CAM), have become indispensable for visualizing the reasoning process of deep neural networks in medical image analysis. Despite their popularity, the faithfulness and reliability of these heatmap-based explanations remain under scrutiny. This study critically investigates whether Grad-CAM truly represents the internal decision-making of deep models trained for lung cancer image classification. Using the publicly available IQ-OTH/NCCD dataset, we evaluate five representative architectures: ResNet-50, ResNet-101, DenseNet-161, EfficientNet-B0, and ViT-Base-Patch16-224, to explore model-dependent variations in Grad-CAM interpretability. We introduce a quantitative evaluation framework that combines localization accuracy, perturbation-based faithfulness, and explanation consistency to assess Grad-CAM reliability across architectures. Experimental findings reveal that while Grad-CAM effectively highlights salient tumor regions in most convolutional networks, its interpretive fidelity significantly degrades for Vision Transformer models due to non-local attention behavior. Furthermore, cross-model comparisons indicate substantial variability in saliency localization, implying that Grad-CAM explanations may not always correspond to the true diagnostic evidence used by the networks. This work exposes critical limitations of current saliency-based XAI approaches in medical imaging and emphasizes the need for model-aware interpretability methods that are both computationally sound and clinically meaningful. Our findings aim to inspire a more cautious and rigorous adoption of visual explanation tools in medical AI, urging the community to rethink what it truly means to "trust" a model's explanation.

ARXIV Cancer: general cancer Method: multimodal learning

A Generalist Foundation Model for Total-body PET/CT Enables Diagnostic Reporting and System-wide Metabolic Profiling

Wei Chen, Liang Wu, Shuyi Lu, Yuanyuan Sun, Wenkai Bi, Zilong Yuan, Yaoyao He, Feng Wang, Junchi Ma, Shuyong Liu, Zhaoping Cheng, Xiaoyan Hu, Jianfeng Qiu
Published 2026-01-19 08:30

This paper presents SDF-HOLO, a multimodal foundation model designed for total-body PET/CT imaging, addressing challenges posed by heterogeneous anatomical and metabolic signals. The model utilizes dual-stream encoders for decoupling CT and PET representation learning and incorporates hierarchical context modeling to capture long-range dependencies. SDF-HOLO demonstrates superior performance in tumor segmentation, low-dose lesion detection, and multilingual diagnostic report generation, while also facilitating system-wide metabolic profiling for precision oncology.

Read abstract

Total-body PET/CT enables system-wide molecular imaging, but heterogeneous anatomical and metabolic signals, approximately 2 m axial coverage, and structured radiology semantics challenge existing medical AI models that assume single-modality inputs, localized fields of view, and coarse image-text alignment. We introduce SDF-HOLO (Systemic Dual-stream Fusion Holo Model), a multimodal foundation model for holistic total-body PET/CT, pre-trained on more than 10,000 patients. SDF-HOLO decouples CT and PET representation learning with dual-stream encoders and couples them through a cross-modal interaction module, allowing anatomical context to refine PET aggregation while metabolic saliency guides subtle morphological reasoning. To model long-range dependencies across the body, hierarchical context modeling combines efficient local windows with global attention. To bridge voxels and clinical language, we use anatomical segmentation masks as explicit semantic anchors and perform voxel-mask-text alignment during pre-training. Across tumor segmentation, low-dose lesion detection, and multilingual diagnostic report generation, SDF-HOLO outperforms strong task-specific and clinical-reference baselines while reducing localization errors and hallucinated findings. Beyond focal interpretation, the model enables system-wide metabolic profiling and reveals tumor-associated fingerprints of inter-organ metabolic network interactions, providing a scalable computational foundation for total-body PET/CT diagnostics and system-level precision oncology.

ARXIV Cancer: brain tumor Method: convolutional neural network

Exploiting Test-Time Augmentation in Federated Learning for Brain Tumor MRI Classification

Thamara Leandra de Deus Melo, Rodrigo Moreira, Larissa Ferreira Rodrigues Moreira, André Ricardo Backes
Published 2026-01-19 02:32

This study investigates the use of convolutional neural networks (CNNs) within a federated learning framework for the classification of brain tumor MRI images. The research compares the performance of models trained on original images versus those subjected to preprocessing techniques. The findings indicate that while preprocessing alone offers minimal improvements, the integration of test-time augmentation (TTA) significantly enhances classification accuracy. The authors recommend adopting TTA as a standard practice in federated medical imaging.

Read abstract

Efficient brain tumor diagnosis is crucial for early treatment; however, it is challenging because of lesion variability and image complexity. We evaluated convolutional neural networks (CNNs) in a federated learning (FL) setting, comparing models trained on original versus preprocessed MRI images (resizing, grayscale conversion, normalization, filtering, and histogram equalization). Preprocessing alone yielded negligible gains; combined with test-time augmentation (TTA), it delivered consistent, statistically significant improvements in federated MRI classification (p<0.001). In practice, TTA should be the default inference strategy in FL-based medical imaging; when the computational budget permits, pairing TTA with light preprocessing provides additional reliable gains.

ARXIV Cancer: colorectal cancer Method: federated learning

Generalizable Hyperparameter Optimization for Federated Learning on Non-IID Cancer Images

Elisa Gonçalves Ribeiro, Rodrigo Moreira, Larissa Ferreira Rodrigues Moreira, André Ricardo Backes
Published 2026-01-19 02:24

This study investigates the generalization of hyperparameters optimized on one cancer imaging dataset to non-independent and identically distributed (non-IID) federated learning scenarios. The research focuses on binary histopathology tasks for ovarian and colorectal cancers, utilizing centralized Bayesian hyperparameter optimization. A novel cross-dataset aggregation heuristic is introduced, which combines configurations to enhance classification performance.

Read abstract

Deep learning for cancer histopathology training conflicts with privacy constraints in clinical settings. Federated Learning (FL) mitigates this by keeping data local; however, its performance depends on hyperparameter choices under non-independent and identically distributed (non-IID) client datasets. This paper examined whether hyperparameters optimized on one cancer imaging dataset generalized across non-IID federated scenarios. We considered binary histopathology tasks for ovarian and colorectal cancers. We perform centralized Bayesian hyperparameter optimization and transfer dataset-specific optima to the non-IID FL setup. The main contribution of this study is the introduction of a simple cross-dataset aggregation heuristic by combining configurations by averaging the learning rates and considering the modal optimizers and batch sizes. This combined configuration achieves a competitive classification performance.

ARXIV Cancer: unknown Method: large language models

Intelligent Documentation in Medical Education: Can AI Replace Manual Case Logging?

Nafiz Imtiaz Khan, Kylie Cleland, Vladimir Filkov, Roger Eric Goldman
Published 2026-01-19 01:45

This study explores the use of large language models (LLMs) to automate procedural case log documentation in radiology training, addressing the challenges of time consumption and inconsistency in manual logging. The research evaluates various LLMs on their ability to extract structured information from interventional radiology reports, assessing performance through sensitivity, specificity, and F1-score. Results indicate that these models can achieve high extraction performance, suggesting significant potential for reducing clerical burdens and improving documentation consistency in medical education.

Read abstract

Procedural case logs are a core requirement in radiology training, yet they are time-consuming to complete and prone to inconsistency when authored manually. This study investigates whether large language models (LLMs) can automate procedural case log documentation directly from free-text radiology reports. We evaluate multiple local and commercial LLMs under instruction-based and chain-of-thought prompting to extract structured procedural information from 414 curated interventional radiology reports authored by nine residents between 2018 and 2024. Model performance is assessed using sensitivity, specificity, and F1-score, alongside inference latency and token efficiency to estimate operational cost. Results show that both local and commercial models achieve strong extraction performance, with best F1-scores approaching 0.87, while exhibiting different trade-offs between speed and cost. Automation using LLMs has the potential to substantially reduce clerical burden for trainees and improve consistency in case logging. These findings demonstrate the feasibility of AI-assisted documentation in medical education and highlight the need for further validation across institutions and clinical workflows.

ARXIV Cancer: general cancer Method: vision-language model

Histopath-C: Towards Realistic Domain Shifts for Histopathology Vision-Language Adaptation

Mehrdad Noori, Gustavo Adolfo Vargas Hakim, David Osowiechi, Fereshteh Shakeri, Ali Bahri, Moslem Yazdanpanah, Sahar Dastani, Ismail Ben Ayed, Christian Desrosiers
Published 2026-01-18 17:06

This paper presents Histopath-C, a benchmark aimed at addressing the challenges posed by domain shifts in histopathology images. The authors introduce a framework that applies realistic synthetic corruptions to datasets and evaluates Test-Time Adaptation (TTA) mechanisms. They propose a novel adaptation strategy called LATTE, which utilizes multiple text templates to enhance the robustness of vision-language models in histopathology. The results indicate that their approach surpasses existing TTA methods designed for natural images across various histopathology datasets.

Read abstract

Medical Vision-language models (VLMs) have shown remarkable performances in various medical imaging domains such as histo\-pathology by leveraging pre-trained, contrastive models that exploit visual and textual information. However, histopathology images may exhibit severe domain shifts, such as staining, contamination, blurring, and noise, which may severely degrade the VLM's downstream performance. In this work, we introduce Histopath-C, a new benchmark with realistic synthetic corruptions designed to mimic real-world distribution shifts observed in digital histopathology. Our framework dynamically applies corruptions to any available dataset and evaluates Test-Time Adaptation (TTA) mechanisms on the fly. We then propose LATTE, a transductive, low-rank adaptation strategy that exploits multiple text templates, mitigating the sensitivity of histopathology VLMs to diverse text inputs. Our approach outperforms state-of-the-art TTA methods originally designed for natural images across a breadth of histopathology datasets, demonstrating the effectiveness of our proposed design for robust adaptation in histopathology images. Code and data are available at https://github.com/Mehrdad-Noori/Histopath-C.

ARXIV Cancer: triple-negative breast cancer Method: mathematical model

Identifying Therapeutic Targets for Triple-Negative Breast Cancer using a Novel Mathematical Model of the Tumor Microenvironment

Kyle Adams, Julia Bruner, Salma Ameziane, Ashley Brown, Mohammed Gbadamosi, Helen Moore
Published 2026-01-18 15:30

This study focuses on triple-negative breast cancer (TNBC), an aggressive form of cancer with limited treatment options. A novel mathematical model was developed to describe the interactions within the tumor microenvironment (TME) of TNBC, utilizing a system of ordinary differential equations. The model identifies key cellular interactions and parameters that influence tumor burden, providing insights into potential therapeutic strategies for TNBC.

Read abstract

Triple-negative breast cancer (TNBC) is an aggressive disease with high mortality and limited treatment options, due to its lack of receptors that have targeted therapies available. The tumor microenvironment (TME) plays a critical role in TNBC progression and therapeutic resistance. In this work, we developed a novel mathematical model to describe key cellular interactions within the TNBC TME, informed by current literature and expert input. Our model consists of a system of ordinary differential equations representing five interacting cell populations: M2 macrophages, cancer-associated fibroblasts, TNBC tumor cells, cytotoxic T lymphocytes, and regulatory T cells. We performed global sensitivity analysis to determine which model parameters most strongly influence tumor burden over a clinically-relevant treatment timeframe. The pathways associated with the most-influential parameters correspond to biological mechanisms that are consistent with known and emerging therapeutic strategies in TNBC, including stromal-mediated tumor support. These results highlight key regulatory interactions within the TNBC TME and provide a quantitative framework for hypothesis generation and future investigation of combination treatment strategies.

ARXIV Cancer: breast cancer Method: Pyramid Adaptive Atrous Convolution

An Innovative Framework for Breast Cancer Detection Using Pyramid Adaptive Atrous Convolution, Transformer Integration, and Multi-Scale Feature Fusion

Ehsan Sadeghi Pour, Mahdi Esmaeili, Morteza Romoozi
Published 2026-01-18 03:55

This study presents an innovative framework for detecting malignant masses in mammographic images, integrating Pyramid Adaptive Atrous Convolution and Transformer architectures. The approach employs Multi-Scale Feature Fusion to enhance feature extraction from both benign and malignant tissues, achieving high accuracy in breast cancer classification. The model demonstrated an accuracy of 98.5%, significantly outperforming traditional methods and confirming its effectiveness in complex diagnostic scenarios.

Read abstract

Breast cancer is one of the most common cancers among women worldwide, and its accurate and timely diagnosis plays a critical role in improving treatment outcomes. This thesis presents an innovative framework for detecting malignant masses in mammographic images by integrating the Pyramid Adaptive Atrous Convolution (PAAC) and Transformer architectures. The proposed approach utilizes Multi-Scale Feature Fusion to enhance the extraction of features from benign and malignant tissues and combines Dice Loss and Focal Loss functions to improve the model's learning process, effectively reducing errors in binary breast cancer classification and achieving high accuracy and efficiency. In this study, a comprehensive dataset of breast cancer images from INbreast, MIAS, and DDSM was preprocessed through data augmentation and contrast enhancement and resized to 227x227 pixels for model training. Leveraging the Transformer's ability to manage long-range dependencies with Self-Attention mechanisms, the proposed model achieved high accuracy in detecting cancerous masses, outperforming foundational models such as BreastNet, DeepMammo, Multi-Scale CNN, Swin-Unet, and SegFormer. The final evaluation results for the proposed model include an accuracy of 98.5\%, sensitivity of 97.8\%, specificity of 96.3\%, F1-score of 98.2\%, and overall precision of 97.9\%. These metrics demonstrate a significant improvement over traditional methods and confirm the model's effectiveness in identifying cancerous masses in complex scenarios and large datasets. This model shows potential as a reliable and efficient tool for breast cancer diagnosis and can be effectively integrated into medical diagnostic systems.

ARXIV Cancer: general cancer Method: diffusion model

DiffusionQC: Artifact Detection in Histopathology via Diffusion Model

Zhenzhen Wang, Zhongliang Zhou, Zhuoyu Wen, Jeong Hwan Kook, John B Wojcik, John Kang
Published 2026-01-18 02:59

The paper presents DiffusionQC, a novel approach for detecting artifacts in histopathology images using a diffusion model. This method requires only a set of clean images for training, eliminating the need for extensive annotated datasets. The introduction of a contrastive learning module enhances the separation between artifact and clean images, leading to improved performance compared to existing methods. Empirical results indicate that DiffusionQC offers superior artifact detection and generalization across different staining techniques.

Read abstract

Digital pathology plays a vital role across modern medicine, offering critical insights for disease diagnosis, prognosis, and treatment. However, histopathology images often contain artifacts introduced during slide preparation and digitization. Detecting and excluding them is essential to ensure reliable downstream analysis. Traditional supervised models typically require large annotated datasets, which is resource-intensive and not generalizable to novel artifact types. To address this, we propose DiffusionQC, which detects artifacts as outliers among clean images using a diffusion model. It requires only a set of clean images for training rather than pixel-level artifact annotations and predefined artifact types. Furthermore, we introduce a contrastive learning module to explicitly enlarge the distribution separation between artifact and clean images, yielding an enhanced version of our method. Empirical results demonstrate superior performance to state-of-the-art and offer cross-stain generalization capacity, with significantly less data and annotations.

Find the papers that actually matter