Research Papers

ARXIV Cancer: brain tumor Method: spiking neural networks

Reliable Brain Tumor Segmentation Based on Spiking Neural Networks with Efficient Training

Aurora Pia Ghiardelli, Guangzhi Tang, Tao Sun
Published 2026-01-23 11:16

This study presents a framework for 3D brain tumor segmentation utilizing spiking neural networks (SNNs) that emphasizes energy efficiency and reliability. The method incorporates a multi-view ensemble approach to enhance segmentation robustness and provides voxel-wise uncertainty estimation. The training process is optimized using Forward Propagation Through Time (FPTT), resulting in significant reductions in computational costs while maintaining competitive accuracy.

Read abstract

We propose a reliable and energy-efficient framework for 3D brain tumor segmentation using spiking neural networks (SNNs). A multi-view ensemble of sagittal, coronal, and axial SNN models provides voxel-wise uncertainty estimation and enhances segmentation robustness. To address the high computational cost in training SNN models for semantic image segmentation, we employ Forward Propagation Through Time (FPTT), which maintains temporal learning efficiency with significantly reduced computational cost. Experiments on the Multimodal Brain Tumor Segmentation Challenges (BraTS 2017 and BraTS 2023) demonstrate competitive accuracy, well-calibrated uncertainty, and an 87% reduction in FLOPs, underscoring the potential of SNNs for reliable, low-power medical IoT and Point-of-Care systems.

ARXIV Cancer: general cancer Method: hybrid encoder-decoder architecture

PanopMamba: Vision State Space Modeling for Nuclei Panoptic Segmentation

Ming Kang, Fung Fung Ting, Raphaël C. -W. Phan, Zongyuan Ge, Chee-Ming Ting
Published 2026-01-23 10:33

This paper presents PanopMamba, a novel hybrid encoder-decoder architecture designed for nuclei panoptic segmentation in histopathology images. The method addresses challenges such as detecting small objects and handling ambiguous boundaries by integrating Mamba and Transformer architectures with state space modeling. Experimental results on benchmark datasets demonstrate the effectiveness of PanopMamba in improving segmentation performance compared to existing methods.

Read abstract

Nuclei panoptic segmentation supports cancer diagnostics by integrating both semantic and instance segmentation of different cell types to analyze overall tissue structure and individual nuclei in histopathology images. Major challenges include detecting small objects, handling ambiguous boundaries, and addressing class imbalance. To address these issues, we propose PanopMamba, a novel hybrid encoder-decoder architecture that integrates Mamba and Transformer with additional feature-enhanced fusion via state space modeling. We design a multiscale Mamba backbone and a State Space Model (SSM)-based fusion network to enable efficient long-range perception in pyramid features, thereby extending the pure encoder-decoder framework while facilitating information sharing across multiscale features of nuclei. The proposed SSM-based feature-enhanced fusion integrates pyramid feature networks and dynamic feature enhancement across different spatial scales, enhancing the feature representation of densely overlapping nuclei in both semantic and spatial dimensions. To the best of our knowledge, this is the first Mamba-based approach for panoptic segmentation. Additionally, we introduce alternative evaluation metrics, including image-level Panoptic Quality ($i$PQ), boundary-weighted PQ ($w$PQ), and frequency-weighted PQ ($fw$PQ), which are specifically designed to address the unique challenges of nuclei segmentation and thereby mitigate the potential bias inherent in vanilla PQ. Experimental evaluations on two multiclass nuclei segmentation benchmark datasets, MoNuSAC2020 and NuInsSeg, demonstrate the superiority of PanopMamba for nuclei panoptic segmentation over state-of-the-art methods. Consequently, the robustness of PanopMamba is validated across various metrics, while the distinctiveness of PQ variants is also demonstrated. Code is available at https://github.com/mkang315/PanopMamba.

ARXIV Cancer: general cancer Method: interactive foundation model

VISTA-PATH: An interactive foundation model for pathology image segmentation and quantitative analysis in computational pathology

Peixian Liang, Songhao Li, Shunsuke Koga, Yutong Li, Zahra Alipour, Yucheng Tang, Daguang Xu, Zhi Huang
Published 2026-01-23 05:06

The paper introduces VISTA-PATH, an interactive foundation model designed for semantic segmentation of histopathology images, which enhances quantitative tissue analysis and clinical modeling. This model incorporates expert feedback and visual context to produce precise multi-class segmentations across diverse pathology images. VISTA-PATH outperforms existing models in extensive benchmarks and supports dynamic human-in-the-loop refinement, ultimately improving tissue microenvironment analysis and correlating with patient survival.

Read abstract

Accurate semantic segmentation for histopathology image is crucial for quantitative tissue analysis and downstream clinical modeling. Recent segmentation foundation models have improved generalization through large-scale pretraining, yet remain poorly aligned with pathology because they treat segmentation as a static visual prediction task. Here we present VISTA-PATH, an interactive, class-aware pathology segmentation foundation model designed to resolve heterogeneous structures, incorporate expert feedback, and produce pixel-level segmentation that are directly meaningful for clinical interpretation. VISTA-PATH jointly conditions segmentation on visual context, semantic tissue descriptions, and optional expert-provided spatial prompts, enabling precise multi-class segmentation across heterogeneous pathology images. To support this paradigm, we curate VISTA-PATH Data, a large-scale pathology segmentation corpus comprising over 1.6 million image-mask-text triplets spanning 9 organs and 93 tissue classes. Across extensive held-out and external benchmarks, VISTA-PATH consistently outperforms existing segmentation foundation models. Importantly, VISTA-PATH supports dynamic human-in-the-loop refinement by propagating sparse, patch-level bounding-box annotation feedback into whole-slide segmentation. Finally, we show that the high-fidelity, class-aware segmentation produced by VISTA-PATH is a preferred model for computational pathology. It improve tissue microenvironment analysis through proposed Tumor Interaction Score (TIS), which exhibits strong and significant associations with patient survival. Together, these results establish VISTA-PATH as a foundation model that elevates pathology image segmentation from a static prediction to an interactive and clinically grounded representation for digital pathology. Source code and demo can be found at https://github.com/zhihuanglab/VISTA-PATH.

ARXIV Cancer: general cancer Method: attention-guided attribution

Cite-While-You-Generate: Training-Free Evidence Attribution for Multimodal Clinical Summarization

Qianqi Yan, Huy Nguyen, Sumana Srivatsa, Hari Bandi, Xin Eric Wang, Krishnaram Kenthapadi
Published 2026-01-23 02:01

This paper presents a training-free framework for evidence attribution in multimodal clinical summarization, focusing on the transparency of generated content. The proposed methods utilize decoder attentions to cite supporting text spans or images, addressing limitations of previous approaches. Evaluations demonstrate that the framework outperforms existing baselines in attribution accuracy across clinician-patient dialogues and radiology reports.

Read abstract

Trustworthy clinical summarization requires not only fluent generation but also transparency about where each statement comes from. We propose a training-free framework for generation-time source attribution that leverages decoder attentions to directly cite supporting text spans or images, overcoming the limitations of post-hoc or retraining-based methods. We introduce two strategies for multimodal attribution: a raw image mode, which directly uses image patch attentions, and a caption-as-span mode, which substitutes images with generated captions to enable purely text-based alignment. Evaluations on two representative domains: clinician-patient dialogues (CliConSummation) and radiology reports (MIMIC-CXR), show that our approach consistently outperforms embedding-based and self-attribution baselines, improving both text-level and multimodal attribution accuracy (e.g., +15% F1 over embedding baselines). Caption-based attribution achieves competitive performance with raw-image attention while being more lightweight and practical. These findings highlight attention-guided attribution as a promising step toward interpretable and deployable clinical summarization systems.

ARXIV Cancer: unknown Method: federated learning

FeTTL: Federated Template and Task Learning for Multi-Institutional Medical Imaging

Abhijeet Parida, Antonia Alomar, Zhifan Jiang, Pooneh Roshanitabrizi, Austin Tapp, Ziyue Xu, Syed Muhammad Anwar, Maria J. Ledesma-Carbayo, Holger R. Roth, Marius George Linguraru
Published 2026-01-22 20:14

This paper presents Federated Template and Task Learning (FeTTL), a framework aimed at improving model performance in federated learning settings for medical imaging. The method addresses challenges posed by domain shifts and data heterogeneity across different medical institutions. FeTTL was evaluated on tasks related to retinal fundus optical disc segmentation and histopathological metastasis classification, demonstrating significant performance improvements over existing federated learning approaches.

Read abstract

Federated learning enables collaborative model training across geographically distributed medical centers while preserving data privacy. However, domain shifts and heterogeneity in data often lead to a degradation in model performance. Medical imaging applications are particularly affected by variations in acquisition protocols, scanner types, and patient populations. To address these issues, we introduce Federated Template and Task Learning (FeTTL), a novel framework designed to harmonize multi-institutional medical imaging data in federated environments. FeTTL learns a global template together with a task model to align data distributions among clients. We evaluated FeTTL on two challenging and diverse multi-institutional medical imaging tasks: retinal fundus optical disc segmentation and histopathological metastasis classification. Experimental results show that FeTTL significantly outperforms the state-of-the-art federated learning baselines (p-values <0.002) for optical disc segmentation and classification of metastases from multi-institutional data. Our experiments further highlight the importance of jointly learning the template and the task. These findings suggest that FeTTL offers a principled and extensible solution for mitigating distribution shifts in federated learning, supporting robust model deployment in real-world, multi-institutional environments.

ARXIV Cancer: colon cancer Method: convolutional neural network

Phi-SegNet: Phase-Integrated Supervision for Medical Image Segmentation

Shams Nafisa Ali, Taufiq Hasan
Published 2026-01-22 16:00

The paper presents Phi-SegNet, a CNN-based architecture designed to enhance medical image segmentation by integrating phase-aware information. This approach addresses the limitations of existing models that primarily focus on spatial information, by incorporating frequency-domain representations for improved object localization. The model was evaluated on multiple public datasets and demonstrated state-of-the-art performance, achieving significant improvements in intersection over union (IoU) and F1-score metrics. The findings suggest that leveraging spectral priors can enhance segmentation frameworks across various imaging modalities.

Read abstract

Deep learning has substantially advanced medical image segmentation, yet achieving robust generalization across diverse imaging modalities and anatomical structures remains a major challenge. A key contributor to this limitation lies in how existing architectures, ranging from CNNs to Transformers and their hybrids, primarily encode spatial information while overlooking frequency-domain representations that capture rich structural and textural cues. Although few recent studies have begun exploring spectral information at the feature level, supervision-level integration of frequency cues-crucial for fine-grained object localization-remains largely untapped. To this end, we propose Phi-SegNet, a CNN-based architecture that incorporates phase-aware information at both architectural and optimization levels. The network integrates Bi-Feature Mask Former (BFMF) modules that blend neighboring encoder features to reduce semantic gaps, and Reverse Fourier Attention (RFA) blocks that refine decoder outputs using phase-regularized features. A dedicated phase-aware loss aligns these features with structural priors, forming a closed feedback loop that emphasizes boundary precision. Evaluated on five public datasets spanning X-ray, US, histopathology, MRI, and colonoscopy, Phi-SegNet consistently achieved state-of-the-art performance, with an average relative improvement of 1.54+/-1.26% in IoU and 0.98+/-0.71% in F1-score over the next best-performing model. In cross-dataset generalization scenarios involving unseen datasets from the known domain, Phi-SegNet also exhibits robust and superior performance, highlighting its adaptability and modality-agnostic design. These findings demonstrate the potential of leveraging spectral priors in both feature representation and supervision, paving the way for generalized segmentation frameworks that excel in fine-grained object localization.

ARXIV Cancer: unknown Method: self-supervised learning

RadJEPA: Radiology Encoder for Chest X-Rays via Joint Embedding Predictive Architecture

Anas Anwarul Haq Khan, Mariam Husain, Kshitij Jadhav
Published 2026-01-22 12:11

This paper presents RadJEPA, a self-supervised framework designed for learning robust radiology encoders from chest X-ray images without relying on language supervision. The model utilizes a Joint Embedding Predictive Architecture to predict latent representations of masked image regions. Evaluation results demonstrate that RadJEPA outperforms existing state-of-the-art methods in various tasks, including disease classification and semantic segmentation.

Read abstract

Recent advances in medical vision language models guide the learning of visual representations; however, this form of supervision is constrained by the availability of paired image text data, raising the question of whether robust radiology encoders can be learned without relying on language supervision. In this work, we introduce RadJEPA, a self-supervised framework built on a Joint Embedding Predictive Architecture that learns without language supervision. Pre-trained solely on unlabeled chest X-ray images, the model learns to predict latent representations of masked image regions. This predictive objective differs fundamentally from both image text pre-training and DINO-style self-distillation: rather than aligning global representations across views or modalities, RadJEPA explicitly models latent-space prediction. We evaluate the learned encoder on disease classification, semantic segmentation, and report generation tasks. Across benchmarks, RadJEPA achieves performance exceeding state-of-the-art approaches, including Rad-DINO.

ARXIV Cancer: general cancer Method: image-to-image translation

PMPBench: A Paired Multi-Modal Pan-Cancer Benchmark for Medical Image Synthesis

Yifan Chen, Fei Yin, Hao Chen, Jia Wu, Chao Li
Published 2026-01-22 11:58

This paper introduces PMPBench, a novel public dataset designed for paired multi-modal medical image synthesis across various cancer types. The dataset addresses limitations in existing resources by providing fully paired dynamic contrast-enhanced MR sequences and corresponding non-contrast and contrast-enhanced CT acquisitions. The goal is to facilitate AI-based image translation for synthesizing contrast-enhanced images from non-contrast scans, thereby improving clinical workflows in oncology. The authors also establish a benchmark for evaluating image-to-image translation methods using this dataset.

Read abstract

Contrast medium plays a pivotal role in radiological imaging, as it amplifies lesion conspicuity and improves detection for the diagnosis of tumor-related diseases. However, depending on the patient's health condition or the medical resources available, the use of contrast medium is not always feasible. Recent work has explored AI-based image translation to synthesize contrast-enhanced images directly from non-contrast scans, aims to reduce side effects and streamlines clinical workflows. Progress in this direction has been constrained by data limitations: (1) existing public datasets focus almost exclusively on brain-related paired MR modalities; (2) other collections include partially paired data but suffer from missing modalities/timestamps and imperfect spatial alignment; (3) explicit labeling of CT vs. CTC or DCE phases is often absent; (4) substantial resources remain private. To bridge this gap, we introduce the first public, fully paired, pan-cancer medical imaging dataset spanning 11 human organs. The MR data include complete dynamic contrast-enhanced (DCE) sequences covering all three phases (DCE1-DCE3), while the CT data provide paired non-contrast and contrast-enhanced acquisitions (CTC). The dataset is curated for anatomical correspondence, enabling rigorous evaluation of 1-to-1, N-to-1, and N-to-N translation settings (e.g., predicting DCE phases from non-contrast inputs). Built upon this resource, we establish a comprehensive benchmark. We report results from representative baselines of contemporary image-to-image translation. We release the dataset and benchmark to catalyze research on safe, effective contrast synthesis, with direct relevance to multi-organ oncology imaging workflows. Our code and dataset are publicly available at https://github.com/YifanChen02/PMPBench.

ARXIV Cancer: brain tumor Method: foundation models

Sub-Region-Aware Modality Fusion and Adaptive Prompting for Multi-Modal Brain Tumor Segmentation

Shadi Alijani, Fereshteh Aghaee Meibodi, Homayoun Najjaran
Published 2026-01-22 08:03

This paper presents a novel framework for adapting foundation models to multi-modal medical imaging, specifically for brain tumor segmentation. The framework incorporates sub-region-aware modality attention and adaptive prompt engineering to enhance segmentation accuracy. Validation on the BraTS 2020 dataset shows significant improvements over baseline methods, particularly in challenging tumor sub-regions.

Read abstract

The successful adaptation of foundation models to multi-modal medical imaging is a critical yet unresolved challenge. Existing models often struggle to effectively fuse information from multiple sources and adapt to the heterogeneous nature of pathological tissues. To address this, we introduce a novel framework for adapting foundation models to multi-modal medical imaging, featuring two key technical innovations: sub-region-aware modality attention and adaptive prompt engineering. The attention mechanism enables the model to learn the optimal combination of modalities for each tumor sub-region, while the adaptive prompting strategy leverages the inherent capabilities of foundation models to refine segmentation accuracy. We validate our framework on the BraTS 2020 brain tumor segmentation dataset, demonstrating that our approach significantly outperforms baseline methods, particularly in the challenging necrotic core sub-region. Our work provides a principled and effective approach to multi-modal fusion and prompting, paving the way for more accurate and robust foundation model-based solutions in medical imaging.

ARXIV Cancer: skin cancer Method: convolutional neural network

A Machine Vision Approach to Preliminary Skin Lesion Assessments

Ali Khreis, Ro'Yah Radaideh, Quinn McGill
Published 2026-01-21 23:48

This study focuses on the early detection of malignant skin lesions to improve patient outcomes in aggressive skin cancers. It evaluates a system that combines the ABCD rule of dermoscopy with machine learning classification, using a subset of the HAM10000 dataset. The research highlights the effectiveness of a custom three-layer Convolutional Neural Network (CNN) which achieved a significant accuracy improvement over traditional methods. The findings suggest that direct pixel-level learning can capture diagnostic patterns more effectively than handcrafted features.

Read abstract

Early detection of malignant skin lesions is critical for improving patient outcomes in aggressive, metastatic skin cancers. This study evaluates a comprehensive system for preliminary skin lesion assessment that combines the clinically established ABCD rule of dermoscopy (analyzing Asymmetry, Borders, Color, and Dermoscopic Structures) with machine learning classification. Using a 1,000-image subset of the HAM10000 dataset, the system implements an automated, rule-based pipeline to compute a Total Dermoscopy Score (TDS) for each lesion. This handcrafted approach is compared against various machine learning solutions, including traditional classifiers (Logistic Regression, Random Forest, and SVM) and deep learning models. While the rule-based system provides high clinical interpretability, results indicate a performance bottleneck when reducing complex morphology to five numerical features. Experimental findings show that transfer learning with EfficientNet-B0 failed significantly due to domain shift between natural and medical images. In contrast, a custom three-layer Convolutional Neural Network (CNN) trained from scratch achieved 78.5% accuracy and 86.5% recall on median-filtered images, representing a 19-point accuracy improvement over traditional methods. The results demonstrate that direct pixel-level learning captures diagnostic patterns beyond handcrafted features and that purpose-built lightweight architectures can outperform large pretrained models for small, domain-specific medical datasets.

Find the papers that actually matter