Research Papers

ARXIV Cancer: unknown Method: mathematical modeling

A Predictive Model for Synergistic Oncolytic Virotherapy: Unveiling the Ping-Pong Mechanism and Optimal Timing of Combined Vesicular Stomatitis and Vaccinia Viruses

Joseph Malinzi, Amina Eladdadi, Rachid Ouifki, Raluca Eftimie, Anotida Madzvamuse, Helen M. Byrne
Published 2026-01-15 13:56

This study introduces a mathematical model to explore the synergistic effects of combining Vesicular Stomatitis Virus (VSV) and Vaccinia Virus (VV) in oncolytic virotherapy. The model elucidates a 'ping-pong' mechanism where VV enhances VSV replication by neutralizing interferon-$α$. Numerical simulations indicate that this combination can achieve complete tumor clearance in about 50 days, outperforming VV monotherapy. The research also identifies critical parameters for treatment efficacy and suggests an optimal timing strategy for administration.

Read abstract

We present a mathematical model that describes the synergistic mechanism of combined Vesicular Stomatitis Virus (VSV) and Vaccinia Virus (VV). The model captures the dynamic interplay between tumor cells, viral replication, and the interferon-mediated immune response, revealing a `ping-pong' synergy where VV-infected cells produce B18R protein that neutralizes interferon-$α$, thereby enhancing VSV replication within the tumor. Numerical simulations demonstrate that this combination achieves complete tumor clearance in approximately 50 days, representing an 11\% acceleration compared to VV monotherapy (56 days), while VSV alone fails to eradicate tumors. Through bifurcation analysis, we identify critical thresholds for viral burst size and B18R inhibition, while sensitivity analysis highlights infection rates and burst sizes as the most influential parameters for treatment efficacy. Temporal optimization reveals that therapeutic outcomes are maximized through immediate VSV administration followed by delayed VV injection within a 1-19 day window, offering a strategic approach to overcome the timing and dosing challenges inherent in OVT.

ARXIV Cancer: non-small cell lung cancer Method: multimodal deep learning

Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer

Filippo Ruffini, Camillo Maria Caruso, Claudia Tacconi, Lorenzo Nibid, Francesca Miccolis, Marta Lovino, Carlo Greco, Edy Ippolito, Michele Fiore, Alessio Cortellini, Bruno Beomonte Zobel, Giuseppe Perrone, Bruno Vincenzi, Claudio Marrocco, Alessandro Bria, Elisa Ficarra, Sara Ramella, Valerio Guarrasi, Paolo Soda
Published 2026-01-15 13:38

This study addresses the challenge of accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) by developing a missing-aware multimodal survival framework. The framework integrates various data types, including CT images, Whole-Slide Histopathology images, and clinical variables, to enhance overall survival modeling. The proposed method demonstrates resilience to missing data and outperforms traditional unimodal and fusion strategies, achieving a C-index of 73.30 with the optimal combination of modalities.

Read abstract

Accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) requires the integration of heterogeneous clinical, radiological, and histopathological information. While Multimodal Deep Learning (MDL) offers a promises for precision prognosis and survival prediction, its clinical applicability is severely limited by small cohort sizes and the presence of missing modalities, often forcing complete-case filtering or aggressive imputation. In this work, we present a missing-aware multimodal survival framework that integrates Computed Tomography (CT), Whole-Slide Histopathology (WSI) Images, and structured clinical variables for overall survival modeling in unresectable stage II-III NSCLC. By leveraging Foundation Models (FM) for modality-specific feature extraction and a missing-aware encoding strategy, the proposed approach enables intermediate multimodal fusion under naturally incomplete modality profiles. The proposed architecture is resilient to missing modalities by design, allowing the model to utilize all available data without being forced to drop patients during training or inference. Experimental results demonstrate that intermediate fusion consistently outperforms unimodal baselines as well as early and late fusion strategies, with the strongest performance achieved by the fusion of WSI and clinical modalities (73.30 C-index). Further analyses of modality importance reveal an adaptive behavior in which less informative modalities, i.e., CT modality, are automatically down-weighted and contribute less to the final survival prediction.

ARXIV Cancer: lung cancer Method: vector quantization

VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation

Sicheng Yang, Zhaohu Xing, Lei Zhu
Published 2026-01-15 07:09

This paper presents VQ-Seg, a novel approach for semi-supervised medical image segmentation that utilizes vector quantization to improve feature perturbation methods. The proposed Quantized Perturbation Module (QPM) replaces traditional dropout techniques, allowing for effective regularization without the need for manual tuning of hyperparameters. The method is evaluated on a large-scale lung cancer dataset, demonstrating superior performance compared to existing state-of-the-art techniques.

Read abstract

Consistency learning with feature perturbation is a widely used strategy in semi-supervised medical image segmentation. However, many existing perturbation methods rely on dropout, and thus require a careful manual tuning of the dropout rate, which is a sensitive hyperparameter and often difficult to optimize and may lead to suboptimal regularization. To overcome this limitation, we propose VQ-Seg, the first approach to employ vector quantization (VQ) to discretize the feature space and introduce a novel and controllable Quantized Perturbation Module (QPM) that replaces dropout. Our QPM perturbs discrete representations by shuffling the spatial locations of codebook indices, enabling effective and controllable regularization. To mitigate potential information loss caused by quantization, we design a dual-branch architecture where the post-quantization feature space is shared by both image reconstruction and segmentation tasks. Moreover, we introduce a Post-VQ Feature Adapter (PFA) to incorporate guidance from a foundation model (FM), supplementing the high-level semantic information lost during quantization. Furthermore, we collect a large-scale Lung Cancer (LC) dataset comprising 828 CT scans annotated for central-type lung carcinoma. Extensive experiments on the LC dataset and other public benchmarks demonstrate the effectiveness of our method, which outperforms state-of-the-art approaches. Code available at: https://github.com/script-Yang/VQ-Seg.

ARXIV Cancer: non-small cell lung cancer Method: multiple instance learning

ReaMIL: Reasoning- and Evidence-Aware Multiple Instance Learning for Whole-Slide Histopathology

Hyun Do Jung, Jungwon Choi, Hwiyoung Kim
Published 2026-01-15 04:55

This paper presents ReaMIL, a multiple instance learning approach designed for whole-slide histopathology. The method incorporates a selection head that produces soft per-tile gates and is trained with a budgeted-sufficiency objective to enhance evidence efficiency without compromising performance. The results demonstrate that ReaMIL achieves high AUC scores on various cancer datasets, indicating improved class confidence with a minimal number of selected tiles.

Read abstract

We introduce ReaMIL (Reasoning- and Evidence-Aware MIL), a multiple instance learning approach for whole-slide histopathology that adds a light selection head to a strong MIL backbone. The head produces soft per-tile gates and is trained with a budgeted-sufficiency objective: a hinge loss that enforces the true-class probability to be $\geq τ$ using only the kept evidence, under a sparsity budget on the number of selected tiles. The budgeted-sufficiency objective yields small, spatially compact evidence sets without sacrificing baseline performance. Across TCGA-NSCLC (LUAD vs. LUSC), TCGA-BRCA (IDC vs. Others), and PANDA, ReaMIL matches or slightly improves baseline AUC and provides quantitative evidence-efficiency diagnostics. On NSCLC, it attains AUC 0.983 with a mean minimal sufficient K (MSK) $\approx 8.2$ tiles at $τ= 0.90$ and AUKC $\approx 0.864$, showing that class confidence rises sharply and stabilizes once a small set of tiles is kept. The method requires no extra supervision, integrates seamlessly with standard MIL training, and naturally yields slide-level overlays. We report accuracy alongside MSK, AUKC, and contiguity for rigorous evaluation of model behavior on WSIs.

ARXIV Cancer: unknown Method: multimodal learning

MedVL-SAM2: A unified 3D medical vision-language model for multimodal reasoning and prompt-driven segmentation

Yang Xing, Jiong Wu, Savas Ozdemir, Ying Zhang, Yang Yang, Wei Shao, Kuang Gong
Published 2026-01-14 21:21

The paper presents MedVL-SAM2, a unified 3D medical vision-language model designed to enhance multimodal reasoning and prompt-driven segmentation in medical imaging. The model integrates image-level reasoning with pixel-level perception, enabling tasks such as report generation, visual question answering, and various forms of segmentation. It is trained on a large-scale dataset of 3D CT image-text pairs, achieving state-of-the-art performance across multiple tasks while demonstrating effective 3D visual grounding and cross-modal reasoning.

Read abstract

Recent progress in medical vision-language models (VLMs) has achieved strong performance on image-level text-centric tasks such as report generation and visual question answering (VQA). However, achieving fine-grained visual grounding and volumetric spatial reasoning in 3D medical VLMs remains challenging, particularly when aiming to unify these capabilities within a single, generalizable framework. To address this challenge, we proposed MedVL-SAM2, a unified 3D medical multimodal model that concurrently supports report generation, VQA, and multi-paradigm segmentation, including semantic, referring, and interactive segmentation. MedVL-SAM2 integrates image-level reasoning and pixel-level perception through a cohesive architecture tailored for 3D medical imaging, and incorporates a SAM2-based volumetric segmentation module to enable precise multi-granular spatial reasoning. The model is trained in a multi-stage pipeline: it is first pre-trained on a large-scale corpus of 3D CT image-text pairs to align volumetric visual features with radiology-language embeddings. It is then jointly optimized with both language-understanding and segmentation objectives using a comprehensive 3D CT segmentation dataset. This joint training enables flexible interaction via language, point, or box prompts, thereby unifying high-level visual reasoning with spatially precise localization. Our unified architecture delivers state-of-the-art performance across report generation, VQA, and multiple 3D segmentation tasks. Extensive analyses further show that the model provides reliable 3D visual grounding, controllable interactive segmentation, and robust cross-modal reasoning, demonstrating that high-level semantic reasoning and precise 3D localization can be jointly achieved within a unified 3D medical VLM.

ARXIV Cancer: osteosarcoma Method: deep learning

Radiomics-Integrated Deep Learning with Hierarchical Loss for Osteosarcoma Histology Classification

Yaxi Chen, Zi Ye, Shaheer U. Saeed, Oliver Yu, Simin Ni, Jie Huang, Yipeng Hu
Published 2026-01-14 12:09

This study focuses on improving the histological classification of osteosarcoma by integrating radiomic features into a deep learning model. The authors propose a hierarchical loss approach for optimizing binary classification tasks, which enhances model performance and interpretability. Experimental results demonstrate significant improvements in classification accuracy on the TCIA OS Tumor Assessment dataset, establishing a new state-of-the-art performance for this application.

Read abstract

Osteosarcoma (OS) is an aggressive primary bone malignancy. Accurate histopathological assessment of viable versus non-viable tumor regions after neoadjuvant chemotherapy is critical for prognosis and treatment planning, yet manual evaluation remains labor-intensive, subjective, and prone to inter-observer variability. Recent advances in digital pathology have enabled automated necrosis quantification. Evaluating on test data, independently sampled on patient-level, revealed that the deep learning model performance dropped significantly from the tile-level generalization ability reported in previous studies. First, this work proposes the use of radiomic features as additional input in model training. We show that, despite that they are derived from the images, such a multimodal input effectively improved the classification performance, in addition to its added benefits in interpretability. Second, this work proposes to optimize two binary classification tasks with hierarchical classes (i.e. tumor-vs-non-tumor and viable-vs-non-viable), as opposed to the alternative ``flat'' three-class classification task (i.e. non-tumor, non-viable tumor, viable tumor), thereby enabling a hierarchical loss. We show that such a hierarchical loss, with trainable weightings between the two tasks, the per-class performance can be improved significantly. Using the TCIA OS Tumor Assessment dataset, we experimentally demonstrate the benefits from each of the proposed new approaches and their combination, setting a what we consider new state-of-the-art performance on this open dataset for this application. Code and trained models: https://github.com/YaxiiC/RadiomicsOS.git.

ARXIV Cancer: gastrointestinal cancer Method: knowledge distillation

Pairing-free Group-level Knowledge Distillation for Robust Gastrointestinal Lesion Classification in White-Light Endoscopy

Qiang Hu, Qimei Wang, Yingjie Guo, Qiang Li, Zhiwei Wang
Published 2026-01-14 06:24

This paper presents a novel framework called PaGKD for enhancing gastrointestinal lesion classification in white-light endoscopy by utilizing unpaired Narrow-Band Imaging (NBI) and white-light imaging (WLI) data. The method introduces a Pairing-free Group-level Knowledge Distillation approach that operates at the group level to distill knowledge across modalities without requiring paired images. Experimental results on four clinical datasets show that PaGKD significantly outperforms existing methods, achieving notable improvements in diagnostic performance.

Read abstract

White-Light Imaging (WLI) is the standard for endoscopic cancer screening, but Narrow-Band Imaging (NBI) offers superior diagnostic details. A key challenge is transferring knowledge from NBI to enhance WLI-only models, yet existing methods are critically hampered by their reliance on paired NBI-WLI images of the same lesion, a costly and often impractical requirement that leaves vast amounts of clinical data untapped. In this paper, we break this paradigm by introducing PaGKD, a novel Pairing-free Group-level Knowledge Distillation framework that that enables effective cross-modal learning using unpaired WLI and NBI data. Instead of forcing alignment between individual, often semantically mismatched image instances, PaGKD operates at the group level to distill more complete and compatible knowledge across modalities. Central to PaGKD are two complementary modules: (1) Group-level Prototype Distillation (GKD-Pro) distills compact group representations by extracting modality-invariant semantic prototypes via shared lesion-aware queries; (2) Group-level Dense Distillation (GKD-Den) performs dense cross-modal alignment by guiding group-aware attention with activation-derived relation maps. Together, these modules enforce global semantic consistency and local structural coherence without requiring image-level correspondence. Extensive experiments on four clinical datasets demonstrate that PaGKD consistently and significantly outperforms state-of-the-art methods, achieving relative AUC improvements of 3.3%, 1.1%, 2.8%, and 3.2%, respectively, establishing a new direction for cross-modal learning from unpaired data.

ARXIV Cancer: colorectal cancer Method: equivariant convolutional neural network

Equi-ViT: Rotational Equivariant Vision Transformer for Robust Histopathology Analysis

Fuyao Chen, Yuexi Du, Elèonore V. Lieffrig, Nicha C. Dvornek, John A. Onofrey
Published 2026-01-14 04:03

This paper presents Equi-ViT, a novel Vision Transformer designed to improve histopathology analysis by incorporating rotational equivariance into its architecture. The method enhances the model's ability to handle variations in image orientation, which is critical in histopathology. Results on a colorectal cancer dataset indicate that Equi-ViT provides better data efficiency and robustness compared to standard ViTs.

Read abstract

Vision Transformers (ViTs) have gained rapid adoption in computational pathology for their ability to model long-range dependencies through self-attention, addressing the limitations of convolutional neural networks that excel at local pattern capture but struggle with global contextual reasoning. Recent pathology-specific foundation models have further advanced performance by leveraging large-scale pretraining. However, standard ViTs remain inherently non-equivariant to transformations such as rotations and reflections, which are ubiquitous variations in histopathology imaging. To address this limitation, we propose Equi-ViT, which integrates an equivariant convolution kernel into the patch embedding stage of a ViT architecture, imparting built-in rotational equivariance to learned representations. Equi-ViT achieves superior rotation-consistent patch embeddings and stable classification performance across image orientations. Our results on a public colorectal cancer dataset demonstrate that incorporating equivariant patch embedding enhances data efficiency and robustness, suggesting that equivariant transformers could potentially serve as more generalizable backbones for the application of ViT in histopathology, such as digital pathology foundation models.

ARXIV Cancer: brain cancer Method: wavelet diffusion model

POWDR: Pathology-preserving Outpainting with Wavelet Diffusion for 3D MRI

Fei Tan, Ashok Vardhan Addala, Bruno Astuto Arouche Nunes, Xucheng Zhu, Ravi Soni
Published 2026-01-14 00:20

The paper presents POWDR, a pathology-preserving outpainting framework designed to enhance 3D MRI datasets by addressing class imbalance and limited availability of pathology-rich cases. The method utilizes a conditioned wavelet diffusion model to generate anatomically plausible tissue while retaining real pathological regions. Evaluation on brain MRI datasets demonstrates improved tumor segmentation performance and confirms the method's effectiveness in generating diverse synthetic data for robust model development.

Read abstract

Medical imaging datasets often suffer from class imbalance and limited availability of pathology-rich cases, which constrains the performance of machine learning models for segmentation, classification, and vision-language tasks. To address this challenge, we propose POWDR, a pathology-preserving outpainting framework for 3D MRI based on a conditioned wavelet diffusion model. Unlike conventional augmentation or unconditional synthesis, POWDR retains real pathological regions while generating anatomically plausible surrounding tissue, enabling diversity without fabricating lesions. Our approach leverages wavelet-domain conditioning to enhance high-frequency detail and mitigate blurring common in latent diffusion models. We introduce a random connected mask training strategy to overcome conditioning-induced collapse and improve diversity outside the lesion. POWDR is evaluated on brain MRI using BraTS datasets and extended to knee MRI to demonstrate tissue-agnostic applicability. Quantitative metrics (FID, SSIM, LPIPS) confirm image realism, while diversity analysis shows significant improvement with random-mask training (cosine similarity reduced from 0.9947 to 0.9580; KL divergence increased from 0.00026 to 0.01494). Clinically relevant assessments reveal gains in tumor segmentation performance using nnU-Net, with Dice scores improving from 0.6992 to 0.7137 when adding 50 synthetic cases. Tissue volume analysis indicates no significant differences for CSF and GM compared to real images. These findings highlight POWDR as a practical solution for addressing data scarcity and class imbalance in medical imaging. The method is extensible to multiple anatomies and offers a controllable framework for generating diverse, pathology-preserving synthetic data to support robust model development.

ARXIV Cancer: brain tumor Method: Bayesian U-Net

Variance-Penalized MC-Dropout as a Learned Smoothing Prior for Brain Tumour Segmentation

Satyaki Roy Chowdhury, Golrokh Mirzaei
Published 2026-01-13 19:50

This study presents UAMSA-UNet, an Uncertainty-Aware Multi-Scale Attention-based Bayesian U-Net designed for brain tumor segmentation. The method utilizes Monte Carlo Dropout to learn a data-driven smoothing prior, enhancing segmentation quality by reducing noise in tumor boundaries. Results indicate significant improvements in Dice Similarity Coefficient and mean IoU on the BraTS2023 and BraTS2024 datasets, while also achieving greater computational efficiency compared to existing models.

Read abstract

Brain tumor segmentation is essential for diagnosis and treatment planning, yet many CNN and U-Net based approaches produce noisy boundaries in regions of tumor infiltration. We introduce UAMSA-UNet, an Uncertainty-Aware Multi-Scale Attention-based Bayesian U-Net that in- stead leverages Monte Carlo Dropout to learn a data-driven smoothing prior over its predictions, while fusing multi-scale features and attention maps to capture both fine details and global context. Our smoothing-regularized loss augments binary cross-entropy with a variance penalty across stochas- tic forward passes, discouraging spurious fluctuations and yielding spatially coherent masks. On BraTS2023, UAMSA- UNet improves Dice Similarity Coefficient by up to 3.3% and mean IoU by up to 2.7% over U-Net; on BraTS2024, it delivers up to 4.5% Dice and 4.0% IoU gains over the best baseline. Remarkably, it also reduces FLOPs by 42.5% rel- ative to U-Net++ while maintaining higher accuracy. These results demonstrate that, by combining multi-scale attention with a learned smoothing prior, UAMSA-UNet achieves both better segmentation quality and computational efficiency, and provides a flexible foundation for future integration with transformer-based modules for further enhanced segmenta- tion results.

Find the papers that actually matter