Research Papers

ARXIV Cancer: colorectal cancer Method: gated progressive fusion network

GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification

Suncheng Xiang, Xiaoyang Wang, Junjie Jiang, Hejia Wang, Dahong Qian
Published 2025-12-25 02:40

This paper presents GPF-Net, a Gated Progressive Fusion network designed for the task of colonoscopic polyp re-identification. The method addresses the challenge of matching polyps from different views and cameras, which is crucial for colorectal cancer diagnosis. By employing a gated progressive fusion strategy, the architecture enhances feature interactions and improves the identification of small objects. Experimental results demonstrate the effectiveness of this approach compared to existing unimodal models.

Read abstract

Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras, which plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, the coarse resolution of high-level features of a specific polyp often leads to inferior results for small objects where detailed information is important. To address this challenge, we propose a novel architecture, named Gated Progressive Fusion network, to selectively fuse features from multiple levels using gates in a fully connected way for polyp ReID. On the basis of it, a gated progressive fusion strategy is introduced to achieve layer-wise refinement of semantic information through multi-level feature interactions. Experiments on standard benchmarks show the benefits of the multimodal setting over state-of-the-art unimodal ReID models, especially when combined with the specialized multimodal fusion strategy.

ARXIV Cancer: general cancer Method: vision-language model

A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding

Christina Liu, Alan Q. Wang, Joy Hsu, Jiajun Wu, Ehsan Adeli
Published 2025-12-24 20:30

The paper presents the Tool Bottleneck Framework (TBF) aimed at enhancing medical image understanding through the use of vision-language models (VLMs). TBF utilizes a learned Tool Bottleneck Model (TBM) to select and compose tools that extract clinically relevant features from images, improving prediction accuracy and interpretability. The framework is evaluated on histopathology and dermatology tasks, demonstrating performance comparable to or better than existing deep learning classifiers and tool-use frameworks, especially in data-limited scenarios.

Read abstract

Recent tool-use frameworks powered by vision-language models (VLMs) improve image understanding by grounding model predictions with specialized tools. Broadly, these frameworks leverage VLMs and a pre-specified toolbox to decompose the prediction task into multiple tool calls (often deep learning models) which are composed to make a prediction. The dominant approach to composing tools is using text, via function calls embedded in VLM-generated code or natural language. However, these methods often perform poorly on medical image understanding, where salient information is encoded as spatially-localized features that are difficult to compose or fuse via text alone. To address this, we propose a tool-use framework for medical image understanding called the Tool Bottleneck Framework (TBF), which composes VLM-selected tools using a learned Tool Bottleneck Model (TBM). For a given image and task, TBF leverages an off-the-shelf medical VLM to select tools from a toolbox that each extract clinically-relevant features. Instead of text-based composition, these tools are composed by the TBM, which computes and fuses the tool outputs using a neural network before outputting the final prediction. We propose a simple and effective strategy for TBMs to make predictions with any arbitrary VLM tool selection. Overall, our framework not only improves tool-use in medical imaging contexts, but also yields more interpretable, clinically-grounded predictors. We evaluate TBF on tasks in histopathology and dermatology and find that these advantages enable our framework to perform on par with or better than deep learning-based classifiers, VLMs, and state-of-the-art tool-use frameworks, with particular gains in data-limited regimes. Our code is available at https://github.com/christinaliu2020/tool-bottleneck-framework.

ARXIV Cancer: general cancer Method: transformer

TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning

Varun Belagali, Saarthak Kapse, Pierre Marza, Srijan Das, Zilinghan Li, Sofiène Boutaj, Pushpak Pati, Srikar Yellapragada, Tarak Nath Nandi, Ravi K Madduri, Joel Saltz, Prateek Prasanna, Stergios Christodoulidis, Maria Vakalopoulou, Dimitris Samaras
Published 2025-12-24 18:58

The paper presents TICON, a transformer-based model designed to enhance the contextualization of tile representations in histopathology. By addressing the limitations of standard tile encoder pipelines, TICON produces rich embeddings that incorporate both local and global slide-level information. The model is pretrained using a masked modeling objective and demonstrates significant performance improvements across various computational pathology tasks, achieving state-of-the-art results on multiple benchmarks.

Read abstract

The interpretation of small tiles in large whole slide images (WSI) often needs a larger image context. We introduce TICON, a transformer-based tile representation contextualizer that produces rich, contextualized embeddings for ''any'' application in computational pathology. Standard tile encoder-based pipelines, which extract embeddings of tiles stripped from their context, fail to model the rich slide-level information essential for both local and global tasks. Furthermore, different tile-encoders excel at different downstream tasks. Therefore, a unified model is needed to contextualize embeddings derived from ''any'' tile-level foundation model. TICON addresses this need with a single, shared encoder, pretrained using a masked modeling objective to simultaneously unify and contextualize representations from diverse tile-level pathology foundation models. Our experiments demonstrate that TICON-contextualized embeddings significantly improve performance across many different tasks, establishing new state-of-the-art results on tile-level benchmarks (i.e., HEST-Bench, THUNDER, CATCH) and slide-level benchmarks (i.e., Patho-Bench). Finally, we pretrain an aggregator on TICON to form a slide-level foundation model, using only 11K WSIs, outperforming SoTA slide-level foundation models pretrained with up to 350K WSIs.

ARXIV Cancer: acute myeloid leukemia Method: metaheuristic algorithm

Transcriptome-Conditioned Personalized De Novo Drug Generation for AML Using Metaheuristic Assembly and Target-Driven Filtering

Abdullah G. Elafifi, Basma Mamdouh, Mariam Hanafy, Muhammed Alaa Eldin, Yosef Khaled, Nesma Mohamed El-Gelany, Tarek H. M. Abou-El-Enien
Published 2025-12-24 17:39

This study addresses the challenges of Acute Myeloid Leukemia (AML) by developing a computational framework that integrates patient-specific transcriptomics with drug discovery. The framework employs Weighted Gene Co-expression Network Analysis to identify key biomarkers and utilizes a novel metaheuristic algorithm for assembling drug-like ligands. The results indicate the potential for generating personalized drug candidates with favorable binding properties, contributing to precision oncology for AML.

Read abstract

Acute Myeloid Leukemia (AML) remains a clinical challenge due to its extreme molecular heterogeneity and high relapse rates. While precision medicine has introduced mutation-specific therapies, many patients still lack effective, personalized options. This paper presents a novel, end-to-end computational framework that bridges the gap between patient-specific transcriptomics and de novo drug discovery. By analyzing bulk RNA sequencing data from the TCGA-LAML cohort, the study utilized Weighted Gene Co-expression Network Analysis (WGCNA) to prioritize 20 high-value biomarkers, including metabolic transporters like HK3 and immune-modulatory receptors such as SIGLEC9. The physical structures of these targets were modeled using AlphaFold3, and druggable hotspots were quantitatively mapped via the DOGSiteScorer engine. Then developed a novel, reaction-first evolutionary metaheuristic algorithm as well as multi-objective optimization programming that assembles novel ligands from fragment libraries, guided by spatial alignment to these identified hotspots. The generative model produced structurally unique chemical entities with a strong bias toward drug-like space, as evidenced by QED scores peaking between 0.5 and 0.7. Validation through ADMET profiling and SwissDock molecular docking identified high-confidence candidates, such as Ligand L1, which achieved a binding free energy of -6.571 kcal/mol against the A08A96 biomarker. These results demonstrate that integrating systems biology with metaheuristic molecular assembly can produce pharmacologically viable, patient tailored leads, offering a scalable blueprint for precision oncology in AML and beyond

ARXIV Cancer: colorectal cancer Method: graph neural network

INSIGHT: Spatially resolved survival modelling from routine histology crosslinked with molecular profiling reveals prognostic epithelial-immune axes in stage II/III colorectal cancer

Piotr Keller, Mark Eastwood, Zedong Hu, Aimée Selten, Ruqayya Awan, Gertjan Rasschaert, Sara Verbandt, Vlad Popovici, Hubert Piessevaux, Hayley T Morris, Petros Tsantoulis, Thomas Alexander McKee, André D'Hoore, Cédric Schraepen, Xavier Sagaert, Gert De Hertogh, Sabine Tejpar, Fayyaz Minhas
Published 2025-12-24 14:36

The study introduces INSIGHT, a graph neural network designed to predict survival outcomes from routine histology images in stage II/III colorectal cancer. It was trained on datasets from TCGA and SURGEN, demonstrating superior prognostic performance compared to traditional pTNM staging. The model generates spatially resolved risk scores and identifies key histopathological features associated with patient outcomes.

Read abstract

Routine histology contains rich prognostic information in stage II/III colorectal cancer, much of which is embedded in complex spatial tissue organisation. We present INSIGHT, a graph neural network that predicts survival directly from routine histology images. Trained and cross-validated on TCGA (n=342) and SURGEN (n=336), INSIGHT produces patient-level spatially resolved risk scores. Large independent validation showed superior prognostic performance compared with pTNM staging (C-index 0.68-0.69 vs 0.44-0.58). INSIGHT spatial risk maps recapitulated canonical prognostic histopathology and identified nuclear solidity and circularity as quantitative risk correlates. Integrating spatial risk with data-driven spatial transcriptomic signatures, spatial proteomics, bulk RNA-seq, and single-cell references revealed an epithelium-immune risk manifold capturing epithelial dedifferentiation and fetal programs, myeloid-driven stromal states including $\mathrm{SPP1}^{+}$ macrophages and $\mathrm{LAMP3}^{+}$ dendritic cells, and adaptive immune dysfunction. This analysis exposed patient-specific epithelial heterogeneity, stratification within MSI-High tumours, and high-risk routes of CDX2/HNF4A loss and CEACAM5/6-associated proliferative programs, highlighting coordinated therapeutic vulnerabilities.

ARXIV Cancer: unknown Method: dual-stream vision transformer

A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI

Md Assaduzzaman, Nushrat Jahan Oyshi, Eram Mahamud
Published 2025-12-24 07:51

This study introduces a hybrid dual-stream deep learning framework for the classification of gastrointestinal diseases using endoscopic and histopathological imagery. The framework employs a teacher-student knowledge distillation approach, integrating a Swin Transformer and a compact Tiny-ViT model to enhance diagnostic accuracy while maintaining efficiency. The proposed method achieved high accuracy and interpretability, demonstrating its potential for practical clinical applications in gastrointestinal diagnostics.

Read abstract

The accurate classification of gastrointestinal diseases from endoscopic and histopathological imagery remains a significant challenge in medical diagnostics, mainly due to the vast data volume and subtle variation in inter-class visuals. This study presents a hybrid dual-stream deep learning framework built on teacher-student knowledge distillation, where a high-capacity teacher model integrates the global contextual reasoning of a Swin Transformer with the local fine-grained feature extraction of a Vision Transformer. The student network was implemented as a compact Tiny-ViT structure that inherits the teacher's semantic and morphological knowledge via soft-label distillation, achieving a balance between efficiency and diagnostic accuracy. Two carefully curated Wireless Capsule Endoscopy datasets, encompassing major GI disease classes, were employed to ensure balanced representation and prevent inter-sample bias. The proposed framework achieved remarkable performance with accuracies of 0.9978 and 0.9928 on Dataset 1 and Dataset 2 respectively, and an average AUC of 1.0000, signifying near-perfect discriminative capability. Interpretability analyses using Grad-CAM, LIME, and Score-CAM confirmed that the model's predictions were grounded in clinically significant tissue regions and pathologically relevant morphological cues, validating the framework's transparency and reliability. The Tiny-ViT demonstrated diagnostic performance with reduced computational complexity comparable to its transformer-based teacher while delivering faster inference, making it suitable for resource-constrained clinical environments. Overall, the proposed framework provides a robust, interpretable, and scalable solution for AI-assisted GI disease diagnosis, paving the way toward future intelligent endoscopic screening that is compatible with clinical practicality.

ARXIV Cancer: lung cancer Method: Dual-Graph Spatiotemporal Attention Network

DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction

Xiao Yu, Zhaojie Fang, Guanyu Zhou, Yin Shen, Huoling Luo, Ye Li, Ahmed Elazab, Xiang Wan, Ruiquan Ge, Changmiao Wang
Published 2025-12-24 02:47

This study presents the Dual-Graph Spatiotemporal Attention Network (DGSAN) aimed at improving the prediction of pulmonary nodule malignancy in lung cancer. The method integrates temporal variations and multimodal data through a Global-Local Feature Encoder and a Hierarchical Cross-Modal Graph Fusion Module. Experimental results indicate that DGSAN significantly outperforms existing state-of-the-art methods in terms of classification accuracy and computational efficiency.

Read abstract

Lung cancer continues to be the leading cause of cancer-related deaths globally. Early detection and diagnosis of pulmonary nodules are essential for improving patient survival rates. Although previous research has integrated multimodal and multi-temporal information, outperforming single modality and single time point, the fusion methods are limited to inefficient vector concatenation and simple mutual attention, highlighting the need for more effective multimodal information fusion. To address these challenges, we introduce a Dual-Graph Spatiotemporal Attention Network, which leverages temporal variations and multimodal data to enhance the accuracy of predictions. Our methodology involves developing a Global-Local Feature Encoder to better capture the local, global, and fused characteristics of pulmonary nodules. Additionally, a Dual-Graph Construction method organizes multimodal features into inter-modal and intra-modal graphs. Furthermore, a Hierarchical Cross-Modal Graph Fusion Module is introduced to refine feature integration. We also compiled a novel multimodal dataset named the NLST-cmst dataset as a comprehensive source of support for related research. Our extensive experiments, conducted on both the NLST-cmst and curated CSTL-derived datasets, demonstrate that our DGSAN significantly outperforms state-of-the-art methods in classifying pulmonary nodules with exceptional computational efficiency.

ARXIV Cancer: breast cancer Method: multimodal mixed-supervision

NULLBUS: Multimodal Mixed-Supervision for Breast Ultrasound Segmentation via Nullable Global-Local Prompts

Raja Mallina, Bryar Shareef
Published 2025-12-23 21:30

This paper presents NullBUS, a multimodal mixed-supervision framework designed for breast ultrasound segmentation. The method utilizes nullable prompts to effectively learn from images with and without accompanying metadata, enhancing robustness in scenarios where data is incomplete. Evaluations on three public breast ultrasound datasets show that NullBUS achieves state-of-the-art performance, with a mean IoU of 0.8568 and a mean Dice of 0.9103.

Read abstract

Breast ultrasound (BUS) segmentation provides lesion boundaries essential for computer-aided diagnosis and treatment planning. While promptable methods can improve segmentation performance and tumor delineation when text or spatial prompts are available, many public BUS datasets lack reliable metadata or reports, constraining training to small multimodal subsets and reducing robustness. We propose NullBUS, a multimodal mixed-supervision framework that learns from images with and without prompts in a single model. To handle missing text, we introduce nullable prompts, implemented as learnable null embeddings with presence masks, enabling fallback to image-only evidence when metadata are absent and the use of text when present. Evaluated on a unified pool of three public BUS datasets, NullBUS achieves a mean IoU of 0.8568 and a mean Dice of 0.9103, demonstrating state-of-the-art performance under mixed prompt availability.

ARXIV Cancer: lung cancer Method: deep learning

Fairness Evaluation of Risk Estimation Models for Lung Cancer Screening

Shaurya Gaur, Michel Vitale, Alessa Hering, Johan Kwisthout, Colin Jacobs, Lena Philipp, Fennie van der Graaf
Published 2025-12-23 19:57

This study evaluates the fairness of two deep learning models for lung cancer risk estimation in screening high-risk individuals using low-dose CT scans. The models were assessed for performance disparities across demographic groups, revealing significant differences in AUROC and sensitivity between genders and racial groups. The findings underscore the need for improved monitoring of model performance to address potential biases in lung cancer screening.

Read abstract

Lung cancer is the leading cause of cancer-related mortality in adults worldwide. Screening high-risk individuals with annual low-dose CT (LDCT) can support earlier detection and reduce deaths, but widespread implementation may strain the already limited radiology workforce. AI models have shown potential in estimating lung cancer risk from LDCT scans. However, high-risk populations for lung cancer are diverse, and these models' performance across demographic groups remains an open question. In this study, we drew on the considerations on confounding factors and ethically significant biases outlined in the JustEFAB framework to evaluate potential performance disparities and fairness in two deep learning risk estimation models for lung cancer screening: the Sybil lung cancer risk model and the Venkadesh21 nodule risk estimator. We also examined disparities in the PanCan2b logistic regression model recommended in the British Thoracic Society nodule management guideline. Both deep learning models were trained on data from the US-based National Lung Screening Trial (NLST), and assessed on a held-out NLST validation set. We evaluated AUROC, sensitivity, and specificity across demographic subgroups, and explored potential confounding from clinical risk factors. We observed a statistically significant AUROC difference in Sybil's performance between women (0.88, 95% CI: 0.86, 0.90) and men (0.81, 95% CI: 0.78, 0.84, p < .001). At 90% specificity, Venkadesh21 showed lower sensitivity for Black (0.39, 95% CI: 0.23, 0.59) than White participants (0.69, 95% CI: 0.65, 0.73). These differences were not explained by available clinical confounders and thus may be classified as unfair biases according to JustEFAB. Our findings highlight the importance of improving and monitoring model performance across underrepresented subgroups, and further research on algorithmic fairness, in lung cancer screening.

ARXIV Cancer: general cancer Method: federated learning

FedPOD: the deployable units of training for federated learning

Daewoon Kim, Si Young Yie, Jae Sung Lee
Published 2025-12-23 18:57

This paper introduces FedPOD, a novel approach that ranked first in the 2024 Federated Tumor Segmentation Challenge, aimed at optimizing learning efficiency and communication costs in federated learning. The method enhances training efficiency by defining a round-wise task and incorporates a PID controller to model data distribution, reducing communication costs even with skewed data. FedPOD also addresses limitations of previous methods by including outlier participants and eliminating dependency on prior learning rounds, demonstrating comparable performance metrics to existing methods.

Read abstract

This paper proposes FedPOD, which ranked first in the 2024 Federated Tumor Segmentation (FeTS) Challenge, for optimizing learning efficiency and communication cost in federated learning among multiple clients. Inspired by FedPIDAvg, we define a round-wise task for FedPOD to enhance training efficiency. FedPIDAvg achieved performance improvement by incorporating the training loss reduction for prediction entropy as weights using differential terms. Furthermore, by modeling data distribution with a Poisson distribution and using a PID controller, it reduced communication costs even in skewed data distribution. However, excluding participants classified as outliers based on the Poisson distribution can limit data utilization. Additionally, PID controller requires the same participants to be maintained throughout the federated learning process as it uses previous rounds' learning information in the current round. In our approach, FedPOD addresses these issues by including participants excluded as outliers, eliminating dependency on previous rounds' learning information, and applying a method for calculating validation loss at each round. In this challenge, FedPOD presents comparable performance to FedPIDAvg in metrics of Dice score, 0.78, 0.71 and 0.72 for WT, ET and TC in average, and projected convergence score, 0.74 in average. Furthermore, the concept of FedPOD draws inspiration from Kubernetes' smallest computing unit, POD, designed to be compatible with Kubernetes auto-scaling. Extending round-wise tasks of FedPOD to POD units allows flexible design by applying scale-out similar to Kubernetes' auto-scaling. This work demonstrated the potentials of FedPOD to enhance federated learning by improving efficiency, flexibility, and performance in metrics.

Find the papers that actually matter