Research Papers

ARXIV Cancer: unknown Method: convolutional neural network

CAP-IQA: Context-Aware Prompt-Guided CT Image Quality Assessment

Kazi Ramisa Rifa, Jie Zhang, Abdullah Imran
Published 2026-01-04 17:30

The paper presents the Context-Aware Prompt-guided Image Quality Assessment (CAP-IQA) framework, which aims to improve CT image quality assessment by integrating text-level priors with instance-level context prompts. This method employs a CNN-based visual encoder alongside a domain-specific text encoder to evaluate diagnostic visibility and anatomical clarity in abdominal CT images. The CAP-IQA framework demonstrates superior performance on the 2023 LDCTIQA challenge benchmark, achieving a correlation score that surpasses the leading team. Additionally, the model shows generalizability in assessing image quality across a large dataset of pediatric CT images.

Read abstract

Prompt-based methods, which encode medical priors through descriptive text, have been only minimally explored for CT Image Quality Assessment (IQA). While such prompts can embed prior knowledge about diagnostic quality, they often introduce bias by reflecting idealized definitions that may not hold under real-world degradations such as noise, motion artifacts, or scanner variability. To address this, we propose the Context-Aware Prompt-guided Image Quality Assessment (CAP-IQA) framework, which integrates text-level priors with instance-level context prompts and applies causal debiasing to separate idealized knowledge from factual, image-specific degradations. Our framework combines a CNN-based visual encoder with a domain-specific text encoder to assess diagnostic visibility, anatomical clarity, and noise perception in abdominal CT images. The model leverages radiology-style prompts and context-aware fusion to align semantic and perceptual representations. On the 2023 LDCTIQA challenge benchmark, CAP-IQA achieves an overall correlation score of 2.8590 (sum of PLCC, SROCC, and KROCC), surpassing the top-ranked leaderboard team (2.7427) by 4.24%. Moreover, our comprehensive ablation experiments confirm that prompt-guided fusion and the simplified encoder-only design jointly enhance feature alignment and interpretability. Furthermore, evaluation on an in-house dataset of 91,514 pediatric CT images demonstrates the true generalizability of CAP-IQA in assessing perceptual fidelity in a different patient population.

ARXIV Cancer: general cancer Method: hypergraph diffusion

HyperNetWalk: A Unified Framework for Personalized and Population-Level Cancer Driver Gene Identification via Multi-Network Hypergraph Diffusion

Xueqing Xu, Yonghang Gao, Duanchen Sun, Ling-Yun Wu
Published 2026-01-04 02:49

The paper presents HyperNetWalk, a novel computational framework designed to identify cancer driver genes by integrating multiple biological networks and hypergraph diffusion. This method captures both personalized and cohort-level information through random walks on patient-specific subnetworks and refines predictions using hypergraph-based approaches. Evaluation across 12 TCGA cancer types shows that HyperNetWalk outperforms existing methods in identifying known driver genes and reveals cancer type-specific drivers, contributing to precision oncology.

Read abstract

Identifying cancer driver genes is crucial for understanding tumor biology and developing precision therapies. However, existing computational methods often rely on single biological networks or population-level mutation patterns, limiting their ability to identify patient-specific drivers and leverage the complementary information from multiple network types. Here, we present HyperNetWalk, a novel computational framework that integrates multiple biological networks and hypergraph diffusion to identify driver genes at both personalized and cohort levels. In the first stage, HyperNetWalk integrates protein-protein interaction networks, gene regulatory networks, and dynamic co-expression networks through sample-independent random walks on patient-specific subnetworks to capture topological importance and expression perturbation effects. In the second stage, it refines predictions through hypergraph-based random walks that leverage cross-sample information while preserving individual mutational contexts. Comprehensive evaluation on 12 TCGA cancer types demonstrates that HyperNetWalk achieves superior or competitive performance compared to state-of-the-art methods in both personalized and cohort-level predictions. Notably, HyperNetWalk successfully identifies known driver genes with high precision while revealing cancer type-specific drivers that reflect distinct biological mechanisms. Our framework provides a unified solution for personalized and population-based driver gene identification, offering valuable insights for precision oncology and therapeutic target discovery.

ARXIV Cancer: brain tumor Method: spectral-selective token mixer

S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss

Md. Sanaullah Chowdhury Lameya Sabrin
Published 2026-01-03 21:03

This paper presents S2M-Net, a novel architecture for medical image segmentation that addresses the challenges of local precision, global context, and computational efficiency. The method incorporates a Spectral-Selective Token Mixer and a Morphology-Aware Adaptive Segmentation Loss to enhance performance while reducing the number of parameters. Evaluation across 16 medical imaging datasets shows S2M-Net achieving state-of-the-art results in polyp segmentation, surgical instrument detection, and brain tumor segmentation.

Read abstract

Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide local precision at $\mathcal{O}(n)$ cost but limited receptive fields, vision transformers achieve global context through $\mathcal{O}(n^2)$ self-attention at prohibitive computational expense, causing overfitting on small clinical datasets. We propose S2M-Net, a 4.7M-parameter architecture that achieves $\mathcal{O}(HW \log HW)$ global context through two synergistic innovations: (i) Spectral-Selective Token Mixer (SSTM), which exploits the spectral concentration of medical images via truncated 2D FFT with learnable frequency filtering and content-gated spatial projection, avoiding quadratic attention cost while maintaining global receptive fields; and (ii) Morphology-Aware Adaptive Segmentation Loss (MASL), which automatically analyzes structure characteristics (compactness, tubularity, irregularity, scale) to modulate five complementary loss components through constrained learnable weights, eliminating manual per-dataset tuning. Comprehensive evaluation in 16 medical imaging datasets that span 8 modalities demonstrates state-of-the-art performance: 96.12\% Dice on polyp segmentation, 83.77\% on surgical instruments (+17.85\% over the prior art) and 80.90\% on brain tumors, with consistent 3-18\% improvements over specialized baselines while using 3.5--6$\times$ fewer parameters than transformer-based methods.

ARXIV Cancer: unknown Method: deep learning

Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance

Ifeanyi Ezuma, Ugochukwu Ugwu
Published 2026-01-03 03:33

This study investigates the classification performance of machine learning and deep learning models on the LC25000 dataset, which consists of histopathological images. The fine-tuned InceptionResNet-v2 network was utilized for both classification and feature extraction, achieving a classification accuracy of 96.01% and an average AUC of 96.8%. The results indicate that models leveraging deep features significantly outperformed those using only the pre-trained network, with the Neural Network model achieving an AUC of 99.99%. Additionally, the study assessed model robustness under varying signal-to-noise ratio conditions.

Read abstract

The era of digital pathology has advanced histopathological examinations, making automated image analysis essential in clinical practice. This study evaluates the classification performance of machine learning and deep learning models on the LC25000 dataset, which includes five classes of histopathological images. We used the fine-tuned InceptionResNet-v2 network both as a classifier and for feature extraction. Our results show that the fine-tuned InceptionResNet-v2 achieved a classification accuracy of 96.01\% and an average AUC of 96.8\%. Models trained on deep features from InceptionResNet-v2 outperformed those using only the pre-trained network, with the Neural Network model achieving an AUC of 99.99\% and accuracy of 99.84\%. Evaluating model robustness under varying SNR conditions revealed that models using deep features exhibited greater resilience, particularly GBM and KNN. The combination of HOG and deep features showed enhanced performance, however, less so in noisy environments.

ARXIV Cancer: acute lymphoblastic leukemia Method: attention-based convolutional neural network

Enhanced Leukemic Cell Classification Using Attention-Based CNN and Data Augmentation

Douglas Costa Braga, Daniel Oliveira Dantas
Published 2026-01-03 01:24

This study presents a deep learning pipeline for the classification of leukemic cells, specifically targeting acute lymphoblastic leukemia (ALL). The method utilizes an attention-based convolutional neural network, integrating EfficientNetV2-B3 with Squeeze-and-Excitation mechanisms, and employs data augmentation and focal loss to enhance performance. The system achieved a 97.89% F1-score and accuracy on the test set, demonstrating significant improvements over existing methods while reducing the number of parameters used.

Read abstract

We present a reproducible deep learning pipeline for leukemic cell classification, focusing on system architecture, experimental robustness, and software design choices for medical image analysis. Acute lymphoblastic leukemia (ALL) is the most common childhood cancer, requiring expert microscopic diagnosis that suffers from inter-observer variability and time constraints. The proposed system integrates an attention-based convolutional neural network combining EfficientNetV2-B3 with Squeeze-and-Excitation mechanisms for automated ALL cell classification. Our approach employs comprehensive data augmentation, focal loss for class imbalance, and patient-wise data splitting to ensure robust and reproducible evaluation. On the C-NMC 2019 dataset (12,528 original images from 62 patients), the system achieves a 97.89% F1-score and 97.89% accuracy on the test set, with statistical validation through 100-iteration Monte Carlo experiments confirming significant improvements (p < 0.001) over baseline methods. The proposed pipeline outperforms existing approaches by up to 4.67% while using 89% fewer parameters than VGG16 (15.2M vs. 138M). The attention mechanism provides interpretable visualizations of diagnostically relevant cellular features, demonstrating that modern attention-based architectures can improve leukemic cell classification while maintaining computational efficiency suitable for clinical deployment.

ARXIV Cancer: unknown Method: agentic AI framework

An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions

Md Rashadul Islam
Published 2026-01-03 00:10

This paper presents an explainable agentic AI framework designed for uncertainty-aware and abstention-enabled decision support in acute ischemic stroke imaging. The framework includes a modular pipeline with agents for image analysis, uncertainty estimation, and decision-making, prioritizing clinical safety and transparency. Through qualitative and case-based analyses, the framework demonstrates effective handling of diagnostically ambiguous situations, integrating visual explanations to enhance trust in AI outputs.

Read abstract

Artificial intelligence models have shown strong potential in acute ischemic stroke imaging, particularly for lesion detection and segmentation using computed tomography and magnetic resonance imaging. However, most existing approaches operate as black box predictors, producing deterministic outputs without explicit uncertainty awareness or structured mechanisms to abstain under ambiguous conditions. This limitation raises serious safety and trust concerns in high risk emergency radiology settings. In this paper, we propose an explainable agentic AI framework for uncertainty aware and abstention enabled decision support in acute ischemic stroke imaging. The framework follows a modular agentic pipeline in which a perception agent performs lesion aware image analysis, an uncertainty estimation agent computes slice level predictive reliability, and a decision agent determines whether to issue a prediction or abstain based on predefined uncertainty thresholds. Unlike prior stroke imaging systems that primarily focus on improving segmentation or classification accuracy, the proposed framework explicitly prioritizes clinical safety, transparency, and clinician aligned decision behavior. Qualitative and case based analyses across representative stroke imaging scenarios demonstrate that uncertainty driven abstention naturally emerges in diagnostically ambiguous regions and low information slices. The framework further integrates visual explanation mechanisms to support both predictive and abstention decisions, addressing a key limitation of existing uncertainty aware medical imaging systems. Rather than introducing a new performance benchmark, this work presents agentic control, uncertainty awareness, and selective abstention as essential design principles for developing safe and trustworthy medical imaging AI systems.

ARXIV Cancer: pancreatic cancer Method: scale-aware adaptive supervised network

Scale-aware Adaptive Supervised Network with Limited Medical Annotations

Zihan Li, Dandan Shan, Yunxiang Li, Paul E. Kinahan, Qingqi Hong
Published 2026-01-02 23:55

This paper presents SASNet, a Scale-aware Adaptive Supervised Network designed to improve medical image segmentation in semi-supervised learning scenarios with limited annotations. The proposed dual-branch architecture integrates low-level and high-level feature representations through innovative mechanisms such as dynamic pixel-wise prediction weighting and 3D Fourier domain transformations. Evaluation on multiple datasets shows that SASNet outperforms existing semi-supervised methods and approaches the performance of fully supervised models.

Read abstract

Medical image segmentation faces critical challenges in semi-supervised learning scenarios due to severe annotation scarcity requiring expert radiological knowledge, significant inter-annotator variability across different viewpoints and expertise levels, and inadequate multi-scale feature integration for precise boundary delineation in complex anatomical structures. Existing semi-supervised methods demonstrate substantial performance degradation compared to fully supervised approaches, particularly in small target segmentation and boundary refinement tasks. To address these fundamental challenges, we propose SASNet (Scale-aware Adaptive Supervised Network), a dual-branch architecture that leverages both low-level and high-level feature representations through novel scale-aware adaptive reweight mechanisms. Our approach introduces three key methodological innovations, including the Scale-aware Adaptive Reweight strategy that dynamically weights pixel-wise predictions using temporal confidence accumulation, the View Variance Enhancement mechanism employing 3D Fourier domain transformations to simulate annotation variability, and segmentation-regression consistency learning through signed distance map algorithms for enhanced boundary precision. These innovations collectively address the core limitations of existing semi-supervised approaches by integrating spatial, temporal, and geometric consistency principles within a unified optimization framework. Comprehensive evaluation across LA, Pancreas-CT, and BraTS datasets demonstrates that SASNet achieves superior performance with limited labeled data, surpassing state-of-the-art semi-supervised methods while approaching fully supervised performance levels. The source code for SASNet is available at https://github.com/HUANGLIZI/SASNet.

ARXIV Cancer: skin cancer Method: deep learning

A Deep Learning Approach for Automated Skin Lesion Diagnosis with Explainable AI

Md. Maksudul Haque, Rahnuma Akter, A S M Ahsanul Sarkar Akib, Abdul Hasib
Published 2026-01-02 19:21

This paper presents a deep learning architecture for the automated classification of skin lesions using the HAM10000 dataset. The proposed system integrates advanced data balancing, augmentation techniques, and a hybrid EfficientNetV2-L framework with channel attention. The model achieves a total accuracy of 91.15% and demonstrates high performance across various lesion classes, particularly in identifying melanoma and melanocytic nevi. Additionally, the use of explainable AI techniques enhances the interpretability of the model's predictions.

Read abstract

Skin cancer is also one of the most common and dangerous types of cancer in the world that requires timely and precise diagnosis. In this paper, a deep-learning architecture of the multi-class skin lesion classification on the HAM10000 dataset will be described. The system suggested combines high-quality data balancing methods, large-scale data augmentation, hybridized EfficientNetV2-L framework with channel attention, and a three-stage progressive learning approach. Moreover, we also use explainable AI (XAI) techniques such as Grad-CAM and saliency maps to come up with intelligible visual representations of model predictions. Our strategy is with a total accuracy of 91.15 per cent, macro F1 of 85.45\% and micro-average AUC of 99.33\%. The model has shown high performance in all the seven lesion classes with specific high performance of melanoma and melanocytic nevi. In addition to enhancing diagnostic transparency, XAI also helps to find out the visual characteristics that cause the classifications, which enhances clinical trustworthiness.

ARXIV Cancer: general cancer Method: vision-language model

Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model

Hao Guan, Li Zhou
Published 2026-01-02 15:12

This study investigates the detection of performance degradation in Vision-Language Models (VLMs) used for pathology under data shift conditions. It introduces DomainSAT, a toolbox for analyzing input data shifts and proposes a label-free, confidence-based indicator for monitoring output performance. The findings indicate that combining input shift detection with output confidence indicators enhances the reliability of VLMs in tumor classification tasks.

Read abstract

Vision-Language Models have demonstrated strong potential in medical image analysis and disease diagnosis. However, after deployment, their performance may deteriorate when the input data distribution shifts from that observed during development. Detecting such performance degradation is essential for clinical reliability, yet remains challenging for large pre-trained VLMs operating without labeled data. In this study, we investigate performance degradation detection under data shift in a state-of-the-art pathology VLM. We examine both input-level data shift and output-level prediction behavior to understand their respective roles in monitoring model reliability. To facilitate systematic analysis of input data shift, we develop DomainSAT, a lightweight toolbox with a graphical interface that integrates representative shift detection algorithms and enables intuitive exploration of data shift. Our analysis shows that while input data shift detection is effective at identifying distributional changes and providing early diagnostic signals, it does not always correspond to actual performance degradation. Motivated by this observation, we further study output-based monitoring and introduce a label-free, confidence-based degradation indicator that directly captures changes in model prediction confidence. We find that this indicator exhibits a close relationship with performance degradation and serves as an effective complement to input shift detection. Experiments on a large-scale pathology dataset for tumor classification demonstrate that combining input data shift detection and output confidence-based indicators enables more reliable detection and interpretation of performance degradation in VLMs under data shift. These findings provide a practical and complementary framework for monitoring the reliability of foundation models in digital pathology.

ARXIV Cancer: melanoma Method: convolutional neural network

The Impact of Lesion Focus on the Performance of AI-Based Melanoma Classification

Tanay Donde
Published 2026-01-01 14:17

This study investigates the impact of lesion focus on the performance of AI-based melanoma classification. It employs convolutional neural networks and explores the relationship between lesion attention and diagnostic accuracy using various analytical methods. The results indicate that models with better alignment to lesion areas yield improved diagnostic performance, highlighting the importance of interpretable AI in medical diagnostics.

Read abstract

Melanoma is the most lethal subtype of skin cancer, and early and accurate detection of this disease can greatly improve patients' outcomes. Although machine learning models, especially convolutional neural networks (CNNs), have shown great potential in automating melanoma classification, their diagnostic reliability still suffers due to inconsistent focus on lesion areas. In this study, we analyze the relationship between lesion attention and diagnostic performance, involving masked images, bounding box detection, and transfer learning. We used multiple explainability and sensitivity analysis approaches to investigate how well models aligned their attention with lesion areas and how this alignment correlated with precision, recall, and F1-score. Results showed that models with a higher focus on lesion areas achieved better diagnostic performance, suggesting the potential of interpretable AI in medical diagnostics. This study provides a foundation for developing more accurate and trustworthy melanoma classification models in the future.

Find the papers that actually matter