Research Papers

ARXIV Cancer: unknown Method: CycleGAN

An Example for Domain Adaptation Using CycleGAN

Yanhua Zhao
Published 2026-01-13 18:08

This paper discusses the application of Cycle-Consistent Adversarial Network (CycleGAN) for domain adaptation in the medical field. It specifically focuses on the unpaired image-to-image translation from microscopy images to pseudo H&E stained histopathology images. The study illustrates the structure of the CycleGAN model and its potential in enhancing image analysis in medical diagnostics.

Read abstract

Cycle-Consistent Adversarial Network (CycleGAN) is very promising in domain adaptation. In this report, an example in medical domain will be explained. We present struecture of a CycleGAN model for unpaired image-to-image translation from microscopy to pseudo H\&E stained histopathology images.

ARXIV Cancer: high-grade serous ovarian carcinoma Method: radiomics

Developing Predictive and Robust Radiomics Models for Chemotherapy Response in High-Grade Serous Ovarian Carcinoma

Sepideh Hatamikia, Geevarghese George, Florian Schwarzhans, Amirreza Mahbod, Marika AV Reinius, Ali Abbasian Ardakani, Mercedes Jimenez-Linan, Satish Viswanath, Mireia Crispin-Ortuzar, Lorena Escudero Sanchez, Evis Sala, James D Brenton, Ramona Woitek
Published 2026-01-13 11:29

This study focuses on improving the prediction of chemotherapy response in patients with high-grade serous ovarian carcinoma (HGSOC) using radiomics and machine learning. An automated randomization algorithm was employed to enhance feature selection, ensuring robustness and accuracy in predictions. The results indicated that the best prediction performance was achieved for volume reduction, with an AUC of 0.83, demonstrating the potential of radiomics in clinical applications for HGSOC patients.

Read abstract

Objectives: High-grade serous ovarian carcinoma (HGSOC) is typically diagnosed at an advanced stage with extensive peritoneal metastases, making treatment challenging. Neoadjuvant chemotherapy (NACT) is often used to reduce tumor burden before surgery, but about 40% of patients show limited response. Radiomics, combined with machine learning (ML), offers a promising non-invasive method for predicting NACT response by analyzing computed tomography (CT) imaging data. This study aimed to improve response prediction in HGSOC patients undergoing NACT by integration different feature selection methods. Materials and methods: A framework for selecting robust radiomics features was introduced by employing an automated randomisation algorithm to mimic inter-observer variability, ensuring a balance between feature robustness and prediction accuracy. Four response metrics were used: chemotherapy response score (CRS), RECIST, volume reduction (VolR), and diameter reduction (DiaR). Lesions in different anatomical sites were studied. Pre- and post-NACT CT scans were used for feature extraction and model training on one cohort, and an independent cohort was used for external testing. Results: The best prediction performance was achieved using all lesions combined for VolR prediction, with an AUC of 0.83. Omental lesions provided the best results for CRS prediction (AUC 0.77), while pelvic lesions performed best for DiaR (AUC 0.76). Conclusion: The integration of robustness into the feature selection processes ensures the development of reliable models and thus facilitates the implementation of the radiomics models in clinical applications for HGSOC patients. Future work should explore further applications of radiomics in ovarian cancer, particularly in real-time clinical settings.

ARXIV Cancer: prostate cancer Method: multimodal learning

Tissue Classification and Whole-Slide Images Analysis via Modeling of the Tumor Microenvironment and Biological Pathways

Junzhuo Liu, Xuemei Du, Daniel Reisenbuchler, Ye Chen, Markus Eckstein, Christian Matek, Friedrich Feuerhake, Dorit Merhof
Published 2026-01-13 08:53

This study presents BioMorphNet, a multimodal network designed to integrate tissue morphological features and spatial gene expression for improved tissue classification and differential gene analysis. The model constructs a graph to represent relationships between tissue patches and incorporates clinical pathway features derived from spatial transcriptomic data. BioMorphNet demonstrates enhanced classification metrics across multiple cancer datasets, indicating its effectiveness in tumor localization and biomarker discovery.

Read abstract

Automatic integration of whole slide images (WSIs) and gene expression profiles has demonstrated substantial potential in precision clinical diagnosis and cancer progression studies. However, most existing studies focus on individual gene sequences and slide level classification tasks, with limited attention to spatial transcriptomics and patch level applications. To address this limitation, we propose a multimodal network, BioMorphNet, which automatically integrates tissue morphological features and spatial gene expression to support tissue classification and differential gene analysis. For considering morphological features, BioMorphNet constructs a graph to model the relationships between target patches and their neighbors, and adjusts the response strength based on morphological and molecular level similarity, to better characterize the tumor microenvironment. In terms of multimodal interactions, BioMorphNet derives clinical pathway features from spatial transcriptomic data based on a predefined pathway database, serving as a bridge between tissue morphology and gene expression. In addition, a novel learnable pathway module is designed to automatically simulate the biological pathway formation process, providing a complementary representation to existing clinical pathways. Compared with the latest morphology gene multimodal methods, BioMorphNet's average classification metrics improve by 2.67%, 5.48%, and 6.29% for prostate cancer, colorectal cancer, and breast cancer datasets, respectively. BioMorphNet not only classifies tissue categories within WSIs accurately to support tumor localization, but also analyzes differential gene expression between tissue categories based on prediction confidence, contributing to the discovery of potential tumor biomarkers.

ARXIV Cancer: general cancer Method: contrastive learning

Representation Learning with Semantic-aware Instance and Sparse Token Alignments

Phuoc-Nguyen Bui, Toan Duc Nguyen, Junghyun Bum, Duc-Tai Le, Hyunseung Choo
Published 2026-01-13 02:55

This paper presents a novel framework called SISTA, which enhances medical contrastive vision-language pre-training by incorporating semantic-aware instance and sparse token alignments. The method addresses the limitations of traditional contrastive learning by considering inter-report similarities to reduce false negatives and align image patches with relevant word tokens. Experimental results indicate that SISTA significantly improves performance on downstream tasks such as image classification, segmentation, and object detection, particularly in fine-grained tasks with limited labeled data.

Read abstract

Medical contrastive vision-language pre-training (VLP) has demonstrated significant potential in improving performance on downstream tasks. Traditional approaches typically employ contrastive learning, treating paired image-report samples as positives and unpaired ones as negatives. However, in medical datasets, there can be substantial similarities between images or reports from different patients. Rigidly treating all unpaired samples as negatives, can disrupt the underlying semantic structure and negatively impact the quality of the learned representations. In this paper, we propose a multi-level alignment framework, Representation Learning with Semantic-aware Instance and Sparse Token Alignments (SISTA) by exploiting the semantic correspondence between medical image and radiology reports at two levels, i.e., image-report and patch-word levels. Specifically, we improve the conventional contrastive learning by incorporating inter-report similarity to eliminate the false negatives and introduce a method to effectively align image patches with relevant word tokens. Experimental results demonstrate the effectiveness of the proposed framework in improving transfer performance across different datasets on three downstream tasks: image classification, image segmentation, and object detection. Notably, our framework achieves significant improvements in fine-grained tasks even with limited labeled data. Codes and pre-trained models will be made available.

ARXIV Cancer: breast cancer Method: diffusion model

PathoGen: Diffusion-Based Synthesis of Realistic Lesions in Histopathology Images

Mohamad Koohi-Moghadam, Mohammad-Ali Nikouei Mahani, Kyongtae Tyler Bae
Published 2026-01-13 01:45

The paper introduces PathoGen, a diffusion-based generative model designed to synthesize realistic lesions in histopathology images to address the scarcity of expert-annotated data. PathoGen enhances training datasets by generating high-fidelity lesions that maintain natural tissue boundaries and cellular structures. Validation across multiple datasets demonstrates its superiority over existing generative methods, leading to improved segmentation performance in data-scarce scenarios.

Read abstract

The development of robust artificial intelligence models for histopathology diagnosis is severely constrained by the scarcity of expert-annotated lesion data, particularly for rare pathologies and underrepresented disease subtypes. While data augmentation offers a potential solution, existing methods fail to generate sufficiently realistic lesion morphologies that preserve the complex spatial relationships and cellular architectures characteristic of histopathological tissues. Here we present PathoGen, a diffusion-based generative model that enables controllable, high-fidelity inpainting of lesions into benign histopathology images. Unlike conventional augmentation techniques, PathoGen leverages the iterative refinement process of diffusion models to synthesize lesions with natural tissue boundaries, preserved cellular structures, and authentic staining characteristics. We validate PathoGen across four diverse datasets representing distinct diagnostic challenges: kidney, skin, breast, and prostate pathology. Quantitative assessment confirms that PathoGen outperforms state-of-the-art generative baselines, including conditional GAN and Stable Diffusion, in image fidelity and distributional similarity. Crucially, we show that augmenting training sets with PathoGen-synthesized lesions enhances downstream segmentation performance compared to traditional geometric augmentations, particularly in data-scarce regimes. Besides, by simultaneously generating realistic morphology and pixel-level ground truth, PathoGen effectively overcomes the manual annotation bottleneck. This approach offers a scalable pathway for developing generalizable medical AI systems despite limited expert-labeled data.

ARXIV Cancer: colorectal cancer Method: foundation model

Robust Multicentre Detection and Classification of Colorectal Liver Metastases on CT: Application of Foundation Models

Shruti Atul Mali, Zohaib Salahuddin, Yumeng Zhang, Andre Aichert, Xian Zhong, Henry C. Woodruff, Maciej Bobowicz, Katrine Riklund, Juozas Kupčinskas, Lorenzo Faggioni, Roberto Francischello, Razvan L Miclea, Philippe Lambin
Published 2026-01-12 14:35

This study presents a foundation model-based AI pipeline designed for the detection and classification of colorectal liver metastases (CRLM) on contrast-enhanced CT scans. Utilizing data from the EuCanImage consortium and an external TCIA cohort, the model achieved a classification AUC of 0.90 and demonstrated improved performance with uncertainty quantification. The results indicate that this approach can enhance the reliability and interpretability of CRLM detection across diverse clinical settings.

Read abstract

Colorectal liver metastases (CRLM) are a major cause of cancer-related mortality, and reliable detection on CT remains challenging in multi-centre settings. We developed a foundation model-based AI pipeline for patient-level classification and lesion-level detection of CRLM on contrast-enhanced CT, integrating uncertainty quantification and explainability. CT data from the EuCanImage consortium (n=2437) and an external TCIA cohort (n=197) were used. Among several pretrained models, UMedPT achieved the best performance and was fine-tuned with an MLP head for classification and an FCOS-based head for lesion detection. The classification model achieved an AUC of 0.90 and a sensitivity of 0.82 on the combined test set, with a sensitivity of 0.85 on the external cohort. Excluding the most uncertain 20 percent of cases improved AUC to 0.91 and balanced accuracy to 0.86. Decision curve analysis showed clinical benefit for threshold probabilities between 0.30 and 0.40. The detection model identified 69.1 percent of lesions overall, increasing from 30 percent to 98 percent across lesion size quartiles. Grad-CAM highlighted lesion-corresponding regions in high-confidence cases. These results demonstrate that foundation model-based pipelines can support robust and interpretable CRLM detection and classification across heterogeneous CT data.

ARXIV Cancer: general cancer Method: Comparison-based Reinforcement Policy Optimization

PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

Jiao Xu, Junwei Liu, Jiangwei Lao, Qi Zhu, Yunpeng Zhao, Congyun Jin, Shinan Liu, Zhihong Lu, Lihe Zhang, Xin Chen, Jian Wang, Ping Wang
Published 2026-01-12 09:17

The paper presents PulseMind, a multi-modal diagnostic model designed to enhance real-world clinical diagnosis by integrating diverse inputs and ongoing contextual understanding. It introduces a comprehensive dataset, MediScope, consisting of 98,000 multi-turn consultations and 601,500 medical images across various clinical departments. The model employs a unique training framework called Comparison-based Reinforcement Policy Optimization (CRPO) to improve training stability and alignment with human preferences. Experimental results indicate that PulseMind performs competitively on diagnostic benchmarks.

Read abstract

Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional com-parisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.

ARXIV Cancer: general cancer Method: scene-appearance disentanglement

Learning Domain-Invariant Representations for Cross-Domain Image Registration via Scene-Appearance Disentanglement

Jiahao Qin, Yiwen Wang
Published 2026-01-12 07:14

This paper presents SAR-Net, a framework designed to tackle the challenge of image registration under domain shift in medical imaging. By employing scene-appearance disentanglement, the method decomposes images into domain-invariant representations and domain-specific appearance codes, facilitating registration through re-rendering. The empirical results demonstrate that SAR-Net outperforms existing methods on the ANHIR benchmark, achieving a median relative Target Registration Error of 0.25%.

Read abstract

Image registration under domain shift remains a fundamental challenge in computer vision and medical imaging: when source and target images exhibit systematic intensity differences, the brightness constancy assumption underlying conventional registration methods is violated, rendering correspondence estimation ill-posed. We propose SAR-Net, a unified framework that addresses this challenge through principled scene-appearance disentanglement. Our key insight is that observed images can be decomposed into domain-invariant scene representations and domain-specific appearance codes, enabling registration via re-rendering rather than direct intensity matching. We establish theoretical conditions under which this decomposition enables consistent cross-domain alignment (Proposition 1) and prove that our scene consistency loss provides a sufficient condition for geometric correspondence in the shared latent space (Proposition 2). Empirically, we validate SAR-Net on the ANHIR (Automatic Non-rigid Histological Image Registration) challenge benchmark, where multi-stain histopathology images exhibit coupled domain shift from different staining protocols and geometric distortion from tissue preparation. Our method achieves a median relative Target Registration Error (rTRE) of 0.25%, outperforming the state-of-the-art MEVIS method (0.27% rTRE) by 7.4%, with robustness of 99.1%. Code is available at https://github.com/D-ST-Sword/SAR-NET .

ARXIV Cancer: unknown Method: deep learning

Adversarial Attacks on Medical Hyperspectral Imaging Exploiting Spectral-Spatial Dependencies and Multiscale Features

Yunrui Gu, Zhenzhe Gao, Cong Kong, Zhaoxia Yin
Published 2026-01-11 20:28

This study investigates the vulnerabilities of medical hyperspectral imaging (HSI) in cancer diagnostics, particularly focusing on adversarial attacks that exploit spectral-spatial dependencies and multiscale features. The authors propose a targeted adversarial attack framework that includes a Local Pixel Dependency Attack and a Multiscale Information Attack. Experimental results indicate that these attacks significantly impair classification performance in tumor regions while being visually imperceptible. The findings highlight the necessity for robust defenses in clinical applications of medical HSI.

Read abstract

Medical hyperspectral imaging (HSI) enables accurate disease diagnosis by capturing rich spectral-spatial tissue information, but recent advances in deep learning have exposed its vulnerability to adversarial attacks. In this work, we identify two fundamental causes of this fragility: the reliance on local pixel dependencies for preserving tissue structure and the dependence on multiscale spectral-spatial representations for hierarchical feature encoding. Building on these insights, we propose a targeted adversarial attack framework for medical HSI, consisting of a Local Pixel Dependency Attack that exploits spatial correlations among neighboring pixels, and a Multiscale Information Attack that perturbs features across hierarchical spectral-spatial scales. Experiments on the Brain and MDC datasets demonstrate that our attacks significantly degrade classification performance, especially in tumor regions, while remaining visually imperceptible. Compared with existing methods, our approach reveals the unique vulnerabilities of medical HSI models and underscores the need for robust, structure-aware defenses in clinical applications.

ARXIV Cancer: glioblastoma Method: 3D convolutional neural network

Explainable Deep Radiogenomic Molecular Imaging for MGMT Methylation Prediction in Glioblastoma

Hasan M Jamil
Published 2026-01-11 19:16

This study presents a novel framework for the non-invasive prediction of MGMT promoter methylation in glioblastoma using radiogenomic molecular imaging. The approach integrates radiomics, deep learning, and explainable artificial intelligence to analyze MRI-derived features and correlate them with molecular labels. The framework utilizes a 3D convolutional neural network and applies XAI methods to enhance clinical interpretability, demonstrating its potential for precision oncology.

Read abstract

Glioblastoma (GBM) is a highly aggressive primary brain tumor with limited therapeutic options and poor prognosis. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) gene promoter is a critical molecular biomarker that influences patient response to temozolomide chemotherapy. Traditional methods for determining MGMT status rely on invasive biopsies and are limited by intratumoral heterogeneity and procedural risks. This study presents a radiogenomic molecular imaging analysis framework for the non-invasive prediction of MGMT promoter methylation using multi-parametric magnetic resonance imaging (mpMRI). Our approach integrates radiomics, deep learning, and explainable artificial intelligence (XAI) to analyze MRI-derived imaging phenotypes and correlate them with molecular labels. Radiomic features are extracted from FLAIR, T1-weighted, T1-contrast-enhanced, and T2-weighted MRI sequences, while a 3D convolutional neural network learns deep representations from the same modalities. These complementary features are fused using both early fusion and attention-based strategies and classified to predict MGMT methylation status. To enhance clinical interpretability, we apply XAI methods such as Grad-CAM and SHAP to visualize and explain model decisions. The proposed framework is trained on the RSNA-MICCAI Radiogenomic Classification dataset and externally validated on the BraTS 2021 dataset. This work advances the field of molecular imaging by demonstrating the potential of AI-driven radiogenomics for precision oncology, supporting non-invasive, accurate, and interpretable prediction of clinically actionable molecular biomarkers in GBM.

Find the papers that actually matter