Research Papers

ARXIV Cancer: general cancer Method: multimodal learning

MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images

Md. Sazzadul Islam Prottasha, Nabil Walid Rafi
Published 2025-12-29 08:48

This study compares two AI architectures, MedGemma and GPT-4, for medical disease classification from images, focusing on their diagnostic capabilities. The MedGemma model, fine-tuned with Low-Rank Adaptation, achieved a mean test accuracy of 80.37%, outperforming the untuned GPT-4. The findings highlight the importance of domain-specific fine-tuning in enhancing diagnostic sensitivity, particularly in high-stakes clinical tasks such as cancer detection.

Read abstract

Multimodal Large Language Models (LLMs) introduce an emerging paradigm for medical imaging by interpreting scans through the lens of extensive clinical knowledge, offering a transformative approach to disease classification. This study presents a critical comparison between two fundamentally different AI architectures: the specialized open-source agent MedGemma and the proprietary large multimodal model GPT-4 for diagnosing six different diseases. The MedGemma-4b-it model, fine-tuned using Low-Rank Adaptation (LoRA), demonstrated superior diagnostic capability by achieving a mean test accuracy of 80.37% compared to 69.58% for the untuned GPT-4. Furthermore, MedGemma exhibited notably higher sensitivity in high-stakes clinical tasks, such as cancer and pneumonia detection. Quantitative analysis via confusion matrices and classification reports provides comprehensive insights into model performance across all categories. These results emphasize that domain-specific fine-tuning is essential for minimizing hallucinations in clinical implementation, positioning MedGemma as a sophisticated tool for complex, evidence-based medical reasoning.

ARXIV Cancer: unknown Method: Deviation-Space Diffusion Model

PathoSyn: Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion

Jian Wang, Sixing Rong, Jiarui Xing, Yuling Xu, Weide Liu
Published 2025-12-29 01:13

PathoSyn is a generative framework for MRI image synthesis that reformulates imaging-pathology as a disentangled additive deviation on a stable anatomical manifold. It addresses limitations of current generative models by decomposing the synthesis task into anatomical reconstruction and deviation modeling, utilizing a Deviation-Space Diffusion Model. The framework aims to generate high-fidelity patient-specific synthetic datasets, enhancing the development of diagnostic algorithms and supporting precision intervention planning. Evaluations indicate that PathoSyn outperforms existing methods in perceptual realism and anatomical fidelity.

Read abstract

We present PathoSyn, a unified generative framework for Magnetic Resonance Imaging (MRI) image synthesis that reformulates imaging-pathology as a disentangled additive deviation on a stable anatomical manifold. Current generative models typically operate in the global pixel domain or rely on binary masks, these paradigms often suffer from feature entanglement, leading to corrupted anatomical substrates or structural discontinuities. PathoSyn addresses these limitations by decomposing the synthesis task into deterministic anatomical reconstruction and stochastic deviation modeling. Central to our framework is a Deviation-Space Diffusion Model designed to learn the conditional distribution of pathological residuals, thereby capturing localized intensity variations while preserving global structural integrity by construction. To ensure spatial coherence, the diffusion process is coupled with a seam-aware fusion strategy and an inference-time stabilization module, which collectively suppress boundary artifacts and produce high-fidelity internal lesion heterogeneity. PathoSyn provides a mathematically principled pipeline for generating high-fidelity patient-specific synthetic datasets, facilitating the development of robust diagnostic algorithms in low-data regimes. By allowing interpretable counterfactual disease progression modeling, the framework supports precision intervention planning and provides a controlled environment for benchmarking clinical decision-support systems. Quantitative and qualitative evaluations on tumor imaging benchmarks demonstrate that PathoSyn significantly outperforms holistic diffusion and mask-conditioned baselines in both perceptual realism and anatomical fidelity. The source code of this work will be made publicly available.

ARXIV Cancer: melanoma Method: deep learning

INTERACT-CMIL: Multi-Task Shared Learning and Inter-Task Consistency for Conjunctival Melanocytic Intraepithelial Lesion Grading

Mert Ikinci, Luna Toma, Karin U. Loeffler, Leticia Ussem, Daniela Süsskind, Julia M. Weller, Yousef Yeganeh, Martina C. Herwig-Carl, Shadi Albarqouni
Published 2025-12-27 17:37

The paper presents INTERACT-CMIL, a multi-head deep learning framework designed for the grading of Conjunctival Melanocytic Intraepithelial Lesions (CMIL). This framework predicts five histopathological axes using Shared Feature Learning and an Inter-Dependence Loss to ensure consistency across tasks. Evaluated on a dataset of 486 expert-annotated conjunctival biopsy patches, INTERACT-CMIL demonstrates significant improvements over existing CNN and foundation-model baselines, achieving notable macro F1 score gains. The results indicate its potential as a reproducible benchmark for CMIL diagnosis.

Read abstract

Accurate grading of Conjunctival Melanocytic Intraepithelial Lesions (CMIL) is essential for treatment and melanoma prediction but remains difficult due to subtle morphological cues and interrelated diagnostic criteria. We introduce INTERACT-CMIL, a multi-head deep learning framework that jointly predicts five histopathological axes; WHO4, WHO5, horizontal spread, vertical spread, and cytologic atypia, through Shared Feature Learning with Combinatorial Partial Supervision and an Inter-Dependence Loss enforcing cross-task consistency. Trained and evaluated on a newly curated, multi-center dataset of 486 expert-annotated conjunctival biopsy patches from three university hospitals, INTERACT-CMIL achieves consistent improvements over CNN and foundation-model (FM) baselines, with relative macro F1 gains up to 55.1% (WHO4) and 25.0% (vertical spread). The framework provides coherent, interpretable multi-criteria predictions aligned with expert grading, offering a reproducible computational benchmark for CMIL diagnosis and a step toward standardized digital ocular pathology.

ARXIV Cancer: lung cancer Method: deep learning

Leveraging Machine Learning for Early Detection of Lung Diseases

Bahareh Rahmani, Harsha Reddy Bindela, Rama Kanth Reddy Gosula, Krishna Yedubati, Mohammad Amir Salari, Leslie Hinyard, Payam Norouzzadeh, Eli Snir, Martin Schoen
Published 2025-12-27 16:50

This study explores the application of deep learning methods for the early detection of lung diseases, including lung cancer, using chest x-rays. By integrating traditional image processing with advanced neural networks, the research aims to provide rapid and accurate diagnostic solutions. The models were trained and validated, demonstrating high performance metrics such as accuracy, precision, recall, and F1 scores, indicating their reliability for real-world applications.

Read abstract

A combination of traditional image processing methods with advanced neural networks concretes a predictive and preventive healthcare paradigm. This study offers rapid, accurate, and non-invasive diagnostic solutions that can significantly impact patient outcomes, particularly in areas with limited access to radiologists and healthcare resources. In this project, deep learning methods apply in enhancing the diagnosis of respiratory diseases such as COVID-19, lung cancer, and pneumonia from chest x-rays. We trained and validated various neural network models, including CNNs, VGG16, InceptionV3, and EfficientNetB0, with high accuracy, precision, recall, and F1 scores to highlight the models' reliability and potential in real-world diagnostic applications.

ARXIV Cancer: glioma Method: radiomics

ReFRM3D: A Radiomics-enhanced Fused Residual Multiparametric 3D Network with Multi-Scale Feature Fusion for Glioma Characterization

Md. Abdur Rahman, Mohaimenul Azam Khan Raiaan, Arefin Ittesafun Abian, Yan Zhang, Mirjam Jonkman, Sami Azam
Published 2025-12-27 12:12

This study presents ReFRM3D, a novel radiomics-enhanced fused residual multiparametric 3D network designed for the characterization of gliomas. The method utilizes multi-parametric MRI data to improve tumor segmentation and classification efficiency. Experimental results indicate significant enhancements in segmentation performance across multiple datasets, achieving high Dice Similarity Coefficients for various tumor regions.

Read abstract

Gliomas are among the most aggressive cancers, characterized by high mortality rates and complex diagnostic processes. Existing studies on glioma diagnosis and classification often describe issues such as high variability in imaging data, inadequate optimization of computational resources, and inefficient segmentation and classification of gliomas. To address these challenges, we propose novel techniques utilizing multi-parametric MRI data to enhance tumor segmentation and classification efficiency. Our work introduces the first-ever radiomics-enhanced fused residual multiparametric 3D network (ReFRM3D) for brain tumor characterization, which is based on a 3D U-Net architecture and features multi-scale feature fusion, hybrid upsampling, and an extended residual skip mechanism. Additionally, we propose a multi-feature tumor marker-based classifier that leverages radiomic features extracted from the segmented regions. Experimental results demonstrate significant improvements in segmentation performance across the BraTS2019, BraTS2020, and BraTS2021 datasets, achieving high Dice Similarity Coefficients (DSC) of 94.04%, 92.68%, and 93.64% for whole tumor (WT), enhancing tumor (ET), and tumor core (TC) respectively in BraTS2019; 94.09%, 92.91%, and 93.84% in BraTS2020; and 93.70%, 90.36%, and 92.13% in BraTS2021.

ARXIV Cancer: oncology Method: DistilBERT

ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language

Franciszek Górski, Andrzej Czyżewski
Published 2025-12-27 10:00

This paper presents ADMEDTAGGER, an annotation framework that utilizes a multilingual LLM pretrained on a large corpus to distill expert knowledge for tagging medical texts in Polish. The study involved developing a multi-class classifier using a limited annotated dataset, leading to the training of various classifiers based on the BERT architecture. The DistilBERT model outperformed others, achieving high F1 scores across clinical categories, demonstrating its effectiveness as a compact alternative to larger language models.

Read abstract

In this work, we present an annotation framework that demonstrates how a multilingual LLM pretrained on a large corpus can be used as a teacher model to distill the expert knowledge needed for tagging medical texts in Polish. This work is part of a larger project called ADMEDVOICE, within which we collected an extensive corpus of medical texts representing five clinical categories - Radiology, Oncology, Cardiology, Hypertension, and Pathology. Using this data, we had to develop a multi-class classifier, but the fundamental problem turned out to be the lack of resources for annotating an adequate number of texts. Therefore, in our solution, we used the multilingual Llama3.1 model to annotate an extensive corpus of medical texts in Polish. Using our limited annotation resources, we verified only a portion of these labels, creating a test set from them. The data annotated in this way were then used for training and validation of 3 different types of classifiers based on the BERT architecture - the distilled DistilBERT model, BioBERT fine-tuned on medical data, and HerBERT fine-tuned on the Polish language corpus. Among the models we trained, the DistilBERT model achieved the best results, reaching an F1 score > 0.80 for each clinical category and an F1 score > 0.93 for 3 of them. In this way, we obtained a series of highly effective classifiers that represent an alternative to large language models, due to their nearly 500 times smaller size, 300 times lower GPU VRAM consumption, and several hundred times faster inference.

ARXIV Cancer: breast cancer Method: vision transformer

Feature Learning with Multi-Stage Vision Transformers on Inter-Modality HER2 Status Scoring and Tumor Classification on Whole Slides

Olaide N. Oyelade, Oliver Hoxey, Yulia Humrye
Published 2025-12-26 17:45

This study presents a novel approach for HER2 status scoring and tumor classification using a multi-stage vision transformer pipeline on whole slide images (WSIs). The method integrates patch-wise processing of hematoxylin and eosin (H&E) images with immunohistochemistry (IHC) stained images to enhance HER2 scoring accuracy. Experimental results demonstrate a classification accuracy of 0.94 and specificity of 0.933 for predicting HER2 status, indicating the effectiveness of the proposed method in comparison to human pathologists.

Read abstract

The popular use of histopathology images, such as hematoxylin and eosin (H&E), has proven to be useful in detecting tumors. However, moving such cancer cases forward for treatment requires accurate on the amount of the human epidermal growth factor receptor 2 (HER2) protein expression. Predicting both the lower and higher levels of HER2 can be challenging. Moreover, jointly analyzing H&E and immunohistochemistry (IHC) stained images for HER2 scoring is difficult. Although several deep learning methods have been investigated to address the challenge of HER2 scoring, they suffer from providing a pixel-level localization of HER2 status. In this study, we propose a single end-to-end pipeline using a system of vision transformers with HER2 status scoring on whole slide images of WSIs. The method includes patch-wise processing of H&E WSIs for tumor localization. A novel mapping function is proposed to correspondingly identify correlated IHC WSIs regions with malignant regions on H&E. A clinically inspired HER2 scoring mechanism is embedded in the pipeline and allows for automatic pixel-level annotation of 4-way HER2 scoring (0, 1+, 2+, and 3+). Also, the proposed method accurately returns HER2-negative and HER2-positive. Privately curated datasets were collaboratively extracted from 13 different cases of WSIs of H&E and IHC. A thorough experiment is conducted on the proposed method. Results obtained showed a good classification accuracy during tumor localization. Also, a classification accuracy of 0.94 and a specificity of 0.933 were returned for the prediction of HER2 status, scoring in the 4-way methods. The applicability of the proposed pipeline was investigated using WSIs patches as comparable to human pathologists. Findings from the study showed the usability of jointly evaluated H&E and IHC images on end-to-end ViTs-based models for HER2 scoring

ARXIV Cancer: glioblastoma Method: variational autoencoder

The Multi-View Paradigm Shift in MRI Radiomics: Predicting MGMT Methylation in Glioblastoma

Mariya Miteva, Maria Nisheva-Pavlova
Published 2025-12-26 16:32

This study presents a multi-view latent representation learning framework utilizing variational autoencoders to predict MGMT promoter methylation in glioblastoma from MRI radiomic features. The method integrates data from post-contrast T1-weighted and Fluid-Attenuated Inversion Recovery MRI, addressing limitations of conventional unimodal approaches. The proposed framework aims to enhance the classification accuracy of MGMT methylation status, which is crucial for treatment decisions.

Read abstract

Non-invasive inference of molecular tumor characteristics from medical imaging is a central goal of radiogenomics, particularly in glioblastoma (GBM), where O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation carries important prognostic and therapeutic significance. Although radiomics-based machine learning methods have shown promise for this task, conventional unimodal and early-fusion approaches are often limited by high feature redundancy and an incomplete modeling of modality-specific information. In this work, we introduce a multi-view latent representation learning framework based on variational autoencoders (VAE) to integrate complementary radiomic features derived from post-contrast T1-weighted (T1Gd) and Fluid-Attenuated Inversion Recovery (FLAIR) magnetic resonance imaging (MRI). By encoding each modality through an independent probabilistic encoder and performing fusion in a compact latent space, the proposed approach preserves modality-specific structure while enabling effective multimodal integration. The resulting latent embeddings are subsequently used for MGMT promoter methylation classification.

ARXIV Cancer: unknown Method: deep learning

AI for Mycetoma Diagnosis in Histopathological Images: The MICCAI 2024 Challenge

Hyam Omar Ali, Sahar Alhesseen, Lamis Elkhair, Adrian Galdran, Ming Feng, Zhixiang Xiong, Zengming Lin, Kele Xu, Liang Hu, Benjamin Keel, Oliver Mills, James Battye, Akshay Kumar, Asra Aslam, Prasad Dutande, Ujjwal Baid, Bhakti Baheti, Suhas Gajre, Aravind Shrenivas Murali, Eung-Joo Lee, Ahmed Fahal, Rachid Jennane
Published 2025-12-25 21:46

This paper discusses the Mycetoma MicroImage: Detect and Classify Challenge (mAIcetoma), aimed at improving mycetoma diagnosis through AI solutions. The challenge focused on developing automated models for segmenting mycetoma grains and classifying mycetoma types from histopathological images. Five finalist teams participated, proposing various deep learning architectures that achieved high segmentation accuracy and significant performance in classifying mycetoma types.

Read abstract

Mycetoma is a neglected tropical disease caused by fungi or bacteria leading to severe tissue damage and disabilities. It affects poor and rural communities and presents medical challenges and socioeconomic burdens on patients and healthcare systems in endemic regions worldwide. Mycetoma diagnosis is a major challenge in mycetoma management, particularly in low-resource settings where expert pathologists are limited. To address this challenge, this paper presents an overview of the Mycetoma MicroImage: Detect and Classify Challenge (mAIcetoma) which was organized to advance mycetoma diagnosis through AI solutions. mAIcetoma focused on developing automated models for segmenting mycetoma grains and classifying mycetoma types from histopathological images. The challenge attracted the attention of several teams worldwide to participate and five finalist teams fulfilled the challenge objectives. The teams proposed various deep learning architectures for the ultimate goal of this challenge. Mycetoma database (MyData) was provided to participants as a standardized dataset to run the proposed models. Those models were evaluated using evaluation metrics. Results showed that all the models achieved high segmentation accuracy, emphasizing the necessitate of grain detection as a critical step in mycetoma diagnosis. In addition, the top-performing models show a significant performance in classifying mycetoma types.

ARXIV Cancer: liver cancer Method: Adaptive Quaternion Cross-Fusion Network

A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets

Arunkumar V, Firos V M, Senthilkumar S, Gangadharan G R
Published 2025-12-25 18:42

This paper presents the Adaptive Quaternion Cross-Fusion Network (A-QCF-Net), designed for multimodal liver tumor segmentation using unpaired CT and MRI datasets. The model leverages Quaternion Neural Networks to create a shared feature space, facilitating knowledge transfer between imaging modalities. Validation results demonstrate significant improvements in segmentation accuracy compared to existing unimodal approaches, indicating the model's potential for clinical application.

Read abstract

Multimodal medical imaging provides complementary information that is crucial for accurate delineation of pathology, but the development of deep learning models is limited by the scarcity of large datasets in which different modalities are paired and spatially aligned. This paper addresses this fundamental limitation by proposing an Adaptive Quaternion Cross-Fusion Network (A-QCF-Net) that learns a single unified segmentation model from completely separate and unpaired CT and MRI cohorts. The architecture exploits the parameter efficiency and expressive power of Quaternion Neural Networks to construct a shared feature space. At its core is the Adaptive Quaternion Cross-Fusion (A-QCF) block, a data driven attention module that enables bidirectional knowledge transfer between the two streams. By learning to modulate the flow of information dynamically, the A-QCF block allows the network to exchange abstract modality specific expertise, such as the sharp anatomical boundary information available in CT and the subtle soft tissue contrast provided by MRI. This mutual exchange regularizes and enriches the feature representations of both streams. We validate the framework by jointly training a single model on the unpaired LiTS (CT) and ATLAS (MRI) datasets. The jointly trained model achieves Tumor Dice scores of 76.7% on CT and 78.3% on MRI, significantly exceeding the strong unimodal nnU-Net baseline by margins of 5.4% and 4.7% respectively. Furthermore, comprehensive explainability analysis using Grad-CAM and Grad-CAM++ confirms that the model correctly focuses on relevant pathological structures, ensuring the learned representations are clinically meaningful. This provides a robust and clinically viable paradigm for unlocking the large unpaired imaging archives that are common in healthcare.

Find the papers that actually matter