Research Papers

ARXIV Cancer: general cancer Method: Conditional Random Fields

Conditional Random Fields for Interactive Refinement of Histopathological Predictions

Tiffanie Godelaine, Maxime Zanella, Karim El Khoury, Saïd Mahmoudi, Benoît Macq, Christophe De Vleeschouwer
Published 2026-01-17 15:19

This study presents HistoCRF, a Conditional Random Fields (CRF)-based framework designed to enhance histopathological predictions. The method refines zero-shot predictions from Vision-Language Models by incorporating expert annotations and a human-in-the-loop approach. Experimental results indicate significant accuracy improvements across various datasets, demonstrating the effectiveness of the proposed method in supporting cancer detection and staging.

Read abstract

Assisting pathologists in the analysis of histopathological images has high clinical value, as it supports cancer detection and staging. In this context, histology foundation models have recently emerged. Among them, Vision-Language Models (VLMs) provide strong yet imperfect zero-shot predictions. We propose to refine these predictions by adapting Conditional Random Fields (CRFs) to histopathological applications, requiring no additional model training. We present HistoCRF, a CRF-based framework, with a novel definition of the pairwise potential that promotes label diversity and leverages expert annotations. We consider three experiments: without annotations, with expert annotations, and with iterative human-in-the-loop annotations that progressively correct misclassified patches. Experiments on five patch-level classification datasets covering different organs and diseases demonstrate average accuracy gains of 16.0% without annotations and 27.5% with only 100 annotations, compared to zero-shot predictions. Moreover, integrating a human in the loop reaches a further gain of 32.6% with the same number of annotations. The code will be made available on https://github.com/tgodelaine/HistoCRF.

ARXIV Cancer: glioma Method: Karhunen-Loève Expansion

Karhunen-Loève Expansion-Based Residual Anomaly Map for Resource-Efficient Glioma MRI Segmentation

Anthony Hur
Published 2026-01-16 23:48

This study presents a novel approach for glioma MRI segmentation using a Karhunen-Loève Expansion (KLE) based residual anomaly map. The method aims to reduce computational costs and data requirements while maintaining high performance in segmentation tasks. The proposed model achieves competitive Dice scores and HD95 distances compared to state-of-the-art methods, demonstrating its effectiveness on limited resources.

Read abstract

Accurate segmentation of brain tumors is essential for clinical diagnosis and treatment planning. Deep learning is currently the state-of-the-art for brain tumor segmentation, yet it requires either large datasets or extensive computational resources that are inaccessible in most areas. This makes the problem increasingly difficult: state-of-the-art models use thousands of training cases and vast computational power, where performance drops sharply when either is limited. The top performer in the Brats GLI 2023 competition relied on supercomputers trained on over 92,000 augmented MRI scans using an AMD EPYC 7402 CPU, six NVIDIA RTX 6000 GPUs (48GB VRAM each), and 1024GB of RAM over multiple weeks. To address this, the Karhunen--Loève Expansion (KLE) was implemented as a feature extraction step on downsampled, z-score normalized MRI volumes. Each 240$\times$240$\times$155 multi-modal scan is reduced to four $48^3$ channels and compressed into 32 KL coefficients. The resulting approximate reconstruction enables a residual-based anomaly map, which is upsampled and added as a fifth channel to a compact 3D U-Net. All experiments were run on a consumer workstation (AMD Ryzen 5 7600X CPU, RTX 4060Ti (8GB VRAM), and 64GB RAM while using far fewer training cases. This model achieves post-processed Dice scores of 0.929 (WT), 0.856 (TC), and 0.821 (ET), with HD95 distances of 2.93, 6.78, and 10.35 voxels. These results are significantly better than the winning BraTS 2023 methodology for HD95 distances and WT dice scores. This demonstrates that a KLE-based residual anomaly map can dramatically reduce computational cost and data requirements while retaining state-of-the-art performance.

ARXIV Cancer: unknown Method: large language model

CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation

Vanshali Sharma, Andrea Mia Bejar, Gorkem Durak, Ulas Bagci
Published 2026-01-16 18:09

The paper presents CTest-Metric, a unified framework designed to assess the clinical validity of metrics used for CT report generation. It includes three modules that evaluate writing style generalizability, synthetic error injection, and correlation with expert clinician ratings. The study analyzes eight widely used metrics across various large language models, revealing insights into their sensitivity to stylistic variations and factual errors. The findings indicate that certain metrics, like the GREEN Score, align closely with expert judgments.

Read abstract

In the generative AI era, where even critical medical tasks are increasingly automated, radiology report generation (RRG) continues to rely on suboptimal metrics for quality assessment. Developing domain-specific metrics has therefore been an active area of research, yet it remains challenging due to the lack of a unified, well-defined framework to assess their robustness and applicability in clinical contexts. To address this, we present CTest-Metric, a first unified metric assessment framework with three modules determining the clinical feasibility of metrics for CT RRG. The modules test: (i) Writing Style Generalizability (WSG) via LLM-based rephrasing; (ii) Synthetic Error Injection (SEI) at graded severities; and (iii) Metrics-vs-Expert correlation (MvE) using clinician ratings on 175 "disagreement" cases. Eight widely used metrics (BLEU, ROUGE, METEOR, BERTScore-F1, F1-RadGraph, RaTEScore, GREEN Score, CRG) are studied across seven LLMs built on a CT-CLIP encoder. Using our novel framework, we found that lexical NLG metrics are highly sensitive to stylistic variations; GREEN Score aligns best with expert judgments (Spearman~0.70), while CRG shows negative correlation; and BERTScore-F1 is least sensitive to factual error injection. We will release the framework, code, and allowable portion of the anonymized evaluation data (rephrased/error-injected CT reports), to facilitate reproducible benchmarking and future metric development.

ARXIV Cancer: glioblastoma Method: multiple instance learning

Explainable histomorphology-based survival prediction of glioblastoma, IDH-wildtype

Jan-Philipp Redlich, Friedrich Feuerhake, Stefan Nikolin, Nadine Sarah Schaadt, Sarah Teuber-Hanselmann, Joachim Weis, Sabine Luttmann, Andrea Eberle, Christoph Buck, Timm Intemann, Pascal Birnstill, Klaus Kraywinkel, Jonas Ort, Peter Boor, André Homeyer
Published 2026-01-16 15:35

This study presents an explainable AI framework aimed at predicting survival in patients with glioblastoma, IDH-wildtype (GBM-IDHwt) by analyzing histomorphological features. The framework integrates a multiple instance learning architecture with a sparse autoencoder to identify and interpret relevant image tiles. The model demonstrated some ability to discriminate between patients based on survival duration, with a reported AUC of 0.67. The findings suggest potential for prognostic biomarker discovery in GBM-IDHwt.

Read abstract

Glioblastoma, IDH-wildtype (GBM-IDHwt) is the most common malignant brain tumor. While histomorphology is a crucial component of GBM-IDHwt diagnosis, it is not further considered for prognosis. Here, we present an explainable artificial intelligence (AI) framework to identify and interpret histomorphological features associated with patient survival. The framework combines an explainable multiple instance learning (MIL) architecture that directly identifies prognostically relevant image tiles with a sparse autoencoder (SAE) that maps these tiles to interpretable visual patterns. The MIL model was trained and evaluated on a new real-world dataset of 720 GBM-IDHwt cases from three hospitals and four cancer registries across Germany. The SAE was trained on 1,878 whole-slide images from five independent public glioblastoma collections. Despite the many factors influencing survival time, our method showed some ability to discriminate between patients living less than 180 days or more than 360 days solely based on histomorphology (AUC: 0.67; 95% CI: 0.63-0.72). Cox proportional hazards regression confirmed a significant survival difference between predicted groups after adjustment for established prognostic factors (hazard ratio: 1.47; 95% CI: 1.26-1.72). Three neuropathologists categorized the identified visual patterns into seven distinct histomorphological groups, revealing both established prognostic features and unexpected associations, the latter being potentially attributable to surgery-related confounders. The presented explainable AI framework facilitates prognostic biomarker discovery in GBM-IDHwt and beyond, highlighting promising histomorphological features for further analysis and exposing potential confounders that would be hidden in black-box models.

ARXIV Cancer: non-small cell lung cancer Method: mechanistic learning

Mechanistic Learning for Survival Prediction in NSCLC Using Routine Blood Biomarkers and Tumor Kinetics

Ruben Taieb, René Bruno, Pascal Chanu, Jin Yan Jin, Sébastien Benzekry
Published 2026-01-16 10:07

This study aims to predict overall survival in non-small cell lung cancer (NSCLC) by developing a mechanistic model that integrates tumor burden and blood marker kinetics. The model, termed TALN-k, utilizes coupled differential equations and is enhanced with a machine learning framework for improved survival predictions. Results indicate that TALN-k outperforms traditional models, demonstrating better predictive accuracy and interpretability in clinical settings.

Read abstract

Background Predicting overall survival (OS) in non-small cell lung cancer (NSCLC) is essential for clinical decision-making and drug development. While tumor and blood test markers kinetics are intrinsically linked, their joint dynamics and relationship to OS remain unknown. Methods We developed a mechanistic model capturing the interplay between tumor (T) burden and three key blood markers kinetics: albumin (A), lactate dehydrogenase (L), and neutrophils (N), through coupled differential equations (termed TALN-k). This model was enhanced with a machine learning framework (TALN-kML) for OS prediction. The model was trained and validated on clinical trial data from NSCLC patients treated with atezolizumab in monotherapy (N = 862 patients) or combination therapy (N = 1,115). Model parameters were estimated using nonlinear mixed-effects modelling, and survival predictions were assessed using individual and trial level metrics. Results TALN-k successfully described individual and population-level marker kinetics, revealing complex interactions between tumor and blood markers, and improving corrected BIC and log-likelihood metrics by a significant margin of previous empirical state-of-the-art models. Feature selection methods also highlighted valuable predictive parameters, indicatives of good or poor prognosis. The TALN-kML model outperformed empirical, uncoupled models, achieving improved C-index (0.74 $\pm$ 0.02 vs 0.72 $\pm$ 0.03), 12-months AUC (0.83 $\pm$ 0.004 vs 0.79 $\pm$ 0.05), and accuracy (0.77 $\pm$ 0.03 vs 0.76 $\pm$ 0.05) in OS prediction. Conclusion Our mechanistic learning approach allows for an interpretable model, which improves on longitudinal data description and on survival prediction in NSCLC by jointly integrating tumor and blood markers kinetics. This methodology offers a promising avenue for both personalized treatment strategies and drug development optimization.

ARXIV Cancer: small cell carcinoma Method: latent diffusion models

Generation of Chest CT pulmonary Nodule Images by Latent Diffusion Models using the LIDC-IDRI Dataset

Kaito Urata, Maiko Nagao, Atsushi Teramoto, Kazuyoshi Imaizumi, Masashi Kondo, Hiroshi Fujita
Published 2026-01-16 08:36

This study addresses the challenge of data imbalance in computer-aided diagnosis systems for chest CT images, particularly for rare cases like small cell carcinoma. The authors propose a method using latent diffusion models (LDM) to automatically generate chest CT nodule images that reflect target features. The effectiveness of the method was verified using the LIDC-IDRI dataset, with results indicating that the generated images achieved quality comparable to real clinical images.

Read abstract

Recently, computer-aided diagnosis systems have been developed to support diagnosis, but their performance depends heavily on the quality and quantity of training data. However, in clinical practice, it is difficult to collect the large amount of CT images for specific cases, such as small cell carcinoma with low epidemiological incidence or benign tumors that are difficult to distinguish from malignant ones. This leads to the challenge of data imbalance. In this study, to address this issue, we proposed a method to automatically generate chest CT nodule images that capture target features using latent diffusion models (LDM) and verified its effectiveness. Using the LIDC-IDRI dataset, we created pairs of nodule images and finding-based text prompts based on physician evaluations. For the image generation models, we used Stable Diffusion version 1.5 (SDv1) and 2.0 (SDv2), which are types of LDM. Each model was fine-tuned using the created dataset. During the generation process, we adjusted the guidance scale (GS), which indicates the fidelity to the input text. Both quantitative and subjective evaluations showed that SDv2 (GS = 5) achieved the best performance in terms of image quality, diversity, and text consistency. In the subjective evaluation, no statistically significant differences were observed between the generated images and real images, confirming that the quality was equivalent to real clinical images. We proposed a method for generating chest CT nodule images based on input text using LDM. Evaluation results demonstrated that the proposed method could generate high-quality images that successfully capture specific medical features.

ARXIV Cancer: lung cancer Method: visual question answering

Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations

Maiko Nagao, Kaito Urata, Atsushi Teramoto, Kazuyoshi Imaizumi, Masashi Kondo, Hiroshi Fujita
Published 2026-01-16 08:21

This study focuses on the interpretation of pulmonary nodules in chest CT images through a visual question answering (VQA) approach. A dataset was constructed from structured annotations to enable interactive diagnostic support, allowing findings to be generated based on specific physician inquiries. The method demonstrated effectiveness, achieving high evaluation scores in generating relevant image findings.

Read abstract

Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images. In this study, we constructed a visual question answering (VQA) dataset from structured data in an open dataset and investigated an image-finding generation method for chest CT images, with the aim of enabling interactive diagnostic support that presents findings based on questions that reflect physicians' interests rather than fixed descriptions. In this study, chest CT images included in the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) datasets were used. Regions of interest surrounding the pulmonary nodules were extracted from these images, and image findings and questions were defined based on morphological characteristics recorded in the database. A dataset comprising pairs of cropped images, corresponding questions, and image findings was constructed, and the VQA model was fine-tuned on it. Language evaluation metrics such as BLEU were used to evaluate the generated image findings. The VQA dataset constructed using the proposed method contained image findings with natural expressions as radiological descriptions. In addition, the generated image findings showed a high CIDEr score of 3.896, and a high agreement with the reference findings was obtained through evaluation based on morphological characteristics. We constructed a VQA dataset for chest CT images using structured information on the morphological characteristics from the LIDC-IDRI dataset. Methods for generating image findings in response to these questions have also been investigated. Based on the generated results and evaluation metric scores, the proposed method was effective as an interactive diagnostic support system that can present image findings according to physicians' interests.

ARXIV Cancer: general cancer Method: multi-scale attention

MATEX: Multi-scale Attention and Text-guided Explainability of Medical Vision-Language Models

Muhammad Imran, Chi Lee, Yugyung Lee
Published 2026-01-16 01:18

The paper presents MATEX, a framework designed to improve the interpretability of medical vision-language models by integrating anatomically informed spatial reasoning. It combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to generate accurate and clinically relevant gradient attribution maps. Evaluations on the MS-CXR dataset demonstrate that MATEX surpasses the existing M2IB method in spatial precision and alignment with expert annotations, indicating its potential to enhance trust in radiological AI applications.

Read abstract

We introduce MATEX (Multi-scale Attention and Text-guided Explainability), a novel framework that advances interpretability in medical vision-language models by incorporating anatomically informed spatial reasoning. MATEX synergistically combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to produce precise, stable, and clinically meaningful gradient attribution maps. By addressing key limitations of prior methods, such as spatial imprecision, lack of anatomical grounding, and limited attention granularity, MATEX enables more faithful and interpretable model explanations. Evaluated on the MS-CXR dataset, MATEX outperforms the state-of-the-art M2IB approach in both spatial precision and alignment with expert-annotated findings. These results highlight MATEX's potential to enhance trust and transparency in radiological AI applications.

ARXIV Cancer: breast cancer Method: latent diffusion model

Self-learned representation-guided latent diffusion model for breast cancer classification in deep ultraviolet whole surface images

Pouya Afshin, David Helminiak, Tianling Niu, Julie M. Jorns, Tina Yen, Bing Yu, Dong Hye Ye
Published 2026-01-16 00:22

This study presents a Self-Supervised Learning (SSL)-guided Latent Diffusion Model (LDM) aimed at improving breast cancer classification using Deep Ultraviolet Fluorescence Scanning Microscopy (DUV-FSM) images. The method generates high-quality synthetic training patches by incorporating semantic details from a fine-tuned DINO teacher. The approach combines real and synthetic data to fine-tune a Vision Transformer (ViT), achieving an accuracy of 96.47% in classification tasks.

Read abstract

Breast-Conserving Surgery (BCS) requires precise intraoperative margin assessment to preserve healthy tissue. Deep Ultraviolet Fluorescence Scanning Microscopy (DUV-FSM) offers rapid, high-resolution surface imaging for this purpose; however, the scarcity of annotated DUV data hinders the training of robust deep learning models. To address this, we propose an Self-Supervised Learning (SSL)-guided Latent Diffusion Model (LDM) to generate high-quality synthetic training patches. By guiding the LDM with embeddings from a fine-tuned DINO teacher, we inject rich semantic details of cellular structures into the synthetic data. We combine real and synthetic patches to fine-tune a Vision Transformer (ViT), utilizing patch prediction aggregation for WSI-level classification. Experiments using 5-fold cross-validation demonstrate that our method achieves 96.47 % accuracy and reduces the FID score to 45.72, significantly outperforming class-conditioned baselines.

ARXIV Cancer: unknown Method: DenseNet121

Classification of Chest XRay Diseases through image processing and analysis techniques

Santiago Martínez Novoa, María Catalina Ibáñez, Lina Gómez Mesa, Jeremias Kramer
Published 2026-01-16 00:06

This study focuses on the classification of chest X-ray images to diagnose thoracic diseases using various image processing and analysis techniques. The authors specifically highlight the use of DenseNet121 as a method for this classification task. They also evaluate the performance of different methods and discuss their limitations, proposing future improvements.

Read abstract

Multi-Classification Chest X-Ray Images are one of the most prevalent forms of radiological examination used for diagnosing thoracic diseases. In this study, we offer a concise overview of several methods employed for tackling this task, including DenseNet121. In addition, we deploy an open-source web-based application. In our study, we conduct tests to compare different methods and see how well they work. We also look closely at the weaknesses of the methods we propose and suggest ideas for making them better in the future. Our code is available at: https://github.com/AML4206-MINE20242/Proyecto_AML

Find the papers that actually matter