Skip to main content
Top

2024 | Book

Image Analysis and Processing - ICIAP 2023 Workshops

Udine, Italy, September 11–15, 2023, Proceedings, Part II

insite
SEARCH

About this book

The two-volume set LNCS 14365 and 14366 constitutes the papers of workshops hosted by the 22nd International Conference on Image Analysis and Processing, ICIAP 2023, held in Udine, Italy, in September 2023.
In total, 72 workshop papers and 10 industrial poster session papers have been accepted for publication.

Part II of the set, volume 14366, contains 41 papers from the following workshops:– Medical Imaging Hub:• Artificial Intelligence and Radiomics in Computer-Aided Diagnosis (AIR-CAD)• Multi-Modal Medical Imaging Processing (M3IP)• Federated Learning in Medical Imaging and Vision (FedMed)– Digital Humanities Hub:• Artificial Intelligence for Digital Humanities (AI4DH)• Fine Art Pattern Extraction and Recognition (FAPER)• Pattern Recognition for Cultural Heritage (PatReCH)• Visual Processing of Digital Manuscripts: Workflows, Pipelines, BestPractices (ViDiScript)

Table of Contents

Frontmatter
Correction to: Vision Transformers for Breast Cancer Histology Image Classification
Giulia L. Baroni, Laura Rasotto, Kevin Roitero, Ameer Hamza Siraj, Vincenzo Della Mea

Artificial Intelligence and Radiomics in Computer-Aided Diagnosis (AIRCAD)

Frontmatter
Leukocytes Classification Methods: Effectiveness and Robustness in a Real Application Scenario

Classification and differentiation of leukocyte sub-types are important in peripheral blood smear analysis. Fully-automated systems for leukocyte analysis are grouped into segmentation- and detection-based methods. The accuracy of classification depends on the accuracy of segmentation and detection steps. Real-world applications often produce inaccurate ROIs due to image quality factors, e.g., colour and lighting conditions, absence of standards, or even density and presence of overlapping cells. To this end, we investigated the scenario in-depth with ROIs simulating segmentation and detection methods and evaluating different image descriptors on two tasks: differentiation of leukocyte sub-types and leukaemia detection. The obtained results show that even simpler approaches can lead to accurate and robust results in both tasks when exploiting appropriate images for model training. Traditional handcrafted features are more effective when extracted from tight bounding boxes or masks, while deep features are more effective when extracted from large bounding boxes or masks.

Lorenzo Putzu, Andrea Loddo
Vision Transformers for Breast Cancer Histology Image Classification

We propose a self-attention Vision Transformer (ViT) model tailored for breast cancer histology image classification. The proposed architecture uses a stack of transformer layers, with each layer consisting of a multi-head self-attention mechanism and a position-wise feed-forward network, and it is trained with different strategies and configurations, including pretraining, resize dimension, data augmentation, patch overlap, and patch size, to investigate their impact on performance on the histology image classification task. Experimental results show that pretraining on ImageNet and using geometric and color data augmentation techniques significantly improve the model’s accuracy on the task. Additionally, a patch size of 16 $$\times $$ × 16 and no patch overlap were found to be optimal for this task. These findings provide valuable insights for the design of future ViT-based models for similar image classification tasks.

Giulia L. Baroni, Laura Rasotto, Kevin Roitero, Ameer Hamza Siraj, Vincenzo Della Mea
Editable Stain Transformation of Histological Images Using Unpaired GANs

Double staining in histopathology is done to help identify tissue features and cell types differentiated between two tissue samples using two different dyes. In the case of metaplastic breast cancer, H &E and P63 are often used in conjunction for diagnosis. However, P63 tends to damage the tissue and is prohibitively expensive, motivating the development of virtual staining methods, or methods of using artificial intelligence in computer vision for diagnostic strain transformation. In this work, we present results of the new xAI-CycleGAN architecture’s capability to transform from H &E pathology stain to the P63 pathology stain on samples of breast tissue with presence of metaplastic cancer. The architecture is based on Mask CycleGAN and explainability-enhanced training, and further enhanced by structure-preserving features, and the ability to edit the output to further bring generated samples to ground truth images. We demonstrate its ability to preserve structure well and produce superior quality images, and demonstrate the ability to use output editing to approach real images, and opening the doors for further tuning frameworks to perfect the model using the editing approach. Additionally, we present the results of a survey conducted with histopathologists, evaluating the realism of the generated images through a pairwise comparison task, where we demonstrate the approach produced high quality images that sometimes are indistinguishable from ground truth, and overall our model outputs get a high realism rating.

Tibor Sloboda, Lukáš Hudec, Wanda Benešová
Assessing the Robustness and Reproducibility of CT Radiomics Features in Non-small-cell Lung Carcinoma

The aim of this study was to investigate the robustness of radiomics features extracted from computed tomography (CT) images of patients affected by non-small-cell lung carcinoma (NSCLC). Specifically, the impact of manual segmentation on radiomics feature values and their variability were assessed. Therefore, 63 patients affected by squamous cell carcinoma (SCC) and adenocarcinoma (ADC) were retrospectively collected from a public dataset. Original segmentations (automated plus manual refinement approach) were provided together with CT images. Through the matRadiomics tool, manual segmentation of the volume of interest (VOI) was repeated by two training physicians and 107 features were extracted. Feature extraction was also performed using the original segmentations. Therefore, three datasets of extracted features were obtained and compared computing the difference percentage coefficient (DP) and the intraclass correlation coefficient (ICC). Moreover, feature reduction and selection on each dataset were performed using a hybrid descriptive inferential method and the differences among the three feature subsets were evaluated. Successively, three classification models were obtained using the Linear Discriminant Analysis (LDA) classifier. Validation was performed through 10 times repeated 5-fold stratified cross validation. As result, even if 87% features obtained an ICC > 0.8, showing robustness, an AVGDP (averaged DP) equal to 16.2% was observed between the datasets based on manual segmentation. Moreover, manual segmentation had an impact on the subsets of selected features, thus influencing study reproducibility and model explainability.

Giovanni Pasini
Prediction of High Pathological Grade in Prostate Cancer Patients Undergoing [18F]-PSMA PET/CT: A Preliminary Radiomics Study

The aim of this study was to evaluate the effectiveness of [18F]-prostate-specific membrane antigen (PSMA) positron emission tomography/computed tomography (PET/CT) imaging in discriminating high pathological grade (Gleason score > 7), and low pathological grade (Gleason score < 7) using machine learning techniques. The study involved 81 patients with diagnosed prostate cancer who underwent positive [18F]-SPMA PET/CT scans. The PET images were used to identify the primary lesions, and then radiomics analyses were performed using an Imaging Biomarker Standardization Initiative (IBSI) compliant software, namely matRadiomics. Machine learning approaches were employed to identify relevant radiomics features for predicting high-risk malignant disease. The performance of the models was validated using 10 times repeated 5-fold cross validation scheme. The results showed a value of 0.75 for the area under the curve and an accuracy of 72% using the support vector machine (SVM). In conclusion, the study showcased the clinical potential of [18F]-SPMA PET/CT radiomics in differentiating high-risk and low-risk tumors, without the need for biopsy sampling. In-vivo PET/CT imaging could therefore be considered a noninvasive tool for virtual biopsy, facilitating personalized treatment management.

Alessandro Stefano, Cristina Mantarro, Selene Richiusa, Giovanni Pasini, Maria Gabriella Sabini, Sebastiano Cosentino, Massimo Ippolito
MTANet: Multi-Type Attention Ensemble for Malaria Parasite Detection

Malaria is a severe infectious disease caused by the Plasmodium parasite. Diagnosing and treating the disease is crucial to increase the chances of survival. However, detecting malaria parasites is still a manual process performed by experts examining blood smears, especially in less developed countries. This task is time-consuming and prone to errors. Fortunately, deep learning-based object detection methods have shown promising results in automating this task, allowing quick diagnosis and treatment. In this work, we proposed an object detection ensemble architecture, MTANet, that efficiently detects malaria parasite species using one tailored YOLOv5 version integrated with an attention-based approach. We compared its performance against several methods in the literature. The experimental results have shown that MTANet can efficiently and accurately address the detection of different species with a single model.

Luca Zedda, Andrea Loddo, Cecilia Di Ruberto
Breast Mass Detection and Classification Using Transfer Learning on OPTIMAM Dataset Through RadImageNet Weights

A significant number of women are diagnosed with breast cancer each year. Early detection of breast masses is crucial in improving patient prognosis and survival rates. In recent years, deep learning techniques, particularly object detection models, have shown remarkable success in medical imaging, providing promising tools for the early detection of breast masses. This paper uses transfer learning methodologies to present an end-to-end breast mass detection and classification pipeline. Our approach involves a two-step process: initial detection of breast masses using variants of the YOLO object detection models, followed by classification of the detected masses into benign or malignant categories. We used a subset of OPTIMAM (OMI-DB) dataset for our study. We leveraged the weights of RadImageNet, a set of models specifically trained on medical images, to enhance our object detection models. Among the publicly available RadImageNet weights, DenseNet-121 coupled with the yolov5m model gives 0.718 mean average precision(mAP) at 0.5 IoU threshold and a True Positive Rate (TPR) of 0.97 at 0.85 False Positives Per Image (FPPI). For the classification task, we implement a transfer learning approach with fine-tuning, demonstrating the ability to effectively classify breast masses into benign and malignant categories. We used a combination of class weighting and weight decay methods to tackle the class imbalance problem for the classification task.

Ruth Kehali Kassahun, Mario Molinara, Alessandro Bria, Claudio Marrocco, Francesco Tortorella
Prostate Cancer Detection: Performance of Radiomics Analysis in Multiparametric MRI

The purpose of the study was to evaluate the performance of radiomics analysis of MR images for the detection of prostate cancer. The radiomics analysis was conducted using axial T2-weighted images from 49 prostate cancers. The study employs a sophisticated hybrid descriptive-inferential method for the meticulous selection and reduction of features, followed by discriminant analysis to construct a robust predictive model. Among 71 radiomics features, original_glrlm_ShortRunLowGrayLevelEmphasis demonstrated exemplary performance in differentiating between the whole prostate gland and prostate cancer. It had an AUROC of 68.46 (95% CI 0.544 – 0.824; p = 0.022), sensitivity of 76.25%, specificity of 73.15%, and accuracy of 71.02%. Radiomic analysis of T2 weighted MR images was demonstrated to have clinical application in prostate cancer detection, paving the way for improved diagnostic procedures and tailor-made treatment plans for prostate cancer patients.

Muhammad Ali, Viviana Benfante, Giuseppe Cutaia, Leonardo Salvaggio, Sara Rubino, Marzia Portoghese, Marcella Ferraro, Rosario Corso, Giovanni Piraino, Tommaso Ingrassia, Gabriele Tulone, Nicola Pavan, Domenico Di Raimondo, Antonino Tuttolomondo, Alchiede Simonato, Giuseppe Salvaggio
Grading and Staging of Bladder Tumors Using Radiomics Analysis in Magnetic Resonance Imaging

Aim of this study was to evaluate the performance of MRI radiomics analysis in distinguishing low-grade (LG) versus high-grade (HG) bladder lesions and non-muscle-invasive bladder cancer (NMIBC) versus muscle-invasive bladder cancer (MIBC). We proposed a computational statistical analysis model that is standardized and reproducible, and identified predictive and prognostic models to facilitate the process of making medical decisions. Sixteen patients with bladder lesions and preoperative mpMRI were included for a total of 35 bladder lesions. Lesions were manually segmented from T2-weighted sequences. PyRadiomics software was used to extract radiomics features and a total of 120 radiomics features were obtained from each lesions. An operator-independent statistical system was adopted for the selection and reduction of the characteristics, while discriminant analysis was used for the construction of the predictive model. The performance in the discrimination between LG and HG lesions, with an AUROC of 0.84 (95% C.I. between 0.71 and 0.98), sensitivity of 65.6%, specificity of 81.5%, with p < 0.001. The performance in the discrimination between NMIBC and MIBC, with an AUROC of 0.7 (95% C.I. between 0.11 and 1), sensitivity of 100%, specificity of 86.7%, with p-value of 0.0031. Our results demonstrate the valuable contribution of radiomics analysis in improving the characterization and differentiation of bladder lesions, both in terms of differentiating LG from HG lesions and discriminating NMIBC from MIBC.

Viviana Benfante, Giuseppe Salvaggio, Muhammad Ali, Giuseppe Cutaia, Leonardo Salvaggio, Sergio Salerno, Gabriele Busè, Gabriele Tulone, Nicola Pavan, Domenico Di Raimondo, Antonino Tuttolomondo, Alchiede Simonato, Albert Comelli
Combined Data Augmentation for HEp-2 Cells Image Classification

The Antinuclear Antibody (ANA) test is a valuable diagnostic tool for autoimmune disorders that uses Indirect Immunofluorescence (IIF) microscopy with HEp-2 cells as the substrate to identify antibodies and their distinct staining patterns. Machine learning-based approaches have shown promise in automating this diagnosis process, with Data Augmentation (DA) techniques playing a crucial role in improving performance. Even though traditional DA methods have yielded positive results, generative techniques like Variational AutoEncoders (VAEs) have shown potential in exploring the input distribution and generating new images. To address the limitations of traditional DA and explore the potential of generative approaches, this paper focuses on applying Conditional Variational AutoEncoders (CVAEs) to HEp-2 cell image classification. A customized CVAE architecture is proposed, considering multiple labels during generation to enhance versatility. Extensive experiments were conducted with the largest publicly available dataset of HEp-2 cell images, the I3A dataset. The performance of traditional and generative data augmentation techniques were compared while investigating potential synergies between them. The findings highlight the benefits of combining these techniques, especially in scenarios with class imbalance. Thorough statistical analysis provides valuable insights from the experimental results.

Gennaro Percannella, Umberto Petruzzello, Francesco Tortorella, Mario Vento

Multi-modal Medical Imaging Processing (M3IP)

Frontmatter
Harnessing Multi-modality and Expert Knowledge for Adverse Events Prediction in Clinical Notes

Recent advancements in machine learning and deep learning techniques have revolutionized the field of adverse event prediction, which plays a vital role in healthcare by enabling early identification and intervention for high-risk patients. Traditionally, researchers have relied on structured data, including demographic information, vital signs, laboratory results, and medication records. However, the widespread adoption of electronic health records (EHRs) has introduced a substantial amount of unstructured information in the form of clinical notes, which have been largely underutilized. Natural Language Processing (NLP) techniques have emerged as a powerful tool for extracting valuable insights from these clinical notes and incorporating them into machine learning frameworks. Additionally, multimodal machine learning, which integrates structured and unstructured data, has gained considerable attention to enhance the accuracy of adverse event prediction. This research focuses on the application of multimodal machine learning for predicting adverse events such as atrial fibrillation, heart failure, and ischemic myocardial infarction. The study aims to compare the performance of a Machine Learning specialist without domain knowledge would obtain with an approach guided by physicians, that includes an information retrieval step using unstructured clinical notes. The analysis is carried out using a dataset provided by the Hospital of Naples Federico II. The results not only shed light on the importance of leveraging different aspects of a patient’s medical history and extracting information from unstructured notes but also highlight the added value of domain expertise.

Marco Postiglione, Giovanni Esposito, Raffaele Izzo, Valerio La Gatta, Vincenzo Moscato, Raffaele Piccolo
A Multimodal Deep Learning Based Approach for Alzheimer’s Disease Diagnosis

Alzheimer’s Disease is among the most common causes of death worldwide, and it is expected to have a greater impact in the years to come. Currently, there are no effective means to halt its progression, but researchers are actively exploring prevention, diagnosis, prognosis, and treatment options to find better solutions in each domain. Notably, extensive studies have shown that early detection plays a crucial role in developing more accurate prognoses and appropriate treatments. Presently, the primary diagnostic tests employed in this regard are images derived from Positron Emission Tomography (PET) and Magnetic Resonance Imaging (MRI). PETs are mainly used for obtaining functional information from the produced image, while MRIs reveal structural impairment. As artificial intelligence (AI) is increasingly used to support diagnostics for the development of ever more performing classifiers, in this paper we intend to show a study related to a Multimodal Deep Learning (MDL) approach that could guarantee better classification performance, integrating MRI structural information with PET functional information. The classifiers we are going to introduce is based on 3D Deep Convolutional Neural Networks (CNN). Here we will focus on Early Fusion (EF) and Late Fusion (LF) approaches on unbalanced and incomplete datasets exported from the Californian ADNI project.

Adriano De Simone, Carlo Sansone
A Systematic Review of Multimodal Deep Learning Approaches for COVID-19 Diagnosis

During and after the years of the COVID-19 pandemic, researchers and domain experts put all their effort into the discovery of accurate and reliable techniques for the detection and diagnosis of this disease in potentially sick patients. In the meanwhile, Deep Learning (DL) techniques are continuously improving and expanding, becoming more and more efficient and compatible in several fields of study and with different kinds of data. This huge but heterogeneous set of data cannot be fully exploited if DL models are not designed to be compatible with different sources of data at the same time, therefore multimodal approaches were designed and adopted, resulting in better prediction results than the classic approaches. Given these premises, several multimodal solutions for COVID-19 diagnosis were built in these years, but it may result hard to have a complete overview of the current state-of-the-art. For this reason, this paper wants to be a useful review of multimodal approaches and related adopted datasets, and therefore a starting point to quickly check what to improve to bring more accurate solutions.

Salvatore Capuozzo, Carlo Sansone
A Multi-dimensional Joint ICA Model with Gaussian Copula

Different imaging modalities can provide complementary information and fusing those can leverage their unique views into the brain. Independent component analysis (ICA) and its multimodal version, joint ICA (jICA), have been useful for brain imaging data mining. Conventionally, jICA assumes a common mixing matrix and independent latent joint components with independent and identical marginals. Thus, jICA maximizes a (melded) 1D distribution for each joint component, by either maximum likelihood or the infomax principle. In this study, we propose a joint ICA method that relaxes these assumptions by allowing samples from same voxels (in this case, fMRI and sMRI) to originate from a non-factorial bivariate distribution. We then maximize the likelihood of this joint 2D distribution. The full 2D bivariate distribution is defined by two marginal distributions linked with a copula. Several ICA-based studies on neuroimaging data have successfully modeled independent sources with a logistic distribution, providing robust and replicable results across modalities. This is because neuroimaging data often consists of rapid fluctuations around a baseline, resulting in super-Gaussian distributions. For consistency with prior literature, we choose the logistic distribution to model the marginals, combined with a Gaussian copula to model linkage via simple correlation. However, it should be noted that the proposed algorithm can easily adapt to different types of copulas and alternative marginal distributions. We demonstrated the performance of the proposed method on a simulated dataset and applied the proposed method to analyze structural and functional magnetic resonance imaging dataset from the Alzheimer’s disease neuroimaging initiative (ADNI) dataset.

Oktay Agcaoglu, Rogers F. Silva, Deniz Alacam, Vince Calhoun

Federated Learning in Medical Imaging and Vision (FEDMED)

Frontmatter
Federated Learning for Data and Model Heterogeneity in Medical Imaging

Federated Learning (FL) is an evolving machine learning method in which multiple clients participate in collaborative learning without sharing their data with each other and the central server. In real-world applications such as hospitals and industries, FL counters the challenges of data heterogeneity and model heterogeneity as an inevitable part of the collaborative training. More specifically, different organizations, such as hospitals, have their own private data and customized models for local training. To the best of our knowledge, the existing methods do not effectively address both problems of model heterogeneity and data heterogeneity in FL. In this paper, we exploit the data and model heterogeneity simultaneously, and propose a method, MDH-FL (Exploiting Model and Data Heterogeneity in FL) to solve such problems to enhance the efficiency of the global model in FL. We use knowledge distillation and a symmetric loss to minimize the heterogeneity and its impact on the model performance. Knowledge distillation is used to solve the problem of model heterogeneity, and symmetric loss tackles with the data and label heterogeneity. We evaluate our method on the medical datasets to conform the real-world scenario of hospitals, and compare with the existing methods. The experimental results demonstrate the superiority of the proposed approach over the other existing methods.

Hussain Ahmad Madni, Rao Muhammad Umer, Gian Luca Foresti
Experience Sharing and Human-in-the-Loop Optimization for Federated Robot Navigation Recommendation

Mobile robot navigation in unknown, dynamic, hostile and/or crowded environments is a challenging task, especially when it comes to multi-robot systems. Looking at the issue from the lens of human-robot interaction (HRI) and artificial general intelligence, the keys are enabling (mobile) robots to deal with such a problem as human beings do by learning from (themselves or) counterparts’ experience and providing them with freedom of choice. Besides traditional solutions in the field, learning from historical knowledge, in the form of experience sharing and experience replay gained momentum within the recent years. Extending these notions, the idea of taking advantages of the robot’s collected information and perception of the environment (and the obstacles within); previous (case-specific) decisions and their consequences as well as complementary information provided by human-in-the-loop optimization, such as human-generated suggestions and advice, to provide the robotic agent with context-aware navigation recommendations is introduced in this paper. More specifically, the conceptual architecture of a robot navigation recommender system (RoboRecSys) is proposed to provide the agent with several options (based on different criteria) for making more efficient decisions in finding the most appropriate path towards its goal. Moreover, in order to preserve the privacy of both agents’ data and environmental perception information and their decisions (and feedback) based on the received recommendations, the federated learning approach is employed.

Morteza Moradi, Mohammad Moradi, Dario Calogero Guastella
FeDETR: A Federated Approach for Stenosis Detection in Coronary Angiography

Assessing the severity of stenoses in coronary angiography is critical to the patient’s health, as coronary stenosis is an underlying factor in heart failure. Current practice for grading coronary lesions, i.e. fractional flow reserve (FFR) or instantaneous wave-free ratio (iFR), suffers from several drawbacks, including time, cost and invasiveness, alongside potential interobserver variability. In this context, some deep learning methods have emerged to assist cardiologists in automating the estimation of FFR/iFR values. Despite the effectiveness of these methods, their reliance on large datasets is challenging due to the distributed nature of sensitive medical data. Federated learning addresses this challenge by aggregating knowledge from multiple nodes to improve model generalization, while preserving data privacy. We propose the first federated detection transformer approach, FeDETR, to assess stenosis severity in angiography videos based on FFR/iFR values estimation. In our approach, each node trains a detection transformer (DETR) on its local dataset, with the central server federating the backbone part of the network. The proposed method is trained and evaluated on a dataset collected from five hospitals, consisting of 1001 angiographic examinations, and its performance is compared with state-of-the-art federated learning methods.

Raffaele Mineo, Amelia Sorrenti, Federica Proietto Salanitri
FeDZIO: Decentralized Federated Knowledge Distillation on Edge Devices

In recent years, the proliferation of edge devices and distributed sensors has fueled the need for training sophisticated deep learning models directly on resource-constrained nodes, in order to guarantee data locality and prevent the transmission of private information to centralized training infrastructures. However, executing large-scale models on edge devices poses significant challenges due to limited computational power, memory constraints and energy consumption limitation. Federated Learning (FL) has emerged as a promising approach to partially address these issues, enabling decentralized model training across multiple devices without the need to exchange local data. At the same time, Knowledge Distillation (KD) has demonstrated its efficacy in compressing complex models by transferring knowledge from a larger teacher model to a smaller student model.This paper presents a novel framework combining Federated Learning with Knowledge Distillation, specifically tailored for accelerating training on edge devices. The proposed approach leverages the collaborative learning capabilities of federated learning to perform knowledge distillation in a privacy-preserving and efficient manner. Instead of relying on a central server for aggregation, edge devices with localized data collaboratively exchange knowledge with each other, enabling transmission of minimal quantities of data without compromising data privacy and model performance. The distributed nature of this approach allows edge devices to leverage collective intelligence while avoiding the need for sharing raw data across the network.We conduct extensive experiments on diverse edge device scenarios using state-of-the-art deep learning architectures. The results demonstrate that our approach achieves substantial model compression while maintaining competitive performance compared to traditional knowledge distillation methods. Additionally, the federated nature of our approach ensures scalability and robustness, even in dynamic edge device environments.

Luca Palazzo, Matteo Pennisi, Giovanni Bellitto, Isaak Kavasidis
A Federated Learning Framework for Stenosis Detection

This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA). Two heterogeneous datasets from two institutions were considered: Dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy); Dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature. Stenosis detection was performed by using a Faster R-CNN model. In our FL framework, only the weights of the model backbone were shared among the two client institutions, using Federated Averaging (FedAvg) for weight aggregation. We assessed the performance of stenosis detection using Precision (Prec), Recall (Rec), and F1 score (F1). Our results showed that the FL framework does not substantially affects clients 2 performance, which already achieved good performance with local training; for client 1, instead, FL framework increases the performance with respect to local model of +3.76%, +17.21% and +10.80%, respectively, reaching $$Prec = 73.56$$ P r e c = 73.56 , $$Rec = 67.01$$ R e c = 67.01 and $$F1 = 70.13$$ F 1 = 70.13 . With such results, we showed that FL may enable multicentric studies relevant to automatic stenosis detection in CA by addressing data heterogeneity from various institutions, while preserving patient privacy.

Mariachiara Di Cosmo, Giovanna Migliorelli, Matteo Francioni, Andi Muçaj, Alessandro Maolo, Alessandro Aprile, Emanuele Frontoni, Maria Chiara Fiorentino, Sara Moccia
Benchmarking Federated Learning Frameworks for Medical Imaging Tasks

This paper presents a comprehensive benchmarking study of various Federated Learning (FL) frameworks applied to the task of Medical Image Classification. The research specifically addresses the often neglected and complex aspects of scalability and usability in off-the-shelf FL frameworks. Through experimental validation using real case deployments, we provide empirical evidence of the performance and practical relevance of open source FL frameworks. Our findings contribute valuable insights for anyone interested in deploying a FL system, with a particular focus on the healthcare domain-an increasingly attractive field for FL applications.

Samuele Fonio

Artificial Intelligence for Digital Humanities (AI4DH)

Frontmatter
Examining the Robustness of an Ensemble Learning Model for Credibility Based Fake News Detection

Ensemble learning is a technique of combining multiple base machine learning models and using the blended results as the final classification output. Such models provide a unique perspective on the classification results as it produces a more comprehensive and encompassing output. As such ensemble learning techniques are widely used for classification in today. Hence it is important that any ensemble learning model be robust and resilient to any type of data and not just applicable to one dataset. This research investigates and evaluates the robustness and the resilience of the proposed Legitimacy ensemble learning model. This ensemble learning model was previously proposed for Credibility Based Fake News Detection. This research evaluates Legitimacy’s performance with a variety of datasets. In the first scenario, the Legitimacy ensemble learning model is evaluated with 3 different binary classification datasets for training and testing purposes, respectively. In the second scenario, the Legitimacy model is assessed where one dataset is used for training whilst another dataset is used for testing. In the final scenario the Legitimacy ensemble learning model is evaluated against a multiclass dataset for multiclass classification. The results of all the above tests are assimilated and evaluated. The results suggest that the Legitimacy ensemble learning model performs well in all three scenarios giving AUC values all equal to or greater than 0.500. As such it can be concluded that the Legitimacy model is a robust and resilient ensemble learning technique and can be employed for the task of classification with any dataset.

Amit Neil Ramkissoon, Kris Manohar, Wayne Goodridge
Prompt Me a Dataset: An Investigation of Text-Image Prompting for Historical Image Dataset Creation Using Foundation Models

In this paper, we present a pipeline for image extraction from historical documents using foundation models, and evaluate text-image prompts and their effectiveness on humanities datasets of varying levels of complexity. The motivation for this approach stems from the high interest of historians in visual elements printed alongside historical texts on the one hand, and from the relative lack of well-annotated datasets within the humanities when compared to other domains. We propose a sequential approach that relies on GroundDINO and Meta’s Segment-Anything-Model (SAM) to retrieve a significant portion of visual data from historical documents that can then be used for downstream development tasks and dataset creation, as well as evaluate the effect of different linguistic prompts on the resulting detections.

Hassan El-Hajj, Matteo Valleriani
Artificial Intelligence in Art Generation: An Open Issue

This paper aims to give a contribution to one of the most discussed issues in recent times, in both scientific and art communities: the use of Artificial Intelligence (AI) based tools for creating artworks. As the issue is strongly multidisciplinary, we structured the paper as a debate between experts in several fields (computer science, art history, philosophy) to listen to their specific points of view on the topic. The first part of the paper is focused on the relationship between the artists and the use of AI techniques. Furthermore, we organized an art exhibition with images created by an AI-based tools, to also collect people’s feedbacks. We submitted to the viewers a questionnaire and their answers are reported in the experimental section. This, the second part is more focused on the visitors’ perspective and about their perception on the use of these tools.

Giuseppe Mazzola, Marco Carapezza, Antonio Chella, Diego Mantoan
A Deep Learning Approach for Painting Retrieval Based on Genre Similarity

As digitized paintings continue to grow in popularity and become more prevalent on online collection platforms, it becomes necessary to develop new image processing algorithms to effectively manage the paintings stored in databases. Image retrieval has historically been a challenging field within digital image processing, as it requires scanning large databases for images that are similar to a given query image. The notion of similarity itself, varies according to user’s perception. The performance of image retrieval is heavily influenced by the feature representations and similarity measures used. Recently, Deep Learning has made significant strides, and deep features derived from this technology have become widely used due to their demonstrated ability to generalize well. In this paper, a fine-tune Convolutional Neural Network for the artistic genres recognition is employed to extract deep and high-level features from paintings. These features are then used to measure the similarity between a given query image and the images stored in the database, using an Approximate Nearest Neighbours algorithm to get a real time result. Our experimental results indicate this approach leads to a significant improvement in the performance of content-based image retrieval for the task of genre retrieval in paintings.

Tess Masclef, Mihaela Scuturici, Benjamin Bertin, Vincent Barrellon, Vasile-Marian Scuturici, Serge Miguet
GeomEthics: Ethical Considerations About Using Artificial Intelligence in Geomatics

Artificial intelligence (AI) has made significant advancements in the field of geomatics, revolutionizing the way geospatial data is processed, analyzed, and interpreted. While these advancements have brought numerous benefits, they also raise ethical risks that must be carefully considered. The improvement of AI in geomatics has introduced ethical considerations such as data privacy, algorithmic bias, transparency, accountability, and the responsible use of AI technology. As AI algorithms process and analyze vast amounts of geospatial data, concerns regarding data privacy and security become paramount. Geospatial data often contains sensitive information, and the use of AI requires robust measures to protect individual privacy and prevent unauthorized access or misuse of data. This paper examines the ethical implications of the use of AI in geomatics and proposes the concept of GeomEthics as a framework for analyzing these ethical considerations. It explores the technical aspects of AI in geomatics and highlights the ethical principles of fairness, privacy, bias, accountability, and transparency. By coining the term GeomEthics, the paper emphasizes the importance of addressing these ethical concerns. It proposes the development of ethical guidelines and best practices for the responsible integration of AI in geomatics and discusses future research directions in the field. This paper contributes to a comprehensive understanding of the ethical implications of AI in geomatics and provides insights for ensuring the responsible and beneficial use of AI technologies in the geospatial domain. By addressing these ethical challenges, the field of geomatics can harness the benefits of AI while mitigating its potential risks and ensuring that geospatial analysis and decision-making processes are conducted ethically and responsibly.

Ermanno Petrocchi, Simona Tiribelli, Marina Paolanti, Benedetta Giovanola, Emanuele Frontoni, Roberto Pierdicca

Fine Art Pattern Extraction and Recognition (FAPER)

Frontmatter
Enhancing Preservation and Restoration of Open Reel Audio Tapes Through Computer Vision

Analog audio documents inevitably face degradation over time, posing a challenge for preserving their audio content and ensuring the integrity of the recordings. Analog document preservation is one of the main research topics of interest of the Centro di Sonologia Computazionale (CSC) of the Department of Information Engineering of the University of Padua, which over the years developed and implemented a methodology for preservation that includes, among other things, the video recording of the digitization process of the open-reel tapes for documenting irregularities on the top of their surface. Together with the corpus of digitized high-quality audio recordings, this led to the creation of an internal archive of video documents. This paper presents a software application that leverages computer vision techniques to automatically detect Irregularities on open-reel audio tapes, analyzing the video documents produced during the digitization interventions. The software employs a frame-by-frame analysis to automatically identify and highlight points of interest that may indicate tape damages, splices, and other Irregularities. The software uses Generalized Hough Transform and SURF algorithms to locate regions of interest within the tape. The proposed software is also part of the MPAI/IEEE-CAE ARP standard developed by Audio Innova s.r.l., spin-off of the CSC, and it may offer a robust and efficient solution for analyzing open-reel audio tapes, supporting archivists and musicologists in their activities.

Alessandro Russo, Matteo Spanio, Sergio Canazza
Exploring the Synergy Between Vision-Language Pretraining and ChatGPT for Artwork Captioning: A Preliminary Study

While AI techniques have enabled automated analysis and interpretation of visual content, generating meaningful captions for artworks presents unique challenges. These include understanding artistic intent, historical context, and complex visual elements. Despite recent developments in multi-modal techniques, there are still gaps in generating complete and accurate captions. This paper contributes by introducing a new dataset for artwork captioning generated using prompt engineering techniques and ChatGPT. We refined the captions with CLIPScore to filter out noise; then, we fine-tuned GIT-Base, resulting in visually accurate captions that surpass the ground truth. Enrichment of descriptions with predicted metadata improves their informativeness. Artwork captioning has implications for art appreciation, inclusivity, education, and cultural exchange, particularly for people with visual impairments or limited knowledge of art.

Giovanna Castellano, Nicola Fanelli, Raffaele Scaringi, Gennaro Vessio
Progressive Keypoint Localization and Refinement in Image Matching

Image matching is the core of many computer vision applications for cultural heritage. The standard image matching pipeline detects keypoints at the beginning and freezes them until bundle adjustment, by which keypoints are allowed to move in order to improve the overall scene estimation. Recent deep image matching approaches do not follow this scheme, historically imposed by computational limits, and progressively refine the localization of the matches in a coarse-to-fine manner.This paper investigates the use of traditional computer vision approaches based on template matching to update the keypoint position throughout the whole matching pipeline. In order to improve the accuracy of the template matching, the usage of the coarse-to-fine refinement is explored and a novel normalization strategy for the local keypoint patches is designed. Specifically, the proposed patch normalization assumes a local piece-wise planar approximation of the scene and warps the corresponding patches according to a “middle homography”, so that, after normalization, patch distortion is roughly equally distributed within the two original patches.The experimental comparison of the considered approaches, mainly focused on cultural heritage scenes but straightforwardly generalizable to other common scenarios, shows the strengths and limitations of each evaluated method. This analysis indicates promising and interesting results for the investigated approaches, which can effectively be deployed to design better image matching solutions.

Fabio Bellavia, Luca Morelli, Carlo Colombo, Fabio Remondino
Toward a System of Visual Classification, Analysis and Recognition of Performance-Based Moving Images in the Artistic Field

This paper proposes a research program focused on the design of a model for the recognition, analysis and classification of video art works and documentations based on their semiotic aspects and audiovisual content. Focusing on a corpus of art cinema, video art, and performance art, the theoretical framework involves bringing together semiotics, film studies, visual studies, and performance studies with the innovative technologies of computer vision and artificial intelligence. The aim is to analyze the performance aspect to interpret contextual references and cultural constructs recorded in artistic contexts, contributing to the classification and analysis of video art works with complex semiotic characteristics. Underlying the conceptual framework is the simultaneous use of a set of technologies, such as pose estimation, facial recognition, object recognition, motion analysis, audio analysis, and natural language processing, to improve recognition accuracy and create a large set of labeled audiovisual data. In addition, the authors propose a prototype application to explore the primary challenges of such a research project.

Michael Castronuovo, Alessandro Fiordelmondo, Cosetta Saba
CreatiChain: From Creation to Market

CreatiChain is a novel, integrated workflow management system that supports creating and monetizing AI-generated art. The system leverages grammars (Prompt Grammars) for semi-automated prompt generation, advanced AI algorithms for digital art creation, and blockchain technology for NFT minting and placement. The workflow begins with generating creative prompts using Prompt Grammars, offering creators a high level of customization. These prompts are fed into an AI-based art generation platform, producing unique digital art pieces. Once the art is created, the system automatically mints it into an NFT and places it on an NFT marketplace. CreatiChain streamlines access to AI art generation and NFT creation, offering a comprehensive solution for artists, designers, and digital creators to navigate the rapidly evolving digital art landscape.

Enrico Maria Aldorasi, Remo Pareschi, Francesco Salzano
Towards Using Natural Images of Wood to Retrieve Painterly Depictions of the Wood of Christ’s Cross

A painting, much like written text, allows future viewers to draw conclusions about the time of its creation or its painter. Also, when looking at a whole corpus of images instead a single instance, trends in painting can be analyzed. One particular trend originating in the 14th century is the transfer of the visual impression of real world materials onto paintings. One object, which is often depicted in paintings around that time being made from wood, is Christ’s cross. Scarce research has been done in the direction of automatically analyzing painterly depictions of the wooden cross of Christ. Hence, this study walks a step towards automatic annotation of wooden crosses in paintings by evaluating three publicly available databases containing natural images of wood for their applicability to use their images as queries to retrieve painterly depictions of the wood of the cross. Experimental results underline the demand for further investigations.

Johannes Schuiki, Miriam Landkammer, Michael Linortner, Isabella Nicka, Andreas Uhl

Pattern Recognition for Cultural Heritage (PatReCH)

Frontmatter
Feature Relevance in Classification of 3D Stone from Ancient Wall Structures

The increasing availability of quantitative data in archaeological studies has prompted the research of Machine Learning methods to support archaeologists in their analysis. This paper considers in particular the problem of automatic classification of 3D surface patches of “rubble stones” and “wedges” obtained from Prehistorical and Protohistorical walls in Crete. These data come from the W.A.L.(L) Project aimed to query 3D photogrammetric models of ancient architectonical structures in order to extract archaeologically significant features. The principal aim of this paper is to address the issue of a clear semantically correspondence between data analysis concepts and archaeology. Classification of stone patches has been performed with several Machine Learning methods, and then feature relevance has been computed for all the classifiers. The results show a good correspondence between the most relevant features of the classification and the qualitative features that human experts adopt typically to classify the wall facing stones.

Giovanni Gallo, Yaser Gholizade Atani, Roberto Leotta, Filippo Stanco, Francesca Buscemi, Marianna Figuera
Gamification in Cultural Heritage: When History Becomes SmART

The term “Gamification” refers to the use of techniques and features typically found in the gaming industry, applied to contexts that extend beyond traditional entertainment video games. It has been observed that these techniques are particularly effective for learning information, as users are actively engaged. The field of cultural heritage can also benefit from the use of gamification techniques. Often, the dissemination of tangible and intangible cultural heritage poses certain challenges, especially from a communication standpoint. In this context, the use of gamification-based approaches can significantly enhance the ability to disseminate culture by placing users at the center of the visiting experience and fostering a stronger connection with the cultural heritage. By employing such approaches, it is possible to create immersive and memorable experiences that enhance users’ learning capabilities. Innovative technologies like augmented reality and localization systems can provide valuable support in developing gamified visitor paths. A case study was proposed and developed within the Archaeological Park of Pompeii, and the experimental results were highly encouraging, particularly demonstrating greater benefits for younger age groups compared to a traditional visit.

Mario Casillo, Francesco Colace, Francesco Marongiu, Domenico Santaniello, Carmine Valentino
Classification of Turkish and Balkan House Architectures Using Transfer Learning and Deep Learning

Classifying architectural structures is an important and challenging task that requires expertise. Convolutional Neural Networks (CNN), which are a type of deep learning (DL) approach, have shown successful results in computer vision applications when combined with transfer learning. In this study, we utilized CNN based models to classify regional houses from Anatolia and Balkans based on their architectural styles with various pretrained models using transfer learning. We prepared a dataset using various sources and employed data augmentation and mixup techniques to solve the limited data availability problem for certain regional houses to improve the classification performance. Our study resulted in a classifier that successfully distinguishes 15 architectural classes from Anatolia and Balkans. We explain our predictions using grad-cam methodology.

Veli Mustafa Yönder, Emre İpek, Tarık Çetin, Hasan Burak Çavka, Mehmet Serkan Apaydın, Fehmi Doğan
Method for Ontology Learning from an RDB: Application to the Domain of Cultural Heritage

For decades, ontologies have defined valuable terminology for describing and representing a knowledge domain, capturing relationships between concepts, and improving knowledge management.Ontologies enable the exchange and sharing of information, extending syntactic and semantic interoperability: such advantages are also very useful in the cultural heritage (CH) field.Nowadays, ontologies are often made manually, although various attempts have been made in the literature for their automatic generation (Ontology Learning).This paper proposes a new way for the semi-automatic building of an ontology from a Relational Database (RDB).Following an accurate review of existing methods, we propose the implementation of a Python library capable of converting an RDB into an OWL ontology and importing its data inside the ontology as concepts and properties instances. We present a case study on actual data from the cultural heritage world coming from the REMIAM project of the High Technology District for Cultural Heritage (Distretto ad Alta Tecnologia per i Beni Culturali - DATABENC).Through interviews with experts in the field, a series of valid questions were identified for the experts’ research work and the interrogation of the knowledge base.The questions were then converted into SQL and SPARQL queries to assess the correctness of the method. The ability of the generated ontology to infer new knowledge on accurate data in the RDB will also be highlighted.

Fabio Clarizia, Massimo De Santo, Rosario Gaeta, Rosalba Mosca
A Novel Writer Identification Approach for Greek Papyri Images

Papyrology is the field of study dedicated to ancient texts written on papyri. One significant challenge faced by papyrologists and paleographers is the identification of writers, also referred to as scribes, who penned the texts preserved on papyri. Traditionally, paleographers relied on qualitative assessments to differentiate between writers. However, in recent years, these manual techniques have been complemented by computer-based tools that enable the automated measurement of various quantities such as letter height and width, character spacing, inclination angles, abbreviations, and more. Digital palaeography has emerged as a new approach combining advanced Machine Learning (ML) algorithms with high-quality digital images. This fusion allows for extracting distinctive features from the manuscripts, which can be utilized for writer classification using ML algorithms or Deep Learning (DL) systems. Integrating powerful computational methods and digital imagery has opened up new avenues in palaeography, enabling more accurate and efficient analysis of ancient manuscripts. After applying image processing and segmentation techniques, we exploited the power of Convolutional Neural Networks to characterize a scribe’s handwriting.

Nicole Dalia Cilia, Tiziana D’Alessandro, Claudio De Stefano, Francesco Fontanella, Isabelle Marthot-Santaniello, Mario Molinara, Alessandra Scotto Di Freca
Convolutional Generative Model for Pixel–Wise Colour Specification for Cultural Heritage

Colour specification can be carried out using different instruments or tools. The biggest limitation of these existing instruments consists of the region in which they can be applied. Indeed, they can only work locally in small regions on the surface of the object under examination. This implicates a slow process, errors while repeating the procedure and sometimes the impossibility of measuring the colour depending on the object’s surface. We present a new way to perform colour specification in the CIELab colour space from RGB images by using Convolutional Generative Model that performs the transformation needed to remove all the shading effect on the image, producing an albedo image which is used to estimate the CIELab value for each pixel. In this work, we examine two different models one based on autoencoder and another based on GANs. In order to train and validate our models we present also a dataset of synthetic images which have been acquired using a Blender–based tool. The results obtained using our model on the generated dataset prove the performance of this method, which led to a low average colour error ( $$\varDelta E00$$ Δ E 00 ) for both the validation and test sets. Finally, a real-scenario test is conducted on the head of the god Hades and a half-bust depicting the goddess Persephone, both are from the archaeological Museum of Aidone (Italy).

Furnari Giuseppe, Anna Maria Gueli, Stanco Filippo, Dario Allegra
3D Modeling and Augmented Reality in Education: An Effective Application for the Museo dei Saperi e delle Mirabilia of the University of Catania

Augmented reality is the process of using technology to superimpose video, images, text or sound onto what a person can already see with their own eyes in the reality around them.It is enough to have a smartphone or tablet to change the reality in front of you. In fact, by means of an app, the user can display on his or her device content related to what is in front of him or her, accessing an ‘altered’ version of reality. Augmented Reality content can add value to any museum or art gallery and offer interactive solutions to entertain people and create rewarding engagement opportunities.This technology, applied to the field of Cultural Heritage and museum enjoyment, brings a significant enhancement to experiential feedback, attracting an ever-widening audience. In 2021, the National Gallery in London sought to take the collections of the National Gallery, the National Portrait Gallery and the Royal Academy of Arts beyond the museum walls with an Augmented Reality experience that the public could access via smartphones. Users used an app to activate artworks marked with QR codes and the initiative was very successful.The Museo dei Saperi e delle Mirabilia Siciliane at the University of Catania has set itself the goal of using this technology to display ‘digital versions’ of selected artefacts, bringing them to life in contexts outside the museum space. There are already many institutions using Augmented Reality and many more are being added all the time.

Germana Barone, Raissa Garozzo, Gloria Russo, Cettina Santagati, Diego Sinitò, Marilisa Yolanda Spironello, Filippo Stanco

Visual Processing of Digital Manuscripts: Workflows, Pipelines, Best Practices (ViDiScript)

Frontmatter
Writer Identification in Historical Handwritten Documents: A Latin Dataset and a Benchmark

Writer identification refers to the process of determining or attributing the authorship of a document to a specific individual through the analysis of various elements such as writing style, linguistic characteristics, and other textual features. This is a relevant task in heterogeneous fields such as cybersecurity, forensics, or linguistics and becomes particularly challenging when considering historical documents. In fact, the latter might present deterioration due to time, often lack signatures, and could be authored by multiple people. Complicating matters further, scribes were trained to mimic handwriting meticulously when copying manuscripts, making author identification of such documents even more difficult. In this context, this paper introduces a curated collection of Latin documents from the Genesis and Gospel of Matthew specifically gathered for the purpose of exploring the writer identification task. In particular, the dataset comprises over 400 pages, written by nine distinct persons. The primary objective is to explore the efficacy of state-of-the-art deep learning architectures in accurately ascribing historical texts to their rightful authors. To this end, this paper conducts extensive experiments, utilizing varying training set sizes and employing diverse pre-processing techniques to assess the performance and capabilities of these renowned models on the writer identification task while also providing the community with a baseline on the introduced collection.

Alessio Fagioli, Danilo Avola, Luigi Cinque, Emanuela Colombi, Gian Luca Foresti
Synthetic Lines from Historical Manuscripts: An Experiment Using GAN and Style Transfer

Given enough data of sufficient quality, HTR systems can achieve high accuracy, regardless of language, script or medium. Despite growing pooling of datasets, the question of the required quantity of training material still remains crucial for the transfer of models to out-of-domain documents, or the recognition of new scripts and under-resourced character classes. We propose a new data augmentation strategy, using generative adversarial networks (GAN). Inspired by synthetic lines generation for printed documents, our objective is to generate handwritten lines in order to massively produce data for a given style or under-resourced character class. Our approach, based on a variant of ScrabbleGAN, demonstrates the feasibility for various scripts, either in the presence of a high number and variety of abbreviations (Latin) and spellings or letter forms (Medieval French), in a situation of data scarcity (Armenian), or in the instance of a very cursive script (Arabic Maghribi). We then study the impact of synthetic line generation on HTR, by evaluating the gain for out-of-domain documents and under-resourced classes.

Chahan Vidal-Gorène, Jean-Baptiste Camps, Thibault Clérice
Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis

Semantic segmentation models have shown impressive performance in the context of historical document layout analysis, but their effectiveness is reliant on having access to a large number of high-quality annotated images for training. A popular approach to address the lack of training data in other domains is to rely on transfer learning to transfer the knowledge learned from a large-scale, general-purpose dataset (e.g. ImageNet) to a domain-specific task. However, this approach has been shown to lead to unsatisfactory results when the target task is completely unrelated to the data employed for the pre-training process, which is the case when working on document layout analysis. For this reason, in the present paper, we provide an overview of domain-specific transfer learning for document layout segmentation. In particular, we show how relying on document-related images for the pre-training process leads to consistently improved performance and faster convergence compared to training from scratch or even relying on a large, general purpose, dataset such as ImageNet.

Axel De Nardin, Silvia Zottin, Emanuela Colombi, Claudio Piciarelli, Gian Luca Foresti
Backmatter
Metadata
Title
Image Analysis and Processing - ICIAP 2023 Workshops
Editors
Gian Luca Foresti
Andrea Fusiello
Edwin Hancock
Copyright Year
2024
Electronic ISBN
978-3-031-51026-7
Print ISBN
978-3-031-51025-0
DOI
https://doi.org/10.1007/978-3-031-51026-7

Premium Partner