Skip to main content

2024 | Buch

Advances in Information Retrieval

46th European Conference on Information Retrieval, ECIR 2024, Glasgow, UK, March 24–28, 2024, Proceedings, Part V

herausgegeben von: Nazli Goharian, Nicola Tonellotto, Yulan He, Aldo Lipani, Graham McDonald, Craig Macdonald, Iadh Ounis

Verlag: Springer Nature Switzerland

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

The six-volume set LNCS 14608, 14609, 14609, 14610, 14611, 14612 and 14613 constitutes the refereed proceedings of the 46th European Conference on IR Research, ECIR 2024, held in Glasgow, UK, during March 24–28, 2024.

The 57 full papers, 18 finding papers, 36 short papers, 26 IR4Good papers, 18 demonstration papers, 9 reproducibility papers, 8 doctoral consortium papers, and 15 invited CLEF papers were carefully reviewed and selected from 578 submissions. The accepted papers cover the state of the art in information retrieval focusing on user aspects, system and foundational aspects, machine learning, applications, evaluation, new social and technical challenges, and other topics of direct or indirect relevance to search.

Inhaltsverzeichnis

Frontmatter

IR for Good Papers

Frontmatter
Measuring Bias in a Ranked List Using Term-Based Representations

In most recent studies, gender bias in document ranking is evaluated with the NFaiRR metric, which measures bias in a ranked list based on an aggregation over the unbiasedness scores of each ranked document. This perspective in measuring the bias of a ranked list has a key limitation: individual documents of a ranked list might be biased while the ranked list as a whole balances the groups’ representations. To address this issue, we propose a novel metric called TExFAIR (term exposure-based fairness), which is based on two new extensions to a generic fairness evaluation framework, attention-weighted ranking fairness (AWRF). TExFAIR assesses fairness based on the term-based representation of groups in a ranked list: (i) an explicit definition of associating documents to groups based on probabilistic term-level associations, and (ii) a rank-biased discounting factor (RBDF) for counting non-representative documents towards the measurement of the fairness of a ranked list. We assess TExFAIR on the task of measuring gender bias in passage ranking, and study the relationship between TExFAIR and NFaiRR. Our experiments show that there is no strong correlation between TExFAIR and NFaiRR, which indicates that TExFAIR measures a different dimension of fairness than NFaiRR. With TExFAIR, we extend the AWRF framework to allow for the evaluation of fairness in settings with term-based representations of groups in documents in a ranked list.

Amin Abolghasemi, Leif Azzopardi, Arian Askari, Maarten de Rijke, Suzan Verberne
Measuring Bias in Search Results Through Retrieval List Comparison

Many IR systems project harmful societal biases, including gender bias, in their retrieved contents. Uncovering and addressing such biases requires grounded bias measurement principles. However, defining reliable bias metrics for search results is challenging, particularly due to the difficulties in capturing gender-related tendencies in the retrieved documents. In this work, we propose a new framework for search result bias measurement. Within this framework, we first revisit the current metrics for representative search result bias (RepSRB) that are based on the occurrence of gender-specific language in the search results. Addressing their limitations, we additionally propose a metric for comparative search result bias (ComSRB) measurement and integrate it into our framework. ComSRB defines bias as the skew in the set of retrieved documents in response to a non-gendered query toward those for male/female-specific variations of the same query. We evaluate ComSRB against RepSRB on a recent collection of bias-sensitive topics and documents from the MS MARCO collection, using pre-trained bi-encoder and cross-encoder IR models. Our analyses show that, while existing metrics are highly sensitive to the wordings and linguistic formulations, the proposed ComSRB metric mitigates this issue by focusing on the deviations of a retrieval list from its explicitly biased variants, avoiding the need for sub-optimal content analysis processes.

Linda Ratz, Markus Schedl, Simone Kopeinik, Navid Rekabsaz
SALSA: Salience-Based Switching Attack for Adversarial Perturbations in Fake News Detection Models

Despite advances in fake news detection algorithms, recent research reveals that machine learning-based fake news detection models are still vulnerable to carefully crafted adversarial attacks. In this landscape, traditional methods, often relying on text perturbations or heuristic-based approaches, have proven insufficient, revealing a critical need for more nuanced and context-aware strategies to enhance the robustness of fake news detection. Our research identifies and addresses three critical areas: creating subtle perturbations, preserving core information while modifying sentence structure, and incorporating inherent interpretability. We propose SALSA, an adversarial Salience-based Switching Attack strategy that harnesses salient words, using similarity-based switching to address the shortcomings of traditional adversarial attack methods. Using SALSA, we perform a two-way attack: misclassifying real news as fake and fake news as real. Due to the absence of standardized metrics to evaluate adversarial attacks in fake news detection, we further propose three new evaluation metrics to gauge the attack’s success. Finally, we validate the transferability of our proposed attack strategy across attacker and victim models, demonstrating our approach’s broad applicability and potency. Code and data are available here at https://github.com/iamshnoo/salsa .

Chahat Raj, Anjishnu Mukherjee, Hemant Purohit, Antonios Anastasopoulos, Ziwei Zhu
Federated Conversational Recommender Systems

Conversational Recommender Systems (CRSs) have become increasingly popular as a powerful tool for providing personalized recommendation experiences. By directly engaging with users in a conversational manner to learn their current and fine-grained preferences, a CRS can quickly derive recommendations that are relevant and justifiable. However, existing CRSs typically rely on a centralized training and deployment process, which involves collecting and storing explicitly-communicated user preferences in a centralized repository. These fine-grained user preferences are completely human-interpretable and can easily be used to infer sensitive information (e.g., financial status, political stands, and health information) about the user, if leaked or breached. To address the user privacy concerns in CRS, we first define a set of privacy protection guidelines for preserving user privacy then propose a novel federated CRS framework that effectively reduces the risk of exposing user privacy. Through extensive experiments, we show that the proposed framework not only satisfies these user privacy protection guidelines, but also achieves competitive recommendation performance comparing to the state-of-the-art non-private conversational recommendation approach.

Allen Lin, Jianling Wang, Ziwei Zhu, James Caverlee
FakeClaim: A Multiple Platform-Driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned language model, Universal Sentence Encoder (USE), achieves a Macro F1 of 87%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion.

Gautam Kishore Shahi, Amit Kumar Jaiswal, Thomas Mandl
Countering Mainstream Bias via End-to-End Adaptive Local Learning

Collaborative filtering (CF) based recommendations suffer from mainstream bias – where mainstream users are favored over niche users, leading to poor recommendation quality for many long-tail users. In this paper, we identify two root causes of this mainstream bias: (i) discrepancy modeling, whereby CF algorithms focus on modeling mainstream users while neglecting niche users with unique preferences; and (ii) unsynchronized learning, where niche users require more training epochs than mainstream users to reach peak performance. Targeting these causes, we propose a novel end-To-end Adaptive Local Learning (TALL) framework to provide high-quality recommendations to both mainstream and niche users. TALL uses a loss-driven Mixture-of-Experts module to adaptively ensemble experts to provide customized local models for different users. Further, it contains an adaptive weight module to synchronize the learning paces of different users by dynamically adjusting weights in the loss. Extensive experiments demonstrate the state-of-the-art performance of the proposed model. Code and data are provided at https://github.com/JP-25/end-To-end-Adaptive-Local-Leanring-TALL- .

Jinhao Pan, Ziwei Zhu, Jianling Wang, Allen Lin, James Caverlee
Towards Optimizing Ranking in Grid-Layout for Provider-Side Fairness

Information access systems, such as search engines and recommender systems, order and position results based on their estimated relevance. These results are then evaluated for a range of concerns, including provider-side fairness: whether exposure to users is fairly distributed among items and the people who created them. Several fairness-aware ranking and re-ranking techniques have been proposed to ensure fair exposure for providers, but this work focuses almost exclusively on linear layouts in which items are displayed in single ranked list. Many widely-used systems use other layouts, such as the grid views common in streaming platforms, image search, and other applications. Providing fair exposure to providers in such layouts is not well-studied. We seek to fill this gap by providing a grid-aware re-ranking algorithm to optimize layouts for provider-side fairness by adapting existing re-ranking techniques to grid-aware browsing models, and an analysis of the effect of grid-specific factors such as device size on the resulting fairness optimization. Our work provides a starting point and identifies open gaps in ensuring provider-side fairness in grid-based layouts.

Amifa Raj, Michael D. Ekstrand
MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries

In the healthcare domain, summarizing medical questions posed by patients is critical for improving doctor-patient interactions and medical decision-making. Although medical data has grown in complexity and quantity, the current body of research in this domain has primarily concentrated on text-based methods, overlooking the integration of visual cues. Also prior works in the area of medical question summarisation have been limited to the English language. This work introduces the task of multimodal medical question summarization for codemixed input in a low-resource setting. To address this gap, we introduce the Multimodal Medical Codemixed Question Summarization (MMCQS) dataset, which combines Hindi-English codemixed medical queries with visual aids. This integration enriches the representation of a patient’s medical condition, providing a more comprehensive perspective. We also propose a framework named MedSumm that leverages the power of LLMs and VLMs for this task. By utilizing our MMCQS dataset, we demonstrate the value of integrating visual information from images to improve the creation of medically detailed summaries. This multimodal strategy not only improves healthcare decision-making but also promotes a deeper comprehension of patient queries, paving the way for future exploration in personalized and responsive medical care. Our dataset, code, and pre-trained models will be made publicly available. https://github.com/ArkadeepAcharya/MedSumm-ECIR2024

Akash Ghosh, Arkadeep Acharya, Prince Jha, Sriparna Saha, Aniket Gaudgaul, Rajdeep Majumdar, Aman Chadha, Raghav Jain, Setu Sinha, Shivani Agarwal
Improving Exposure Allocation in Rankings by Query Generation

Deploying methods that incorporate generated queries in their retrieval process, such as Doc2Query, has been shown to be effective for retrieving the most relevant documents for a user’s query. However, to the best of our knowledge, there has been no work yet on whether generated queries can also be used in the ranking process to achieve other objectives, such as ensuring a fair distribution of exposure in the ranking. Indeed, the amount of exposure that a document is likely to receive depends on the document’s position in the ranking, with lower-ranked documents having a lower probability of being examined by the user. While the utility to users remains the main objective of an Information Retrieval (IR) system, an unfair exposure allocation can lead to lost opportunities and unfair economic impacts for particular societal groups. Therefore, in this work, we conduct a first investigation into whether generating relevant queries can help to fairly distribute the exposure over groups of documents in a ranking. In our work, we build on the effective Doc2Query methods to selectively generate relevant queries for underrepresented groups of documents and use their predicted relevance to the original query in order to re-rank the underexposed documents. Our experiments on the TREC 2022 Fair Ranking Track collection show that using generated queries consistently leads to a fairer allocation of exposure compared to a standard ranking while still maintaining utility.

Thomas Jaenich, Graham McDonald, Iadh Ounis
The Open Web Index
Crawling and Indexing the Web for Public Use

Only few search engines index the Web at scale. Third parties who want to develop downstream applications based on web search fully depend on the terms and conditions of the few vendors. The public availability of the large-scale Common Crawl does not alleviate the situation, as it is often cheaper to crawl and index only a smaller collection focused on a downstream application scenario than to build and maintain an index for a general collection the size of the Common Crawl. Our goal is to improve this situation by developing the Open Web Index.The Open Web Index is a publicly funded basic infrastructure from which downstream applications will be able to select and compile custom indexes in a simple and transparent way. Our goal is to establish the Open Web Index along with associated data products as a new open web information intermediary. In this paper, we present our first prototype for the Open Web Index and our plans for future developments. In addition to the conceptual and technical background, we discuss how the information retrieval community can benefit from and contribute to the Open Web Index—for example, by providing resources, by providing pre-processing components and pipelines, or by creating new kinds of vertical search engines and test collections.

Gijs Hendriksen, Michael Dinzinger, Sheikh Mastura Farzana, Noor Afshan Fathima, Maik Fröbe, Sebastian Schmidt, Saber Zerhoudi, Michael Granitzer, Matthias Hagen, Djoerd Hiemstra, Martin Potthast, Benno Stein
A Conversational Robot for Children’s Access to a Cultural Heritage Multimedia Archive

In this paper we introduce a conversational robot designed to assist children in searching a museum’s cultural heritage video archive. The robot employs a form of Spoken Conversational Search to facilitate the clarification of children’s interest (their information need) in specific videos from the archive. Children are typically insufficiently supported in this process by common search technologies such as search-bar and keyboard, or one-shot voice interfaces. We present our approach, which leverages a knowledge-graph representation of the museum’s video archive to facilitate conversational search interactions and suggest content based on the interaction, in order to study information-seeking conversations with children. We plan to use the robot test-bed to investigate the effectiveness of conversational designs over one-shot voice interactions for clarifying children’s information needs in a museum context.

Thomas Beelen, Roeland Ordelman, Khiet P. Truong, Vanessa Evers, Theo Huibers
Towards Robust Expert Finding in Community Question Answering Platforms

This paper introduces TUEF, a topic-oriented user-interaction model for fair Expert Finding in Community Question Answering (CQA) platforms. The Expert Finding task in CQA platforms involves identifying proficient users capable of providing accurate answers to questions from the community. To this aim, TUEF improves the robustness and credibility of the CQA platform through a more precise Expert Finding component. The key idea of TUEF is to exploit diverse types of information, specifically, content and social information, to identify more precisely experts thus improving the robustness of the task. We assess TUEF through reproducible experiments conducted on a large-scale dataset from StackOverflow. The results consistently demonstrate that TUEF outperforms state-of-the-art competitors while promoting transparent expert identification.

Maddalena Amendola, Andrea Passarella, Raffaele Perego

Demo Papers

Frontmatter
QuantPlorer: Exploration of Quantities in Text

Quantities play an important role in documents of various domains such as finance, business, and medicine. Despite the role of quantities, only a limited number of works focus on their extraction from text and even less on creating respective user-friendly document exploration frameworks. In this work, we introduce QuantPlorer, an online quantity extractor and explorer. Through an intuitive web interface, QuantExplorer extracts quantities from unstructured text, enables users to interactively investigate and visualize quantities in text, and it supports filtering based on diverse features, i.e., value ranges, units, trends, and concepts. Furthermore, users can explore and visualize distributions of values for specific units and concepts. Our demonstration is available at https://quantplorer.ifi.uni-heidelberg.de/ .

Satya Almasian, Alexander Kosnac, Michael Gertz
Interactive Document Summarization

With the advent of modern chatbots, automatic summarization is becoming common practice to quicken access to information. However the summaries they generate can be biased, unhelpful or untruthful. Hence, in sensitive scenarios, extractive summarization remains a more reliable approach. In this paper we present an original extractive method combining a GNN-based encoder and a RNN-based decoder, coupled with a user-friendly interface that allows for interactive summarization.

Raoufdine Said, Adrien Guille
KnowFIRES: A Knowledge-Graph Framework for Interpreting Retrieved Entities from Search

Entity retrieval is essential in information access domains where people search for specific entities, such as individuals, organizations, and places. While entity retrieval is an active research topic in Information Retrieval, it is necessary to explore the explainability and interpretability of them more extensively. KnowFIRES addresses this by offering a knowledge graph-based visual representation of entity retrieval results, focusing on contrasting different retrieval methods. KnowFIRES allows users to better understand these differences through the juxtaposition and superposition of retrieved sub-graphs. As part of our demo, we make KnowFIRES (Demo: http://knowfires.live , Source: https://github.com/kiarashgl/KnowFIRES ) web interface and its source code publicly available (A demonstration of the tool: https://www.youtube.com/watch?v=9u-877ArNYE ).

Negar Arabzadeh, Kiarash Golzadeh, Christopher Risi, Charles L. A. Clarke, Jian Zhao
Physio: An LLM-Based Physiotherapy Advisor

The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-being. In this paper, we present Physio, a chat-based application for physical rehabilitation. Physio is capable of making an initial diagnosis while citing reliable health sources to support the information provided. Furthermore, drawing upon external knowledge databases, Physio can recommend rehabilitation exercises and over-the-counter medication for symptom relief. By combining these features, Physio can leverage the power of generative models for language processing while also conditioning its response on dependable and verifiable sources. A live demo of Physio is available at https://physio.inesctec.pt .

Rúben Almeida, Hugo Sousa, Luís F. Cunha, Nuno Guimarães, Ricardo Campos, Alípio Jorge
MathMex: Search Engine for Math Definitions

This paper introduces MathMex, an open-source search engine for math definitions. With MathMex, users can search for definitions of mathematical concepts extracted from a variety of data sources and types including text, images, and videos. Definitions are extracted using a fine-tuned SciBERT classifier, and the search is done with a fine-tuned Sentence-BERT model. MathMex interface provides means of issuing a text, formula, and combined queries and logging features.

Shea Durgin, James Gore, Behrooz Mansouri
XSearchKG: A Platform for Explainable Keyword Search over Knowledge Graphs

One of the most user-friendly methods to search over knowledge graphs is the usage of keyword queries. They offer a simple text input that requires no technical or domain knowledge. Most existing approaches for keyword search over graph-shaped data rely on graph traversal algorithms to find connections between keywords. They mostly concentrate on achieving efficiency and effectiveness (accurate ranking), but ignore usability, visualization, and interactive result presentation. All of which offer better support to non-experienced users. Moreover, it is not sufficient to just show a raw list of results, but it is also important to explain why a specific result is proposed. This not only provides an abstract view of the capabilities and limitations of the search system, but also increases confidence and helps discover new interesting facts. We propose XSearchKG, a platform for explainable keyword search over knowledge graphs that extends our previously proposed graph traversal-based approach and complements it with an interactive user interface for results explanation and browsing.

Leila Feddoul, Martin Birke, Sirko Schindler
Result Assessment Tool: Software to Support Studies Based on Data from Search Engines

The Result Assessment Tool (RAT) is a software toolkit for conducting research with results from commercial search engines and other information retrieval (IR) systems. The software integrates modules for study design and management, automatic collection of search results via web scraping, and evaluation of search results in an assessment interface using different question types. RAT can be used for conducting a wide range of studies, including retrieval effectiveness studies, classification studies, and content analyses.

Sebastian Sünkler, Nurce Yagci, Sebastian Schultheiß, Sonja von Mach, Dirk Lewandowski
eval-rationales: An End-to-End Toolkit to Explain and Evaluate Transformers-Based Models

State-of-the-art (SOTA) transformer-based models in the domains of Natural Language Processing (NLP) and Information Retrieval (IR) are often characterized by their opacity in terms of decision-making processes. This limitation has given rise to various techniques for enhancing model interpretability and the emergence of evaluation benchmarks aimed at designing more transparent models. These techniques are primarily focused on developing interpretable models with the explicit aim of shedding light on the rationales behind their predictions. Concurrently, evaluation benchmarks seek to assess the quality of these rationales provided by the models. Despite the availability of numerous resources for using these techniques and benchmarks independently, their seamless integration remains a non-trivial task. In response to this challenge, this work introduces an end-to-end toolkit that integrates the most common techniques and evaluation approaches for interpretability. Our toolkit offers user-friendly resources facilitating fast and robust evaluations.

Khalil Maachou, Jesús Lovón-Melgarejo, Jose G. Moreno, Lynda Tamine
Selma: A Semantic Local Code Search Platform

Searching for the right code snippet is cumbersome and not a trivial task. Online platforms such as Github.com or searchcode.com provide tools to search, but they are limited to publicly available and internet-hosted code. However, during the development of research prototypes or confidential tools, it is preferable to store source code locally. Consequently, the use of external code search tools becomes impractical. Here, we present Selma (Code and Videos: https://anreu.github.io/selma ): a local code search platform that enables term-based and semantic retrieval of source code. Selma searches code and comments, annotates undocumented code to enable term-based search in natural language, and trains neural models for code retrieval.

Anja Reusch, Guilherme C. Lopes, Wilhelm Pertsch, Hannes Ueck, Julius Gonsior, Wolfgang Lehner
VADIS – A Variable Detection, Interlinking and Summarization System

The VADIS system addresses the demand of providing enhanced information access in the domain of the social sciences. This is achieved by allowing users to search and use survey variables in context of their underlying research data and scholarly publications which have been interlinked with each other.

Yavuz Selim Kartal, Muhammad Ahsan Shahid, Sotaro Takeshita, Tornike Tsereteli, Andrea Zielinski, Benjamin Zapilko, Philipp Mayr
ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction

The escalating volume of textual data necessitates adept and scalable Information Extraction (IE) systems in the field of Natural Language Processing (NLP) to analyse massive text collections in a detailed manner. While most deep learning systems are designed to handle textual information as it is, the gap in the existence of the interface between a document and the annotation of its parts is still poorly covered. Concurrently, one of the major limitations of most deep-learning models is a constrained input size caused by architectural and computational specifics. To address this, we introduce ARElight $$^1$$ 1 , a system designed to efficiently manage and extract information from sequences of large documents by dividing them into segments with mentioned object pairs. Through a pipeline comprising modules for text sampling, inference, optional graph operations, and visualisation, the proposed system transforms large volumes of text in a structured manner. Practical applications of ARElight are demonstrated across diverse use cases, including literature processing and social network analysis.( $$^1$$ 1 https://github.com/nicolay-r/ARElight )

Nicolay Rusnachenko, Huizhi Liang, Maksim Kalameyets, Lei Shi
Translating Justice: A Cross-Lingual Information Retrieval System for Maltese Case Law Documents

In jurisdictions adhering to the Common Law system, previous court judgements inform future rulings based on the Stare Decisis principle. For enhanced accessibility and retrieval of such judgements, we introduced a cross-lingual Legal Information Retrieval system prototype focused on Malta’s small claims tribunal. This system utilises Neural Machine Translation (NMT) to automatically translate Maltese judgement documents into English, enabling dual-language querying. Additionally, it employs Rhetorical Role Labelling (RRL) on sentences within the judgements, allowing for targeted searches based on specific rhetorical roles. Developed without depending on high-end resources or commercial systems, this prototype showcases the potential of AI in advancing legal research tools and making legal documents more accessible, especially for non-native speakers.

Joel Azzopardi
A Conversational Search Framework for Multimedia Archives

Conversational search system seek to support users in their search activities to improve the effectiveness and efficiency of search while reducing their cognitive load. The challenges of multimedia search mean that search supports provided by conversational search have the potential to improve the user search experience. For example, by assisting users in constructing better queries and making more informed decisions in relevance feedback stages whilst searching. However, previous research on conversational search has been focused almost exclusively on text archives. This demonstration illustrates the potential for the application of conversational methods in multimedia search. We describe a framework to enable multimodal conversational search for use with multimedia archives. Our current prototype demonstrates the use of an conversational AI assistant during the multimedia information retrieval process for both image and video collections.

Anastasia Potyagalova, Gareth J. F. Jones
Building and Evaluating a WebApp for Effortless Deep Learning Model Deployment

In the field of deep learning, particularly Natural Language Processing (NLP), model deployment is a key process for public testing and analysis. However, developing a deployment pipeline is often difficult and time-consuming. To address this challenge, we developed SUD.DL, a web application to simplify the model deployment process for NLP researchers. Our application provides significant improvements in deployment efficiency, functionality discoverability, and deployment functionality, allowing NLP researchers to quickly deploy and test models on the web.

Ruikun Wu, Jiaxuan Han, Jerome Ramos, Aldo Lipani
indxr: A Python Library for Indexing File Lines

indxr is a Python utility for indexing file lines that allows users to dynamically access specific ones, avoiding loading the entire file in the computer’s main memory. indxr addresses two main issues related to working with textual data. First, users who do not have plenty of RAM at their disposal may struggle to work with large datasets. Since indxr allows accessing specific lines without loading entire files, users can work with datasets that do not fit into their computer’s main memory. For example, it enables users to perform complex tasks with limited RAM without noticeable slowdowns, such as pre-processing texts and training Neural models for Information Retrieval or other tasks. Second, indxr reduces the burden of working with datasets split among multiple files by allowing users to load specific data by providing the related line numbers or the identifiers of the information they describe, thus providing convenient access to such data. This paper overviews indxr’s main features. ( https://github.com/AmenRa/indxr ).

Elias Bassani, Nicola Tonellotto
SciSpace Literature Review: Harnessing AI for Effortless Scientific Discovery

In the rapidly evolving landscape of academia, the scientific research community barely copes with the challenges posed by a surging volume of scientific literature. Nevertheless, discovering research remains an important step in the research workflow which is also proven to be a challenging one to automate. We present Scispace Literature Review, a sophisticated, multi-faceted tool that serves as a comprehensive solution to streamline the literature review process. By leveraging the state-of-the-art methods in vector-based search, reranking, and large language models, the tool delivers features like customizable search results, data exintegration with an AI assistant, multi-language support, top papers insights, and customizable results columns to cater a researcher’s requirements, and accelerate literature exploration. Resources for simplified sharing and documentation further enhance the scope and depth and breadth of research. We demonstrate the extensive use and popularity of the tool among researchers with various metrics, highlighting its value as a resource to elevate scientific literature review. This tool can be tried using this link: https://typeset.io/search .

Siddhant Jain, Asheesh Kumar, Trinita Roy, Kartik Shinde, Goutham Vignesh, Rohan Tondulkar
Displaying Evolving Events Via Hierarchical Information Threads for Sensitivity Review

Many government documents contain sensitive (e.g. personal or confidential) information that must be protected before the documents can be released to the public. However, reviewing documents to identify sensitive information is a complex task, which often requires analysing multiple related documents that mention a particular context of sensitivity. For example, coherent information about evolving events, such as legal proceedings, is often dispersed across documents produced at different times. In this paper, we present a novel system for sensitivity review, which automatically identifies hierarchical information threads to capture diverse aspects of an event. In particular, our system aims to assist sensitivity reviewers in making accurate sensitivity judgements efficiently by presenting hierarchical information threads that provide coherent and chronological information about an event’s evolution. Through a user study, we demonstrate our system’s effectiveness in improving the sensitivity reviewers’ reviewing speed and accuracy compared to the traditional document-by-document review process.

Hitarth Narvala, Graham McDonald, Iadh Ounis
FAR-AI: A Modular Platform for Investment Recommendation in the Financial Domain

Financial asset recommendation (FAR) is an emerging sub-domain of the wider recommendation field that is concerned with recommending suitable financial assets to customers, with the expectation that those customers will invest capital into a subset of those assets. FAR is a particularly interesting sub-domain to explore, as unlike traditional movie or product recommendation, FAR solutions need to analyse and learn from a combination of time-series pricing data, company fundamentals, social signals and world events, relating the patterns observed to multi-faceted customer representations comprising profiling information, expectations and past investments. In this demo we will present a modular FAR platform; referred to as FAR-AI, with the goal of raising awareness and building a community around this emerging domain, as well as illustrate the challenges, design considerations and new research directions that FAR offers. The demo will comprise two components: 1) we will present the architecture of FAR-AI to attendees, to enable them to understand the how’s and the why’s of developing a FAR system; and 2) a live demonstration of FAR-AI as a customer-facing product, highlighting the differences in functionality between FAR solutions and traditional recommendation scenarios. The demo is supplemented by online-tutorial materials, to enable attendees new to this space to get practical experience with training FAR models. VIDEO URL .

Javier Sanz-Cruzado, Edward Richards, Richard McCreadie

Industry Papers

Frontmatter
Lottery4CVR: Neuron-Connection Level Sharing for Multi-task Learning in Video Conversion Rate Prediction

As a fundamental task of industrial ranking systems, conversion rate (CVR) prediction is suffering from data sparsity problems. Most conventional CVR modeling leverages Click-through rate (CTR) &CVR multitask learning because CTR involves far more samples than CVR. However, typical coarse-grained layer-level sharing methods may introduce conflicts and lead to performance degradation, since not every neuron or neuron connection in one layer should be shared between CVR and CTR tasks. This is because users may have different fine-grained content feature preferences between deep consumption and click behaviors, represented by CVR and CTR, respectively. To address this sharing &conflict problem, we propose a neuron-connection level knowledge sharing. We start with an over-parameterized base network from which CVR and CTR extract their own subnetworks. The subnetworks have partially overlapped neuron connections which correspond to the sharing knowledge, and the task-specific neuron connections are utilized to alleviate the conflict problem. As far as we know, this is the first time that a neuron-connection level sharing is proposed in CVR modeling. Experiments on the Tencent video platform demonstrate the superiority of the method, which has been deployed serving major traffic. (The source code is available at https://github.com/xuanjixiao/onerec/tree/main/lt4rec ).

Xuanji Xiao, Jimmy Chen, Yuzhen Liu, Xing Yao, Pei Liu, Chaosheng Fan
Semantic Content Search on IKEA.com

In this paper, we present an approach to content search. The aim is to increase customer engagement with content recommendations on IKEA.com . As an alternative to Boolean search, we introduce a method based on semantic textual similarity between content pages and search queries. Our approach improves the relevance of search results by a 2.95% increase in click-through rate in an online A/B test.

Mateusz Slominski, Ezgi Yıldırım, Martin Tegner
Let’s Get It Started: Fostering the Discoverability of New Releases on Deezer

This paper presents our recent initiatives to foster the discoverability of new releases on the music streaming service Deezer. After introducing our search and recommendation features dedicated to new releases, we outline our shift from editorial to personalized release suggestions using cold start embeddings and contextual bandits. Backed by online experiments, we discuss the advantages of this shift in terms of recommendation quality and exposure of new releases on the service.

Léa Briand, Théo Bontempelli, Walid Bendada, Mathieu Morlon, François Rigaud, Benjamin Chapus, Thomas Bouabça, Guillaume Salha-Galvan
Variance Reduction in Ratio Metrics for Efficient Online Experiments

Online controlled experiments, such as A/B-tests, are commonly used by modern tech companies to enable continuous system improvements. Despite their paramount importance, A/B-tests are expensive: by their very definition, a percentage of traffic is assigned an inferior system variant. To ensure statistical significance on top-level metrics, online experiments typically run for several weeks. Even then, a considerable amount of experiments will lead to inconclusive results (i.e. false negatives, or type-II error). The main culprit for this inefficiency is the variance of the online metrics. Variance reduction techniques have been proposed in the literature, but their direct applicability to commonly used ratio metrics (e.g. click-through rate or user retention) is limited.In this work, we successfully apply variance reduction techniques to ratio metrics on a large-scale short-video platform: ShareChat. Our empirical results show that we can either improve A/B-test confidence in 77% of cases, or can retain the same level of confidence with 30% fewer data points. Importantly, we show that the common approach of including as many covariates as possible in regression is counter-productive, highlighting that control variates based on Gradient-Boosted Decision Tree predictors are most effective. We discuss the practicalities of implementing these methods at scale and showcase the cost reduction they beget.

Shubham Baweja, Neeti Pokharna, Aleksei Ustimenko, Olivier Jeunen
Augmenting KG Hierarchies Using Neural Transformers

This work leverages neural transformers to generate hierarchies in an existing knowledge graph. For small ( $${<}$$ < 10,000 node) domain-specific KGs, we find that a combination of few-shot prompting with one-shot generation works well, while larger KG may require cyclical generation. Hierarchy coverage increased by 98% for intents and 95% for colors.

Sanat Sharma, Mayank Poddar, Jayant Kumar, Kosta Blank, Tracy King
Incorporating Query Recommendation for Improving In-Car Conversational Search

Retrieval-augmented generation has become an effective mechanism for conversational systems in domain-specific settings. Retrieval of a wrong document due to the lack of context from the user utterance may lead to wrong answer generation. Such an issue may reduce the user engagement and thereby the system reliability. In this paper, we propose a context-guided follow-up question recommendation to internally improve the document retrieval in an iterative approach for developing an in-car conversational system. Specifically, a user utterance is first reformulated, given the context of the conversation to facilitate improved understanding to the retriever. In the cases, where the documents retrieved by the retriever are not relevant enough for answering the user utterance, we employ a large language model (LLM) to generate question recommendation which is then utilized to perform a refined retrieval. An empirical evaluation confirms the effectiveness of our proposed approaches in in-car conversations, achieving 48% and 22% improvement in the retrieval and system generated responses, respectively, against baseline approaches.

Md. Rashad Al Hasan Rony, Soumya Ranjan Sahoo, Abbas Goher Khan, Ken E. Friedl, Viju Sudhi, Christian Süß

Doctoral Consortium Papers

Frontmatter
Semantic Search in Archive Collections Through Interpretable and Adaptable Relation Extraction About Person and Places

In recent years, libraries and archives have undertaken numerous campaigns to digitise their collections.

Nicolas Gutehrlé
Document Level Event Extraction from Narratives

One of the fundamental tasks in Information Extraction (IE) is Event Extraction (EE), an extensively studied and challenging task [13, 15], which aims to identify and classify events from the text.

Luís Filipe Cunha
Effective and Efficient Transformer Models for Sequential Recommendation

The focus of our work is sequential recommender systems. Sequential recommender systems use ordered sequences of user-item interactions to predict future interactions of the user.

Aleksandr V. Petrov
Reproduction and Simulation of Interactive Retrieval Experiments

The reproducibility crisis, spanning across various scientific fields, substantially affects information retrieval research [1].

Jana Isabelle Friese
Cascading Ranking Pipelines for Sensitivity-Aware Search

Search engines are designed to make information accessible. However, some information should not be accessible, such as documents concerning citizenship applications or personal information. This sensitive information is often found interspersed with other potentially useful non-sensitive information. As such, collections containing sensitive information cannot be made searchable due to the risk of revealing sensitive information. The development of search engines capable of safely searching collections containing sensitive information to provide relevant and non-sensitive information would allow previously hidden collections to be made available. This work aims to develop sensitivity-aware search engines via two-stage cascading retrieval pipelines.

Jack McKechnie
Analyzing Mathematical Content for Plagiarism and Recommendations

Defined as “the use of ideas, concepts, words, or structures without appropriately acknowledging the source to benefit in a setting where originality is expected" [6], plagiarism poses a severe concern in the rapidly increasing number of scientific publications.

Ankit Satpute
Shuffling a Few Stalls in a Crowded Bazaar: Potential Impact of Document-Side Fairness on Unprivileged Info-Seekers

Information systems rely on algorithmic ranking to ascertain expected relevance. Concerns about this strategy have resulted in the emergence of a field of inquiry referred to as fair ranking. Within this field, the aim varies between one-sided and two-sided fairness across automatically generated rankings. But research has focused primarily on fairness among document providers as opposed to fairness among searchers. Concerns have already been raised about the present framing of fairness. In the following line of research, a novel framing concern is introduced, whereby researchers may fail to consider the broader context of search engine usage among protected groups of searchers.

Seán Healy
Knowledge Transfer from Resource-Rich to Resource-Scarce Environments

Resource-scarce environments have limited data, creating barriers and suboptimal experiences for users, while resource-rich environments are well-stocked with comprehensive information.

Negin Ghasemi

Tutorials

Frontmatter
PhD Candidacy: A Tutorial on Overcoming Challenges and Achieving Success

Undertaking a PhD is a demanding yet rewarding experience. PhD candidates develop a deep understanding of their research topic and acquire a wide range of skills, including (i) formulating research questions; (ii) conducting research ethically and rigorously; (iii) communicating research findings effectively to both academic and non-academic audiences alike; (iv) forging a profile as an independent researcher; and (v) developing a teaching portfolio. PhD candidates inevitably experience challenges during their candidature. These challenges can be overcome by applying various techniques to adapt and learn from these experiences. This tutorial introduces strategies to help them advance in the PhD process. It is presented by two early career researchers in information retrieval, who have the unique perspective of being close enough to their time as PhD candidates to remember the highs and lows of PhD life yet far enough removed from the process to reflect on their experiences and provide insights. The tutorial will empower attendees to share, review, and refine productivity methods for their PhD journey. It provides a non-judgemental platform for open discussions led by the presenters.

Johanne R. Trippas, David Maxwell
Explainable Recommender Systems with Knowledge Graphs and Language Models

In this tutorial, we delve into recent advances in explainable recommendation using Knowledge Graphs (KGs). The session begins by introducing the fundamental principles behind the increasing adoption of KGs in modern recommender systems. Then, the tutorial explores recent techniques that leverage KGs as an input for language models tailored to explainable recommendation, describing also data types, methods, and evaluation protocols and metrics. Conceptual elements are complemented with hands-on sessions, providing practical implementations using open-source tools and public datasets. Concluding with a comprehensive case study in the education domain as a recap, the tutorial analyses emerging issues and outlines prospective trajectories in this field. The tutorial website is available at https://explainablerecsys.github.io/ecir2024/ .

Giacomo Balloccu, Ludovico Boratto, Gianni Fenu, Francesca Maridina Malloci, Mirko Marras
Quantum Computing for Information Retrieval and Recommender Systems

Quantum Computing (QC) is a research field that has been in the limelight in recent years. In fact, many researchers and practitioners believe that it can provide benefits in terms of efficiency and effectiveness when employed to solve certain computationally intensive tasks. In Information Retrieval (IR) and Recommender Systems (RS) we are required to process very large and heterogeneous datasets by means of complex operations, it is natural therefore to wonder whether QC could also be applied to boost their performance. The goal of this tutorial is to show how QC works to an audience that is not familiar with the technology, as well as how to apply the QC paradigm of Quantum Annealing (QA) to solve practical problems that are currently faced by IR and RS systems. During the tutorial, participants will be provided with the fundamentals required to understand QC and to apply it in practice by using a real D-Wave quantum annealer through APIs.

Maurizio Ferrari Dacrema, Andrea Pasin, Paolo Cremonesi, Nicola Ferro
Recent Advances in Generative Information Retrieval

Generative retrieval (GR) has become a highly active area of information retrieval that has witnessed significant growth recently. Compared to the traditional “index-retrieve-then-rank” pipeline, the GR paradigm aims to consolidate all information within a corpus into a single model. Typically, a sequence-to-sequence model is trained to directly map a query to its relevant document identifiers (i.e., docids). This tutorial offers an introduction to the core concepts of the novel GR paradigm and a comprehensive overview of recent advances in its foundations and applications. We start by providing preliminary information covering foundational aspects and problem formulations of GR. Then, our focus shifts towards recent progress in docid design, training approaches, inference strategies, and applications of GR. We end by outlining remaining challenges and issuing a call for future GR research. This tutorial is intended to be beneficial to both researchers and industry practitioners interested in developing novel GR solutions or applying them in real-world scenarios.

Yubao Tang, Ruqing Zhang, Zhaochun Ren, Jiafeng Guo, Maarten de Rijke
Transformers for Sequential Recommendation

Sequential recommendation is a recommendation problem that aims to predict the next item in the sequence of user-item interactions. Sequential recommendation is similar to language modelling in terms of learning sequence structure; therefore, variants of the Transformer architecture, which has recently become mainstream in language modelling, also achieved state-of-the-art performance in sequential recommendation. However, despite similarities, training Transformers for recommendation models may be tricky: most recommendation datasets have their unique item sets, and therefore, the pre-training/finetuning approach, which is very successful for training language models, has limited applications for recommendations. Moreover, a typical recommender system has to work with millions of items, much larger than the vocabulary size of language models. In this tutorial, we cover adaptations of Transformers for sequential recommendation and techniques that help to mitigate the training challenges. The half-day (3 h + a break) tutorial consists of two sessions. The first session provides a background of the Transformer architecture and its adaptations to Recommendation scenarios. It covers classic Transformer-based models, such as SASRec and BERT4Rec, their architectures, training tasks and loss functions. In this session, we also discuss the specifics of training these models with large datasets and discuss negative sampling and the mitigation problem of the overconfidence problem caused by negative sampling. We also discuss the problem of the large item embedding tensor and the approaches to mitigate this problem, allowing training of the models even with very large item catalogues. In the second part of the tutorial, we focus specifically on modern generative transformer-based models for sequential recommendation. We discuss specifics of generative models for sequential recommending, such as item ID representation and recommendation list generation strategies. We also cover modern adaptations of large language models (LLMs) to recommender systems and discuss concrete examples, such as the P5 model. We conclude the session with our vision for the future development of the recommender systems field in the era of Large Language Models.

Aleksandr V. Petrov, Craig Macdonald
Affective Computing for Social Good Applications: Current Advances, Gaps and Opportunities in Conversational Setting

Affective computing involves examining and advancing systems and devices capable of identifying, comprehending, processing, and emulating human emotions, sentiment, politeness and personality characteristics. This is an ever-expanding multidisciplinary domain that investigates how technology can contribute to the comprehension of human affect, how affect can influence interactions between humans and machines, how systems can be engineered to harness affect for enhanced capabilities, and how integrating affective strategies can revolutionize interactions between humans and machines. Recognizing the fact that affective computing encompasses disciplines such as computer science, psychology, and cognitive science, this tutorial aims to delve into the historical underpinnings and overarching objectives of affective computing, explore various approaches for affect detection and generation, its practical applications across diverse areas, including but not limited to social good (like persuasion, therapy and support, etc.), address ethical concerns, and outline potential future directions.

Priyanshu Priya, Mauajama Firdaus, Gopendra Vikram Singh, Asif Ekbal
Query Performance Prediction: From Fundamentals to Advanced Techniques

Query performance prediction (QPP) is a core task in information retrieval (IR) that aims at predicting the retrieval quality for a given query without relevance judgments. QPP has been investigated for decades and has witnessed a surge in research activity in recent years; QPP has been shown to benefit various aspects, e.g., improving retrieval effectiveness by selecting the most effective ranking function per query [5, 7]. Despite its importance, there is no recent tutorial to provide a comprehensive overview of QPP techniques in the era of pre-trained/large language models or in the scenario of emerging conversational search (CS); In this tutorial, we have three main objectives. First, we aim to disseminate the latest advancements in QPP to the IR community. Second, we go beyond investigating QPP in ad-hoc search and cover QPP for CS. Third, the tutorial offers a unique opportunity to bridge the gap between theory and practice; we aim to equip participants with the essential skills and insights needed to navigate the evolving landscape of QPP, ultimately benefiting both researchers and practitioners in the field of IR and encouraging them to work around the future avenues on QPP.

Negar Arabzadeh, Chuan Meng, Mohammad Aliannejadi, Ebrahim Bagheri

Workshops

Frontmatter
The 7th International Workshop on Narrative Extraction from Texts: Text2Story 2024

The Text2Story Workshop series, dedicated to Narrative Extraction from Texts, has been running successfully since 2018. Over the past six years, significant progress, largely propelled by Transformers and Large Language Models, has advanced our understanding of natural language text. Nevertheless, the representation, analysis, generation, and comprehensive identification of the different elements that compose a narrative structure remains a challenging objective. In its seventh edition, the workshop strives to consolidate a common platform and a multidisciplinary community for discussing and addressing various issues related to narrative extraction tasks. In particular, we aim to bring to the forefront the challenges involved in understanding narrative structures and integrating their representation into established frameworks, as well as in modern architectures (e.g., transformers) and AI-powered language models (e.g., chatGPT) which are now common and form the backbone of almost every IR and NLP application. Text2Story encompasses sessions covering full research papers, work-in-progress, demos, resources, position and dissemination papers, along with keynote talks. Moreover, there is dedicated space for informal discussions on methods, challenges, and the future of research in this dynamic field.

Ricardo Campos, Alípio Jorge, Adam Jatowt, Sumit Bhatia, Marina Litvak
KEIR @ ECIR 2024: The First Workshop on Knowledge-Enhanced Information Retrieval

The infusion of external knowledge bases into IR models can provide enhanced ranking results and greater interpretability, offering substantial advancements in the field. The first workshop on Knowledge-Enhanced Information Retrieval (KEIR @ ECIR 2024) will serve as a platform to bring together researchers from academia and industry to explore and discuss various aspects of knowledge-enhanced information retrieval systems, such as models, techniques, data collection and evaluation. The workshop aims to not only deliberate upon the advantages and hurdles intrinsic to the development of knowledge-enhanced pretrained language models, IR models and recommendation models but also to facilitate in-depth discussions concerning the same.

Zaiqiao Meng, Shangsong Liang, Xin Xin, Gianluca Moro, Evangelos Kanoulas, Emine Yilmaz
ROMCIR 2024: Overview of the 4th Workshop on Reducing Online Misinformation Through Credible Information Retrieval

In the realm of the Social Web, we are continuously surrounded by information pollution, posing significant threats to both individuals and society as a whole. Instances of false news, for instance, wield the power to sway public opinion on matters of politics and finance. Deceptive reviews can either bolster or tarnish the reputation of businesses, while unverified medical advice may steer people toward harmful health practices. In light of this challenging landscape, it has become imperative to ensure that users have access to both topically relevant and truthful information that does not warp their perception of reality, and there has been a surge of interest in various strategies to combat disinformation through different contexts and multiple tasks. The purpose of the ROMCIR Workshop, for some years now, is precisely that of engaging the Information Retrieval community to explore potential solutions that extend beyond conventional misinformation detection approaches. Key objectives include integrating information truthfulness as a fundamental dimension of relevance within Information Retrieval Systems (IRSs) and ensuring that truthful search results are also explainable to IRS users. Moreover, it is essential to evaluate the role of generative models such as Language Models (LLMs) in inadvertently amplifying misinformation problems, and how they can be used to support IRSs.

Marinella Petrocchi, Marco Viviani
1 Workshop on Information Retrieval for Understudied Users (IR4U2)

Information Retrieval (IR) remains an active, fast-paced area of research. However, most advances in IR have predominantly benefited the so-called “classical” users, e.g., English-speaking adults. We envision IR4U2as a forum to spotlight efforts that, while sparse, consider diverse, and often understudied, user groups when designing, developing, assessing, and deploying the IR technologies that directly impact them. The key objectives for IR4U2 are: (1) raise awareness about ongoing efforts focused on IR technologies designed for and used by often understudied user groups, (2) identify challenges and open issues impacting this area of research, (3) ignite discussions to identify common frameworks for future research, and (4) enable cross-fertilization and community-building by sharing lessons learned from research catering to different audiences by researchers and (industry) practitioners across various disciplines.

Maria Soledad Pera, Federica Cena, Theo Huibers, Monica Landoni, Noemi Mauro, Emiliana Murgia
First International Workshop on Graph-Based Approaches in Information Retrieval (IRonGraphs 2024)

In the dynamic field of information retrieval, the adoption of graph-based approaches has become a notable research trend. Fueled by the growing research on Knowledge Graphs and Graph Neural Networks, these approaches rooted in graph theory have shown significant promise in enhancing the effectiveness and relevance of information retrieval results. With this motivation in mind, this workshop serves as a platform, bringing together researchers and practitioners from diverse backgrounds, to delve into and discuss the integration of modern graph-based methodologies into information retrieval methods. The workshop website is available at https://irongraphs.github.io/ecir2024/ .

Ludovico Boratto, Daniele Malitesta, Mirko Marras, Giacomo Medda, Cataldo Musto, Erasmo Purificato
The Search Futures Workshop

The field and community of Information Retrieval (IR) are changing and evolving in response to the latest developments and advances in Artificial Intelligence (AI) and research culture. As the field and community re-oriented and re-consider its positioning within computing and information sciences more generally – it is timely to gather and discuss more seriously our field’s vision for the future – the challenges and threats that the community and field faces – along with the bold new research questions and problems that are arising and emerging as we re-imagine search. This workshop aims to provide a forum for the IR community to voice and discuss their concerns and pitch proposals for building and strengthening the field and community.

Leif Azzopardi, Charles L. A. Clarke, Paul B. Kantor, Bhaskar Mitra, Johanne R. Trippas, Zhaochun Ren
The First International Workshop on Open Web Search (WOWS)

We organize the first international Workshop on Open Web Search (WOWS) at ECIR 2024 with two calls for contributions. The first call targets scientific contributions on cooperative search engine development.This includes cooperative crawling of the web and cooperative deployment and evaluation of search engines. We specifically highlight the potential of enabling public and commercial organizations to use an indexed web crawl as a resource to create innovative search engines tailored to specific user groups, instead of relying on one search engine provider. The second call aims at gaining practical experience with joint, cooperative evaluation of search engine prototypes and their components using the Information Retrieval Experiment Platform (TIREx).

Sheikh Mastura Farzana, Maik Fröbe, Michael Granitzer, Gijs Hendriksen, Djoerd Hiemstra, Martin Potthast, Saber Zerhoudi
Third Workshop on Augmented Intelligence in Technology-Assisted Review Systems (ALTARS)

In this third edition of the workshop on Augmented Intelligence in Technology-Assisted Review Systems (ALTARS), we focus on the capacity of building test collections for the evaluation of High-recall Information Retrieval (IR) systems which tackle challenging tasks that require the finding of (nearly) all the relevant documents in a collection. During the workshop, the organizers as well as the participants will discuss the problems of how to build and evaluate these types of systems and prepare a set of guidelines for a correct evaluation of those systems according to the current and future available datasets.

Giorgio Maria Di Nunzio, Evangelos Kanoulas, Prasenjit Majumder
2nd International Workshop on Geographic Information Extraction from Texts (GeoExT 2024)

A wealth of unstructured textual content contains valuable geographic insights. This geographic information holds significance across diverse domains, including geographic information retrieval, disaster management, and spatial humanities. Despite significant progress in the extraction of geographic information from texts, numerous unresolved challenges persist, ranging from methodologies, systems, data, and applications to privacy concerns. This workshop will serve as a platform for the discourse of recent breakthroughs, novel ideas, and conceptual innovations in this field.

Xuke Hu, Ross Purves, Ludovic Moncla, Jens Kersten, Kristin Stock
Bibliometric-Enhanced Information Retrieval: 14th International BIR Workshop (BIR 2024)

The 14th iteration of the Bibliometric-enhanced Information Retrieval (BIR) workshop series takes place at ECIR 2024 as a full-day workshop. BIR addresses research topics related to academic search and recommendation, at the intersection of Information Retrieval, Natural Language Processing, and Bibliometrics. As an interdisciplinary scientific event, BIR brings together researchers and practitioners from the Scientometrics/Bibliometrics community on the one hand, and the Information Retrieval and NLP communities on the other hand. BIR is an ever-growing topic investigated by both academia and the industry.

Ingo Frommholz, Philipp Mayr, Guillaume Cabanac, Suzan Verberne

Conference and Labs of the Evaluation Forum (CLEF)

Frontmatter
The CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness

The first five editions of the CheckThat! lab focused on the main tasks of the information verification pipeline: check-worthiness, evidence retrieval and pairing, and verification. Since the 2023 edition, it has been focusing on new problems that can support the research and decision making during the verification process. In this new edition, we focus on new problems and —for the first time— we propose six tasks in fifteen languages (Arabic, Bulgarian, English, Dutch, French, Georgian, German, Greek, Italian, Polish, Portuguese, Russian, Slovene, Spanish, and code-mixed Hindi-English): Task 1 estimation of check-worthiness (the only task that has been present in all CheckThat! editions), Task 2 identification of subjectivity (a follow up of CheckThat! 2023 edition), Task 3 identification of persuasion (a follow up of SemEval 2023), Task 4 detection of hero, villain, and victim from memes (a follow up of CONSTRAINT 2022), Task 5 Rumor Verification using Evidence from Authorities (a first), and Task 6 robustness of credibility assessment with adversarial examples (a first). These tasks represent challenging classification and retrieval problems at the document and at the span level, including multilingual and multimodal settings.

Alberto Barrón-Cedeño, Firoj Alam, Tanmoy Chakraborty, Tamer Elsayed, Preslav Nakov, Piotr Przybyła, Julia Maria Struß, Fatima Haouari, Maram Hasanain, Federico Ruggeri, Xingyi Song, Reem Suwaileh
ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality

ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to bring together some high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The selected tasks for this first year of ELOQUENT are (1) probing a language model for topical competence; (2) assessing the ability of models to generate and detect hallucinations; (3) assessing the robustness of a model output given variation in the input prompts; and (4) establishing the possibility to distinguish human-generated text from machine-generated text.

Jussi Karlgren, Luise Dürlich, Evangelia Gogoulou, Liane Guillou, Joakim Nivre, Magnus Sahlgren, Aarne Talman
Overview of Touché 2024: Argumentation Systems

Decision-making and opinion-forming are everyday tasks that involve weighing pro and con arguments. The goal of Touché is to foster the development of support-technologies for decision-making and opinion-forming and to improve our understanding of these processes. This fifth edition of the lab features three shared tasks: (1) Human value detection (ValueEval), where participants detect (implicit) references to human values and their attainment in text; (2) Multilingual Ideology and Power Identification in Parliamentary Debates, where participants identify from a speech the political leaning of the speaker’s party and whether it was governing at the time of the speech (new task); and (3) Image retrieval or generation in order to convey the premise of an argument with visually. In this paper, we briefly describe the planned setup for the fifth lab edition at CLEF 2024 and summarize the results of the 2023 edition.

Johannes Kiesel, Çağrı Çöltekin, Maximilian Heinrich, Maik Fröbe, Milad Alshomary, Bertrand De Longueville, Tomaž Erjavec, Nicolas Handke, Matyáš Kopp, Nikola Ljubešić, Katja Meden, Nailia Mirzhakhmedova, Vaidas Morkevičius, Theresa Reitis-Münstermann, Mario Scharfbillig, Nicolas Stefanovitch, Henning Wachsmuth, Martin Potthast, Benno Stein
eRisk 2024: Depression, Anorexia, and Eating Disorder Challenges

In 2017, we launched eRisk as a CLEF Lab to encourage research on early risk detection on the Internet. Since then, thanks to the participants’ work, we have developed detection models and datasets for depression, anorexia, pathological gambling and self-harm. In 2024, it will be the eighth edition of the lab, where we will present a revision of the sentence ranking for depression symptoms, the third edition of tasks on early alert of anorexia and eating disorder severity estimation. This paper outlines the work that we have done to date, discusses key lessons learned in previous editions, and presents our plans for eRisk 2024.

Javier Parapar, Patricia Martín-Rodilla, David E. Losada, Fabio Crestani
QuantumCLEF - Quantum Computing at CLEF

Over the last few years, Quantum Computing (QC) has captured the attention of numerous researchers pertaining to different fields since, due to technological advancements, QC resources have become more available and also applicable in solving practical problems. In the current landscape, Information Retrieval (IR) and Recommender Systems (RS) need to perform computationally intensive operations on massive and heterogeneous datasets. Therefore, it could be possible to use QC and especially Quantum Annealing (QA) technologies to boost systems’ performance both in terms of efficiency and effectiveness. The objective of this work is to present the first edition of the QuantumCLEF lab, which is composed of two tasks that aim at: evaluating QA approaches compared to their traditional counterpart; identifying new problem formulations to discover novel methods that leverage the capabilities of QA for improved solutions; establishing collaborations among researchers from different fields to harness their knowledge and skills to solve the considered challenges and promote the usage of QA. This lab will employ the QC resources provided by CINECA, one of the most important computing centers worldwide. We also describe the design of our infrastructure which uses Docker and Kubernetes to ensure scalability, fault tolerance and replicability.

Andrea Pasin, Maurizio Ferrari Dacrema, Paolo Cremonesi, Nicola Ferro
BioASQ at CLEF2024: The Twelfth Edition of the Large-Scale Biomedical Semantic Indexing and Question Answering Challenge

The large-scale biomedical semantic indexing and question-answering challenge (BioASQ) aims at the continuous advancement of methods and tools to meet the needs of biomedical researchers and practitioners for efficient and precise access to the ever-increasing resources of their domain. With this purpose, during the last eleven years, a series of annual challenges have been organized with specific shared tasks on large-scale biomedical semantic indexing and question answering. Benchmark datasets have been concomitantly provided in alignment with the real needs of biomedical experts, providing a unique common testbed where different teams around the world can investigate and compare new approaches for accessing biomedical knowledge.The twelfth version of the BioASQ Challenge will be held as an evaluation Lab within CLEF2024 providing four shared tasks: (i) Task b on the information retrieval for biomedical questions, and the generation of comprehensible answers. (ii) Task Synergy the information retrieval and generation of answers for open biomedical questions on developing topics, in collaboration with the experts posing the questions. (iii) Task MultiCardioNER on the automated annotation of clinical entities in medical documents in the field of cardiology, primarily in Spanish, English, Italian and Dutch. (iv) Task BioNNE on the automated annotation of biomedical documents in Russian and English with nested named entity annotations. As BioASQ rewards the methods that outperform the state of the art in these shared tasks, it pushes the research frontier towards approaches that accelerate access to biomedical knowledge.

Anastasios Nentidis, Anastasia Krithara, Georgios Paliouras, Martin Krallinger, Luis Gasco Sanchez, Salvador Lima, Eulalia Farre, Natalia Loukachevitch, Vera Davydova, Elena Tutubalina
EXIST 2024: sEXism Identification in Social neTworks and Memes

The paper describes the EXIST 2024 lab on Sexism identification in social networks, that is expected to take place at the CLEF 2024 conference and represents the fourth edition of the EXIST challenge. The lab comprises five tasks in two languages, English and Spanish, with the initial three tasks building upon those from EXIST 2023 (sexism identification in tweets, source intention detection in tweets, and sexism categorization in tweets). In this edition, two new tasks have been introduced: sexism detection in memes and sexism categorization in memes. Similar to the prior edition, this one will adopt the Learning With Disagreement paradigm. The dataset for the various tasks will provide all annotations from multiple annotators, enabling models to learn from a range of training data, which may sometimes present contradictory opinions or labels. This approach facilitates the model’s ability to handle and navigate diverse perspectives. Data bias will be handled both in the sampling and in the labeling processes: seed, topic, temporal and user bias will be taken into account when gathering data; in the annotation process, bias will be reduced by involving annotators from different social and demographic backgrounds.

Laura Plaza, Jorge Carrillo-de-Albornoz, Enrique Amigó, Julio Gonzalo, Roser Morante, Paolo Rosso, Damiano Spina, Berta Chulvi, Alba Maeso, Víctor Ruiz
Backmatter
Metadaten
Titel
Advances in Information Retrieval
herausgegeben von
Nazli Goharian
Nicola Tonellotto
Yulan He
Aldo Lipani
Graham McDonald
Craig Macdonald
Iadh Ounis
Copyright-Jahr
2024
Electronic ISBN
978-3-031-56069-9
Print ISBN
978-3-031-56068-2
DOI
https://doi.org/10.1007/978-3-031-56069-9

Premium Partner