nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality

verfasst von : Jussi Karlgren, Luise Dürlich, Evangelia Gogoulou, Liane Guillou, Joakim Nivre, Magnus Sahlgren, Aarne Talman

Erschienen in: Advances in Information Retrieval

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to bring together some high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The selected tasks for this first year of ELOQUENT are (1) probing a language model for topical competence; (2) assessing the ability of models to generate and detect hallucinations; (3) assessing the robustness of a model output given variation in the input prompts; and (4) establishing the possibility to distinguish human-generated text from machine-generated text.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel The CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness

Nächstes Kapitel Overview of Touché 2024: Argumentation Systems

https://wsl.iiitb.ac.in/cikm-2023-workshop-on-enterprise-knowledge-graphs-using-large-language-models/.

https://helsinki-nlp.github.io/shroom/.

https://www.theguardian.com/technology/2023/jun/05/google-and-facebook-urged-by-eu-to-label-ai-generated-content.

ELOQUENT lab page with more practical information can be found at https://eloquent-lab.github.io/.

Altinisik, E., Sajjad, H., Sencar, H.T., Messaoud, S., Chawla, S.: Impact of adversarial training on robustness and generalizability of language models. arXiv preprint arXiv:2211.05523 (2023)

Bell, A.: Language style as audience design. Lang. Soc. 13(2) (1984)

Bevendorff, J., et al.: Overview of PAN 2024: multi-author writing style analysis, multilingual text detoxification, oppositional thinking analysis, and generative AI authorship verification. In: Advances in Information Retrieval: 46th European Conference on IR Research (ECIR) (2024)

Bevendorff, J., et al.: Overview of PAN 2021: authorship verification, profiling hate speech spreaders on twitter, and style change detection. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction – 12th International Conference of the CLEF Association (2021)

Bevendorff, J., et al.: Overview of PAN 2020: authorship verification, celebrity profiling, profiling fake news spreaders on twitter, and style change detection. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction – 11th International Conference of the CLEF Association (2020)

Ettinger, A., Rao, S., Daumé III, H., Bender, E.M.: Towards linguistically generalizable NLP systems: a workshop and shared task. In: Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems. Association for Computational Linguistics (2017)

Freitag, M., et al.: Results of WMT22 metrics shared task: stop using BLEU - neural metrics are better and more robust. In: Proceedings of the Seventh Conference on Machine Translation (WMT). Association for Computational Linguistics (2022)

Karlgren, J.: Adopting systematic evaluation benchmarks in operational settings. In: Information Retrieval Evaluation in a Changing World: Lessons Learned from 20 Years of CLEF (2019)

Karlgren, J., et al.: Evaluating learning language representations. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction – 6th International Conference of the CLEF Association (2015)

10.

Manakul, P., Liusie, A., Gales, M.J.F.: Selfcheckgpt: zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896 (2023)

11.

Moradi, M., Samwald, M.: Evaluating the robustness of neural language models to input perturbations. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (2021)

12.

Mündler, N., He, J., Jenko, S., Vechev, M.: Self-contradictory hallucinations of large language models: evaluation, detection and mitigation. arXiv preprint arXiv:2305.15852 (2023)

13.

Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., Kiela, D.: Adversarial NLI: a new benchmark for natural language understanding. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2020)

14.

Sarvazyan, A.M., González, J.Á., Rosso, P., Franco-Salvador, M.: Supervised machine-generated text detectors: family and scale matters. In: Arampatzis, A., et al. (eds.) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2023. LNCS, vol. 14163, pp. 121–132. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42448-9_11

15.

Saunders, W., et al..: Self-critiquing models for assisting human evaluators. arXiv preprint arXiv:2206.05802 (2022)

16.

Singhal, K., et al.: Large language models encode clinical knowledge. Nature 620(7972) (2023)

17.

Stamatatos, E., et al.: Overview of the authorship verification task at PAN 2022. In: Faggioli, G., Ferro, N., Hanbury, A., Potthast, M. (eds.) CLEF 2022 Labs and Workshops, Notebook Papers. CEUR-WS.org (2022)

18.

Stamatatos, E., Potthast, M., Pardo, F.M.R., Rosso, P., Stein, B.: Overview of the PAN/CLEF 2015 evaluation lab. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction – 6th International Conference of the CLEF Association (2015)

19.

Wang, B., et al.: InfoBERT: improving robustness of language models from an information theoretic perspective. In: International Conference on Learning Representations (2021)

20.

Wu, S., et al.: Bloomberggpt: a large language model for finance. arXiv preprint arXiv:2303.17564 (2023)

21.

Zheng, C., Zhou, H., Meng, F., Zhou, J., Huang, M.: Large language models are not robust multiple choice selectors. arXiv preprint arXiv:2309.03882 (2023)

Titel: ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality
verfasst von: Jussi Karlgren
Luise Dürlich
Evangelia Gogoulou
Liane Guillou
Joakim Nivre
Magnus Sahlgren
Aarne Talman
Verlag: Springer Nature Switzerland
Buch: Advances in Information Retrieval
Print ISBN: 978-3-031-56068-2

Electronic ISBN: 978-3-031-56069-9

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-56069-9_63

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner