Skip to main content

2024 | OriginalPaper | Buchkapitel

DQAC: Detoxifying Query Auto-completion with Adapters

verfasst von : Aishwarya Maheswaran, Kaushal Kumar Maurya, Manish Gupta, Maunendra Sankar Desarkar

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent Query Auto-completion (QAC) systems leverage natural language generation or pre-trained language models (PLMs) to demonstrate remarkable performance. However, these systems also suffer from biased and toxic completions. Efforts have been made to address language detoxification within PLMs using controllable text generation (CTG) techniques, involving training with non-toxic data and employing decoding time approaches. As the completions for QAC systems are usually short, these existing CTG methods based on decoding and training are not directly transferable. Towards these concerns, we propose the first public QAC detoxification model, Detoxifying Query Auto-Completion (or DQAC), which utilizes adapters in a CTG framework. DQAC operates on latent representations with no additional overhead. It leverages two adapters for toxic and non-toxic cases. During inference, we fuse these representations in a controlled manner that guides the generation of query completions towards non-toxicity. We evaluate toxicity levels in the generated completions across two real-world datasets using two classifiers: a publicly available (Detoxify) and a search query-specific classifier which we develop (QDetoxify). DQAC consistently outperforms all existing baselines and emerges as a state-of-the-art model providing high quality and low toxicity. We make the code publicly available\(^{1}\).(\(^{1}\) https://​shorturl.​at/​zJ024)

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cai, F., De Rijke, M., et al.: A survey of query auto completion in information retrieval. Found. Trends® in Inf. Retrieval 10(4), 273–363 (2016) Cai, F., De Rijke, M., et al.: A survey of query auto completion in information retrieval. Found. Trends® in Inf. Retrieval 10(4), 273–363 (2016)
2.
Zurück zum Zitat Dathathri, S., Madotto, A., Lan, J., Hung, J., Frank, E., Molino, P., Yosinski, J., Liu, R.: Plug and play language models: a simple approach to controlled text generation. In: ICLR (2020) Dathathri, S., Madotto, A., Lan, J., Hung, J., Frank, E., Molino, P., Yosinski, J., Liu, R.: Plug and play language models: a simple approach to controlled text generation. In: ICLR (2020)
3.
Zurück zum Zitat Gehman, S., Gururangan, S., Sap, M., Choi, Y., Smith, N.A.: RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: EMNLP Findings, pp. 3356–3369 (2020) Gehman, S., Gururangan, S., Sap, M., Choi, Y., Smith, N.A.: RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: EMNLP Findings, pp. 3356–3369 (2020)
5.
Zurück zum Zitat Gururangan, S., et al.: Don’t stop pretraining: Adapt language models to domains and tasks. In: ACL, pp. 8342–8360. Association for Computational Linguistics (2020) Gururangan, S., et al.: Don’t stop pretraining: Adapt language models to domains and tasks. In: ACL, pp. 8342–8360. Association for Computational Linguistics (2020)
7.
Zurück zum Zitat Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D., Kamar, E.: ToxiGen: a large-scale machine-generated dataset for adversarial and implicit hate speech detection. In: ACL, pp. 3309–3326 (May 2022) Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D., Kamar, E.: ToxiGen: a large-scale machine-generated dataset for adversarial and implicit hate speech detection. In: ACL, pp. 3309–3326 (May 2022)
8.
Zurück zum Zitat Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019) Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)
9.
Zurück zum Zitat Lees, A., et al.: A new generation of perspective API: efficient multilingual character-level transformers. KDD (2022) Lees, A., et al.: A new generation of perspective API: efficient multilingual character-level transformers. KDD (2022)
10.
Zurück zum Zitat Liu, A., et al.: DExperts: decoding-time controlled text generation with experts and anti-experts. In: ACL-IJCNLP, pp. 6691–6706 (Aug 2021) Liu, A., et al.: DExperts: decoding-time controlled text generation with experts and anti-experts. In: ACL-IJCNLP, pp. 6691–6706 (Aug 2021)
11.
Zurück zum Zitat Logacheva, V., et al.: ParaDetox: detoxification with parallel data. In: ACL, pp. 6804–6818 (2022) Logacheva, V., et al.: ParaDetox: detoxification with parallel data. In: ACL, pp. 6804–6818 (2022)
12.
Zurück zum Zitat Lu, X., et al.: Quark: controllable text generation with reinforced unlearning. NeurIPS 35, 27591–27609 (2022) Lu, X., et al.: Quark: controllable text generation with reinforced unlearning. NeurIPS 35, 27591–27609 (2022)
13.
Zurück zum Zitat Maurya, K.K., Desarkar, M.S., Gupta, M., Agrawal, P.: TRIE-NLG: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes. In: DMKD, vol. 1573-756X. ECML-PKDD 2023 (2023) Maurya, K.K., Desarkar, M.S., Gupta, M., Agrawal, P.: TRIE-NLG: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes. In: DMKD, vol. 1573-756X. ECML-PKDD 2023 (2023)
14.
Zurück zum Zitat Maurya, K.K., Desarkar, M.S., Kano, Y., Deepshikha, K.: ZmBART: an unsupervised cross-lingual transfer framework for language generation. In: ACL-IJCNLP Findings, pp. 2804–2818 (Aug 2021) Maurya, K.K., Desarkar, M.S., Kano, Y., Deepshikha, K.: ZmBART: an unsupervised cross-lingual transfer framework for language generation. In: ACL-IJCNLP Findings, pp. 2804–2818 (Aug 2021)
15.
Zurück zum Zitat Mitra, B., Craswell, N.: Query auto-completion for rare prefixes. In: CIKM, pp. 1755–1758 (2015) Mitra, B., Craswell, N.: Query auto-completion for rare prefixes. In: CIKM, pp. 1755–1758 (2015)
16.
Zurück zum Zitat Olteanu, A., Diaz, F., Kazai, G.: When are search completion suggestions problematic? Proc. ACM on Hum.-Comput. Inter. 4(CSCW2), 1–25 (2020)CrossRef Olteanu, A., Diaz, F., Kazai, G.: When are search completion suggestions problematic? Proc. ACM on Hum.-Comput. Inter. 4(CSCW2), 1–25 (2020)CrossRef
17.
Zurück zum Zitat Pozzobon, L.A., Ermis, B., Lewis, P., Hooker, S.: On the challenges of using black-box APIs for toxicity evaluation in research. ArXiv abs/2304.12397 (2023) Pozzobon, L.A., Ermis, B., Lewis, P., Hooker, S.: On the challenges of using black-box APIs for toxicity evaluation in research. ArXiv abs/2304.12397 (2023)
19.
Zurück zum Zitat Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP (11 2019) Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP (11 2019)
20.
Zurück zum Zitat Stickland, A.C., Murray, I.: BERT and PALs: projected attention layers for efficient adaptation in multi-task learning. In: ICML, pp. 5986–5995. PMLR (2019) Stickland, A.C., Murray, I.: BERT and PALs: projected attention layers for efficient adaptation in multi-task learning. In: ICML, pp. 5986–5995. PMLR (2019)
21.
Zurück zum Zitat Üstün, A., Bérard, A., Besacier, L., Gallé, M.: Multilingual unsupervised neural machine translation with denoising adapters. In: EMNLP (2021) Üstün, A., Bérard, A., Besacier, L., Gallé, M.: Multilingual unsupervised neural machine translation with denoising adapters. In: EMNLP (2021)
22.
Zurück zum Zitat Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13, 254–270 (2010)CrossRef Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13, 254–270 (2010)CrossRef
23.
Zurück zum Zitat Yadav, N., Sen, R., Hill, D.N., Mazumdar, A., Dhillon, I.S.: Session-aware query auto-completion using extreme multi-label ranking. In: KDD, pp. 3835–3844 (2021) Yadav, N., Sen, R., Hill, D.N., Mazumdar, A., Dhillon, I.S.: Session-aware query auto-completion using extreme multi-label ranking. In: KDD, pp. 3835–3844 (2021)
Metadaten
Titel
DQAC: Detoxifying Query Auto-completion with Adapters
verfasst von
Aishwarya Maheswaran
Kaushal Kumar Maurya
Manish Gupta
Maunendra Sankar Desarkar
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2266-2_9

Premium Partner