nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

6. GenAI Model Security

verfasst von : Ken Huang, Ben Goertzel, Daniel Wu, Anita Xie

Erschienen in: Generative AI Security

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Safeguarding GenAI models against threats and aligning them with security requirements is imperative yet challenging. This chapter provides an overview of the security landscape for generative models. It begins by elucidating common vulnerabilities and attack vectors, including adversarial attacks, model inversion, backdoors, data extraction, and algorithmic bias. The practical implications of these threats are discussed, spanning domains like finance, healthcare, and content creation. The narrative then shifts to exploring mitigation strategies and innovative security paradigms. Differential privacy, blockchain-based provenance, quantum-resistant algorithms, and human-guided reinforcement learning are analyzed as potential techniques to harden generative models. Broader ethical concerns surrounding transparency, accountability, deepfakes, and model interpretability are also addressed. The chapter aims to establish a conceptual foundation encompassing both the technical and ethical dimensions of security for generative AI. It highlights open challenges and lays the groundwork for developing robust, trustworthy, and human-centric solutions. The multifaceted perspective spanning vulnerabilities, implications, and solutions is intended to further discourse on securing society’s growing reliance on generative models. Frontier model security is discussed using Anthropic proposed approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel GenAI Data Security

Nächstes Kapitel GenAI Application Level Security

Adams, N. (2023, March 23). Model inversion attacks | A new AI security risk. Michalsons. Retrieved August 28, 2023, from https://www.michalsons.com/blog/model-inversion-attacks-a-new-ai-security-risk/64427

Anthropic. (2023a, July 25). Frontier model security. Anthropic. Retrieved November 26, 2023, from https://www.anthropic.com/index/frontier-model-security

Anthropic. (2023b, October 5). Decomposing language models into understandable components. Anthropic. Retrieved October 10, 2023, from https://www.anthropic.com/index/decomposing-language-models-into-understandable-components

Bansemer, J., & Lohn, A. (2023, July 6). Securing AI makes for safer AI. Center for Security and Emerging Technology. Retrieved August 29, 2023, from https://cset.georgetown.edu/article/securing-ai-makes-for-safer-ai/

Brownlee, J. (2018, December 7). A gentle introduction to early stopping to avoid overtraining neural networks - MachineLearningMastery.com. Machine Learning Mastery. Retrieved August 29, 2023, from https://machinelearningmastery.com/early-stopping-to-avoid-overtraining-neural-network-models/

Datascientest. (2023, March 9). SHapley additive exPlanations ou SHAP : What is it ? DataScientest.com. Retrieved August 29, 2023, from https://datascientest.com/en/shap-what-is-it

Dickson, B. (2022, May 23). Machine learning has a backdoor problem. TechTalks. Retrieved August 29, 2023, from https://bdtechtalks.com/2022/05/23/machine-learning-undetectable-backdoors/

Dickson, B. (2023, January 16). What is reinforcement learning from human feedback (RLHF)? TechTalks. Retrieved August 29, 2023, from https://bdtechtalks.com/2023/01/16/what-is-rlhf/

Duffin, M. (2023, August 12). Machine unlearning: The critical art of teaching AI to forget. VentureBeat. Retrieved October 7, 2023, from https://venturebeat.com/ai/machine-unlearning-the-critical-art-of-teaching-ai-to-forget/

Gupta, A. (2020, October 12). Global model interpretability techniques for Black Box models. Analytics Vidhya. Retrieved August 29, 2023, from https://www.analyticsvidhya.com/blog/2020/10/global-model-interpretability-techniques-for-black-box-models/

Irolla, P. (2019, September 19). Demystifying the membership inference attack | by Paul Irolla | Disaitek. Medium. Retrieved August 29, 2023, from https://medium.com/disaitek/demystifying-the-membership-inference-attack-e33e510a0c39

Kan, M., & Ozalp, H. (2023, November 9). OpenAI Blames ChatGPT Outages on DDoS Attacks. PCMag. Retrieved November 23, 2023, from https://www.pcmag.com/news/openai-blames-chatgpt-outages-on-ddos-attacks

Nagpal, A., & Guide, S. (2022, January 5). L1 and L2 regularization methods, explained. Built In. Retrieved August 29, 2023, from https://builtin.com/data-science/l2-regularization

Nguyen, A. (2019, July). Understanding differential privacy | by An Nguyen. Towards Data Science. Retrieved August 28, 2023, from https://towardsdatascience.com/understanding-differential-privacy-85ce191e198a

NIST. (2022, February 3). NIST Special Publication (SP) 800-218, Secure Software Development Framework (SSDF) Version 1.1: Recommendations for mitigating the risk of software vulnerabilities. NIST Computer Security Resource Center. Retrieved November 26, 2023, from https://csrc.nist.gov/pubs/sp/800/218/final

Noone, R. (2023, July 28). Researchers discover new vulnerability in large language models. Carnegie Mellon University. Retrieved August 28, 2023, from https://www.cmu.edu/news/stories/archives/2023/july/researchers-discover-new-vulnerability-in-large-language-models

O’Connor’s, R., & O’Connor, R. (2023, August 1). How reinforcement learning from AI feedback works. AssemblyAI. Retrieved October 10, 2023, from https://www.assemblyai.com/blog/how-reinforcement-learning-from-ai-feedback-works/

Olah, C. (2022, June 27). mechanistic interpretability, variables, and the importance of interpretable bases. Transformer Circuits Thread. Retrieved August 29, 2023, from https://transformer-circuits.pub/2022/mech-interp-essay/index.html

OWASP. (2023). OWASP top 10 for large language model applications. OWASP Foundation. Retrieved August 29, 2023, from https://owasp.org/www-project-top-10-for-large-language-model-applications/

Ribeiro, M. T. (2016, April 2). LIME - Local interpretable model-agnostic explanations – Marco Tulio Ribeiro –. Retrieved August 29, 2023, from https://homes.cs.washington.edu/~marcotcr/blog/lime/

Sample, I., & Gregory, S. (2020, January 13). What are deepfakes – and how can you spot them? The Guardian. Retrieved August 29, 2023, from https://www.theguardian.com/technology/2020/jan/13/what-are-deepfakes-and-how-can-you-spot-them

Sanzeri, S., & Danise, A. (2023, June 23). The quantum threat to AI language models like ChatGPT. Forbes. Retrieved August 29, 2023, from https://www.forbes.com/sites/forbestechcouncil/2023/06/23/the-quantum-threat-to-ai-language-models-like-chatgpt/

Secureworks. (2023, June 27). Unravelling the attack surface of AI systems. Secureworks. Retrieved August 29, 2023, from https://www.secureworks.com/blog/unravelling-the-attack-surface-of-ai-systems

Tomorrow.bio. (2023, September 21). Preventing Bias in AI Models with Constitutional AI. Tomorrow Bio. Retrieved October 10, 2023, from https://www.tomorrow.bio/post/preventing-bias-in-ai-models-with-constitutional-ai-2023-09-5160899464-futurism

van Heeswijk, W. (2022, November 29). Proximal policy optimization (PPO) explained | by Wouter van Heeswijk, PhD. Towards Data Science. Retrieved August 29, 2023, from https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b

Wolford, B. (2021). Everything you need to know about the “Right to be forgotten” - GDPR.eu. GDPR compliance. Retrieved October 7, 2023, from https://gdpr.eu/right-to-be-forgotten/

Wunderwuzzi. (2020, November 10). Machine learning attack series: repudiation threat and auditing · Embrace the red. Embrace The Red. Retrieved August 29, 2023, from https://embracethered.com/blog/posts/2020/husky-ai-repudiation-threat-deny-action-machine-learning/

Yadav, H. (2022, July 4). Dropout in neural networks. Dropout layers have been the go-to… | by Harsh Yadav. Towards Data Science. Retrieved August 29, 2023, from https://towardsdatascience.com/dropout-in-neural-networks-47a162d621d9

Yasar, K. (2022). What is a generative adversarial network (GAN)? | Definition from TechTarget. TechTarget. Retrieved August 29, 2023, from https://www.techtarget.com/searchenterpriseai/definition/generative-adversarial-network-GAN

Titel: GenAI Model Security
verfasst von: Ken Huang
Ben Goertzel
Daniel Wu
Anita Xie
Verlag: Springer Nature Switzerland
Buch: Generative AI Security
Print ISBN: 978-3-031-54251-0

Electronic ISBN: 978-3-031-54252-7

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-54252-7_6

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner