nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Enhancing Code Security Through Open-Source Large Language Models: A Comparative Study

verfasst von : Norah Ridley, Enrico Branca, Jadyn Kimber, Natalia Stakhanova

Erschienen in: Foundations and Practice of Security

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Significant advances in the language processing field are providing new innovations, including the ability to analyze code for weaknesses. Typically, analyzing code security is performed by tools that use known vulnerable patterns, which may not adequately represent the intricacies of vulnerabilities in real-world projects. Such tools can fail to detect non-standard weaknesses in code samples, potentially leading to a loss of personal and financial information for end users of the code. Using language-based models to detect weaknesses that would have otherwise been missed by the currently available analysis tools is a promising new avenue of vulnerability detection. In this research, we employ 25 different models to evaluate the security of code samples. Using an existing dataset of insecure code, we prompt each model to detect weaknesses in the vulnerable code. Our findings indicate that most models are ill-equipped to deal with insecure code. Through our analysis, we identify strategies for improving weakness detection using language models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel A BERT-Based Framework for Automated Extraction of Behavioral Indicators of Compromise from Security Incident Reports

Nächstes Kapitel Green-Fuzz: Efficient Fuzzing for Network Protocol Implementations

At the time of writing.

Adamson, V., Bägerfeldt, J.: Assessing the effectiveness of ChatGPT in generating Python code (2023)

Ahmed, I., Kajol, M., Hasan, U., Datta, P.P., Roy, A., Reza, M.R.: ChatGPT vs. Bard: a comparative study. UMBC Student Collection (2023)

Airoboros: Airoboros: using large language models to fine-tune large language models. https://github.com/jondurbin/airoboros

Austism: Chronos-hermes-13b. https://huggingface.co/Austism/chronos-hermes-13b

Bilgin, Z., Ersoy, M.A., Soykan, E.U., Tomur, E., Çomak, P., Karaçay, L.: Vulnerability prediction from source code using machine learning. IEEE Access 8, 150672–150684 (2020)CrossRef

Bull, C., Kharrufa, A.: Generative AI assistants in software development education: a vision for integrating generative AI into educational practice, not instinctively defending against it. IEEE Softw. 41, 52–59 (2023)CrossRef

MITRE Corporation: Common weakness enumeration. https://cwe.mitre.org/

CWE: CWE-664: improper control of a resource through its lifetime. https://cwe.mitre.org/data/definitions/664.html

CWE: CWE-693: protection mechanism failure. https://cwe.mitre.org/data/definitions/693.html

10.

CWE: CWE-707: improper neutralization. https://cwe.mitre.org/data/definitions/707.html

11.

CWE: CWE view: research concepts. https://cwe.mitre.org/data/definitions/1000.html

12.

Nijkamp, E., Hayashi, H., Zhou, Y., Xiong, C.: CodeGen2.5: small, but mighty. https://blog.salesforceairesearch.com/codegen25/

13.

Hugging Face: The AI community building the future. https://huggingface.co/

14.

Fu, Y., Peng, H., Khot, T.: How does GPT obtain its ability? Tracing emergent abilities of language models to their sources. Yao Fu’s Notion, December 2022. https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1dc1

15.

Hartford, E.: Samantha-33b. https://huggingface.co/ehartford/samantha-33b

16.

Hartford, E.: Wizard Vicuna 7B Uncensored. https://huggingface.co/ehartford/Wizard-Vicuna-7B-Uncensored

17.

Python Package Index: llama2-wrapper 0.1.12. https://pypi.org/project/llama2-wrapper/

18.

Technology Innovation Institute: Falcon 7b instruct. https://huggingface.co/tiiuae/falcon-7b-instruct

19.

Ji, B.: VicunaNER: zero/few-shot named entity recognition using Vicuna. arXiv preprint arXiv:2305.03253 (2023)

20.

Kande, R., et al.: LLM-assisted generation of hardware assertions (2023)

21.

Khoury, R., Avila, A.R., Brunelle, J., Camara, B.M.: How secure is code generated by chatgpt? arXiv preprint arXiv:2304.09655 (2023)

22.

GS Lab: CodeQL. https://codeql.github.com/

23.

Lee, A.N., Hunter, C.J., Ruiz, N.: Platypus: quick, cheap, and powerful refinement of LLMs (2023)

24.

Li, R., et al.: StarCoder: may the source be with you! (2023)

25.

Mahan, D., Carlow, R., Castricato, L., Cooper, N., Laforte, C.: Stable beluga models. https://huggingface.co/stabilityai/StableBeluga2

26.

Meta: Meta and Microsoft introduce the next generation of Llama. https://about.fb.com/news/2023/07/llama-2/

27.

Nayak, A., Timmapathini, H.P.: LLM2KB: constructing knowledge bases using instruction tuned context aware large language models. arXiv preprint arXiv:2308.13207 (2023)

28.

Open-Orca/OpenOrca-Platypus2-13B. https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B

29.

Pearce, H., Ahmad, B., Tan, B., Dolan-Gavitt, B., Karri, R.: Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions. In: 2022 IEEE Symposium on Security and Privacy (SP), pp. 754–768 (2022)

30.

Pearce, H., Tan, B., Ahmad, B., Karri, R., Dolan-Gavitt, B.: Examining zero-shot vulnerability repair with large language models. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 2339–2356. IEEE (2023)

31.

Nous Research: Nous-Hermes-Llama2-13b. https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b

32.

Romero, M.: Llama-2-Coder-7B (revision d30d193) (2023). https://huggingface.co/mrm8488/llama-2-coder-7b

33.

Sandoval, G., Pearce, H., Nys, T., Karri, R., Garg, S., Dolan-Gavitt, B.: Lost at C: a user study on the security implications of large language model code assistants. In: USENIX (2023)

34.

Sharma, S., Sodhi, B.: Calculating originality of LLM assisted source code (2023)

35.

Siddiq, M.L., Santos, J.C.S.: SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. In: Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security, MSR4PS 2022 (2022). https://doi.org/10.1145/3549035.3561184

36.

SonarSource: Sonarsource static code analysis. https://rules.sonarsource.com/

37.

Surameery, N.M.S., Shakor, M.Y.: Use ChatGPT to solve programming bugs. Int. J. Inf. Technol. Comput. Eng. (IJITC) 3(01), 17–22 (2023). ISSN 2455-5290

38.

Taecharungroj, V.: “What can ChatGPT do?’’ Analyzing early reactions to the innovative AI Chatbot on Twitter. Big Data Cogn. Comput. 7(1), 35 (2023)CrossRef

39.

Tap-M: Luna AI Llama uncensored. https://huggingface.co/Tap-M/Luna-AI-Llama2-Uncensored

40.

Yamaguchi, F., Rieck, K., et al.: Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: 5th USENIX Workshop on Offensive Technologies, WOOT 2011 (2011)

41.

Yetiştiren, B., Özsoy, I., Ayerdem, M., Tüzün, E.: Evaluating the code quality of AI-assisted code generation tools: an empirical study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT (2023)

Titel: Enhancing Code Security Through Open-Source Large Language Models: A Comparative Study
verfasst von: Norah Ridley
Enrico Branca
Jadyn Kimber
Natalia Stakhanova
Verlag: Springer Nature Switzerland
Buch: Foundations and Practice of Security
Print ISBN: 978-3-031-57536-5

Electronic ISBN: 978-3-031-57537-2

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-57537-2_15

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner