nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet

verfasst von : Gonzalo Martínez, Lauren Watson, Pedro Reviriego, José Alberto Hernández, Marc Juarez, Rik Sarkar

Erschienen in: Epistemic Uncertainty in Artificial Intelligence

Verlag: Springer Nature Switzerland

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The rapid adoption of generative Artificial Intelligence (AI) tools that can generate realistic images or text, such as DALL-E, MidJourney, or ChatGPT, have put the societal impacts of these technologies at the center of public debate. These tools are possible due to the massive amount of data (text and images) that is publicly available through the Internet. At the same time, these generative AI tools become content creators that are already contributing to the data that is available to train future models. Therefore, future versions of generative AI tools will be trained with a mix of human-created and AI-generated content, causing a potential feedback loop between generative AI and public data repositories. This interaction raises many questions: how will future versions of generative AI tools behave when trained on a mixture of real and AI-generated data? Will they evolve and improve with the new data sets or on the contrary will they degrade? Will evolution introduce biases or reduce diversity in subsequent generations of generative AI tools? What are the societal implications of the possible degradation of these models? Can we mitigate the effects of this feedback loop? In this work, we explore the effect of this interaction and report some initial results using simple diffusion models trained with various image datasets. Our results show that the quality and diversity of the generated images can degrade over time suggesting that incorporating AI-created data can have undesired effects on future versions of generative models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Defensive Perception: Estimation and Monitoring of Neural Network Performance Under Deployment

Nächstes Kapitel Optimizing Brain Tumor Classification: A Comprehensive Study on Transfer Learning and Imbalance Handling in Deep Learning Models

https://github.com/gonz-mart/Towards-Understanding-the-Interplay-of-Generative-Artificial-Intelligence-and-the-Internet.

Azizi, S., Kornblith, S., Saharia, C., Norouzi, M., Fleet, D.J.: Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466 (2023)

Bansal, M.A., Sharma, D.R., Kathuria, D.M.: A systematic review on data scarcity problem in deep learning: solution and applications. ACM Comput. Surv. 54(10s) (2022). https://doi.org/10.1145/3502287

Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models (2023)

Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)CrossRef

Doyle, J.C., Francis, B.A., Tannenbaum, A.R.: Feedback Control Theory. Courier Corporation, Chelmsford (2013)

Fahimi, F., Dosen, S., Ang, K.K., Mrachacz-Kersting, N., Guan, C.: Generative adversarial networks-based data augmentation for brain-computer interface. IEEE Trans. Neural Netw. Learn. Syst. 32(9), 4039–4051 (2021). https://doi.org/10.1109/TNNLS.2020.3016666CrossRef

Fournaris, A.P., Lalos, A.S., Serpanos, D.: Generative adversarial networks in AI-enabled safety-critical systems: friend or foe? Computer 52(9), 78–81 (2019). https://doi.org/10.1109/MC.2019.2924546CrossRef

Gozalo-Brizuela, R., Garrido-Merchan, E.C.: ChatGPT is not all you need. a state of the art review of large generative AI models. arXiv (2023). https://doi.org/10.48550/ARXIV.2301.04655, https://arxiv.org/abs/2301.04655

Hataya, R., Bao, H., Arai, H.: Will large-scale generative models corrupt future datasets? arXiv preprint arXiv:2211.08095 (2022)

10.

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

11.

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020)

12.

Ho, J., Salimans, T.: Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022)

13.

Jiang, R., Chiappa, S., Lattimore, T., György, A., Kohli, P.: Degenerate feedback loops in recommender systems. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 383–390, January 2019.https://doi.org/10.1145/3306618.3314288

14.

Jiang, Z., Zhang, J., Gong, N.Z.: Evading watermark based detection of AI-generated content. arXiv preprint arXiv:2305.03807 (2023)

15.

Karagiannakos, S., Adaloglou, N.: Diffusion models: toward state-of-the-art image generation (2022). https://theaisummer.com/

16.

Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. Adv. Neural Inf. Process. Syst. 32 (2019)

17.

Laurençon, H., et al.: The bigscience roots corpus: a 1.6 TB composite multilingual dataset. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 31809–31826. Curran Associates, Inc. (2022)

18.

Lhoest, Q., et al.: Datasets: a community library for natural language processing. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 175–184. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, November 2021.https://doi.org/10.18653/v1/2021.emnlp-demo.21, https://aclanthology.org/2021.emnlp-demo.21

19.

Li, C., et al.: Geometry-based molecular generation with deep constrained variational autoencoder. IEEE Trans. Neural Netw. Learn. Syst. 1–10 (2022).https://doi.org/10.1109/TNNLS.2022.3147790

20.

Mansoury, M., Abdollahpouri, H., Pechenizkiy, M., Mobasher, B., Burke, R.: Feedback loop and bias amplification in recommender systems. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 2145–2148 (2020)

21.

Martínez, G., Watson, L., Reviriego, P., Hernández, J.A., Juarez, M., Sarkar, R.: Combining generative artificial intelligence (AI) and the internet: heading towards evolution or degradation? (2023)

22.

Naeem, M.F., Oh, S.J., Uh, Y., Choi, Y., Yoo, J.: Reliable fidelity and diversity metrics for generative models. In: International Conference on Machine Learning, pp. 7176–7185. PMLR (2020)

23.

Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp. 8162–8171. PMLR (2021)

24.

Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing, December 2008

25.

Schuhmann, C., et al.: LAION-5B: an open large-scale dataset for training next generation image-text models. arXiv (2022). https://doi.org/10.48550/ARXIV.2210.08402, https://arxiv.org/abs/2210.08402

26.

Schuhmann, C., et al.: LAION-400M: open dataset of CLIP-filtered 400 million image-text pairs. arXiv (2021). https://doi.org/10.48550/ARXIV.2111.02114, https://arxiv.org/abs/2111.02114

27.

Shumailov, I., Shumaylov, Z., Zhao, Y., Gal, Y., Papernot, N., Anderson, R.: The curse of recursion: training on generated data makes models forget (2023)

28.

Simard, M.: Clean data for training statistical MT: the case of MT contamination. In: Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track, pp. 69–82 (2014)

29.

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

30.

Szegedy, C., et al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594

31.

Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: Caltech-UCSD birds-200-2011 (cub-200-2011). Technical report. CNS-TR-2011-001, California Institute of Technology (2011)

32.

Weng, L.: What are diffusion models? lilianweng.github.io, July 2021. https://lilianweng.github.io/posts/2021-07-11-diffusion-models/

33.

Xiao, Z., Kreis, K., Vahdat, A.: Tackling the generative learning trilemma with denoising diffusion GANs. arXiv preprint arXiv:2112.07804 (2021)

34.

Zhang, C., Geng, Y., Han, Z., Liu, Y., Fu, H., Hu, Q.: Autoencoder in autoencoder networks. IEEE Trans. Neural Netw. Learn. Syst. 1–13 (2022). https://doi.org/10.1109/TNNLS.2022.3189239

35.

Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion model in generative AI: a survey. arXiv preprint arXiv:2303.07909 (2023)

36.

Zhang, C., Zhang, C., Zhang, M., Kweon, I.S.: Text-to-image diffusion models in generative AI: a survey (2023)

Titel: Towards Understanding the Interplay of Generative Artificial Intelligence and the Internet
verfasst von: Gonzalo Martínez
Lauren Watson
Pedro Reviriego
José Alberto Hernández
Marc Juarez
Rik Sarkar
Verlag: Springer Nature Switzerland
Buch: Epistemic Uncertainty in Artificial Intelligence
Print ISBN: 978-3-031-57962-2

Electronic ISBN: 978-3-031-57963-9

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-3-031-57963-9_5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner