Skip to main content

2024 | OriginalPaper | Buchkapitel

StyleAutoEncoder for Manipulating Image Attributes Using Pre-trained StyleGAN

verfasst von : Andrzej Bedychaj, Jacek Tabor, Marek Śmieja

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep conditional generative models are excellent tools for creating high-quality images and editing their attributes. However, training modern generative models from scratch is very expensive and requires large computational resources. In this paper, we introduce StyleAutoEncoder (StyleAE), a lightweight AutoEncoder module, which works as a plugin for pre-trained generative models and allows for manipulating the requested attributes of images. The proposed method offers a cost-effective solution for training deep generative models with limited computational resources, making it a promising technique for a wide range of applications. We evaluate StyleAE by combining it with StyleGAN, which is currently one of the top generative models. Our experiments demonstrate that StyleAE is at least as effective in manipulating image attributes as the state-of-the-art algorithms based on invertible normalizing flows. However, it is simpler, faster, and gives more freedom in designing neural architecture.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? CoRR abs/1904.03189 (2019) Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? CoRR abs/1904.03189 (2019)
2.
Zurück zum Zitat Abdal, R., Zhu, P., Femiani, J., Mitra, N.J., Wonka, P.: CLIP2StyleGAN: unsupervised extraction of StyleGAN edit directions. CoRR abs/2112.05219 (2021) Abdal, R., Zhu, P., Femiani, J., Mitra, N.J., Wonka, P.: CLIP2StyleGAN: unsupervised extraction of StyleGAN edit directions. CoRR abs/2112.05219 (2021)
3.
Zurück zum Zitat Abdal, R., Zhu, P., Mitra, N.J., Wonka, P.: StyleFlow: attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. CoRR abs/2008.02401 (2020) Abdal, R., Zhu, P., Mitra, N.J., Wonka, P.: StyleFlow: attribute-conditioned exploration of StyleGAN-generated images using conditional continuous normalizing flows. CoRR abs/2008.02401 (2020)
4.
Zurück zum Zitat Cha, J., Thiyagalingam, J.: Disentangling autoencoders (DAE) (2022) Cha, J., Thiyagalingam, J.: Disentangling autoencoders (DAE) (2022)
5.
Zurück zum Zitat Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets (2016) Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets (2016)
6.
Zurück zum Zitat Choi, Y., Uh, Y., Yoo, J., Ha, J.: StarGAN v2: diverse image synthesis for multiple domains. CoRR abs/1912.01865 (2019) Choi, Y., Uh, Y., Yoo, J., Ha, J.: StarGAN v2: diverse image synthesis for multiple domains. CoRR abs/1912.01865 (2019)
7.
Zurück zum Zitat Deng, J., Guo, J., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. CoRR abs/1801.07698 (2018) Deng, J., Guo, J., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. CoRR abs/1801.07698 (2018)
8.
Zurück zum Zitat Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP (2016) Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP (2016)
9.
Zurück zum Zitat Gao, Y., et al.: High-fidelity and arbitrary face editing. CoRR abs/2103.15814 (2021) Gao, Y., et al.: High-fidelity and arbitrary face editing. CoRR abs/2103.15814 (2021)
10.
Zurück zum Zitat Goodfellow, I.J., et al.: Generative adversarial networks (2014) Goodfellow, I.J., et al.: Generative adversarial networks (2014)
11.
Zurück zum Zitat Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. CoRR abs/2004.02546 (2020) Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANSpace: discovering interpretable GAN controls. CoRR abs/2004.02546 (2020)
12.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
13.
Zurück zum Zitat He, Z., Zuo, W., Kan, M., Shan, S., Chen, X.: AttGAN: facial attribute editing by only changing what you want. IEEE Trans. Image Process. 28(11), 5464–5478 (2019)MathSciNetCrossRef He, Z., Zuo, W., Kan, M., Shan, S., Chen, X.: AttGAN: facial attribute editing by only changing what you want. IEEE Trans. Image Process. 28(11), 5464–5478 (2019)MathSciNetCrossRef
14.
Zurück zum Zitat Ho, J., Chen, X., Srinivas, A., Duan, Y., Abbeel, P.: Flow++: improving flow-based generative models with variational dequantization and architecture design. CoRR abs/1902.00275 (2019) Ho, J., Chen, X., Srinivas, A., Duan, Y., Abbeel, P.: Flow++: improving flow-based generative models with variational dequantization and architecture design. CoRR abs/1902.00275 (2019)
15.
Zurück zum Zitat Karras, T., et al.: Alias-free generative adversarial networks. In: Proceedings of the NeurIPS (2021) Karras, T., et al.: Alias-free generative adversarial networks. In: Proceedings of the NeurIPS (2021)
16.
Zurück zum Zitat Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018) Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018)
17.
Zurück zum Zitat Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. CoRR abs/1912.04958 (2019) Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. CoRR abs/1912.04958 (2019)
18.
Zurück zum Zitat Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1 \(\times \) 1 convolutions (2018) Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1 \(\times \) 1 convolutions (2018)
19.
Zurück zum Zitat Kingma, D.P., Rezende, D.J., Mohamed, S., Welling, M.: Semi-supervised learning with deep generative models. CoRR abs/1406.5298 (2014) Kingma, D.P., Rezende, D.J., Mohamed, S., Welling, M.: Semi-supervised learning with deep generative models. CoRR abs/1406.5298 (2014)
21.
Zurück zum Zitat Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., Ranzato, M.: Fader networks: manipulating images by sliding attributes. CoRR abs/1706.00409 (2017) Lample, G., Zeghidour, N., Usunier, N., Bordes, A., Denoyer, L., Ranzato, M.: Fader networks: manipulating images by sliding attributes. CoRR abs/1706.00409 (2017)
22.
Zurück zum Zitat Liu, R., Liu, Y., Gong, X., Wang, X., Li, H.: Conditional adversarial generative flow for controllable image synthesis. CoRR abs/1904.01782 (2019) Liu, R., Liu, Y., Gong, X., Wang, X., Li, H.: Conditional adversarial generative flow for controllable image synthesis. CoRR abs/1904.01782 (2019)
23.
Zurück zum Zitat Preechakul, K., Chatthee, N., Wizadwongsa, S., Suwajanakorn, S.: Diffusion autoencoders: toward a meaningful and decodable representation. In: CVPR (2022) Preechakul, K., Chatthee, N., Wizadwongsa, S., Suwajanakorn, S.: Diffusion autoencoders: toward a meaningful and decodable representation. In: CVPR (2022)
24.
Zurück zum Zitat Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: interpreting the disentangled face representation learned by GANs. CoRR abs/2005.09635 (2020) Shen, Y., Yang, C., Tang, X., Zhou, B.: InterFaceGAN: interpreting the disentangled face representation learned by GANs. CoRR abs/2005.09635 (2020)
25.
Zurück zum Zitat Suwała, A., Wójcik, B., Proszewska, M., Tabor, J., Spurek, P., Śmieja, M.: Face identity-aware disentanglement in StyleGAN. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5222–5231 (2024) Suwała, A., Wójcik, B., Proszewska, M., Tabor, J., Spurek, P., Śmieja, M.: Face identity-aware disentanglement in StyleGAN. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5222–5231 (2024)
26.
Zurück zum Zitat Tewari, A., et al.: PIE: portrait image embedding for semantic control. ACM Trans. Graph. 39(6), 1–14 (2020)CrossRef Tewari, A., et al.: PIE: portrait image embedding for semantic control. ACM Trans. Graph. 39(6), 1–14 (2020)CrossRef
27.
Zurück zum Zitat Vidal, A., Wu Fung, S., Tenorio, L., Osher, S., Nurbekyan, L.: Taming hyperparameter tuning in continuous normalizing flows using the JKO scheme. Sci. Rep. 13, 4501 (2023)CrossRef Vidal, A., Wu Fung, S., Tenorio, L., Osher, S., Nurbekyan, L.: Taming hyperparameter tuning in continuous normalizing flows using the JKO scheme. Sci. Rep. 13, 4501 (2023)CrossRef
28.
Zurück zum Zitat Wang, H., Yu, N., Fritz, M.: Hijack-GAN: unintended-use of pretrained, black-box GANs. CoRR abs/2011.14107 (2020) Wang, H., Yu, N., Fritz, M.: Hijack-GAN: unintended-use of pretrained, black-box GANs. CoRR abs/2011.14107 (2020)
29.
Zurück zum Zitat Wołczyk, M., et al.: PluGeN: multi-label conditional generation from pre-trained models. In: AAAI 2022 (2022) Wołczyk, M., et al.: PluGeN: multi-label conditional generation from pre-trained models. In: AAAI 2022 (2022)
Metadaten
Titel
StyleAutoEncoder for Manipulating Image Attributes Using Pre-trained StyleGAN
verfasst von
Andrzej Bedychaj
Jacek Tabor
Marek Śmieja
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2253-2_10

Premium Partner