nach oben

Erschienen in:

2024 | OriginalPaper | Buchkapitel

Combating Quality Distortion in Federated Learning with Collaborative Data Selection

verfasst von : Duc Long Nguyen, Phi Le Nguyen, Thao Nguyen Truong

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Federated Learning (FL), a paradigm facilitating collaborative model training across distributed devices, has attracted substantial attention due to its potential to address privacy concerns and data localization requirements. However, the inherent inaccessibility of data poses a critical challenge in ensuring data quality within FL systems. Consequently, FL systems grapple with a range of data-related issues, encompassing erroneous samples, imbalanced data distributions, and data skew, all of which impose a significant impact on model performance. Therefore, the judicious selection of appropriate data for training is of paramount importance as it seeks to ameliorate these challenges.

This research paper tackles a crucial but often overlooked concern: the presence of low-quality data samples. In such circumstances, we introduce an innovative algorithm that strategically curates a subset of data for each training iteration, with the overarching objective of optimizing the model’s accuracy while simultaneously addressing privacy concerns and reducing communication costs. Our primary innovation lies in the global selection of data, in contrast to the conventional approach that relies on individualized, client-specific data selection.

Furthermore, we introduce a novel medical dataset tailored specifically for classification tasks. This dataset intentionally incorporates various attributes associated with low-quality data to effectively replicate real-world conditions. Through rigorous empirical evaluation, we show the effectiveness of our algorithm using this dataset. The results demonstrate a notable improvement of approximately 2–3% in model performance, particularly in scenarios characterized by imbalanced data distributions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Unveiling Backdoor Risks Brought by Foundation Models in Heterogeneous Federated Learning

Nächstes Kapitel Neural Marked Hawkes Process for Limit Order Book Modeling

The configurations of this experiment is described in Sect. 4.1.

Datasets are available at https://github.com/duclong1009/S-Selection.

By using the transformation function ImageEnhance of the PIL library.

da Costa, G.B.P., Contato, W.A., Nazare, T.S., Batista Neto, J.E.S., Ponti, M.: An empirical study on the effects of different types of noise in image classification tasks (2016)

Dodge, S., Karam, L.: Understanding how image quality affects deep neural networks (2016)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

He, Z., Rakin, A.S., Fan, D.: Parametric noise injection: trainable randomness to improve deep neural network robustness against adversarial attack. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

Holmstrom, L., Koistinen, P., et al.: Using additive noise in back-propagation training. IEEE Trans. Neural Netw. 3(1), 24–38 (1992)CrossRef

Jiang, A.H., et al.: Accelerating deep learning by focusing on the biggest losers (2019)

Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 2525–2534. PMLR, July 2018

Killamsetty, K., Sivasubramanian, D., Ramakrishnan, G., De, A., Iyer, R.: Grad-match: gradient matching based data subset selection for efficient deep model training (2021)

Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)

10.

Li, A., Zhang, L., Tan, J., Qin, Y., Wang, J., Li, X.-Y.: Sample-level data selection for federated learning. In: IEEE INFOCOM 2021 - IEEE Conference on Computer Communications, pp. 1–10 (2021)

11.

Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on non-IID data silos: an experimental study. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 965–978. IEEE (2022)

12.

Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks (2020)

13.

Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of fedavg on non-IID data. arXiv preprint arXiv:1907.02189 (2019)

14.

McMahan, B., Moore, E., Ramage, D., Hampson, S., Aguera, B., Arcas: Communication-efficient learning of deep networks from decentralized data, 54, 1273–1282 (2017)

15.

Paul, M., Ganguli, S., Dziugaite, G.K.: Deep learning on a data diet: finding important examples early in training (2023)

16.

Pillutla, K., Laguel, Y., Malick, J., Harchaoui, Z.: Tackling distribution shifts in federated learning with superquantile aggregation. In: NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications (2022)

17.

Qin, Z., et al.: Infobatch: lossless training speed up by unbiased dynamic data pruning (2023)

18.

Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.S.: Beyond neural scaling laws: beating power law scaling via data pruning (2023)

19.

Tolpegin, V., Truex, S., Gursoy, M.E., Liu, L.: Data poisoning attacks against federated learning systems. In: Chen, L., Li, N., Liang, K., Schneider, S. (eds.) ESORICS 2020. LNCS, vol. 12308, pp. 480–501. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58951-6_24CrossRef

20.

Truong, T.N., Gerofi, B., Martinez-Noriega, E.J., Trahay, F., Wahib, M.: KAKURENBO: adaptively hiding samples in deep neural network training. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)

21.

Wu, C., Yang, X., Zhu, S., Mitra, P.: Mitigating backdoor attacks in federated learning (2021)

22.

Yang, S., Park, H., Byun, J., Kim, C.: Robust federated learning with noisy labels. IEEE Intell. Syst. 37(2), 35–43 (2022)CrossRef

23.

Yang, S., Xie, Z., Peng, H., Xu, M., Sun, M., Li, P.: Dataset pruning: reducing training data by examining generalization influence. In: The Eleventh International Conference on Learning Representations (2023)

24.

Yu, X., Han, B., Yao, J., Niu, G., Tsang, I.W., Sugiyama, M.: How does disagreement help generalization against label corruption? (2019)

25.

Zhou, T., Konukoglu, E.: FedFA: federated feature augmentation (2023)

Titel: Combating Quality Distortion in Federated Learning with Collaborative Data Selection
verfasst von: Duc Long Nguyen
Phi Le Nguyen
Thao Nguyen Truong
Verlag: Springer Nature Singapore
Buch: Advances in Knowledge Discovery and Data Mining
Print ISBN: 978-981-9722-61-7

Electronic ISBN: 978-981-9722-59-4

Copyright-Jahr: 2024
DOI: https://doi.org/10.1007/978-981-97-2259-4_14

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner