Skip to main content

2024 | OriginalPaper | Buchkapitel

Combating Quality Distortion in Federated Learning with Collaborative Data Selection

verfasst von : Duc Long Nguyen, Phi Le Nguyen, Thao Nguyen Truong

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Federated Learning (FL), a paradigm facilitating collaborative model training across distributed devices, has attracted substantial attention due to its potential to address privacy concerns and data localization requirements. However, the inherent inaccessibility of data poses a critical challenge in ensuring data quality within FL systems. Consequently, FL systems grapple with a range of data-related issues, encompassing erroneous samples, imbalanced data distributions, and data skew, all of which impose a significant impact on model performance. Therefore, the judicious selection of appropriate data for training is of paramount importance as it seeks to ameliorate these challenges.
This research paper tackles a crucial but often overlooked concern: the presence of low-quality data samples. In such circumstances, we introduce an innovative algorithm that strategically curates a subset of data for each training iteration, with the overarching objective of optimizing the model’s accuracy while simultaneously addressing privacy concerns and reducing communication costs. Our primary innovation lies in the global selection of data, in contrast to the conventional approach that relies on individualized, client-specific data selection.
Furthermore, we introduce a novel medical dataset tailored specifically for classification tasks. This dataset intentionally incorporates various attributes associated with low-quality data to effectively replicate real-world conditions. Through rigorous empirical evaluation, we show the effectiveness of our algorithm using this dataset. The results demonstrate a notable improvement of approximately 2–3% in model performance, particularly in scenarios characterized by imbalanced data distributions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The configurations of this experiment is described in Sect. 4.1.
 
3
By using the transformation function ImageEnhance of the PIL library.
 
Literatur
1.
Zurück zum Zitat da Costa, G.B.P., Contato, W.A., Nazare, T.S., Batista Neto, J.E.S., Ponti, M.: An empirical study on the effects of different types of noise in image classification tasks (2016) da Costa, G.B.P., Contato, W.A., Nazare, T.S., Batista Neto, J.E.S., Ponti, M.: An empirical study on the effects of different types of noise in image classification tasks (2016)
2.
Zurück zum Zitat Dodge, S., Karam, L.: Understanding how image quality affects deep neural networks (2016) Dodge, S., Karam, L.: Understanding how image quality affects deep neural networks (2016)
3.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
4.
Zurück zum Zitat He, Z., Rakin, A.S., Fan, D.: Parametric noise injection: trainable randomness to improve deep neural network robustness against adversarial attack. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019 He, Z., Rakin, A.S., Fan, D.: Parametric noise injection: trainable randomness to improve deep neural network robustness against adversarial attack. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
5.
Zurück zum Zitat Holmstrom, L., Koistinen, P., et al.: Using additive noise in back-propagation training. IEEE Trans. Neural Netw. 3(1), 24–38 (1992)CrossRef Holmstrom, L., Koistinen, P., et al.: Using additive noise in back-propagation training. IEEE Trans. Neural Netw. 3(1), 24–38 (1992)CrossRef
6.
Zurück zum Zitat Jiang, A.H., et al.: Accelerating deep learning by focusing on the biggest losers (2019) Jiang, A.H., et al.: Accelerating deep learning by focusing on the biggest losers (2019)
7.
Zurück zum Zitat Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 2525–2534. PMLR, July 2018 Katharopoulos, A., Fleuret, F.: Not all samples are created equal: deep learning with importance sampling. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 2525–2534. PMLR, July 2018
8.
Zurück zum Zitat Killamsetty, K., Sivasubramanian, D., Ramakrishnan, G., De, A., Iyer, R.: Grad-match: gradient matching based data subset selection for efficient deep model training (2021) Killamsetty, K., Sivasubramanian, D., Ramakrishnan, G., De, A., Iyer, R.: Grad-match: gradient matching based data subset selection for efficient deep model training (2021)
9.
Zurück zum Zitat Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009) Krizhevsky, A., et al.: Learning multiple layers of features from tiny images (2009)
10.
Zurück zum Zitat Li, A., Zhang, L., Tan, J., Qin, Y., Wang, J., Li, X.-Y.: Sample-level data selection for federated learning. In: IEEE INFOCOM 2021 - IEEE Conference on Computer Communications, pp. 1–10 (2021) Li, A., Zhang, L., Tan, J., Qin, Y., Wang, J., Li, X.-Y.: Sample-level data selection for federated learning. In: IEEE INFOCOM 2021 - IEEE Conference on Computer Communications, pp. 1–10 (2021)
11.
Zurück zum Zitat Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on non-IID data silos: an experimental study. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 965–978. IEEE (2022) Li, Q., Diao, Y., Chen, Q., He, B.: Federated learning on non-IID data silos: an experimental study. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 965–978. IEEE (2022)
12.
Zurück zum Zitat Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks (2020) Li, T., Sahu, A.K., Zaheer, M., Sanjabi, M., Talwalkar, A., Smith, V.: Federated optimization in heterogeneous networks (2020)
13.
Zurück zum Zitat Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of fedavg on non-IID data. arXiv preprint arXiv:1907.02189 (2019) Li, X., Huang, K., Yang, W., Wang, S., Zhang, Z.: On the convergence of fedavg on non-IID data. arXiv preprint arXiv:​1907.​02189 (2019)
14.
Zurück zum Zitat McMahan, B., Moore, E., Ramage, D., Hampson, S., Aguera, B., Arcas: Communication-efficient learning of deep networks from decentralized data, 54, 1273–1282 (2017) McMahan, B., Moore, E., Ramage, D., Hampson, S., Aguera, B., Arcas: Communication-efficient learning of deep networks from decentralized data, 54, 1273–1282 (2017)
15.
Zurück zum Zitat Paul, M., Ganguli, S., Dziugaite, G.K.: Deep learning on a data diet: finding important examples early in training (2023) Paul, M., Ganguli, S., Dziugaite, G.K.: Deep learning on a data diet: finding important examples early in training (2023)
16.
Zurück zum Zitat Pillutla, K., Laguel, Y., Malick, J., Harchaoui, Z.: Tackling distribution shifts in federated learning with superquantile aggregation. In: NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications (2022) Pillutla, K., Laguel, Y., Malick, J., Harchaoui, Z.: Tackling distribution shifts in federated learning with superquantile aggregation. In: NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and Applications (2022)
17.
Zurück zum Zitat Qin, Z., et al.: Infobatch: lossless training speed up by unbiased dynamic data pruning (2023) Qin, Z., et al.: Infobatch: lossless training speed up by unbiased dynamic data pruning (2023)
18.
Zurück zum Zitat Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.S.: Beyond neural scaling laws: beating power law scaling via data pruning (2023) Sorscher, B., Geirhos, R., Shekhar, S., Ganguli, S., Morcos, A.S.: Beyond neural scaling laws: beating power law scaling via data pruning (2023)
20.
Zurück zum Zitat Truong, T.N., Gerofi, B., Martinez-Noriega, E.J., Trahay, F., Wahib, M.: KAKURENBO: adaptively hiding samples in deep neural network training. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023) Truong, T.N., Gerofi, B., Martinez-Noriega, E.J., Trahay, F., Wahib, M.: KAKURENBO: adaptively hiding samples in deep neural network training. In: Thirty-Seventh Conference on Neural Information Processing Systems (2023)
21.
Zurück zum Zitat Wu, C., Yang, X., Zhu, S., Mitra, P.: Mitigating backdoor attacks in federated learning (2021) Wu, C., Yang, X., Zhu, S., Mitra, P.: Mitigating backdoor attacks in federated learning (2021)
22.
Zurück zum Zitat Yang, S., Park, H., Byun, J., Kim, C.: Robust federated learning with noisy labels. IEEE Intell. Syst. 37(2), 35–43 (2022)CrossRef Yang, S., Park, H., Byun, J., Kim, C.: Robust federated learning with noisy labels. IEEE Intell. Syst. 37(2), 35–43 (2022)CrossRef
23.
Zurück zum Zitat Yang, S., Xie, Z., Peng, H., Xu, M., Sun, M., Li, P.: Dataset pruning: reducing training data by examining generalization influence. In: The Eleventh International Conference on Learning Representations (2023) Yang, S., Xie, Z., Peng, H., Xu, M., Sun, M., Li, P.: Dataset pruning: reducing training data by examining generalization influence. In: The Eleventh International Conference on Learning Representations (2023)
24.
Zurück zum Zitat Yu, X., Han, B., Yao, J., Niu, G., Tsang, I.W., Sugiyama, M.: How does disagreement help generalization against label corruption? (2019) Yu, X., Han, B., Yao, J., Niu, G., Tsang, I.W., Sugiyama, M.: How does disagreement help generalization against label corruption? (2019)
25.
Zurück zum Zitat Zhou, T., Konukoglu, E.: FedFA: federated feature augmentation (2023) Zhou, T., Konukoglu, E.: FedFA: federated feature augmentation (2023)
Metadaten
Titel
Combating Quality Distortion in Federated Learning with Collaborative Data Selection
verfasst von
Duc Long Nguyen
Phi Le Nguyen
Thao Nguyen Truong
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2259-4_14

Premium Partner