Skip to main content

2024 | OriginalPaper | Buchkapitel

SecureBoost\(+\): Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

verfasst von : Tao Fan, Weijing Chen, Guoqiang Ma, Yan Kang, Lixin Fan, Qiang Yang

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm that is widely used in industry. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large-scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples faster than SecureBoost. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization and the introduction of new training mechanisms. The experimental results show that SecureBoost+ is 6–35x faster than SecureBoost but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Cao, S., Yang, X., Chen, C., Zhou, J., Li, X., Qi, Y.: Titant: online real-time transaction fraud detection in ant financial. arXiv preprint arXiv:1906.07407 (2019) Cao, S., Yang, X., Chen, C., Zhou, J., Li, X., Qi, Y.: Titant: online real-time transaction fraud detection in ant financial. arXiv preprint arXiv:​1906.​07407 (2019)
2.
Zurück zum Zitat Chai, D., Wang, L., Chen, K., Yang, Q.: Secure federated matrix factorization. IEEE Intell. Syst. (2020) Chai, D., Wang, L., Chen, K., Yang, Q.: Secure federated matrix factorization. IEEE Intell. Syst. (2020)
3.
Zurück zum Zitat Chen, C., et al.: When homomorphic encryption marries secret sharing: secure large-scale sparse logistic regression and applications in risk control. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2652–2662 (2021) Chen, C., et al.: When homomorphic encryption marries secret sharing: secure large-scale sparse logistic regression and applications in risk control. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2652–2662 (2021)
4.
Zurück zum Zitat Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016) Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794 (2016)
5.
Zurück zum Zitat Cheng, K., et al.: SecureBoost: a lossless federated learning framework. IEEE Intell. Syst. 36(6), 87–98 (2021)CrossRef Cheng, K., et al.: SecureBoost: a lossless federated learning framework. IEEE Intell. Syst. 36(6), 87–98 (2021)CrossRef
6.
Zurück zum Zitat Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018) Dorogush, A.V., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:​1810.​11363 (2018)
7.
Zurück zum Zitat Fu, F., Jiang, J., Shao, Y., Cui, B.: An experimental evaluation of large scale GBDT systems. arXiv preprint arXiv:1907.01882 (2019) Fu, F., Jiang, J., Shao, Y., Cui, B.: An experimental evaluation of large scale GBDT systems. arXiv preprint arXiv:​1907.​01882 (2019)
8.
Zurück zum Zitat Fu, F., et al.: VF2Boost: very fast vertical federated gradient boosting for cross-enterprise learning. In: Proceedings of the 2021 International Conference on Management of Data, pp. 563–576 (2021) Fu, F., et al.: VF2Boost: very fast vertical federated gradient boosting for cross-enterprise learning. In: Proceedings of the 2021 International Conference on Management of Data, pp. 563–576 (2021)
9.
Zurück zum Zitat Hardy, S., et al.: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677 (2017) Hardy, S., et al.: Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:​1711.​10677 (2017)
10.
Zurück zum Zitat He, Y., et al.: A hybrid self-supervised learning framework for vertical federated learning. arXiv preprint arXiv:2208.08934 (2022) He, Y., et al.: A hybrid self-supervised learning framework for vertical federated learning. arXiv preprint arXiv:​2208.​08934 (2022)
11.
Zurück zum Zitat Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021) Kairouz, P., et al.: Advances and open problems in federated learning. Found. Trends® Mach. Learn. 14(1–2), 1–210 (2021)
12.
Zurück zum Zitat Kang, Y., He, Y., Luo, J., Fan, T., Liu, Y., Yang, Q.: Privacy-preserving federated adversarial domain adaptation over feature groups for interpretability. IEEE Trans. Big Data (2022) Kang, Y., He, Y., Luo, J., Fan, T., Liu, Y., Yang, Q.: Privacy-preserving federated adversarial domain adaptation over feature groups for interpretability. IEEE Trans. Big Data (2022)
13.
Zurück zum Zitat Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154 (2017) Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154 (2017)
14.
Zurück zum Zitat Li, Q., Wen, Z., He, B.: Practical federated gradient boosting decision trees. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4642–4649 (2020) Li, Q., Wen, Z., He, B.: Practical federated gradient boosting decision trees. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 4642–4649 (2020)
16.
Zurück zum Zitat Liu, Y., Kang, Y., Xing, C., Chen, T., Yang, Q.: A secure federated transfer learning framework. IEEE Intell. Syst. 35(4), 70–82 (2020)CrossRef Liu, Y., Kang, Y., Xing, C., Chen, T., Yang, Q.: A secure federated transfer learning framework. IEEE Intell. Syst. 35(4), 70–82 (2020)CrossRef
17.
18.
Zurück zum Zitat McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017) McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)
20.
Zurück zum Zitat Shahbazi, Z., Byun, Y.C.: Product recommendation based on content-based filtering using XGBoost classifier. Int. J. Adv. Sci. Technol 29, 6979–6988 (2019) Shahbazi, Z., Byun, Y.C.: Product recommendation based on content-based filtering using XGBoost classifier. Int. J. Adv. Sci. Technol 29, 6979–6988 (2019)
21.
Zurück zum Zitat Wang, X., He, X., Feng, F., Nie, L., Chua, T.S.: Tem: tree-enhanced embedding model for explainable recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1543–1552 (2018) Wang, X., He, X., Feng, F., Nie, L., Chua, T.S.: Tem: tree-enhanced embedding model for explainable recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1543–1552 (2018)
22.
Zurück zum Zitat Yang, K., Fan, T., Chen, T., Shi, Y., Yang, Q.: A quasi-newton method based vertical federated learning framework for logistic regression. arXiv preprint arXiv:1912.00513 (2019) Yang, K., Fan, T., Chen, T., Shi, Y., Yang, Q.: A quasi-newton method based vertical federated learning framework for logistic regression. arXiv preprint arXiv:​1912.​00513 (2019)
23.
Zurück zum Zitat Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)CrossRef Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 10(2), 1–19 (2019)CrossRef
24.
Zurück zum Zitat Zhang, C., Li, S., Xia, J., Wang, W., Yan, F., Liu, Y.: BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In: 2020 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 20), pp. 493–506 (2020) Zhang, C., Li, S., Xia, J., Wang, W., Yan, F., Liu, Y.: BatchCrypt: efficient homomorphic encryption for cross-silo federated learning. In: 2020 \(\{\)USENIX\(\}\) Annual Technical Conference (\(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 20), pp. 493–506 (2020)
Metadaten
Titel
SecureBoost: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree
verfasst von
Tao Fan
Weijing Chen
Guoqiang Ma
Yan Kang
Lixin Fan
Qiang Yang
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2259-4_18

Premium Partner