Skip to main content

2024 | OriginalPaper | Buchkapitel

Streaming Fair k-Center Clustering over Massive Dataset with Performance Guarantee

verfasst von : Zeyu Lin, Longkun Guo, Chaoqi Jia

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Emerging applications are imposing challenges for incorporating fairness constraints into k-center clustering in the streaming setting. Different from the traditional k-center problem, the fairness constraints require that the input points be divided into disjoint groups and the number of centers from each group is constrained by a given upper bound. Moreover, observing the applications of fair k-center in massive datasets, we consider the problem in the streaming setting, where the data points arrive in a streaming manner that each point can be processed at its arrival. As the main contributions, we propose a two-pass streaming algorithm for the fair k-center problem with two groups, achieving an approximation ratio of \(3+\epsilon \) and consuming only \(O(k\log n)\) memory and O(k) update time, matching the state-of-art ratio for the offline setting. Then, we show that the algorithm can be easily improved to a one-pass streaming algorithm with an approximation ratio of \(7+\epsilon \) and the same memory complexity and update time. Moreover, we show that our algorithm can be simply tuned to solve the case with an arbitrary number of groups while achieving the same ratio and space complexity. Lastly, we carried out extensive experiments to evaluate the practical performance of our algorithm compared with the state-of-the-art algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Asuncion, A., Newman, D.: UCI machine learning repository (2007) Asuncion, A., Newman, D.: UCI machine learning repository (2007)
2.
Zurück zum Zitat Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: Massive data summarization on the fly. In: 20th International Conference on Knowledge Discovery and Data Mining(ACM SIGKDD), pp. 671–680 (2014) Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., Krause, A.: Streaming submodular maximization: Massive data summarization on the fly. In: 20th International Conference on Knowledge Discovery and Data Mining(ACM SIGKDD), pp. 671–680 (2014)
3.
Zurück zum Zitat Bandyapadhyay, S., Friggstad, Z., Mousavi, R.: Parameterized approximation algorithms for \(k\)-center clustering and variants. In: 36th AAAI Conference on Artificial Intelligence. vol. 36, pp. 3895–3903 (2022) Bandyapadhyay, S., Friggstad, Z., Mousavi, R.: Parameterized approximation algorithms for \(k\)-center clustering and variants. In: 36th AAAI Conference on Artificial Intelligence. vol. 36, pp. 3895–3903 (2022)
4.
Zurück zum Zitat Ceccarello, M., Pietracaprina, A., Pucci, G.: Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. In: 45th International Conference on Very Large Databases(VLDB), pp. 766–778 (2019) Ceccarello, M., Pietracaprina, A., Pucci, G.: Solving k-center clustering (with outliers) in mapreduce and streaming, almost as accurately as sequentially. In: 45th International Conference on Very Large Databases(VLDB), pp. 766–778 (2019)
5.
Zurück zum Zitat Chen, D.Z., Li, J., Liang, H., Wang, H.: Matroid and knapsack center problems. Algorithmica 75, 27–52 (2016)MathSciNetCrossRef Chen, D.Z., Li, J., Liang, H., Wang, H.: Matroid and knapsack center problems. Algorithmica 75, 27–52 (2016)MathSciNetCrossRef
6.
Zurück zum Zitat Chiplunkar, A., Kale, S., Ramamoorthy, S.N.: How to solve fair k-center in massive data models. In: 37th International Conference on Machine Learning(ICML), pp. 1877–1886 (2020) Chiplunkar, A., Kale, S., Ramamoorthy, S.N.: How to solve fair k-center in massive data models. In: 37th International Conference on Machine Learning(ICML), pp. 1877–1886 (2020)
7.
Zurück zum Zitat Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: 21th International Conference on Knowledge Discovery and Data Mining(ACM SIGKDD), pp. 259–268 (2015) Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: 21th International Conference on Knowledge Discovery and Data Mining(ACM SIGKDD), pp. 259–268 (2015)
8.
Zurück zum Zitat Fritz, M., Behringer, M., Schwarz, H.: Log-means: efficiently estimating the number of clusters in large datasets. In: 46th International Conference on Very Large Databases (VLDB), pp. 2118–2131 (2020) Fritz, M., Behringer, M., Schwarz, H.: Log-means: efficiently estimating the number of clusters in large datasets. In: 46th International Conference on Very Large Databases (VLDB), pp. 2118–2131 (2020)
9.
Zurück zum Zitat Holstein, K., Wortman Vaughan, J., Daumé III, H., Dudik, M., Wallach, H.: Improving fairness in machine learning systems: What do industry practitioners need? In: CHI Conference on Human Factors in Computing Systems, pp. 1–16 (2019) Holstein, K., Wortman Vaughan, J., Daumé III, H., Dudik, M., Wallach, H.: Improving fairness in machine learning systems: What do industry practitioners need? In: CHI Conference on Human Factors in Computing Systems, pp. 1–16 (2019)
10.
Zurück zum Zitat Hotegni, S.S., Mahabadi, S., Vakilian, A.: Approximation algorithms for fair range clustering. In: 40th International Conference on Machine Learning (ICML), pp. 13270–13284 (2023) Hotegni, S.S., Mahabadi, S., Vakilian, A.: Approximation algorithms for fair range clustering. In: 40th International Conference on Machine Learning (ICML), pp. 13270–13284 (2023)
11.
Zurück zum Zitat Jones, M., Nguyen, H., Nguyen, T.: Fair \(k\)-centers via maximum matching. In: 37th International Conference on Machine Learning(ICML). pp. 4940–4949 (2020) Jones, M., Nguyen, H., Nguyen, T.: Fair \(k\)-centers via maximum matching. In: 37th International Conference on Machine Learning(ICML). pp. 4940–4949 (2020)
12.
Zurück zum Zitat Kale, S.: Small space stream summary for matroid center. Approximation, Randomization, and Combinatorial Optimization. Algorithms Tech. 145, 20 (2019) Kale, S.: Small space stream summary for matroid center. Approximation, Randomization, and Combinatorial Optimization. Algorithms Tech. 145, 20 (2019)
13.
Zurück zum Zitat Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair \(k\)-center clustering for data summarization. In: 36th International Conference on Machine Learning (ICML), pp. 3448–3457 (2019) Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair \(k\)-center clustering for data summarization. In: 36th International Conference on Machine Learning (ICML), pp. 3448–3457 (2019)
14.
Zurück zum Zitat Matthew McCutchen, R., Khuller, S.: Streaming algorithms for \(k\)-center clustering with outliers and with anonymity. Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pp. 165–178 (2008) Matthew McCutchen, R., Khuller, S.: Streaming algorithms for \(k\)-center clustering with outliers and with anonymity. Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pp. 165–178 (2008)
15.
Zurück zum Zitat Sweeney, L.: Discrimination in online ad delivery. Commun. ACM 56(5), 44–54 (2013)CrossRef Sweeney, L.: Discrimination in online ad delivery. Commun. ACM 56(5), 44–54 (2013)CrossRef
16.
Zurück zum Zitat Tambe, P., Cappelli, P., Yakubovich, V.: Artificial intelligence in human resources management: challenges and a path forward. Calif. Manage. Rev. 61(4), 15–42 (2019)CrossRef Tambe, P., Cappelli, P., Yakubovich, V.: Artificial intelligence in human resources management: challenges and a path forward. Calif. Manage. Rev. 61(4), 15–42 (2019)CrossRef
Metadaten
Titel
Streaming Fair k-Center Clustering over Massive Dataset with Performance Guarantee
verfasst von
Zeyu Lin
Longkun Guo
Chaoqi Jia
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2259-4_8

Premium Partner