Skip to main content

2024 | OriginalPaper | Buchkapitel

Distributed MCMC Inference for Bayesian Non-parametric Latent Block Model

verfasst von : Reda Khoufache, Anisse Belhadj, Hanene Azzag, Mustapha Lebbah

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we introduce a novel Distributed Markov Chain Monte Carlo (MCMC) inference method for the Bayesian Non-Parametric Latent Block Model (DisNPLBM), employing the Master/Worker architecture. Our non-parametric co-clustering algorithm divides observations and features into partitions using latent multivariate Gaussian block distributions. The workload on rows is evenly distributed among workers, who exclusively communicate with the master and not among themselves. DisNPLBM demonstrates its impact on cluster labeling accuracy and execution times through experimental results. Moreover, we present a real-use case applying our approach to co-cluster gene expression data. The code source is publicly available at https://​github.​com/​redakhoufache/​Distributed-NPLBM

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Ben Slimen, Y., Allio, S., Jacques, J.: Model-based co-clustering for functional data. Neurocomputing 291, 97–108 (2018) Ben Slimen, Y., Allio, S., Jacques, J.: Model-based co-clustering for functional data. Neurocomputing 291, 97–108 (2018)
3.
Zurück zum Zitat Box, G.E., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. Ser. B Stat Methodol. 26(2), 211–243 (1964)CrossRef Box, G.E., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. Ser. B Stat Methodol. 26(2), 211–243 (1964)CrossRef
4.
Zurück zum Zitat Cheng, X., Su, S., Gao, L., Yin, J.: Co-clusterd: a distributed framework for data co-clustering with sequential updates. IEEE Trans. Knowl. Data Eng. 27(12), 3231–3244 (2015)CrossRef Cheng, X., Su, S., Gao, L., Yin, J.: Co-clusterd: a distributed framework for data co-clustering with sequential updates. IEEE Trans. Knowl. Data Eng. 27(12), 3231–3244 (2015)CrossRef
5.
Zurück zum Zitat Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in rnalater preservative. J Mol Diagn 8(1), 31–39 (2006)CrossRef Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in rnalater preservative. J Mol Diagn 8(1), 31–39 (2006)CrossRef
6.
Zurück zum Zitat Deodhar, M., Jones, C., Ghosh, J.: Parallel simultaneous co-clustering and learning with map-reduce. In: 2010 IEEE International Conference on Granular Computing, pp. 149–154 (2010) Deodhar, M., Jones, C., Ghosh, J.: Parallel simultaneous co-clustering and learning with map-reduce. In: 2010 IEEE International Conference on Granular Computing, pp. 149–154 (2010)
7.
Zurück zum Zitat Folino, F., Greco, G., Guzzo, A., Pontieri, L.: Scalable parallel co-clustering over multiple heterogeneous data types, pp. 529 – 535, August 2010 Folino, F., Greco, G., Guzzo, A., Pontieri, L.: Scalable parallel co-clustering over multiple heterogeneous data types, pp. 529 – 535, August 2010
8.
Zurück zum Zitat Goffinet, E.: Multi-Block Clustering and Analytical Visualization of Massive Time Series from Autonomous Vehicle Simulation. Theses, Université Paris 13 Sorbonne Paris Nord, December 2021 Goffinet, E.: Multi-Block Clustering and Analytical Visualization of Massive Time Series from Autonomous Vehicle Simulation. Theses, Université Paris 13 Sorbonne Paris Nord, December 2021
9.
Zurück zum Zitat Goffinet, E., Lebbah, M., Azzag, G., Loic, G., Coutant, A.: Non-parametric multivariate time series co-clustering model applied to driving-assistance systems validation. In: International Workshop on Advanced Analysis & Learning on Temporal Data (2021) Goffinet, E., Lebbah, M., Azzag, G., Loic, G., Coutant, A.: Non-parametric multivariate time series co-clustering model applied to driving-assistance systems validation. In: International Workshop on Advanced Analysis & Learning on Temporal Data (2021)
10.
Zurück zum Zitat Govaert, G., Nadif, M.: Clustering with block mixture models. Pattern Recogn. 36, 463–473 (2003) Govaert, G., Nadif, M.: Clustering with block mixture models. Pattern Recogn. 36, 463–473 (2003)
11.
Zurück zum Zitat Greco, G., Guzzo, A., Pontieri, L.: Coclustering multiple heterogeneous domains: Linear combinations and agreements. IEEE Trans. Knowl. Data Eng. 22(12), 1649–1663 (2010)CrossRef Greco, G., Guzzo, A., Pontieri, L.: Coclustering multiple heterogeneous domains: Linear combinations and agreements. IEEE Trans. Knowl. Data Eng. 22(12), 1649–1663 (2010)CrossRef
12.
Zurück zum Zitat Hanisch, D., Zien, A., Zimmer, R.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, 05 (2002)CrossRef Hanisch, D., Zien, A., Zimmer, R.: Co-clustering of biological networks and gene expression data. Bioinformatics 18, 05 (2002)CrossRef
13.
Zurück zum Zitat Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)CrossRef Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)CrossRef
14.
Zurück zum Zitat Meeds, E., Roweis, S., Meeds, E., Roweis, S.: Nonparametric bayesian biclustering (2007) Meeds, E., Roweis, S., Meeds, E., Roweis, S.: Nonparametric bayesian biclustering (2007)
15.
Zurück zum Zitat Murphy, K.P.: Conjugate bayesian analysis of the gaussian distribution. def 1(2\(\sigma \)2), 16 (2007) Murphy, K.P.: Conjugate bayesian analysis of the gaussian distribution. def 1(2\(\sigma \)2), 16 (2007)
16.
Zurück zum Zitat Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)MathSciNetCrossRef Neal, R.M.: Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Stat. 9(2), 249–265 (2000)MathSciNetCrossRef
17.
Zurück zum Zitat Nutt, C.L., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003) Nutt, C.L., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
18.
Zurück zum Zitat Papadimitriou, S., Sun, J.: Disco: distributed co-clustering with map-reduce: a case study towards petabyte-scale end-to-end mining. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 512–521 (2008) Papadimitriou, S., Sun, J.: Disco: distributed co-clustering with map-reduce: a case study towards petabyte-scale end-to-end mining. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 512–521 (2008)
19.
Zurück zum Zitat Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4(2), 639–650 (1994)MathSciNet Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4(2), 639–650 (1994)MathSciNet
20.
Zurück zum Zitat Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002) Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Metadaten
Titel
Distributed MCMC Inference for Bayesian Non-parametric Latent Block Model
verfasst von
Reda Khoufache
Anisse Belhadj
Hanene Azzag
Mustapha Lebbah
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2242-6_22

Premium Partner