Skip to main content

2024 | OriginalPaper | Buchkapitel

Local Subsequence-Based Distribution for Time Series Clustering

verfasst von : Lei Gong, Hang Zhang, Zongyou Liu, Kai Ming Ting, Yang Cao, Ye Zhu

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Analyzing the properties of subsequences within time series can reveal hidden patterns and improve the quality of time series clustering. However, most existing methods for subsequence analysis require point-to-point alignment, which is sensitive to shifts and noise. In this paper, we propose a clustering method named CTDS that treats time series as a set of independent and identically distributed (iid) points in \(\mathbb {R}^d\) extracted by a sliding window in local regions. CTDS utilises a distributional measure called Isolation Distributional Kernel (IDK) that can capture the subtle differences between probability distributions of subsequences without alignment. It has the ability to cluster large non-stationary and complex datasets. We evaluate CTDS on UCR time series benchmark datasets and demonstrate its superior performance than other state-of-the-art clustering methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The source code is available at https://​github.​com/​LeisureGong/​CTDS.
 
2
The training and testing subsets of each dataset are merged for clustering evaluation.
 
3
For short time series data with length less than 150, we directly generate all subsequences without split the data into segments.
 
4
We also evaluated their performance using RI and found a similar result. All experimental details can be found in the supplementary file.
 
5
For methods that solely produce representations or kernel matrices, we use K-means or Kernel K-means as the clustering algorithm.
 
Literatur
1.
Zurück zum Zitat Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering - a decade review. Inf. Syst. 53, 16–38 (2015)CrossRef Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering - a decade review. Inf. Syst. 53, 16–38 (2015)CrossRef
2.
Zurück zum Zitat Begum, N., Ulanova, L., Wang, J., Keogh, E.: Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 49–58 (2015) Begum, N., Ulanova, L., Wang, J., Keogh, E.: Accelerating dynamic time warping clustering with a novel admissible pruning strategy. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 49–58 (2015)
3.
Zurück zum Zitat Bock, C., Togninalli, M., Ghisu, E., Gumbsch, T., Rieck, B., Borgwardt, K.: A Wasserstein subsequence kernel for time series. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 964–969. IEEE (2019) Bock, C., Togninalli, M., Ghisu, E., Gumbsch, T., Rieck, B., Borgwardt, K.: A Wasserstein subsequence kernel for time series. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 964–969. IEEE (2019)
4.
Zurück zum Zitat Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)CrossRef Dau, H.A., et al.: The UCR time series archive. IEEE/CAA J. Automatica Sinica 6(6), 1293–1305 (2019)CrossRef
5.
Zurück zum Zitat Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNet Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNet
6.
Zurück zum Zitat He, Y., Chu, X., Wang, Y.: Neighbor profile: bagging nearest neighbors for unsupervised time series mining. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 373–384. IEEE (2020) He, Y., Chu, X., Wang, Y.: Neighbor profile: bagging nearest neighbors for unsupervised time series mining. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 373–384. IEEE (2020)
7.
Zurück zum Zitat Lafabregue, B., Weber, J., Gançarski, P., Forestier, G.: End-to-end deep representation learning for time series clustering: a comparative study. Data Min. Knowl. Disc. 36(1), 29–81 (2022)MathSciNetCrossRef Lafabregue, B., Weber, J., Gançarski, P., Forestier, G.: End-to-end deep representation learning for time series clustering: a comparative study. Data Min. Knowl. Disc. 36(1), 29–81 (2022)MathSciNetCrossRef
8.
Zurück zum Zitat Lei, Q., Yi, J., Vaculin, R., Wu, L., Dhillon, I.S.: Similarity preserving representation learning for time series clustering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI (2017) Lei, Q., Yi, J., Vaculin, R., Wu, L., Dhillon, I.S.: Similarity preserving representation learning for time series clustering. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI (2017)
9.
Zurück zum Zitat Li, X., Lin, J., Zhao, L.: Time series clustering in linear time complexity. Data Min. Knowl. Disc. 35, 2369–2388 (2021)MathSciNetCrossRef Li, X., Lin, J., Zhao, L.: Time series clustering in linear time complexity. Data Min. Knowl. Disc. 35, 2369–2388 (2021)MathSciNetCrossRef
10.
Zurück zum Zitat Ma, Q., Zheng, J., Li, S., Cottrell, G.W.: Learning representations for time series clustering. In: Advances in Neural Information Processing Systems, vol. 32 (2019) Ma, Q., Zheng, J., Li, S., Cottrell, G.W.: Learning representations for time series clustering. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
11.
Zurück zum Zitat Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008) Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
12.
Zurück zum Zitat MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967) MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
13.
Zurück zum Zitat Madiraju, N.S.: Deep temporal clustering: fully unsupervised learning of time-domain features. Ph.D. thesis, Arizona State University (2018) Madiraju, N.S.: Deep temporal clustering: fully unsupervised learning of time-domain features. Ph.D. thesis, Arizona State University (2018)
14.
Zurück zum Zitat Paparrizos, J., Gravano, L.: k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870 (2015) Paparrizos, J., Gravano, L.: k-shape: Efficient and accurate clustering of time series. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870 (2015)
15.
Zurück zum Zitat Qin, X., Ting, K.M., Zhu, Y., Lee, V.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the 33rd AAAI Conference on AI (AAAI 2019). AAAI Press (2019) Qin, X., Ting, K.M., Zhu, Y., Lee, V.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the 33rd AAAI Conference on AI (AAAI 2019). AAAI Press (2019)
16.
Zurück zum Zitat Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endow. 15(11), 2321–2333 (2022)CrossRef Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endow. 15(11), 2321–2333 (2022)CrossRef
17.
Zurück zum Zitat Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–206 (2020) Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 198–206 (2020)
18.
Zurück zum Zitat Ting, K.M., Zhu, Y., Zhou, Z.H.: Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2329–2337 (2018) Ting, K.M., Zhu, Y., Zhou, Z.H.: Isolation kernel and its effect on SVM. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2329–2337 (2018)
19.
Zurück zum Zitat Tonekaboni, S., Eytan, D., Goldenberg, A.: Unsupervised representation learning for time series with temporal neighborhood coding. In: International Conference on Learning Representations (2021) Tonekaboni, S., Eytan, D., Goldenberg, A.: Unsupervised representation learning for time series with temporal neighborhood coding. In: International Conference on Learning Representations (2021)
20.
Zurück zum Zitat Ulanova, L., Begum, N., Keogh, E.: Scalable clustering of time series with u-shapelets. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 900–908. SIAM (2015) Ulanova, L., Begum, N., Keogh, E.: Scalable clustering of time series with u-shapelets. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 900–908. SIAM (2015)
21.
Zurück zum Zitat Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)CrossRef Vallender, S.: Calculation of the Wasserstein distance between probability distributions on the line. Theory Probab. Appl. 18(4), 784–786 (1974)CrossRef
22.
Zurück zum Zitat Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956 (2009) Ye, L., Keogh, E.: Time series shapelets: a new primitive for data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 947–956 (2009)
23.
Zurück zum Zitat Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016) Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1317–1322. IEEE (2016)
24.
Zurück zum Zitat Yue, Z., et al.: Ts2vec: towards universal representation of time series. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8980–8987 (2022) Yue, Z., et al.: Ts2vec: towards universal representation of time series. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8980–8987 (2022)
25.
Zurück zum Zitat Zakaria, J., Mueen, A., Keogh, E.: Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, pp. 785–794. IEEE (2012) Zakaria, J., Mueen, A., Keogh, E.: Clustering time series using unsupervised-shapelets. In: 2012 IEEE 12th International Conference on Data Mining, pp. 785–794. IEEE (2012)
26.
Zurück zum Zitat Zhao, Y., Ye, L., Li, Z., Song, X., Lang, Y., Su, J.: A novel bidirectional mechanism based on time series model for wind power forecasting. Appl. Energy 177, 793–803 (2016)CrossRef Zhao, Y., Ye, L., Li, Z., Song, X., Lang, Y., Su, J.: A novel bidirectional mechanism based on time series model for wind power forecasting. Appl. Energy 177, 793–803 (2016)CrossRef
Metadaten
Titel
Local Subsequence-Based Distribution for Time Series Clustering
verfasst von
Lei Gong
Hang Zhang
Zongyou Liu
Kai Ming Ting
Yang Cao
Ye Zhu
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2242-6_21

Premium Partner