An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Authors

RIV Lab, Department of Computer Engineering, Bu-Ali Sina University, Hamedan, Iran

Abstract

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative studies using quantitative and large scale evaluations.  In order to provide a comprehensive validation, an extensive evaluation of similarity measures for MTS clustering were conducted. The 14 well-known similarity measures with their variants and testing their effectiveness on 23 MTS datasets coming from a wide variety of application domains were re-implemented. In this paper, an overview of these different techniques is given and the empirical comparison regarding their effectiveness based on agglomerative clustering task is presented. Furthermore, the statistical significance tests were used to derive meaningful conclusions. It has been found that all of similarity measures are equivalent, in terms of clustering F-measure, and there is no significant difference between similarity measures based on our datasets. The results provide a comparative background between similarity measures to find the most proper method in terms of performance and computation time in this field.

Keywords


1.     Keogh, E., Xi, X., Wei, L. and Ratanamahatana, C. A., "The UCR time series classification/clustering homepage", URL= http://www. cs. ucr. edu/eamonn/time series data Vol., (2006)

2.     Jeong, Y.-S., Jeong, M. K. and Omitaomu, O. A., "Weighted dynamic time warping for time series classification", Pattern Recognition Vol. 9, (2011), 2231-2240.

3.     Kantz, H. and Schreiber, T., "Nonlinear time series analysis", Cambridge university press,  (2004).

4.     Fu, Z., Hu, W. and Tan, T., "Similarity based vehicle trajectory clustering and anomaly detection", Image Processing, 2005. ICIP 2005. IEEE International Conference on, IEEE, (2005); Liao, T. W., "Clustering of time series data—a survey", Pattern Recognition Vol. 11, (2005), 1857-1874.

5.     Doroudyan, M., Owlia, M., Sadeghi, H. and Amiri, A., "Monitoring Financial Processes with ARMA-GARCH Model Based on Shewhart Control Chart (Case Study: Tehran Stock Exchange)", International Journal of Engineering-Transactions B: Applications Vol. 2, (2017), 270-278.

6.     Shafiee-Chafi, M. and Gholizade-Narm, H., "A novel fuzzy based method for heart rate variability prediction", International Journal of Engineering-Transactions A: Basics Vol. 7, (2014), 1041-1049.

7.     Aghabozorgi, S., Shirkhorshidi, A. S. and Wah, T. Y., "Time-series clustering–A decade review", Information Systems Vol., (2015), 16-38.

8.     Kleist, C., "Time Series Data Mining Methods: A Review", Humboldt-Universität zu Berlin,  (2015).

9.     Darvishi, A. and Hassanpour, H., "A geometric view of similarity measures in data mining", International Journal of Engineering-Transactions C: Aspects Vol. 12, (2015), 1728-1734.

10.   Faloutsos, C., Ranganathan, M. and Manolopoulos, Y., "Fast subsequence matching in time-series databases", ACM,  (1994).

11.   Rabiner, L. R. and Juang, B.-H., "Fundamentals of speech recognition",  Vol., (1993)

12.   Berndt, D. J. and Clifford, J., "Using dynamic time warping to find patterns in time series", KDD workshop, Seattle, WA, (1994).

13.   Vlachos, M., Kollios, G. and Gunopulos, D., "Discovering similar multidimensional trajectories", Data Engineering, 2002. Proceedings. 18th International Conference on, IEEE, (2002).

14.   Chen, L. and Ng, R., "On the marriage of lp-norms and edit distance", Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, VLDB Endowment, (2004).

15.   Chen, L., Özsu, M. T. and Oria, V., "Robust and fast similarity search for moving object trajectories", Proceedings of the 2005 ACM SIGMOD international conference on Management of data, ACM, (2005).

16.   Morse, M. D. and Patel, J. M., "An efficient and accurate method for evaluating time series similarity", Proceedings of the 2007 ACM SIGMOD international conference on Management of data, ACM, (2007).

17.   Marteau, P. F., "Time warp edit distance with stiffness adjustment for time series matching", IEEE Trans Pattern Anal Mach Intell Vol. 2, (2009), 306-318.

18.   Stefan, A., Athitsos, V. and Das, G., "The move-split-merge metric for time series", IEEE transactions on Knowledge and Data Engineering Vol. 6, (2013), 1425-1438.

19.   Eiter, T. and Mannila, H., "Computing discrete Fréchet distance"  (1994).

20.   Alt, H. and Godau, M., "Computing the Fréchet distance between two polygonal curves", International Journal of Computational Geometry & Applications Vol. 01n02, (1995), 75-91.

21.   Besse, P. C., Guillouet, B., Loubes, J.-M. and Royer, F., "Review and Perspective for Distance-Based Clustering of Vehicle Trajectories", IEEE Transactions on Intelligent Transportation Systems Vol. 11, (2016), 3306-3317.

22.   Górecki, T. and Łuczak, M., "Using derivatives in time series classification", Data Mining and Knowledge Discovery Vol., (2013), 1-22.

23.   Batista, G. E., Keogh, E. J., Tataw, O. M. and De Souza, V. M., "CID: an efficient complexity-invariant distance for time series", Data Mining and Knowledge Discovery Vol. 3, (2014), 634-669.

24.   Ding, H., Trajcevski, G., Scheuermann, P., Wang, X. and Keogh, E., "Querying and mining of time series data: experimental comparison of representations and distance measures", Proceedings of the VLDB Endowment Vol. 2, (2008), 1542-1552.

25.   Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P. and Keogh, E., "Experimental comparison of representation methods and distance measures for time series data", Data Mining and Knowledge Discovery Vol., (2013), 1-35; Serra, J. and Arcos, J. L., "An empirical evaluation of similarity measures for time series classification", Knowledge-Based Systems Vol., (2014), 305-314; Lines, J. and Bagnall, A., "Time series classification with ensembles of elastic distance measures", Data Mining and Knowledge Discovery Vol. 3, (2015), 565-592; Bagnall, A., Bostrom, A., Large, J. and Lines, J., "The great time series classification bake off: an experimental evaluation of recently proposed algorithms", Extended Version. CoRR, abs/1602.01711 Vol., (2016)

26.   Deza, M. M. and Deza, E., "Encyclopedia of distances", Springer,  (2009).

27.   Wei, W. W.-S., "Time series analysis", Addison-Wesley publ Reading,  (1994).

28.   Levenshtein, V. I., "Binary codes capable of correcting deletions, insertions, and reversals", Soviet physics doklady, (1966).

29.   Sakoe, H. and Chiba, S., "Dynamic programming algorithm optimization for spoken word recognition", IEEE transactions on acoustics, speech, and signal processing Vol. 1, (1978), 43-49.

30.   Hausdorff, F., "Mengenlehre", Walter de Gruyter Berlin,  (1927).

31.   Mori, U., Mendiburu, A. and Lozano, J. A., "Similarity measure selection for clustering time series databases", IEEE Transactions on Knowledge and Data Engineering Vol. 1, (2016), 181-195; Yuan, G., Sun, P., Zhao, J., Li, D. and Wang, C., "A review of moving object trajectory clustering algorithms", Artificial Intelligence Review Vol. 1, (2017), 123-144.

32.   Wagner, S. and Wagner, D., "Comparing clusterings: an overview", Universität Karlsruhe, Fakultät für Informatik Karlsruhe,  (2007).

33.   Zhang, Z., Huang, K. and Tan, T., "Comparison of similarity measures for trajectory clustering in outdoor surveillance scenes", Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, IEEE, (2006).

34.   Kadous, W., "GRASP: Recognition of Australian sign language using Instrumented gloves",  Vol., (1995)

35.   Hu, W., Li, X., Tian, G., Maybank, S. and Zhang, Z., "An incremental DPMM-based method for trajectory clustering, modeling, and retrieval", IEEE Trans Pattern Anal Mach Intell Vol. 5, (2013), 1051-1065.

36.   Morris, B. and Trivedi, M., "Learning trajectory patterns by clustering: Experimental studies and comparative evaluation", Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, IEEE, (2009).

37.   Beyan, C. and Fisher, R. B., "Detection of Abnormal Fish Trajectories Using a Clustering Based Hierarchical Classifier", BMVC, (2013).

38.   Ramos-Garijo, R., Martín, S., Marzal, A., Prat, F., Vilar, J. M. and Llorens, D., "An input panel and recognition engine for on-line handwritten text recognition", Frontiers in Artificial Intelligence and Applications Vol., (2007), 223-230.

39.   Calderara, S., Prati, A. and Cucchiara, R., "Mixtures of von mises distributions for people trajectory shape analysis", IEEE Transactions on Circuits and Systems for Video Technology Vol. 4, (2011), 457-471.

40.   Yeung, D.-Y., Chang, H., Xiong, Y., George, S., Kashi, R., Matsumoto, T. and Rigoll, G., "SVC2004: First international signature verification competition", Springer,  (2004).

41.   Demšar, J., "Statistical comparisons of classifiers over multiple data sets", Journal of Machine Learning Research Vol. Jan, (2006), 1-30.

42.   García, S., Fernández, A., Luengo, J. and Herrera, F., "Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power", Information Sciences Vol. 10, (2010), 2044-2064.

43.   Friedman, M., "A comparison of alternative tests of significance for the problem of m rankings", The Annals of Mathematical Statistics Vol. 1, (1940), 86-92.

44.   Holland, B. S. and Copenhaver, M. D., "An improved sequentially rejective Bonferroni test procedure", Biometrics Vol., (1987), 417-423.