Abstract




 
   

IJE TRANSACTIONS C: Aspects Vol. 28, No. 12 (December 2015) 1728-1737   

PDF URL: http://www.ije.ir/Vol28/No12/C/5-2140.pdf  
downloaded Downloaded: 200   viewed Viewed: 2241

  A GEOMETRIC VIEW OF SIMILARITY MEASURES IN DATA MINING
 
A. Darvishi and H. Hassanpour
 
( Received: October 08, 2015 – Accepted: December 24, 2015 )
 
 

Abstract    The main objective of data mining is to acquire information from a set of data for prospect applications using a measure. The concerning issue is that one often has to deal with large scale data. Several dimensionality reduction techniques like various feature extraction methods have been developed to resolve the issue. However, the geometric view of the applied measure, as an additional consideration, is generally neglected. Since each measure has its own perspective to the data, different interpretations may achieved on data depending on the used measure. While efforts are often focused on adjusting the feature extraction techniques for mining the data, choosing a suitable measure regarding to the nature or general characteristics of the data or application is more appropriate. Given a couple of sequences, a specific measure may consider them as similar while another one may quantify them as dissimilar. The goal of this research is twofold: to evince the role of feature extraction in data mining, and to reveal the significance of similarity measures geometric attributes in detecting the relationships between data.

 

Keywords    Data mining, Feature extraction, Similarity measures, Geometric view

 

چکیده    مقصود اصلی داده کاوی، اخذ اطلاعات توسط یک معیار از مجموعه­ایی از داده ها برای کاربردهای موردنظر است. مشکل عمده، رویارویی با داده ها در مقیاس بزرگ است. تکنیک­های کاهش ابعاد متعددی همانند روش­های گوناگون استخراج مشخصه برای حل این معضل ارائه شده­اند. با این حال، دید هندسی معیار بکار رفته به عنوان عاملی موثر بطور کلی نادیده گرفته شده­است. از آنجایی که هر معیار چشم انداز مخصوص بخود را نسبت به داده­ها دارا ست، ممکن است تفسیر متفاوتی نسبت به داده­ها ارائه کند. در حالی که اقدامات محققین اغلب بر روی استخراج مشخصه بهتر برای داده کاوی معطوف شده است، به نظر می­رسد انتخاب معیاری مناسب بر مبنای ماهیت داده و یا خصوصیات کاربردی شایسته­تر باشد. یک معیار خاص ممکن است دو سری زمانی را مشابه در نظر بگیرد، درحالی که معیاری دیگر همان دو دنباله را بی­شباهت بداند. هدف این تحقیق شامل دو مورد: نشان دادن نقش استخراج مشخصه در داده کاوی و ابراز اهمیت خصوصیات هندسی معیارهای شباهت در تشخیص ارتباط بین داده هاست. همچنین کارایی معیارهای شباهت مختلف در کلاس بندی سه مجموعه داده مصنوعی و یک مجموعه داده واقعی از سری­های زمانی نوار قلب بررسی شدئ است.

References   

 

1.     Holzinger, A. and Jurisica, I., Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions, in Interactive knowledge discovery and data mining in biomedical informatics. 2014, Springer. 1-18.

2.     Chanthaweethip, W. and Guha, S., "Temporal data mining and visualization for treatment outcome prediction in hiv patients", Procedia Computer Science,  Vol. 13, No., (2012), 68-79.

3.     Nejad, S.K., Seifi, F., Ahmadi, H. and Seifi, N., "Applying data mining in prediction and classification of urban traffic", in Computer Science and Information Engineering, WRI World Congress on,. Vol. 3, (2009), 674-678.

4.     Grigoras, G. and Scarlatache, F., "An assessment of the renewable energy potential using a clustering based data mining method. Case study in romania", Energy,  Vol. 81, (2015), 416-429.

5.     Hand, D.J., Mannila, H. and Smyth, P., "Principles of data mining, MIT press,  (2001).

6.     Lin, H.-Y., Liang, S.-Y., Ho, Y.-L., Lin, Y.-H. and Ma, H.-P., "Discrete-wavelet-transform-based noise removal and feature extraction for ecg signals", IRBM,  Vol. 35, No. 6, (2014), 351-361.

7.     Verleysen, M. and François, D., The curse of dimensionality in data mining and time series prediction, in Computational intelligence and bioinspired systems, Springer. (2005), 758-770.

8.     Hassanpour, H., Mesbah, M. and Boashash, B., "Time-frequency feature extraction of newborn eeg seizure using svd-based techniques", EURASIP Journal on Applied Signal Processing,  Vol. 2004, (2004), 2544-2554.

9.     Mörchen, F., Time series feature extraction for data mining using DWT and DFT. Univ, (2003)

10.   Darvishi, A., "Translation invariant approach for measuring similarity of signals", Journal of Advances in Computer Research,  Vol. 1, No. 1, (2010), 19-27.

11.   Zareiforoush, H., Minaei, S., Alizadeh, M.R. and Banakar, A., "Potential applications of computer vision in quality inspection of rice: A review", Food Engineering Reviews, (2015), 1-25.

12.   de Araújo, S.A., Pessota, J.H. and Kim, H.Y., "Beans quality inspection using correlation-based granulometry", Engineering Applications of Artificial Intelligence,  Vol. 40, (2015), 84-94.

13.   Wang, J.-S. and Chiang, J.-C., "A cluster validity measure with outlier detection for support vector clustering", Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on,  Vol. 38, No. 1, (2008), 78-89.

14.   Park, D.-h., Lee, S.H., Song, E.-H. and Ahn, D., Similarity computation of fuzzy membership function pairs with similarity measure, in Advanced intelligent computing theories and applications. With aspects of artificial intelligence. Springer (2007), 485-492.

15.   Hassanpour, H. and Mesbah, M., "Neonatal eeg seizure detection using spike signatures in the time-frequency domain", in Signal Processing and Its Applications. Proceedings. Seventh International Symposium on, IEEE. Vol. 2, (2003), 41-44.

16.   Agrawal, R., Faloutsos, C. and Swami, A., "Efficient similarity search in sequence databases, Springer,  (1993).

17.   Hamidzadeh, J., Monsefi, R. and Yazdi, H.S., "Ddc: Distance-based decision classifier", Neural Computing and Applications,  Vol. 21, No. 7, (2012), 1697-1707.

18.   Hassanpour, H., Darvishi, A. and Khalili, A., "A regression-based approach for measuring similarity in discrete signals", International Journal of Electronics,  Vol. 98, No. 9, (2011), 1141-1156.

19.   Pedrycz, W., "Knowledge-based clustering: From data to information granules, John Wiley & Sons,  (2005).

20.   Keogh, E. and Ratanamahatana, C.A., "Exact indexing of dynamic time warping", Knowledge and information systems,  Vol. 7, No. 3, (2005), 358-386.

21.   Liao, T.W., "Clustering of time series data—a survey", Pattern recognition,  Vol. 38, No. 11, (2005), 1857-1874.

22.   Saito, N. and Coifman, R.R., "Local feature extraction and its applications using a library of bases", World Scientific,  (1994),

23.   Ghiasabadi, A., Noorossana, R. and Saghaei, A., "Identifying change point of a non-random pattern on control chart using artificial neural networks", The International Journal of Advanced Manufacturing Technology,  Vol. 67, No. 5-8, (2013), 1623-1630.

24.   Keogh, E., Chakrabarti, K., Pazzani, M. and Mehrotra, S., "Locally adaptive dimensionality reduction for indexing large time series databases", ACM SIGMOD Record,  Vol. 30, No. 2, (2001), 151-162.

25.   Agrawal, R., Gehrke, J., Gunopulos, D. and Raghavan, P., "Automatic subspace clustering of high dimensional data for data mining applications, ACM,  Vol. 27,  (1998).

26.   Wang, Q. and Megalooikonomou, V., "A dimensionality reduction technique for efficient time series similarity analysis", Information Systems,  Vol. 33, No. 1, (2008), 115-132.

27.   Indyk, P., "Dimensionality reduction techniques for proximity problems",  (2000).

28.   Keogh, E.J. and Pazzani, M.J., A simple dimensionality reduction technique for fast similarity search in large time series databases, in Knowledge discovery and data mining. Current issues and new applications Springer. (2000), 122-133.

29.   Rakthanmanon, Q.Z.G.B.T. and Keogh, E., "A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets", (2012).

30.   Mallat, S., "A wavelet tour of signal p rocessing (academic p ress, new york 1 999); i", Daubechies Ten L ectures on Wavelets (SIAM, P hiladelphia, 1 992).[1 0]. J ouault, F. S é bille and V. de la Mota, Nucl. P hys. A,  Vol. 628,](1998).

31.   E. Keogh, Q.Z., B. Hu, Y. Hao, X. Xi, L. Wei, C. Ratanamahatana,, " The ucr time series classification/ clustering homepage www.Cs.Ucr.Edu/~eamonn/ time_series_data",  (2015).

 


Download PDF 



International Journal of Engineering
E-mail: office@ije.ir
Web Site: http://www.ije.ir