Document Type : Original Article
Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
Department of Computer Engineering, Babol Branch, Islamic Azad University, Babol, Iran
Nowadays, data mining has become significant given the popularity of social networks as well as the emergence of abbreviated words, foreign terms and emoticons in the Persian language. Meanwhile, numerous studies have been conducted to identify the type of words. On the one hand, identifying the role of each word in a sentence is far more important than identifying the type of word in the sentence. On the other hand, the spelling-grammatical similarity of Persian to Arabic has enabled the newly proposed method in this paper to be applied to Arabic. In this paper, we adopted the Hidden Markov Model (MHM) and Tri-gram tagging with the aim of identifying the morphology of composition roles in Persian sentences. Then, a comparison was made between the technique developed in this paper and the Hidden Markov Model, Uni-gram and Bi-gram tagging. The proposed method supports the results obtained by the word role identification through "independent" and "dependent" roles and several factors that have a contribution to the words roles in sentences. In fact, the simulation results show that the average success rates of independent composition roles with MHM and Tri-gram tagging were 20.56% and 17.67% compared to Uni-gram and Bi-gram methods, respectively. Regarding the dependent composition role, there were improvements by 24.67% and 32.62%, respectively.