A Hidden Markov Model for Morphology of Compound Roles in Persian Text Part of Tagging

Document Type : Original Article


1 Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran

2 Department of Computer Engineering, Babol Branch, Islamic Azad University, Babol, Iran


Nowadays, data mining has become significant given the popularity of social networks as well as the emergence of abbreviated words, foreign terms and emoticons in the Persian language. Meanwhile, numerous studies have been conducted to identify the type of words. On the one hand, identifying the role of each word in a sentence is far more important than identifying the type of word in the sentence. On the other hand, the spelling-grammatical similarity of Persian to Arabic has enabled the newly proposed method in this paper to be applied to Arabic. In this paper, we adopted the Hidden Markov Model (MHM) and Tri-gram tagging with the aim of identifying the morphology of composition roles in Persian sentences. Then, a comparison was made between the technique developed in this paper and the Hidden Markov Model, Uni-gram and Bi-gram tagging. The proposed method supports the results obtained by the word role identification through "independent" and "dependent" roles and several factors that have a contribution to the words roles in sentences. In fact, the simulation results show that the average success rates of independent composition roles with MHM and Tri-gram tagging were 20.56% and 17.67% compared to Uni-gram and Bi-gram methods, respectively. Regarding the dependent composition role, there were improvements by 24.67% and 32.62%, respectively.


  1. Yoonseok, H., Sangwoo, K. and Donghyun, Y., "Multimodal Neural Machine Translation with Weakly Labeled Images," IEEE Access, Vol. 7, 54042-54053, (2019), doi: 10.1109/ACCESS.2019.2911656.
  2. Alshammari, M., Nasraoui, O., and Sanders, S., "Mining Semantic Knowledge Graphs to Add Explainability to Black Box Recommender Systems," IEEE Access, Vol. 7, 110563-110579,(2019),doi: 10.1109/ACCESS.2019.2934633.
  3. Wu, B., Kehuang, L., Fengpei, G., Zhen, H., Minglei, Y., Chin, L., and Chin-H, L., "An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition," IEEE Journal of Selected Topics in Signal Processing, Vol. 11, No. 8, (2017), 1289-1300. DOI: 1109/JSTSP.2017.2756439.
  4. Vani, H., Anusuya, M., "Fuzzy Speech Recognition: A Review," International Journal of Computer Applications, Vol. 177, No. 47, (2020), 39-54. DOI:5120/ijca2020919989.
  5. Xia, T., Chen, X. (2020). A Discrete Hidden Markov Model for SMS Spam Detection. Applied Sciences. 10. 5011. 10.3390/app10145011.
  6. Motameni, H., Peykar, A., "Morphology of Compounds as Standard Words in Persian through Hidden Markov Model and Fuzzy Method, 2015.," Journal of Intelligent & Fuzzy Systems, Vol. 30, No. 10.3233/IFS-151865, (2016), 1567-1580. DOI: 3233/IFS-151865.
  7. Peykar, A., Motameni, H., Aboutalebi, M. "Application of fuzzy identification method depends on the synthesis of the Persian language," in Conference iran data mining Iran, Tehran, 2014.
  8. Peykar, A., Motameni, H., Aboutalebi, M. "Comparison of fuzzy and hidden Markov model to identify independent of synthesis words in Persian," in Conference iran data mining Iran, Tehran, 2014.
  9. Asghari, R. "Application of N-gram modeling in language statistical modeling. (Persian)," in International Conference on Nonlinear Modeling & Optimization, Amol, Iran, 2012.
  10. Keysers, D., Deselaers, T., Rowley, H., Wang, L. and Carbune, V., "Multi-Language Online Handwriting Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 6, (2017), 1180-1194. DOI: 1109/TPAMI.2016.2572693
  11. Peykar, A., Motameni, H., Aboutalebi, M. "study of the role of labeling N_Gram, terminology and phrases in Farsi, hidden Markov models," in third national conference on computational linguistics, Tehran, 2014.
  12. Obin, N., Lanchantin, P., "Symbolic Modeling of Prosody: From Linguistics to Statistics," IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, No. 3, (2015), 588 - 599. DOI: 1109/TASLP.2014.2387389.
  13. Sahraee Juybari, M., Bozorgian, H., "Cultural Linguistics and ELT curriculum: The case of ‘Prospect’ English textbooks in Iran",30,Issue3, (2020),https://doi.org/10.1111/ijal.12301.
  14. Lücking, A., Driller, C., Stoeckel, M., Abrami, G., Pachzelt, A. and Mehler, A., "Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology", Language Resources & Evaluation, (2021). https://doi.org/10.1007/s10579-021-09553-5.
  15. Fang, K. L., "A Short History of Linguistics R. H. Robins," American Anthropologist, Vol. 70, (2009), 1186-1186.
  16. Moniri, M., "Fuzzy and Intuitionistic Fuzzy Turing Machines.," Fundamenta Informaticae, Vol. 123, No. 3, (2013), 305-315. DOI: 3233/FI-2013-812.
  17. Meghdari, A., Alami, M., "Phrases from well-known social robotics," in Symposium on gateways to the field of cognitive science, Tehran, (2015).
  18. Abid, M., Habib, A., Ashraf, J., Shahid, A., "Urdu word sense disambiguation using machine learning approach". Cluster Comput, Vol. 21, (2018), 515-522 https://doi.org/10.1007/s10586-017-0918-0.
  19. Austin, P. "Theory of language: a taxonomy". SN Soc Sci 1, 78 (2021). https://doi.org/10.1007/s43545-021-00085-x.
  20. Bijankhan, M., Sheykhzadegan, J., Bahrani, M. and Ghayoomi, M., "Lessons from Building a Persian Written Corpus: Peykare," Language Resources and Evaluation, Vol. 45, (2011),  143-164. DOI: 1007/s10579-010-9132-x
  21. Baghaei P., Khoshdel-Niyat, F. & Tabatabaee-Yazdi, M."The Persian adaptation of Baddeley’s 3-min grammatical reasoning test", Psicologia: Reflexão e Crítica 30, No. 16, (2017), https://doi.org/10.1186/s41155-017-0070-z.
  22. Yusupov, A., Yusupova, N., Sibgatullina, A. Grammatical Absorption and Functioning of Arab and Persian ‎Conjunctions in Old Tatar Language in the 19th Century. International Journal of Society, Culture & Language, 8, 3 (Special Issue on Russian Culture and Language)) (2020), 80-88.
  23. Web, A. F., "Natural Language Processing Software of Ferdowsi University of Mashhad Version 1.3.(persian)," Web Technology Lab of Ferdowsi University of Mashhad, Mashhad, (2012).
  24. Sadeghi, H., Motameni, H., Ebrahimnejad, A. and Vahidi, J., "Morphology of composition functions in persian sentences through a newly proposed classified fuzzy method and center ofgravity defuzzification method," Journal of Intelligent & Fuzzy Systems, Vol. 36, No. 6, (2019), 5463-5473. DOI: 10.3233/JIFS-181330
  25. Safari, A., Mazinani, M. and Hosseini, R., "A Novel Type-2 Adaptive Neuro Fuzzy Inference System Classifier for Modelling Uncertainty in Prediction of Air Pollution Disaster", International Journal of Engineering, Transactions B: Applications, Vol. 30, No. 11, (2017), 1746-1751. doi: 10.5829/ije.2017.30.11b.16
  26. Gardani, F. "Borrowing matter and pattern in morphology. An overview". Morphology, Vol. 30, (2020), 263-282, https://doi.org/10.1007/s11525-020-09371-5.
  27. Alexis Amid, N., Éric, L., "Pattern-and-root inflectional morphology: the Arabic broken plural". Language Sciences, Vol. 40, (2013), 221-250, https://doi.org/10.1016/j.langsci.2013.06.002.
  28. Pakendorf, B. "Lamunkhin Even evaluative morphology in cross-linguistic comparison". Morphology, 27, (2017), 123-158 https://doi.org/10.1007/s11525-016-9296-1.
  29. Sagot, B., Walther, G. "A Morphological Lexicon for the Persian Language," in LREC 2010, Valletta, Malta, 2010.
  30. Megerdoomian, K. "Finite-State Morphological Analysis of Persian," in Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages., Stroudsburg, 2004.
  31. Mor, B., Garhwal, S., Kumar, A. "A Systematic Review of Hidden Markov Models and Their Applications". Archives Computational Methods in Engineering, 28, No. 3 1429–1448, (2021). https://doi.org/10.1007/s11831-020-09422-4.
  32. Buckwalter, "Buckwalter Arabic Morphological Analyzer.," the Linguistic Data Consortium,, Pennsylvania, 2002, https://doi.org/10.35111/050q-5r95.
  33. Motameni, H. Determining the Composition Functions of Persian Non-standard Sentences in Terminology using a Deep Learning Fuzzy Neural Network Model. International Journal of Engineering, Transactions C: Aspects, (2020); 33(12): 2471-2481. doi: 10.5829/ije.2020.33.12c.06
  34. Azimizadeh, A., Arab, M., Quchani, S. Persian part of speech tagger based on Hidden Markov Model. In JADT 2008 : 9th international conference on textual data statistical analysis, pages 121–128, March 2008.
  35. Okhovvat, M, Minaei Bidgoli, B, "A hidden Markov model for Persian part-of-speech tagging", Procedia Computer Science, Vol. 3, (2011), 977-981, https://doi.org/10.1016/j.procs.2010.12.160.
  36. Seraji, M., Megyesi, B., Nivre, J. "Dependency parsers for Persian". In Proceedings of the 10th Workshop on Asian Language Resources, (2012), 35-44.
  37. Kardan, A., Imani, M. "Improving Persian POS tagging using the maximum entropy model". In 2014 Iranian Conference on Intelligent Systems (ICIS), (2014), 1-5, doi:10.1109/IranianCIS.2014.6802567.
  38. Nourian, A., Rasooli, M., Imany, M., Faili, H. "On the importance of ezafe construction in Persian parsing". In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Vol. 2: Short Papers), Vol. 2, (2015), 877-882.
  39. Pakzad, A., Minaei Bidgoli, B., "An improved joint model: POS tagging and dependency parsing". Journal of AI and Data Mining, Vol. 4, No. 1, (2016), 1-8, ISSN 2322-5211. doi:10.5829/idosi.JAIDM.2016.04.01.01.