Automatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach

Document Type : Original Article


1 Department of Computer Engineering, Bu Ali Sina University, Hamedan, Iran

2 Department of Computer Engineering, Bu-Ali Sina University, Hamedan, Iran


In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is introduced. The proposed system works by integrating structured knowledge in every core component. First, the relevant features, semantic structures and information-content are extracted from messages. Since little information can often be placed in a message, a content enrichment module is introduced to identify information structures that can improve the representation of message. The extracted features are represented by semantic network. Then, a hybrid and multi-layered similarity module identifies the commonalities and differences of the features, semantics and information-content in messages. At the end, #tags are recommended to users based on #tags in contextually similar messages. The system is evaluated on Tweets2011 dataset. The results suggests that the proposed method can recommend suitable #tags in negligible operational time and when little content is available.


Main Subjects

1. Gong, Y., Zhang, Q., and Huang, X., “Hashtag recommendation
for multimodal microblog posts,” Neurocomputing, Vol. 272,
(2018), 170–177.  
2. Bermingham, A. and Smeaton, A. F., “Classifying sentiment in
microblogs: is brevity an advantage?,” In Proceedings of the 19th
ACM International Conference on Information and Knowledge
Management - CIKM ’10, ACM Press, (2010), 1833–1836.  
3. Bollen, J., Mao, H., and Zeng, X., “Twitter mood predicts the
stock market,” Journal of Computational Science, Vol. 2, No. 1,
(2011), 1–8.  
4. Izadi, S. and Ghasemzadeh, M., “Using Generalized Language
Model for Question Matching,” International Journal of
Engineering - Transactions C: Aspects, Vol. 26, No. 3, (2012),
5. Mohammadi, A. and Hamidi, H., “Analysis and Evaluation of
Privacy Protection Behavior and Information Disclosure
Concerns in Online Social Networks,” International Journal of
Engineering - Transactions B: Applications, Vol. 31, No. 8, 
(2018), 1234–1239.
6. Sakaki, T., Okazaki, M., and Matsuo, Y., “Earthquake shakes 
Twitter users: real-time event detection by social sensors,” In
Proceedings of the 19th International Conference on World Wide
Web - WWW ’10, ACM Press, (2010), 851–860.  
7. Becker, H., Naaman, M., and Gravano, L., “Learning similarity
metrics for event identification in social media,” In Proceedings
of the Third ACM International Conference on Web Search and
Data Mining - WSDM ’10, ACM Press, (2010), 291–300. 
8. Guy, I., Avraham, U., Carmel, D., Ur, S., Jacovi, M. and Ronen,
I., “Mining expertise and interests from social media,” In
Proceedings of the 22nd International Conference on World Wide
Web - WWW’13, ACM Press, (2013), 515–526.  
9. Otsuka, E., Wallace, S. A., and Chiu, D., “Design and evaluation
of a Twitter hashtag recommendation system,” In Proceedings of
the 18th International Database Engineering & Applications
Symposium on - IDEAS ’14, ACM Press, (2014), 330–333.  
10. Tomar, A., Godin, F., Vandersmissen, B., De Neve, W. and Van de Walle, R., “Towards Twitter hashtag recommendation using
distributed word representations and a deep feed forward neural
network,” In International Conference on Advances in
Computing, Communications and Informatics (ICACCI), IEEE,
(2014), 362–368. 
11. Wang, Y., Qu, J., Liu, J., Chen, J. and Huang, Y., What to Tag
Your Microblog: Hashtag Recommendation Based on Topic
Analysis and Collaborative Filtering, Springer, Cham, (2014),
12. Hmimida, M. and Kanawati, R., A graph-based meta-approach
for tag recommendation, Springer, Cham, (2017), 309–320.  
13. Ding, Z., Qiu, X., Zhang, Q. and Huang, X., “Learning topical
translation model for microblog hashtag suggestion,” In
Proceedings of the Twenty-Third International Joint Conference
on Artificial Intelligence, AAAI Press, (2013), 2078–2084.  
14. Kywe, S.M., Hoang, T.A., Lim, E.P. and Zhu, F., On
Recommending Hashtags in Twitter Networks, Springer, Berlin,
Heidelberg, (2012), 337–350. 
15. Li, J., Xu, H., He, X., Deng, J. and Sun, X., “Tweet modeling
with LSTM recurrent neural networks for hashtag
recommendation,” In International Joint Conference on Neural
Networks (IJCNN), IEEE, (2016), 1570–1577. 
16. Ben-Lhachemi, N. and Nfaoui, E.H., “Using Tweets Embeddings
For Hashtag Recommendation in Twitter,” Procedia Computer
Science, Vol. 127, (2018), 7–15.  
17. Devi, G.R., Veena, P.V., Kumar, M.A. and Soman, K.P., “Entity
Extraction for Malayalam Social Media Text Using Structured
Skip-gram Based Embedding Features from Unlabeled Data,”
Procedia Computer Science, Vol. 93, (2016), 547–553.  
18. Araque, O., Corcuera-Platas, I., Sanchez-Rada, J.F. and Iglesias,
C.A., “Enhancing deep learning sentiment analysis with
ensemble techniques in social applications,” Expert Systems with
Applications, Vol. 77, (2017), 236–246.  
19. Iyyer, M., Manjunatha, V., Boyd-Graber, J. and Daumé III, H.,
“Deep Unordered Composition Rivals Syntactic Methods for
Text Classification,” In Proceedings of the 53rd Annual Meeting
of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing
(Volume 1: Long Papers), (2015), 1681–1691. 
20. Wang, Y., Huang, H., Feng, C., Zhou, Q., Gu, J. and Gao, X.,
“CSE: Conceptual Sentence Embeddings based on Attention
Model,” In Proceedings of the 54th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long
Papers), (2016), 505–515. 
21. Wieting, J., Bansal, M., Gimpel, K. and Livescu, K., “Towards
Universal Paraphrastic Sentence Embeddings,” In International
Conference on Learning Representations (ICLR 2016), (2015),
22. Gong, Y., Zhang, Q., Han, X. and Huang, X., “Phrase-based
hashtag recommendation for microblog posts,” Science China
Information Sciences, Vol. 60, No. 1, (2017), 012109:1–
23. Zangerle, E., Gassler, W., and Specht, G., “Recommending #-
tags in twitter,” In Proceedings of the Workshop on Semantic
Adaptive Social Web (SASWeb 2011). CEUR Workshop
Proceedings (Vol. 730), (2011), 67–78. 
24. Mikolov, T., Chen, K., Corrado, G. and Dean, J., “Efficient
Estimation of Word Representations in Vector Space,” Arxiv
Preprint Arxiv:1301.3781, (2013), 1–12. 
25. Weston, J., Chopra, S. and Adams, K, “# tagspace: Semantic
embeddings from hashtags,” In Proceedings of the 2014
Conference on Empirical Methods in Natural Language
Processing (EMNLP), (2014), 1822–1827. 
26. Gong, Y. and Zhang, Q., “Hashtag Recommendation Using
Attention-Based Convolutional Neural Network.,” In
Proceedings of the Twenty-Fifth International Joint Conference 
on Artificial Intelligence, AAAI Press, (2016), 2782–2788.  
27. Chen, K., Chen, T., Zheng, G., Jin, O., Yao, E. and Yu, Y.,
“Collaborative personalized tweet recommendation,” In
Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR ’12,
ACM Press, (2012), 661–670.  
28. Al-Dhelaan, M. and Alhawasi, H., “Graph Summarization for
Hashtag Recommendation,” In 3rd International Conference on
Future Internet of Things and Cloud, IEEE, (2015), 698–702. 
29. Ma, Z., Sun, A., Yuan, Q. and Cong, G., “Tagging Your Tweets:
A Probabilistic Modeling of Hashtag Annotation in Twitter,” In
Proceedings of the 23rd ACM International Conference on
Conference on Information and Knowledge Management -
CIKM ’14, ACM Press, (2014), 999–1008.  
30. Liu, Z., Liang, C., and Sun, M., “Topical word trigger model for
keyphrase extraction,” In Proceedings of COLING, (2012),
31. Li, J. and Xu, H., “Suggest what to tag: Recommending more
precise hashtags based on users’ dynamic interests and streaming
tweet content,” Knowledge-Based Systems, Vol. 106, (2016),
32. Meng, L., Huang, R., and Gu, J., “A review of semantic similarity
measures in wordnet,” International Journal of Hybrid
Information Technology, Vol. 6, No. 1, (2013), 1–12.  
33. Kolb, P., “Disco: A multilingual database of distributionally
similar words,” In Proceedings of KONVENS, (2008), 1–8. 
34. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D.,
Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S.
and Bizer, C., “DBpedia-A Large-scale, Multilingual Knowledge
Base Extracted from Wikipedia”, Semantic Web, Vol. 6, No. 2,
(2012), 167-195. 
35. Sánchez, D., Batet, M., Isern, D. and Valls, A., “Ontology-based
semantic similarity: A new feature-based approach,” Expert
Systems with Applications, Vol. 39, No. 9, (2012), 7718–7728. 
36. McInnes, B.T. and Pedersen, T., “Evaluating measures of
semantic similarity and relatedness to disambiguate terms in
biomedical text,” Journal of Biomedical Informatics, Vol. 46,
No. 6, (2013), 1116–1124.  
37. TREC. 2011. Tweets2011. Retrieved from (retrieved March 2018, archiveby WebCite® at 
38. Twitter4j open-source library (2016, Mar 09). Twitter4j opensource library. [Web-post]. Retrieved Jun 18, 2018 
39. Baziz, M., Boughanem, M., and Traboulsi, S., “A concept-baseapproach for indexing documents in IR,” In proceedings oINFORSID, (2005), 489–504. 
40. Malo, P., Siitari, P., Ahlgren, O., Wallenius, J. and Korhonen, P“Semantic Content Filtering with Wikipedia and Ontologies,” IIEEE International Conference on Data Mining WorkshopsIEEE, (2010), 518–526. 
41. McCandless, M., Hatcher, E., and Gospodnetic, O., Lucene iaction: covers Apache Lucene 3.0, Manning Publications Co(2010). 
42. Finlayson, M., “Java libraries for accessing the princetowordnet: Comparison and evaluation,”  In Proceedings of thSeventh Global Wordnet Conference, (2014), 78–85. 
43. Panchenko, A., Ruppert, E., Faralli, S., Ponzetto, S.P. anBiemann, C., “Unsupervised does not mean uninterpretable : thcase for word sense induction and disambiguation,” IProceedings of the 15th Conference of the European Chapter othe Association for Computational Linguistics (Volume 1, LonPapers), (2017), 86–98. 
44. Liu, B., Web data mining: exploring hyperlinks, contents, anusage data, Springer Science & Business Media, (2007). 
45. Seco, N., Veale, T., and Hayes, J., “An intrinsic informatiocontent metric for semantic similarity in WordNet,” IProceedings of the 16th European Conference on ArtificiaIntelligence, IOS Press, (2004), 1089–1090.  
46. Agirre, E., Banea, C., Cer, D., Diab, M., Gonzalez-Agirre, AMihalcea, R., Rigau, G. and Wiebe, J., “Semeval-2016 task 1Semantic textual similarity, monolingual and cross-linguaevaluation,” In Proceedings of the 10th International Workshoon Semantic Evaluation (SemEval-2016), (2016), 497–511.