Multi-label Text Categorization using Error-correcting Output Coding with Weighted Probability

Document Type : Original Article

Authors

1 Department of ECE, Sathyabama Institute of Science and Technology, Oldmamallapuram Road, Chennai, India

2 Department of Electronics and Communication Engineering, SRM Institute of Technology, Ramapuram, Chennai, India

Abstract

In several real-world categorization problems, labeled data is generally hard to acquire when there is a huge number of unlabeled data. Hence, it is very important to devise a novel approaches to solve these problems, thereby choosing the most valuable instances for labeling and creating a superior classifier. Several existing techniques are devised for the binary categorization issues, only a limited number of algorithms are designed for handling the multi-label cases. The multi-label classification problem turns out to be more complex when the sample belongs to multiple labels from the group of accessible classes. In World Wide Web, text data is generally present nowadays, and is an obvious example for such type of tasks. This paper develops a novel technique to perform the multi-label text categorization by modifying the Error-Correcting Output Coding (ECOC) approach. Here, a cluster of binary complimentary classifiers are employed to facilitate the ECOC more effective for the multi-class problems. In addition, a weighted posterior probability is computed to enhance the multi-label text classification performance more effectively. Moreover, the performance of the proposed ECOC with weighted probability is analyzed using the performance metrics, like precision, recall, and f-measure with maximal precision of 0.897, higher recall value of 0.896, and maximum f-measure of 0.895.

Keywords

Main Subjects


  1. Kajdanowicz, T. and Kazienko, P., "Multi-label classification using error correcting output codes", International Journal of Applied Mathematics and Computer Science, Vol. 22, No. 4, (2012), 829-840.
  2. Shan, J., Hou, C., Tao, H., Zhuge, W. and Yi, D., "Randomized multi-label subproblems concatenation via error correcting output codes", Neurocomputing, Vol. 410, (2020), 317-327, doi: 10.1016/j.neucom.2020.06.035.
  3. Jin, C.H., Kim, H.-J., Piao, Y., Li, M. and Piao, M., "Wafer map defect pattern classification based on convolutional neural network features and error-correcting output codes", Journal of Intelligent Manufacturing, Vol. 31, No. 8, (2020), 1861-1875, doi: 10.1007/s10845-020-01540-x.
  4. Gu, S., Cai, Y., Shan, J. and Hou, C., "Active learning with error-correcting output codes", Neurocomputing, Vol. 364, (2019), 182-191, doi: 10.1016/j.neucom.2019.06.064.
  5. Sun, N., Shan, J. and Hou, C., "Multi-label active learning with error correcting output codes", in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer., (2019), 331-342.
  6. Zhang, Y.-P., Ye, X.-N., Liu, K.-H. and Yao, J.-F., "A novel multi-objective genetic algorithm based error correcting output codes", Swarm and Evolutionary Computation, Vol. 57, (2020), 100709, doi: 10.1016/j.swevo.2020.100709.
  7. Almuzaini, H.A. and Azmi, A.M., "Impact of stemming and word embedding on deep learning-based arabic text categorization", IEEE Access, Vol. 8, (2020), 127913-127928, doi: 10.1109/ACCESS.2020.3009217.
  8. Wang, T., Liu, L., Liu, N., Zhang, H., Zhang, L. and Feng, S., "A multi-label text classification method via dynamic semantic representation model and deep neural network", Applied Intelligence, Vol. 50, No. 8, (2020), 2339-2351, doi: 10.1007/s10489-020-01680-w.
  9. Kimura, K., Kudo, M., Sun, L. and Koujaku, S., "Fast random k-labelsets for large-scale multi-label classification", in 2016 23rd International Conference on Pattern Recognition (ICPR), IEEE., (2016), 438-443.
  10. Sebastiani, F., "Machine learning in automated text categorization", ACM Computing Surveys (CSUR), Vol. 34, No. 1, (2002), 1-47.
  11. Bui, D.D.A., Del Fiol, G. and Jonnalagadda, S., "Pdf text classification to leverage information extraction from publication reports", Journal of Biomedical Informatics, Vol. 61, (2016), 141-148, doi: 10.1016/j.jbi.2016.03.026.
  12. Yu, B. and Xu, Z.-b., "A comparative study for content-based dynamic spam classification using four machine learning algorithms", Knowledge-Based Systems, Vol. 21, No. 4, (2008), 355-362, doi: 10.1016/j.knosys.2008.01.001.
  13. Loh, W.Y., "Classification and regression trees", Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 1, No. 1, (2011), 14-23, doi: 10.1002/widm.8.
  14. Zhong, G., Huang, K. and Liu, C.-L., "Joint learning of error-correcting output codes and dichotomizers from data", Neural Computing and Applications, Vol. 21, No. 4, (2012), 715-724, doi: 10.1007/s00521-011-0653-z.
  15. Kyeong, K. and Kim, H., "Classification of mixed-type defect patterns in wafer bin maps using convolutional neural networks", IEEE Transactions on Semiconductor Manufacturing, Vol. 31, No. 3, (2018), 395-402, doi: 10.1109/TSM.2018.2841416.
  16. Krizhevsky, A., Sutskever, I. and Hinton, G.E., "Imagenet classification with deep convolutional neural networks", Advances in Neural Information Processing Systems, Vol. 25, (2012), doi: 10.1145/3065386.
  17. Krawczyk, B., Galar, M., Woźniak, M., Bustince, H. and Herrera, F., "Dynamic ensemble selection for multi-class classification with one-class classifiers", Pattern Recognition, Vol. 83, (2018), 34-51.
  18. Feng, K.-J., Liong, S.-T. and Liu, K.-H., "The design of variable-length coding matrix for improving error correcting output codes", Information Sciences, Vol. 534, (2020), 192-217, doi: 10.1016/j.ins.2020.04.021.
  19. Li, K.-S., Wang, H.-R. and Liu, K.-H., "A novel error-correcting output codes algorithm based on genetic programming", Swarm and Evolutionary Computation, Vol. 50, (2019), 100564, doi: 10.1016/j.swevo.2019.100564.
  20. Baró, X., Escalera, S., Vitria, J., Pujol, O. and Radeva, P., "Traffic sign recognition using evolutionary adaboost detection and forest-ecoc classification", IEEE Transactions on Intelligent Transportation Systems, Vol. 10, No. 1, (2009), 113-126, doi.
  21. Nazari, S., Moin, M.-S. and Kanan, H.R., "Securing templates in a face recognition system using error-correcting output code and chaos theory", Computers & Electrical Engineering, Vol. 72, (2018), 644-659, doi: 10.1016/j.compeleceng.2018.01.029.
  22. Qin, J., Liu, L., Shao, L., Shen, F., Ni, B., Chen, J. and Wang, Y., "Zero-shot action recognition with error-correcting output codes", in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition., (2017), 2833-2842.
  23. Sadjadi, S., Mashayekhi, H. and Hassanpour, H., "A two-level semi-supervised clustering technique for news articles", International Journal of Engineering, Transactions C: Aspects, Vol. 34, No. 12, (2021), 2648-2657, doi: 10.5829/IJE.2021.34.12C.10.
  24. Vidyadhari, C., Sandhya, N. and Premchand, P., "A semantic word processing using enhanced cat swarm optimization algorithm for automatic text clustering", Multimedia Research, Vol. 2, No. 4, (2019), 23-32, doi: 10.46253/j.mr.v2i4.a3.
  25. Lee, Y., Kim, E., Kim, Y. and Seol, D., "Effective message authentication method for performing a swarm flight of drones", Emergency, Vol. 3, No. 4, (2015), 95-97, doi: 10.2991/eers-15.2015.23.