Enhancing Book and Document Digitization from Videos: A Feature Fusion-Based Approach

Document Type : Original Article

Authors

1 Computer Science and Engineering Department, Sardar Vallabbhai National Institute of Technology, Surat, Gujarat, India

2 School of Computing and Data Sciences, FLAME University, Pune, Maharashtra, India

Abstract

In an age where preserving knowledge and information from books and documents is crucial, traditional manual scanning methods are tedious and error-prone. It involves a lot of human intervention and, as a result, sometimes results in erroneous digitization, which makes the downstream tasks, such as optical character recognition, difficult. Therefore, innovative techniques are required to be proposed that not only reduce human effort in terms of digitization but also give highly accurate results over the recently proposed state-of-the-art techniques. We proposed a novel computer vision-based algorithm that combines Gray-Level Co-occurrence Matrix (GLCM) features with Thepade's 10-ary texture features (TSBTC) for video frame classification. This hybrid approach significantly enhances frame selection accuracy, ensures high-quality digitization, and accommodates multiple languages and document types. We also proposed a dataset of 54,000 diverse images to demonstrate our algorithm's effectiveness in real-world scenarios and compare it to existing methods, making a valuable contribution to document digitization. The proposed dataset can be utilized for several document image analysis tasks.

Graphical Abstract

Enhancing Book and Document Digitization from Videos: A Feature Fusion-Based Approach

Keywords

Main Subjects


  1. Obiora KU, Okeke IE, Onwurah B. Digitization of library resources in university libraries: A practical approach, challenges and prospects. 2015. 10.1109/ETTLIS.2015.7048210
  2. Azim N, Mat Yatin S, Jensonray R, Ayub Mansor S. Digitization of records and archives: Issues and Concerns. International Journal of Academic Research in Business and social sciences. 2018;8(9):170-8. 10.6007/IJARBSS/v8-i9/4582
  3. Sadjadi S, Mashayekhi H, Hassanpour H. A two-level semi-supervised clustering technique for news articles. International Journal of Engineering, Transactions C: Aspects. 2021;34(12):2648-57. https://doi.org/10.5829/ije.2021.34.12C.10
  4. HS C, Shenoy MK. Advanced text documents information retrieval system for search services. Cogent Engineering. 2020;7(1):1856467. 10.1080/23311916.2020.1856467
  5. Lillis D, Scanlon M, editors. On the benefits of information retrieval and information extraction techniques applied to digital forensics. Advanced Multimedia and Ubiquitous Engineering: FutureTech & MUE; 2016: Springer. 10.1007/978-981-10-1536-6_83
  6. Hassanpour H, AlyanNezhadi M, Mohammadi M. A Signal Processing Method for Text Language Identification. International Journal of Engineering, Transactions C: Aspects. 2021;34(6):1413-8. 10.5829/ije.2021.34.06c.04
  7. Fadaei S. New dominant color descriptor features based on weighting of more informative pixels using suitable masks for content-based image retrieval. International Journal of Engineering, Transactions B: Applications. 2022;35(8):1457-67. 10.5829/ije.2022.35.08b.01
  8. Shahbakhsh MB, Hassanpour H. Empowering face recognition methods using a gan-based single image super-resolution network. International Journal of Engineering, Transactions A: Basics. 2022;35(10):1858-66. 10.5829/ije.2022.35.10a.05
  9. Buddhawar G, Jariwala KN, Chattopadhyay C, editors. Some Aspects of Text Recognition from Video Document in Education 4.0. 2021 Emerging Trends in Industry 40 (ETI 40); 2021: IEEE. 10.1109/ETI4.051663.2021.9619427
  10. Parnak A, Baleghi Damavandi Y, Kazemitabar S. A Novel Image Splicing Detection Algorithm Based on Generalized and Traditional Benford’s Law. International Journal of Engineering, A: Basics; 2022;35(4):626-34. 10.5829/ije.2022.35.04a.02
  11. Kumar V. Region completion in a texture using multiresolution transforms. International Journal of Engineering, Transactions B: Applications; 2014;27(5):747-56. 10.5829/idosi.ije.2014.27.05b.10
  12. Kekre H, Thepade SD, Lohar AT, editors. Image retrieval using block truncation coding extended to color clumps. 2013 International Conference on Advances in Technology and Engineering (ICATE); 2013: IEEE. 10.1109/ICAdTE.2013.6524769
  13. Binmakhashen GM, Mahmoud SA. Document layout analysis: a comprehensive survey. ACM Computing Surveys (CSUR). 2019;52(6):1-36. 10.1145/3355610
  14. Rashno A, Fadaei S. Image restoration by projection onto convex sets with particle swarm parameter optimization. International Journal of Engineering, Transactions B: Applications; 2023;36(2):398-407. 10.5829/ije.2023.36.02b.18
  15. Brown M, Hartley RI, Nistér D, editors. Minimal solutions for panoramic stitching. 2007 IEEE conference on computer vision and pattern recognition; 2007: IEEE. 10.1109/CVPR.2007.383082
  16. Chhajed G, Garg B. Novel Scheme for Data Hiding in Binary Images using Cover Pattern Histogram. International Journal of Engineering, Transactions B: Applications; 2023;36(11):2124-36. 10.5829/ije.2023.36.11b.16
  17. Bouguet J-Y. Camera calibration toolbox for matlab. http://www vision caltech edu/bouguetj/calib_doc/. 2004. 10.22002/D1.20164
  18. Charoqdouz E, Hassanpour H. Feature Extraction from Several Angular Faces Using a Deep Learning Based Fusion Technique for Face Recognition. International Journal of Engineering, Transactions B: Applications; 2023;36(8):1548-55. 10.5829/ije.2023.36.08b.14
  19. Chen H, Tsai SS, Schroth G, Chen DM, Grzeszczuk R, Girod B, editors. Robust text detection in natural images with edge-enhanced maximally stable extremal regions. 2011 18th IEEE international conference on image processing; 2011: IEEE. 10.1109/ICIP.2011.6116200
  20. Ulges A, Lampert CH, Breuel T, editors. Document capture using stereo vision. Proceedings of the 2004 ACM symposium on Document engineering; 2004. 10.1145/1030397.1030434
  21. Kantarcıoğlu M, Xi B, Clifton C. Classifier evaluation and attribute selection against active adversaries. Data Mining and Knowledge Discovery. 2011;22:291-335. 10.1007/s10618-010-0197-3
  22. Firouzi M, Fadaei S, Rashno A. A new framework for canny edge detector in hexagonal lattice. International Journal of Engineering, Transactions B: Applications; 2022;35(8):1588-98. 10.5829/IJE.2022.35.08B.15
  23. Dixit U, Shirdhonkar M. An Improved Fingerprint-based Document Image Retrieval using Multi-resolution Histogram of Oriented Gradient Features. International Journal of Engineering, A: Basics; 2022;35(4):750-9. 10.5829/IJE.2022.35.04A.15
  24. Mishra A, Alahari K, Jawahar C, editors. Top-down and bottom-up cues for scene text recognition. 2012 IEEE conference on computer vision and pattern recognition; 2012: IEEE. 10.1109/CVPR.2012.6247990
  25. Roy S, Roy PP, Shivakumara P, Louloudis G, Tan CL, Pal U, editors. HMM-based multi oriented text recognition in natural scene image. 2013 2nd IAPR Asian Conference on Pattern Recognition; 2013: IEEE. 10.1109/ACPR.2013.60
  26. Shivakumara P, Bhowmick S, Su B, Tan CL, Pal U, editors. A new gradient based character segmentation method for video text recognition. 2011 International conference on document analysis and recognition; 2011: IEEE. 10.1109/ICDAR.2011.34
  27. Singh M, Kaur A, editors. An efficient hybrid scheme for key frame extraction and text localization in video. 2015 International conference on advances in computing, communications and informatics (ICACCI); 2015: IEEE. 10.1109/ICACCI.2015.7275784
  28. Hamdan M, Cheriet M. ResneSt-Transformer: Joint attention segmentation-free for end-to-end handwriting paragraph recognition model. Array. 2023:100300. 10.1016/j.array.2023.100300
  29. Xiao Z, Nie Z, Song C, Chronopoulos AT. An extended attention mechanism for scene text recognition. Expert Systems with Applications. 2022;203:117377. 10.1016/j.eswa.2022.117377
  30. Gao F, Deng X, Xu M, Xu J, Dragotti PL. Multi-modal convolutional dictionary learning. IEEE Transactions on Image Processing. 2022;31:1325-39. 10.1109/TIP.2022.3141251