Open Access Academic Publishing | Indexed in Google Scholar | CC BY-NC-ND 4.0
Book Chapter

Transformer Based Deep Learning Models for Intelligent Text Understanding

Download PDF
Deepika Borgaonkar
Research Scholar, Department of Computer Science and Engineering, School of Technology, GITAM Deemed to be University, Hyderabad, India, India.
Pages: 59-71
Keywords: Transformer, Deep Learning, Natural Language Processing, Text Understanding, Attention Mechanism.

Abstract

The proliferation of textual data in the digital era has created a pressing need for intelligent systems capable of understanding and processing human language with high accuracy. This chapter delves into the transformative impact of Transformer-based deep learning models on the field of Natural Language Processing (NLP), with a specific focus on intelligent text understanding. We explore the foundational concepts of the Transformer architecture, including the self-attention mechanism, which has overcome the limitations of sequential data processing inherent in previous recurrent and convolutional models. The chapter presents a comprehensive methodology for applying a Transformer-based model, specifically a fine-tuned BERT (Bidirectional Encoder Representations from Transformers), to the task of multi-class text classification using the AG News dataset. We conduct a detailed analysis of the model’s performance, presenting simulation results that cover training dynamics, accuracy metrics, and a comparative study against traditional machine learning and earlier deep learning baselines. The results demonstrate the superior capability of Transformer models in capturing complex linguistic patterns, achieving a test accuracy of 95.5%. The discussion extends to practical considerations such as inference time and the interpretability of the model’s decisions through attention visualization. This chapter serves as a guide for researchers and practitioners, offering both theoretical insights and a practical framework for implementing state-of-the-art solutions for intelligent text understanding.

References

  1. Virginia Teller. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. 2000.
  2. Isaac C Mogotsi. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval: Cambridge University Press, Cambridge, England, 2008, 482 pp, ISBN: 978-0-521-86571-5. 2010.
  3. Neural Computation. “Long short-term memory”. In: Neural Comput 9 (2016), pp. 1735–1780.
  4. A Vaswani et al. “Attention is all you need. In Advances in Neural Information Processing Systems”. In: (2017).
  5. Jacob Devlin et al. “Bert: Pre-training of deep bidirectional transformers for language understanding”. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019, pp. 4171–4186.
  6. Alec Radford et al. “Improving language understanding by generative pre-training”. In: (2018).
  7. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural machine translation by jointly learning to align and translate”. In: arXiv preprint arXiv:1409.0473 (2014).
  8. Xiang Zhang, Junbo Zhao, and Yann LeCun. “Character-level convolutional networks for text classification”. In: Advances in neural information processing systems 28 (2015).
Deep Learning: Foundations, Advances, and Intelligent Applications Deep Learning: Foundations, Advances, and Intelligent Applications