Open Access Academic Publishing | Indexed in Google Scholar | CC BY-NC-ND 4.0
Book Chapter

Adversarial Robustness in Next-Generation AI: Defense Mechanisms for Image and Text Models

Download PDF
Dr. Pradeep Venuthurumilli
Associate Professor, School of Computer Science and Engineering, Malla Reddy Engineering College for Women, Maisammaguda, Secunderabad, Telangana, India.
pradeepvenuthuru@gmail.com
Pages: 154-171
Keywords: Adversarial Robustness; Defense Mechanisms; Adversarial Attacks; Certified Robustness; Deep Neural Networks

Abstract

This chapter provides a comprehensive exploration of adversarial robustness in next-generation artificial intelligence (AI) systems, with a specific focus on defense mechanisms for image and text models. As AI models, particularly deep neural networks, become increasingly integrated into critical applications, their vulnerability to adversarial attacks presents a significant security challenge. Adversarial examples, which are inputs intentionally perturbed to cause model misclassification, can have severe consequences in domains such as autonomous driving, medical diagnostics, and natural language understanding. This chapter systematically reviews the landscape of adversarial attacks, from foundational gradient-based methods to sophisticated transfer and query-based attacks. We then delve into a detailed analysis of state-of-the-art defense strategies, including adversarial training, defensive distillation, and certified robustness techniques. To provide a practical understanding of these concepts, we present a case study involving the implementation and evaluation of adversarial attacks and defenses on the CIFAR-10 image dataset. The results of our simulations demonstrate the effectiveness of adversarial training in enhancing model robustness against common attacks like the Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). Finally, we discuss the open challenges and future research directions in the pursuit of building truly robust and trustworthy AI systems.

References

  1. Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. “Explaining and harnessing adversarial examples”. In: arXiv preprint arXiv:1412.6572 (2014).
  2. Linyang Li et al. “Bert-attack: Adversarial attack against bert using bert”. In: arXiv preprint arXiv:2004.09984 (2020).
  3. Aleksander Madry et al. “Towards deep learning models resistant to adversarial attacks”. In: arXiv preprint arXiv:1706.06083 (2017).
  4. Nicholas Carlini and David Wagner. “Towards evaluating the robustness of neural networks”. In: 2017 ieee symposium on security and privacy (sp). Ieee. 2017, pp. 39- 57.
  5. Pin-Yu Chen et al. “Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models”. In: Proceedings of the 10th ACM workshop on artificial intelligence and security. 2017, pp. 15–26.
  6. Nicolas Papernot et al. “Distillation as a defense to adversarial perturbations against deep neural networks”. In: 2016 IEEE symposium on security and privacy (SP). IEEE. 2016, pp. 582–597.
  7. Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. “Certified adversarial robustness via randomized smoothing”. In: international conference on machine learning. PMLR. 2019, pp. 1310–1320.
  8. Di Jin et al. “Is bert really robust? a strong baseline for natural language attack on text classification and entailment”. In: Proceedings of the AAAI conference on artificial intelligence. Vol. 34. 05. 2020, pp. 8018–8025.
  9. Anandbabu Gopatoti, Merajothu Chandra Naik, and Kiran Kumar Gopathoti. “Convolutional neural network based image denoising for better quality of images”. In: International Journal of Engineering and Technology (UAE) 7.3.27 (2018), pp. 356– 361.
Next-Generation Artificial Intelligence: From Foundations to Intelligent Applications Next-Generation Artificial Intelligence: From Foundations to Intelligent Applications