BRIDGING THE BLACK-BOX GAP: EXPLAINABLE AI FOR LARGE LANGUAGE MODELS
Keywords:
Large Language Models, Explainable AI, Model Interpretability, Faithful Explanations, Transformer Models, Trustworthy AI, Bias DetectionAbstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks; however, their transformer-based architectures operate as complex black-box systems with limited transparency into internal reasoning processes. Despite their ability to generate coherent and seemingly logical explanations, such outputs are often not guaranteed to be faithful representations of the model’s true decision pathways, raising critical concerns regarding trust, accountability, bias propagation, hallucination, and regulatory compliance in high-stakes applications. This paper addresses the interpretability gap by proposing a structured Explainable AI (XAI) framework designed to bridge the black-box nature of modern LLMs. The proposed approach integrates intrinsic interpretability mechanisms with post-hoc attribution techniques to produce explanations that are human-understandable, verifiable, and aligned with internal model behavior. A multi-dimensional evaluation strategy is introduced, incorporating faithfulness assessment, robustness testing, explanation consistency analysis, and bias sensitivity measurement. Experimental validation on benchmark natural language tasks demonstrates that the proposed framework enhances explanation reliability without significantly degrading predictive performance. By advancing scalable and verifiable explainability mechanisms, this work contributes toward the development of trustworthy, transparent, and ethically responsible Large Language Models suitable for real-world deployment in safety-critical domains.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Journal of Science and Technology Excellence

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in the Journal of Engineering Excellence (JEE) are licensed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Under this license, authors retain full copyright of their work while granting permission for anyone to read, download, copy, distribute, print, search, or link to the full texts of the articles, or use them for any other lawful purpose, without asking prior permission from the publisher or author — provided that the original work is properly cited.
This open-access license ensures maximum dissemination and impact of the published research by allowing free and immediate access to scholarly work.
For more details, please refer to the official license page:
???? https://creativecommons.org/licenses/by/4.0/
