Faithful and Useful Explanations by Large Language Models

Lyu, Qing

Faithful and Useful Explanations by Large Language Models

Files

Lyu_upenngdas_0175C_16698.pdf (8.01 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Computer and Information Science

Discipline

Computer Sciences

Subject

Explainable AI
Explanation
Faithfulness
Large Language Models
Utility

Copyright date

2024

Permalink

https://repository.upenn.edu/handle/20.500.14332/60832

View all metadata

Author

Lyu, Qing

Abstract

The advent of large-scale pretrained Language Models (LMs) has ushered in a new era in Natural Language Processing (NLP). These models, trained on massive text corpora, have demonstrated remarkable performance across a wide array of tasks, from machine translation and text summarization to question answering and even code generation. However, their impressive capabilities are often accompanied by a lack of transparency in their decision-making processes. This opacity raises concerns about their reliability, fairness, and potential biases, hindering their broader adoption in high-stakes applications. The pursuit of interpreting LMs, often referred to as "explainable AI" or "interpretable NLP", has thus emerged as a critical research area. Existing research has made significant strides in proposing a plethora of model explanation methods. In particular, the idea of using Large Language Models (LLMs) to generate explanations for their own decision-making process appears promising. This thesis contends that two key aspects have been under-emphasized in LLM-generated explanations: faithfulness and utility. Faithfulness refers to the degree to which these explanations accurately reflect the model's reasoning process behind predictions, while utility pertains to their practical value in real-world applications. Current LLM-generated explanations often appear convincing yet can misrepresent the underlying reasoning, leading to misplaced trust and limited usefulness in practical scenarios. To enhance faithfulness, this thesis introduces the Faithful Chain-of-Thought (Faithful CoT) framework, which combines stochastic translation with symbolic problem-solving. This integration not only ensures the faithfulness of explanations, but also empirically improves the performance on multiple reasoning tasks, including Math Word Problems, Planning, Multi-hop Question Answering, and Logical Inference. In terms of utility, the thesis explores the application of LLM-generated explanations in different stages of model development, such as in-context learning and fine-tuning. It demonstrates how explanations can be employed to enhance model calibration, improve self-correction capabilities, and increase robustness against spurious cues. Specifically, explanations serve as a valuable interface for post-hoc calibration and as training signals during fine-tuning to mitigate the reliance on irrelevant features in the data. In summary, this thesis contributes to the ongoing quest for faithful and useful explanations generated by LLMs. We aim to bridge the gap between theoretical understanding and practical applications of interpretability, ultimately fostering the development of more transparent, reliable, and beneficial AI systems.

Advisor

Callison-Burch, Chris
Apidianaki, Marianna

Date of degree

2024

Collection

Dissertations and Theses