Faithful and Useful Explanations
by Large Language Models

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
Explainable AI
Explanation
Faithfulness
Large Language Models
Utility
Funder
Grant number
License
Copyright date
2024
Distributor
Related resources
Author
Lyu, Qing
Contributor
Abstract

The advent of large-scale pretrained Language Models (LMs) has ushered in a new era in Natural Language Processing (NLP). These models, trained on massive text corpora, have demonstrated remarkable performance across a wide array of tasks, from machine translation and text summarization to question answering and even code generation. However, their impressive capabilities are often accompanied by a lack of transparency in their decision-making processes. This opacity raises concerns about their reliability, fairness, and potential biases, hindering their broader adoption in high-stakes applications. The pursuit of interpreting LMs, often referred to as "explainable AI" or "interpretable NLP", has thus emerged as a critical research area. Existing research has made significant strides in proposing a plethora of model explanation methods. In particular, the idea of using Large Language Models (LLMs) to generate explanations for their own decision-making process appears promising. This thesis contends that two key aspects have been under-emphasized in LLM-generated explanations: faithfulness and utility. Faithfulness refers to the degree to which these explanations accurately reflect the model's reasoning process behind predictions, while utility pertains to their practical value in real-world applications. Current LLM-generated explanations often appear convincing yet can misrepresent the underlying reasoning, leading to misplaced trust and limited usefulness in practical scenarios. To enhance faithfulness, this thesis introduces the Faithful Chain-of-Thought (Faithful CoT) framework, which combines stochastic translation with symbolic problem-solving. This integration not only ensures the faithfulness of explanations, but also empirically improves the performance on multiple reasoning tasks, including Math Word Problems, Planning, Multi-hop Question Answering, and Logical Inference. In terms of utility, the thesis explores the application of LLM-generated explanations in different stages of model development, such as in-context learning and fine-tuning. It demonstrates how explanations can be employed to enhance model calibration, improve self-correction capabilities, and increase robustness against spurious cues. Specifically, explanations serve as a valuable interface for post-hoc calibration and as training signals during fine-tuning to mitigate the reliance on irrelevant features in the data. In summary, this thesis contributes to the ongoing quest for faithful and useful explanations generated by LLMs. We aim to bridge the gap between theoretical understanding and practical applications of interpretability, ultimately fostering the development of more transparent, reliable, and beneficial AI systems.

Advisor
Callison-Burch, Chris
Apidianaki, Marianna
Date of degree
2024
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation