Trustworthy Machine Learning: Specification, Verification, and Explanation

Loading...
Thumbnail Image
Degree type
Doctor of Philosophy (PhD)
Graduate group
Computer and Information Science
Discipline
Computer Sciences
Subject
Explainability
Large Language Models
Machine Learning
Robustness
Safety
Verification
Funder
Grant number
License
Copyright date
2025
Distributor
Related resources
Author
Xue, Anton
Contributor
Abstract

In the rapidly changing landscape of machine learning (ML), advanced models and systems are increasingly deployed in safety-critical domains. Nonetheless, ensuring their reliability remains a persistent challenge. The central difficulty lies in their often complex, opaque, and black-box nature, which hinders the formal specification, rigorous assurance, and clear interpretation of their behavior. We directly address these challenges by tackling three critical problem areas in trustworthy machine learning: specification, verification, and explanation of complex ML systems. We begin with the fundamental challenge of behavioral specification, focusing on precisely characterizing when a large language model (LLM) correctly follows user-specified instructions. To this end, we propose a logic-based framework for formalizing LLM rule adherence. We demonstrate that real instances of rule adherence and violation, e.g., jailbreak attacks, empirically mirror our theoretical predictions, providing a descriptive framework for understanding these behaviors. Next, we develop scalable verification techniques to formally guarantee the safety of ML models. Specifically, we address computational bottlenecks in expressive semidefinite programming (SDP)-based frameworks for neural network verification. By identifying and exploiting chordal sparsity patterns in large constraints, we obtain speedups of several orders of magnitude, making SDP-based methods more practical for modern networks. Finally, we address the pressing need for reliable insights into complex model behavior through robust explanations. We developed Multiplicative Smoothing (MuS) to provide certified guarantees on the robustness of feature attribution methods. Building on this, our approach extends to the Stability Certification Algorithm (SCA), offering more flexible and practical guarantees. Together, these methods provide non-trivial, practical, and model-agnostic guarantees for common explanation techniques, enhancing their reliability. In summary, this work provides advancements in the specification, verification, and explanation of complex machine learning systems. These contributions are important steps towards designing and building ML systems that are reliable and trustworthy.

Advisor
Alur, Rajeev
Wong, Eric
Date of degree
2025
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation