Trustworthy Machine Learning: Specification, Verification, and Explanation
Degree type
Graduate group
Discipline
Subject
Large Language Models
Machine Learning
Robustness
Safety
Verification
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
In the rapidly changing landscape of machine learning (ML), advanced models and systems are increasingly deployed in safety-critical domains. Nonetheless, ensuring their reliability remains a persistent challenge. The central difficulty lies in their often complex, opaque, and black-box nature, which hinders the formal specification, rigorous assurance, and clear interpretation of their behavior. We directly address these challenges by tackling three critical problem areas in trustworthy machine learning: specification, verification, and explanation of complex ML systems. We begin with the fundamental challenge of behavioral specification, focusing on precisely characterizing when a large language model (LLM) correctly follows user-specified instructions. To this end, we propose a logic-based framework for formalizing LLM rule adherence. We demonstrate that real instances of rule adherence and violation, e.g., jailbreak attacks, empirically mirror our theoretical predictions, providing a descriptive framework for understanding these behaviors. Next, we develop scalable verification techniques to formally guarantee the safety of ML models. Specifically, we address computational bottlenecks in expressive semidefinite programming (SDP)-based frameworks for neural network verification. By identifying and exploiting chordal sparsity patterns in large constraints, we obtain speedups of several orders of magnitude, making SDP-based methods more practical for modern networks. Finally, we address the pressing need for reliable insights into complex model behavior through robust explanations. We developed Multiplicative Smoothing (MuS) to provide certified guarantees on the robustness of feature attribution methods. Building on this, our approach extends to the Stability Certification Algorithm (SCA), offering more flexible and practical guarantees. Together, these methods provide non-trivial, practical, and model-agnostic guarantees for common explanation techniques, enhancing their reliability. In summary, this work provides advancements in the specification, verification, and explanation of complex machine learning systems. These contributions are important steps towards designing and building ML systems that are reliable and trustworthy.
Advisor
Wong, Eric