Park, Sangdon

Email Address
ORCID
Disciplines
Research Projects
Organizational Units
Position
Introduction
Research Interests

Search Results

Now showing 1 - 4 of 4
  • Publication
    Detecting OODs as datapoints with High Uncertainty
    (2021-07-23) Kaur, Ramneet; Park, Sangdon; Sokolsky, Oleg; Jha, Susmit; Lee, Insup; Roy, Anirban
    Deep neural networks (DNNs) are known to produce incorrect predictions with very high confidence on out-of-distribution inputs (OODs). This limitation is one of the key challenges in the adoption of DNNs in high-assurance systems such as autonomous driving, air traffic management, and medical diagnosis. This challenge has received significant attention recently, and several techniques have been developed to detect inputs where the model’s prediction cannot be trusted. These techniques detect OODs as datapoints with either high epistemic uncertainty or high aleatoric uncertainty. We demonstrate the difference in the detection ability of these techniques and propose an ensemble approach for detection of OODs as datapoints with high uncertainty (epistemic or aleatoric). We perform experiments on vision datasets with multiple DNN architectures, achieving state-of-the-art results in most cases.
  • Publication
    Calibrated Prediction with Covariate Shift via Unsupervised Domain Adaptation
    (2020-03-01) Park, Sangdon; Bastani, Osbert; Weimer, James; Lee, Insup
    Reliable uncertainty estimates are an important tool for helping autonomous agents or human decision makers understand and leverage predictive models. However, existing approaches to estimating uncertainty largely ignore the possibility of covariate shift—i.e., where the real-world data distribution may differ from the training distribution. As a consequence, existing algorithms can overestimate certainty, possibly yielding a false sense of confidence in the predictive model. We propose an algorithm for calibrating predictions that accounts for the possibility of covariate shift, given labeled examples from the training distribution and unlabeled examples from the real-world distribution. Our algorithm uses importance weighting to correct for the shift from the training to the real-world distribution. However, importance weighting relies on the training and real-world distributions to be sufficiently close. Building on ideas from domain adaptation, we additionally learn a feature map that tries to equalize these two distributions. In an empirical evaluation, we show that our proposed approach outperforms existing approaches to calibrated prediction when there is covariate shift.
  • Publication
    PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction
    (2020-02-01) Park, Sangdon; Bastani, Osbert; Matni, Nikolai; Lee, Insup
    We propose an algorithm combining calibrated prediction and generalization bounds from learning theory to construct confidence sets for deep neural networks with PAC guarantees---i.e., the confidence set for a given input contains the true label with high probability. We demonstrate how our approach can be used to construct PAC confidence sets on ResNet for ImageNet, a visual object tracking model, and a dynamics model for the half-cheetah reinforcement learning problem.
  • Publication
    Resilient Linear Classification: An Approach to Deal with Attacks on Training Data
    (2017-04-01) Park, Sangdon; Weimer, James; Lee, Insup
    Data-driven techniques are used in cyber-physical systems (CPS) for controlling autonomous vehicles, handling demand responses for energy management, and modeling human physiology for medical devices. These data-driven techniques extract models from training data, where their performance is often analyzed with respect to random errors in the training data. However, if the training data is maliciously altered by attackers, the effect of these attacks on the learning algorithms underpinning data-driven CPS have yet to be considered. In this paper, we analyze the resilience of classification algorithms to training data attacks. Specifically, a generic metric is proposed that is tailored to measure resilience of classification algorithms with respect to worst-case tampering of the training data. Using the metric, we show that traditional linear classification algorithms are resilient under restricted conditions. To overcome these limitations, we propose a linear classification algorithm with a majority constraint and prove that it is strictly more resilient than the traditional algorithms. Evaluations on both synthetic data and a real-world retrospective arrhythmia medical case-study show that the traditional algorithms are vulnerable to tampered training data, whereas the proposed algorithm is more resilient (as measured by worst-case tampering).