Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Electrical & Systems Engineering

First Advisor

Alejandro Ribeiro


Data and learning have become core components of the information processing and autonomous systems upon which we increasingly rely on to select job applicants, analyze medical data, and drive cars. As these systems become ubiquitous, so does the need to curtail their behavior. Left untethered, they are susceptible to tampering (adversarial examples) and prone to prejudiced and unsafe actions. Currently, the response of these systems is tailored by leveraging domain expert knowledge to either construct models that embed the desired properties or tune the training objective so as to promote them. While effective, these solutions are often targeted to specific behaviors, contexts, and sometimes even problem instances and are typically not transferable across models and applications. What is more, the growing scale and complexity of modern information processing and autonomous systems renders this manual behavior tuning infeasible. Already today, explainability, interpretability, and transparency combined with human judgment are no longer enough to design systems that perform according to specifications.

The present thesis addresses these issues by leveraging constrained statistical optimization. More specifically, it develops the theoretical underpinnings of constrained learning and constrained inference to provide tools that enable solving statistical problems under requirements. Starting with the task of learning under requirements, it develops a generalization theory of constrained learning akin to the existing unconstrained one. By formalizing the concept of probability approximately correct constrained (PACC) learning, it shows that constrained learning is as hard as its unconstrained learning and establishes the constrained counterpart of empirical risk minimization (ERM) as a PACC learner. To overcome challenges involved in solving such non-convex constrained optimization problems, it derives a dual learning rule that enables constrained learning tasks to be tackled by through unconstrained learning problems only. It therefore concludes that if we can deal with classical, unconstrained learning tasks, then we can deal with learning tasks with requirements.

The second part of this thesis addresses the issue of constrained inference. In particular, the issue of performing inference using sparse nonlinear function models, combinatorial constrained with quadratic objectives, and risk constraints. Such models arise in nonlinear line spectrum estimation, functional data analysis, sensor selection, actuator scheduling, experimental design, and risk-aware estimation. Although inference problems assume that models and distributions are known, each of these constraints pose serious challenges that hinder their use in practice. Sparse nonlinear functional models lead to infinite dimensional, non-convex optimization programs that cannot be discretized without leading to combinatorial, often NP-hard, problems. Rather than using surrogates and relaxations, this work relies on duality to show that despite their apparent complexity, these models can be fit efficiently, i.e., in polynomial time. While quadratic objectives are typically tractable (often even in closed form), they lead to non-submodular optimization problems when subject to cardinality or matroid constraints. While submodular functions are sometimes used as surrogates, this work instead shows that quadratic functions are close to submodular and can also be optimized near-optimally. The last chapter of this thesis is dedicated to problems involving risk constraints, in particular, bounded predictive mean square error variance estimation. Despite being non-convex, such problems are equivalent to a quadratically constrained quadratic program from which a closed-form estimator can be extracted.

These results are used throughout this thesis to tackle problems in signal processing, machine learning, and control, such as fair learning, robust learning, nonlinear line spectrum estimation, actuator scheduling, experimental design, and risk-aware estimation. Yet, they are applicable much beyond these illustrations to perform safe reinforcement learning, sensor selection, multiresolution kernel estimation, and wireless resource allocation, to name a few.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."