## Publicly Accessible Penn Dissertations

2018

Dissertation

#### Degree Name

Doctor of Philosophy (PhD)

Electrical & Systems Engineering

Alejandro Ribeiro

#### Abstract

The goal of this thesis is to develop a mathematical framework for autonomous behavior. We begin by describing a minimum notion of autonomy, understood as the ability that an agent operating in a complex space has to satisfy in the long run a set of constraints imposed by the environment of which the agent does not have information a priori. In particular, we care about endowing agents with greedy algorithms to solve problems of the form previously described. Although autonomous behavior will require logic reasoning, the goal is to understand what is the most complex autonomous behavior that can be achieved through greedy algorithms. Being able to extend the class of problems that can be solved with these simple algorithms can allow to free the logic of the system and to focus it towards high-level reasoning and planning.

The second and third chapters of this thesis focus on the problem of designing gradient controllers that allow an agent to navigate towards the minimum of a convex potential in punctured spaces. Such problem is related to the problem of satisfying constraints since we can interpret each constraint as a separate potential that needs to be minimized. We solve this problem first in the case where the information about the potential and the obstacles is deterministic and complete and later, in Chapter \ref{chap_stochnf}, we consider the case where this information is only available from a stochastic model. In both cases, we derive sufficient conditions in which a Rimon-Koditschek artificial potential can be tuned into a navigation function and hence being able to solve the problem. These conditions relate the geometry of the potential of interest and the geometry of the obstacles.

Chapter \ref{chap_feasibility} considers the problem of satisfying a set of constraints when their temporal evolution is arbitrary. We show that an online version of a saddle point controller generates trajectories whose fit and regret are bounded by sublinear functions. These metrics are associated with online operation and they are analogous to feasibility and optimality in classic deterministic optimization. The fact that these quantities are bounded by sublinear functions suggests that the trajectories approach the optimal solution. Saddle points have the advantage of providing an intuition on the relative hardness of satisfying each constraint. The limit values of the multipliers are a measure of such relative difficulty, the larger the multiplier the larger is the cost in which one incurs if we try to tighten the corresponding constraint. In Chapter \ref{chap_counterfactuals} we exploit this property and modify the saddle point controller to deal with situations in which the problems of interest are not feasible. The modification of the algorithm allows us to identify which are the constraints that are harder to satisfy. This information can later be used by a high logic reasoning to modify the problem of interest to make it feasible.

Before concluding remarks and future work we devote our attention to the problem of non-myopic agents. In Chapter \ref{chap_rl} we consider the setting of reinforcement learning where the objective is to maximize the expected cumulative rewards that the agent gathers, i.e., the $Q$-function. We model the policy of the agent as a function in a Reproducing Kernel Hilbert Space since this class of functions has the advantage of being quite rich and allows us to compute policy gradients in a simple way. We present an unbiased estimator of the policy gradient that can be constructed in finite time and we establish convergence of the stochastic gradient policy ascent to a function that is a critical point of the $Q$-function.

Files over 3MB may be slow to open. For best results, right-click and select "save as..."

COinS