Leveraging Models to Improve Data Efficiency: Navigation, Reinforcement Learning, and Lie Group Convolutions

Kumar, Harshat

Leveraging Models to Improve Data Efficiency: Navigation, Reinforcement Learning, and Lie Group Convolutions

Files

Kumar_upenngdas_0175C_15820.pdf (4.45 MB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Electrical and Systems Engineering

Discipline

Engineering

Copyright date

2023

Permalink

https://repository.upenn.edu/handle/20.500.14332/59135

View all metadata

Author

Kumar, Harshat

Abstract

Consider a system which takes data as an input, processes the data with a model, and outputs a decision for a particular objective. We call the measure of the amount of data used to complete the objective with some performance metric as data efficiency. Across many domains, it is advantageous to reduce the amount of data to achieve the same or better level of performance. In this thesis, we exploit the model of the system in order to improve the data efficiency across three distinct domains of interest: robot navigation in ellipsoidal worlds, reinforcement learning, and Lie group convolutions. First, consider the problem of navigating to the minimum of a convex potential when there are ellipsoidal obstacles along the way. In particular, we look at the system where the agent estimates its location using local information about its environment, uses a model which describes the dynamics, and outputs a control decision. Artificial potentials strike a balance between the local, sub-optimal bug algorithms and the global information, minimum path algorithms. By combining repulsive obstacle potentials with the objective, they can be implemented locally by traversing the negative gradient. Navigation functions are a particular type of artificial potential with convergence guarantees; however, they require that the obstacles be sufficiently curved. In this thesis, the dynamics model is simplified using a second order correction step. The novel proposed dynamics improve data efficiency by decreasing the amount of information needing to be estimated while extending the class of solvable environments to ellipsoidal obstacles with arbitrary eccentricity. Next, consider the problem of reinforcement learning (RL), where an agent aims to maximize the reward that it sees as it interacts with the environment through actions given by a policy. In this system, the states-action-reward tuples are the input, and the reinforcement learning algorithm is the model which outputs an optimal policy. Within the context of RL, two distinct algorithms are leveraged to establish and improve the finite sample complexity. The first aims to find the policy by decoupling policy search (actor) with dynamic programming (critic). Doing so allows for the finite sample complexity to be established for actor-critic with linear function approximators. This decoupling also permits an understanding on whether the actor or the critic is the bottleneck. This is consequently leveraged to improve the sample complexity with different plug-in critic algorithms. The second considers the case of deterministic polices. Replacing the possibly non-compatible function approximators with zeroth-order gradient estimates, the sample complexity is established for deterministic policy gradient. The form of the model allows for the improvement of the iteration complexity through Monte Carlo variance reduction. Finally, the problem of classification on datasets which have inherent symmetries is considered. The input to the system are signals, such as images or point clouds, which have some sort of continuous symmetry we want the model (a neural network) to exploit to make the correct prediction. Specifically, Lie group convolutions are recontexutalized from an Algebraic Signal Processing (ASP) perspective. Doing so allows for a generalization of conventional group convolutions which no longer require that both the filter and signal be defined as a group signal. Stability theorems are established for the discrete tractable version of the proposed convolution, and results are corroborated on three datasets. Importantly, by leveraging the ASP perspective to develop a new filter, we are able to solve classification problems in higher dimensions which was previously intractable.

Advisor

Ribeiro, Alejandro

Date of degree

2023

Collection

Dissertations and Theses