Machine Learning in Function Spaces
Operator learning is an emerging area of machine learning which aims to learn mappings (operators) between functions from data. Many physical systems can be mathematically formulated as giving a relationship between functional data, hence operator learning has the potential to be a transformative tool in applications such as fluid dynamics, solid mechanics, and climate science. While many classical and successful machine learning approaches to regression focus on data in finite dimensional vector spaces, the straightforward application of these methods to discretizations of functional data can be limiting. This motivates the need for resolution invariant methods which are able to learn on the function spaces of data themselves as opposed to their discretizations. In this thesis we further contribute to this line of work with a new operator learning method, LOCA (Learning Operators with Coupled Attention). Inspired by the success of the attention mechanism, LOCA learns operators between function spaces by averaging features of the input function with a probability distribution that depends on the location of the output function query. These distributions are then coupled to each other across the query domain with a kernel integral transformation, allowing the model to learn correlations in the output functions. This has the additional effect of making the model particularly data efficient in terms of the number of available output function measurements for training. The construction of LOCA is accompanied with proofs of universality, demonstrating it is expressive enough to approximate any continuous operator, and demonstrations of state of the art performance on several operator learning benchmarks. Next, some fundamental limitations of operator architectures with linear decoders such as LOCA are discussed and connections are made to known concepts in approximation theory. This leads to a simple lower bound on the approximation error of such architectures and sheds light on when they become less effective, such as modelling advection dominated phenomenon. Finally, we propose the use of nonlinear decoders in operator learning architectures as a necessary modification to avoid the lower bound limiting their performance. This modification is shown to significantly increase the performance of operator learning architectures while simultaneously requiring fewer model parameters. In total, these results make a notable contribution to the field of operator learning and present several interesting future directions for research.
Preciado, Victor, M