Learning To Compositionally Reason Over Natural Language

Nitish Gupta, University of Pennsylvania


The human ability to understand the world in terms of reusable ``building blocks'' allows us to generalize in near-infinite ways. Developing language understanding systems that can compositionally reason in a similar manner is crucial to achieve human-like capabilities. Designing such systems presents key challenges in the architectural design of machine learning models and the learning paradigm used to train them. This dissertation addresses aspects of both of these challenges by exploring compositional structured models that can be trained using end-task supervision.

We believe that solving complex problems in a generalizable manner requires decomposition into sub-tasks, which in turn are solved using reasoning capabilities that can be reused in novel contexts. Motivated by this idea, we develop a neuro-symbolic model with a modular architecture for language understanding and focus on answering questions requiring multi-step reasoning against natural language text. We design an inventory of freely-composable, learnable neural modules for performing various atomic language understanding and symbolic reasoning tasks in a differentiable manner. The question guides how these modules are dynamically composed to yield an end-to-end differentiable model that performs compositional reasoning and can be trained using end-task supervision. However, we show that when trained using such supervision, having a compositional model structure is not sufficient to induce the intended problem decomposition in terms of the modules; Lack of supervision for the sub-tasks leads to modules that do not freely compose in novel ways, hurting generalization. To address this, we develop a new training paradigm that leverages paired examples---instances that share sub-tasks---to provide an additional training signal to that provided by individual examples. We show that this paradigm induces the intended compositional reasoning and leads to improved in- and out-of-distribution generalization.