Deep Lifelong Learning with Factorized Knowledge Transfer
Degree type
Graduate group
Discipline
Subject
Lifelong Machine Learning
Machine Learning
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Human intelligence has the ability to capture abstract concepts from experience and utilize that learned knowledge for adaptation to new situations. Lifelong machine learning aims to achieve those same properties of human intelligence by designing algorithms to learn from a sequence of tasks, extract useful knowledge of previous tasks, and re-use the extracted knowledge to learn new future tasks. Research into lifelong learning has explored various methodologies, including techniques for sharing knowledge across tasks, techniques for maintaining previously acquired skills, and techniques for actively selecting the next task to learn. This dissertation will focus on one theme of lifelong learning: the way knowledge is transferred across tasks via factorization, which breaks down the architecture of neural networks to naturally encode conceptual knowledge. The tensor factorization is capable of discovering abstract but generalizable knowledge from experiences. This dissertation investigates methods to factorize knowledge encoded in neural networks and share the knowledge across multiple tasks, as well as methods to enhance the training of these factorized knowledge transfer mechanisms. This dissertation starts by developing a lifelong learning architecture that utilizes deconvolutional operation to preserve multi-axis features of data. This deconvolution-based factorization architecture empirically shows reduced harmful interference between tasks thanks to sharing abstract knowledge via factorization. The dissertation then studies the importance of transferring the proper level of knowledge in the network for the success of lifelong learning. As a result, an expectation-maximization style algorithm is developed to discover the useful granularity of knowledge to share for each task depending on the given data. This algorithm determines which layers to share while learning tasks in parallel and reduces human intervention in selecting the knowledge transfer architecture for lifelong learning, which is critical for realistic scenarios with complex task relationships. Moreover, it applies to diverse lifelong learning architectures, augmenting existing lifelong learning works. Lastly, the dissertation investigates the use of data programming to extend existing lifelong learning algorithms into semi-supervised settings, tackling the lifelong learning challenge of data annotation. Due to the modularized framework and theoretical guarantees on the quality of generated labels, this framework can be applied to the existing supervised lifelong learning algorithms.