Sequential Learning and Variable Length Markov Chains
Degree type
Graduate group
Discipline
Subject
Funder
Grant number
License
Copyright date
Distributor
Related resources
Author
Contributor
Abstract
Sequential Learning is a framework that was created for statistical learning problems where $(Y_t)$, the sequence of states is dependent. More specifically, when it has a dependence structure that can be represented as a first order Markov chain. It works by first taking nonsequential probability estimates $P(Y_t | X_t)$ and then modifying these with the sequential part to produce $P(Y_t | X_{1:T})$. However, not all sequential models on a discrete space admit such a representation, at least not easily. As such, our first task is to extend Variable Length Markov Chains (VLMCs), which belie their name and are not Markovian, to be used in the sequential learning framework. This extension greatly broadens the scope of sequential learning as using VLMCs permits sequential learning with far fewer assumptions about the underlying dependence of states. After developing the VLMC extension we provide an overview of sequential learning in general and investigate the probability estimates it produces both theoretically and with a simulation study to assess model performance as a function of the complexity of the underlying sequential model and the quality of the initial probability estimates. Next, we apply VLMC sequential learning to the original dataset and problem that inspired sequential learning --- that of scoring sleep in mice using video data. We find that VLMCs perform at the same level, tying and sometimes beating the previous best sequential method which required many assumptions about the sequence of sleep states and a much more rigid model of sequential dependence. Finally, we turn our attention to the problem of modifying predictors when marginal class probabilities are known. This is inspired by the fact that in sequential learning problems, the marginal class distribution can vary substantially from sample to sample in contrast to i.i.d. problems. We provide a general method of marginal probability reweighting, show it to be equivalent to several extant methods used on similar problems, and provide a proof that our method improves probability estimates under log loss. We conclude with simulations assessing our method as a function of loss type and classifier used.