Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)

Graduate Group

Epidemiology & Biostatistics

First Advisor

Andrea B. Troxel


Researchers increasingly use more and more survey studies, and design medical studies to better understand the relationships of patients, physicians, their health care system utilization, and their decision making processes in disease prevention and management. Longitudinal data is widely used to capture trends occurring over time. Each subject is observed as time progresses, but a common problem is that repeated measurements are not fully observed due to missing response or loss to follow up. An individual can move in and out of the observed data set during a study, giving rise to a large class of distinct "non-monotone" missingness patterns. In such medical studies, sample sizes are often limited due to restrictions on disease type, study design and medical information availability. Small sample sizes with large proportions of missing information are problematic for researchers trying to understand the experience of the total population. The information in the data collected may produce biased estimators if, for example, the patients who don't respond have worse outcomes, or the patients who answered "unknown" are those without access to medical or non-medical information or care. Data modeled without considering this missing information may cause biased results.

A first-order Markov dependence structure is a natural data structure to model the tendency of changes. In my first project, we developed a Markov transition model using a full-likelihood based algorithm to provide robust estimation accounting for "non-ignorable'' missingness information, and applied it to data from the Penn Center of Excellence in Cancer Communication Research. In my second project, we extended the method to a pseudo-likelihood based approach by considering only pairs of adjacent observations to significantly ease the computational complexities of the full-likelihood based method proposed in the first project. In my third project, we proposed a two stage pseudo hidden Markov model to analyze the association between quality of life measurements and cancer treatments from a randomized phase III trial (RTOG 9402) in brain cancer patients. By incorporating selection models and shared parameter models with a hidden Markov model, this approach provides targeted identification of treatment effects.

Included in

Biostatistics Commons