Statistical Methods for Non-Ignorable Missing Data With Applications to Quality-of-Life Data.

Liao, Kaijun

Statistical Methods for Non-Ignorable Missing Data With Applications to Quality-of-Life Data.

Files

Liao_upenngdas_0175C_10430.pdf (693.79 KB)

Degree type

Doctor of Philosophy (PhD)

Graduate group

Epidemiology & Biostatistics

Subject

Clinical trial /Cancer applications
Composite likelihood /pseudo likelihood method
Conjoint analysis
Hidden Markov model
Longitudinal and multivariate data
Non-ignorable missing data
Biostatistics

Copyright date

2014-08-20T00:00:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/32423

View all metadata

Author

Liao, Kaijun

Abstract

Researchers increasingly use more and more survey studies, and design medical studies to better understand the relationships of patients, physicians, their health care system utilization, and their decision making processes in disease prevention and management. Longitudinal data is widely used to capture trends occurring over time. Each subject is observed as time progresses, but a common problem is that repeated measurements are not fully observed due to missing response or loss to follow up. An individual can move in and out of the observed data set during a study, giving rise to a large class of distinct "non-monotone" missingness patterns. In such medical studies, sample sizes are often limited due to restrictions on disease type, study design and medical information availability. Small sample sizes with large proportions of missing information are problematic for researchers trying to understand the experience of the total population. The information in the data collected may produce biased estimators if, for example, the patients who don't respond have worse outcomes, or the patients who answered "unknown" are those without access to medical or non-medical information or care. Data modeled without considering this missing information may cause biased results. A first-order Markov dependence structure is a natural data structure to model the tendency of changes. In my first project, we developed a Markov transition model using a full-likelihood based algorithm to provide robust estimation accounting for "non-ignorable'' missingness information, and applied it to data from the Penn Center of Excellence in Cancer Communication Research. In my second project, we extended the method to a pseudo-likelihood based approach by considering only pairs of adjacent observations to significantly ease the computational complexities of the full-likelihood based method proposed in the first project. In my third project, we proposed a two stage pseudo hidden Markov model to analyze the association between quality of life measurements and cancer treatments from a randomized phase III trial (RTOG 9402) in brain cancer patients. By incorporating selection models and shared parameter models with a hidden Markov model, this approach provides targeted identification of treatment effects.

Advisor

Andrea B. Troxel

Date of degree

2012-01-01

Collection

Dissertations and Theses