Limit Theorems For Dependent Combinatorial Data, With Applications In Statistical Inference

Mukherjee, Somabha

Limit Theorems For Dependent Combinatorial Data, With Applications In Statistical Inference

Degree type

Doctor of Philosophy (PhD)

Graduate group

Statistics

Subject

Curie-Weiss Model
Estimation
Ising Model
Logistic Regression
Maximum Pseudo-Likelihood
Phase Transition
Mathematics
Statistics and Probability

Copyright date

2021-08-31T20:21:00-07:00

Permalink

https://repository.upenn.edu/handle/20.500.14332/31012

View all metadata

Author

Mukherjee, Somabha

Abstract

The Ising model is a celebrated example of a Markov random field, which was introduced in statistical physics to model ferromagnetism. More recently, it has emerged as a useful model for understanding dependent binary data with an underlying network structure. This is a discrete exponential family with binary outcomes, where the sufficient statistic involves a quadratic term designed to capture correlations arising from pairwise interactions. However, in many situations the dependencies in a network arise not just from pairs, but from peer-group effects. A convenient mathematical framework for capturing higher-order dependencies, is the p-tensor Ising model, which is a discrete exponential family where the sufficient statistic consists of a multilinear polynomial of degree p. This thesis develops a framework for statistical inference of the natural parameters in p-tensor Ising models. We begin with the Curie-Weiss Ising model, where every p-tuple of nodes interact with equal strengths, where we unearth various non-standard phenomena in the asymptotics of the maximum-likelihood (ML) estimates of the parameters, such as the presence of a critical curve in the interior of the parameter space on which these estimates have a limiting mixture distribution, and a surprising superefficiency phenomenon at the boundary point(s) of this curve. However, ML estimation fails in more general p-tensor Ising models due to the presence of a computationally intractable normalizing constant. To overcome this issue, we use the popular maximum pseudo-likelihood (MPL) method, which avoids computing the inexplicit normalizing constant based on conditional distributions. We derive general conditions under which the MPL estimate is root N-consistent, where N is the size of the underlying network. Our conditions are robust enough to handle a variety of commonly used tensor Ising models, including spin glass models with random interactions and the hypergraph stochastic block model. Finally, we consider a more general Ising model, which incorporates high-dimensional covariates at the nodes of the network, that can also be viewed as a logistic regression model with dependent observations. In this model, we show that the parameters can be estimated consistently under sparsity assumptions on the true covariate vector.

Advisor

Bhaswar Bhattacharya

Date of degree

2021-01-01

Collection

Dissertations and Theses