Limit Theorems For Dependent Combinatorial Data, With Applications In Statistical Inference

Somabha Mukherjee, University of Pennsylvania


The Ising model is a celebrated example of a Markov random field, which was introduced in statistical physics to model ferromagnetism. More recently, it has emerged as a useful model for understanding dependent binary data with an underlying network structure. This is a discrete exponential family with binary outcomes, where the sufficient statistic involves a quadratic term designed to capture correlations arising from pairwise interactions. However, in many situations the dependencies in a network arise not just from pairs, but from peer-group effects. A convenient mathematical framework for capturing higher-order dependencies, is the p-tensor Ising model, which is a discrete exponential family where the sufficient statistic consists of a multilinear polynomial of degree p. This thesis develops a framework for statistical inference of the natural parameters in p-tensor Ising models. We begin with the Curie-Weiss Ising model, where every p-tuple of nodes interact with equal strengths, where we unearth various non-standard phenomena in the asymptotics of the maximum-likelihood (ML) estimates of the parameters, such as the presence of a critical curve in the interior of the parameter space on which these estimates have a limiting mixture distribution, and a surprising superefficiency phenomenon at the boundary point(s) of this curve. However, ML estimation fails in more general p-tensor Ising models due to the presence of a computationally intractable normalizing constant. To overcome this issue, we use the popular maximum pseudo-likelihood (MPL) method, which avoids computing the inexplicit normalizing constant based on conditional distributions. We derive general conditions under which the MPL estimate is root N-consistent, where N is the size of the underlying network. Our conditions are robust enough to handle a variety of commonly used tensor Ising models, including spin glass models with random interactions and the hypergraph stochastic block model. Finally, we consider a more general Ising model, which incorporates high-dimensional covariates at the nodes of the network, that can also be viewed as a logistic regression model with dependent observations. In this model, we show that the parameters can be estimated consistently under sparsity assumptions on the true covariate vector.