Date of Award
Doctor of Philosophy (PhD)
Edward I. George
Shane T. Jensen
Spatial data often display high levels of smoothness but can simultaneously present abrupt discontinuities, especially in urban environments. In this dissertation we adopt a Bayesian perspective to account for these two contrasting facts, using partitions of areal data, and we then focus on three challenges that arise in this setting. First, we consider the applied problem of modeling crime trends over time in Philadelphia, measured at a local neighborhood level. We find that spatially local shrinkage imposed by a conditional autoregressive (CAR) model has substantial benefits in terms of out-of-sample predictive accuracy of crime. We also detect spatial discontinuities between neighborhoods that represent barriers. Then, we extend our search for barriers by clustering areal data. We propose a model that induces smoothness within clusters but allows for discontinuities between them, by assuming a ''CAR-within-clusters'' structure.
The first challenge introduced by spatial clustering is that the combinatorially vast space of partitions makes typical stochastic search techniques computationally prohibitive. We introduce an ensemble optimization procedure that summarizes the posterior by simultaneously targeting several high probability partitions. We show on simulated data that our method achieves good estimation and partition selection performance. On the Philadelphia data we find that many recovered borders coincide with natural or built man-made barriers.
The second challenge consists in choosing a distribution over partitions: standard distributions for exchangeable partitions are not appropriate for spatial data. We review and compare the properties of distributions for partitions of areal data that have been proposed in the literature and introduce new ones that display favorable properties.
The third challenge relates to the problem of working with multiple granularities: fixing one resolution can be restrictive because different granularities can be appropriate for different parts of a city. We introduce a model that combines the Nested Dirichlet Process with the Hierarchical Dirichlet Process to allow for flexible partitions of multi-resolution data and sharing of information between the partitions at different resolutions. We demonstrate our method on synthetic data and on real data in West Philadelphia, where central and suburban areas seem to be better represented by higher and lower resolutions, respectively.
Balocchi, Cecilia, "Bayesian Nonparametric Analysis Of Spatial Variation With Discontinuities" (2020). Publicly Accessible Penn Dissertations. 3978.