Date of this Version
IMS Lecture Notes- Monograph Series
In the '60's, John Tukey and his followers brought exploratory data analysis into statistics, partly as a revolt against what was then perceived as an overly rigid and brittle mathematical modelling philosophy that held sway at that time. Some problems seemed to demand such a purely data-driven approach where data mining methods in the absence of mathematical modelling is the driving philosophical methodology. One did not want to be biased by preconceived ideas about the origin of the data by formulating a model bl!t instead allowed the data to "speak for itself". Vardi liked mathematical modelling and was very good at it. He also promoted data mining, depending on the problem and thus straddled both philosophies. He and I often debated these issues, and were often in friendly disagreement.
I will try to argue with concrete examples of work of Vardi and others in statistics that the pendulum should again swing back a bit towards encouraging more mathematical modelling to obtain maximal benefit from the use of statistical procedures by allowing physics, biology, and other fields of science to enter the statistical problem formulation via mathematical modelling of the specific statistical problem at hand. I would argue that the solution to a specific problem ought to somehow depend on the problem itself, which is not the case with neural-nets and other data-driven approaches that live mostly or entirely within the data or training set of the problem.
Shepp, L. A. (2007). Statistical Thinking: From Tukey to Vardi and Beyond. IMS Lecture Notes- Monograph Series, 54 268-273. http://dx.doi.org/10.1214/074921707000000210
Date Posted: 27 November 2017