Date of this Version
The Neyman-Pearson theory of hypothesis testing, with the Type I error rate, α, as the significance level, is widely regarded as statistical testing orthodoxy. Fisher’s model of significance testing, where the evidential p value denotes the level of significance, nevertheless dominates statistical testing practice. This paradox has occurred because these two incompatible theories of classical statistical testing have been anonymously mixed together, creating the false impression of a single, coherent model of statistical inference. We show that this hybrid approach to testing, with its misleading p α statistical significance criterion, is common in marketing research textbooks, as well as in a large random sample of papers from twelve marketing journals. That is, researchers attempt the impossible by simultaneously interpreting the p value as a Type I error rate and as a measure of evidence against the null hypothesis. The upshot is that many investigators do not know what our most cherished, and ubiquitous, research desideratum - "statistical significance" - really means. This, in turn, signals an educational failure of the first order. We suggest that tests of statistical significance, whether p’s or α’s, be downplayed in statistics and marketing research courses. Classroom instruction should focus instead on teaching students to emphasize the use of confidence intervals around point estimates in individual studies, and the criterion of overlapping confidence intervals when one has estimates from similar studies.
alpha levels, p values, p less than alpha criterion, Fisher, Neyman-Pearson, (overlapping) confidence intervals
Hubbard, R., & Armstrong, J. S. (2006). Why We Don’t Really Know What "Statistical Significance" Means: A Major Educational Failure. Retrieved from https://repository.upenn.edu/marketing_papers/43
Date Posted: 22 May 2007
This document has been peer reviewed.