The American Statistical Association (ASA) has warned that the p-value, a quantity used to determine the statistical significance of a given piece of evidence, has been subject to widespread misuse and misinterpretation. The ASA has also issued guiding principles on the use of the p-value; for instance, it cannot be used to determine the probability of a statistical hypothesis being true, or justify the importance of research findings without providing sufficient additional supporting evidence.
In frequentist statistics, the p-value can be said to be the minimum level of significance needed to reject the null hypothesis. In more concrete terms, it is a quantity calculated from observed sample data and measures how extreme a given sample observation is. The smaller the p-value, the more extreme the observation, and thus, the smaller the probability of obtaining an equally extreme or more extreme observation under the assumption that the null hypothesis is true. To put this in another way, the smaller the p-value, the less likely an observed set of values would occur by chance, assuming that the null hypothesis is true. The operative phrase is “under the assumption that the null hypothesis is true“. The p-value is a probability value that must be interpreted under this assumption.
The ASA has advised researchers to avoid drawing scientific conclusions or making policy decisions based on p-values alone. In my personal opinion, researches must provide a robust context in which their experiments have been carried out, list out all statistical assumptions made such as assumptions about normality, heteroskedasticity, or equality of population variances, and provide an honest description on how common pitfalls like selection bias, response errors, and lack of randomization have been addressed or not addressed. Otherwise, statistical results may seem falsely robust.
The ASA statement emphasizes that:
- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.