BMJ 1995;311:485 (19 August)

Statistics notes

Absence of evidence is not evidence of absence

Douglas G Altman, head,a J Martin Bland, reader in medical statistics b

a Medical Statistics Laboratory, Imperial Cancer Research Fund, London WC2A 3PX, b Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE

Correspondence to: Mr Altman.

The non-equivalence of statistical significance and clinical importance has long been recognised, but this error of interpretation remains common. Although a significant result in a large study may sometimes not be clinically important, a far greater problem arises from misinterpretation of non-significant findings. By convention a P value greater than 5% (P>0.05) is called "not significant." Randomised controlled clinical trials that do not show a significant difference between the treatments being compared are often called "negative." This term wrongly implies that the study has shown that there is no difference, whereas usually all that has been shown is an absence of evidence of a difference. These are quite different statements.

The sample size of controlled trials is generally inadequate, with a consequent lack of power to detect real, and clinically worthwhile, differences in treatment. Freiman et al1 found that only 30% of a sample of 71 trials published in the New England Journal of Medicine in 1978-9 with P>0.1 were large enough to have a 90% chance of detecting even a 50% difference in the effectiveness of the treatments being compared, and they found no improvement in a similar sample of trials published in 1988. To interpret all these "negative" trials as providing evidence of the ineffectiveness of new treatments is clearly wrong and foolhardy. The term "negative" should not be used in this context.

A recent example is given by a trial comparing octreotide and sclerotherapy in patients with variceal bleeding. > The study was carried out on a sample of only 100 despite a reported calculation that suggested that 1800 patients were needed. This trial had only a 5% chance of getting a statistically significant result if the stated clinically worthwhile treatment difference truly existed. One consequence of such low statistical power was a wide confidence interval for the treatment difference. The authors concluded that the two treatments were equally effective despite a 95% confidence interval that included differences between the cure rates of the two treatments of up to 20 percentage points.

Similar evidence of the dangers of misinterpretation of non-significant results is found in numerous metaanalyses (overviews) of published trials, when few or none of the individual trials were statistically large enough. A dramatic example is provided by the overview of clinical trials evaluating fibrinolytic treatment (mostly streptokinase) for preventing reinfarction after acute myocardial infarction. The overview of randomised controlled trials found a modest but clinically worthwhile (and highly significant) reduction in mortality of 22%,4 but only five of the 24 trials had shown a statistically significant effect with P<0.05. The lack of statistical significance of most of the individual trials led to a long delay before the true value of streptokinase was appreciated.

While it is usually reasonable not to accept a new treatment unless there is positive evidence in its favour, when issues of public health are concerned we must question whether the absence of evidence is a valid enough justification for inaction. A recent publicised example is the suggested link between some sudden infant deaths and antimony in cot mattresses. Statements about the absence of evidence are common--for example, in relation to the possible link between violent behaviour and exposure to violence on television and video, the possible harmful effects of pesticide residues in drinking water, the possible link between electromagnetic fields and leukaemia, and the possible transmission of bovine spongiform encephalopathy from cows. Can we be comfortable that the absence of clear evidence in such cases means that there is no risk or only a negligible one?

When we are told that "there is no evidence that A causes B" we should first ask whether absence of evidence means simply that there is no information at all. If there are data we should look for quantification of the association rather than just a P value. Where risks are small P values may well mislead: confidence intervals are likely to be wide, indicating considerable uncertainty. While we can never prove the absence of a relation, when necessary we should seek evidence against the link between A and B--for example, from case-control studies. The importance of carrying out such studies will relate to the seriousness of the postulated effect and how widespread is the exposure in the population.

References

Freiman JA, Chalmers TC, Smith H, Kuebler RR. The importance of beta, the type II error, and sample size in the design and interpretation of the randomized controlled trial: survey of two sets of "negative" trials. In:Bailar JC, Mosteller F, eds. Medical uses of statistics. 2nd ed. Boston, MA: NEJM Books, 1992:357-73.

Chalmers I. Proposal to outlaw the term "negative trial." BMJ1985;290:1002.

Sung JJY, Chung SCS, Lai C-W, Chan FKL, Leung JWC, Yung M-L, Kassianides C, et al. Octreotide infusion or emergency sclerotherapy for variceal haemorrhage. Lancet 1993;342:637-41.

Yusuf S, Collins R, Peto R, Furberg C, Stampfer MJ, Goldhaber SZ, et al. Intravenous and intracoronary fibrinolytic therapy in acute myocardial infarction: overview of results on mortality, reinfarction and side-effects from 33 randomized controlled trials. Eur Heart J 1985;6:556-85.

This article has been cited by other articles:

TARNOW-MORDI, W. O., HEALY, M. J R (1999). Distinguishing between "no evidence of effect" and"evidence of no effect" in randomised controlled trials and other comparisons. Arch. Dis. Child. 80: 210-211.

Rudolf, M C J, Lyth, N, Bundle, A, Rowland, G, Kelly, A, Bosson, S, Garner, M, Guest, P, Khan, M,Thazin, R, Bennett, T, Damman, D, Cove, V, Kaur, V (1999). A search for the evidence supporting community paediatric practice. Arch. Dis. Child. 80: 257-261.

Raynor, P., Rudolf, M. C J, Cooper, K., Marchant, P., Cottrell;, D., BLAIR, M. (1999). A randomised controlled trial of specialist health visitor intervention for failure to thrive • Commentary. Arch. Dis. Child. 80:500-506.

King, R., Denne, J. (1995). Audit suggests that use of aspirin is rising in coronary heart disease. BMJ311: 1504-1504.

Worth, C, Barnes, H. (1995). Water shortage in West Yorkshire has serious health implications. BMJ311: 1504-1504.

Bender, R., Sawicki, P. T (1996). Interpretation of study's results is open to criticism. BMJ 312: 254-254.

Jones, B, Jarvis, P, Lewis, J A, Ebbutt, A F (1996). Trials to assess equivalence: the importance of rigorous methods. BMJ 313: 36-39.

Hatcher, S. (1996). Predicting which psychiatric patients are at risk of suicide. BMJ 313: 884-884.

Williams, C, Harrad, R A, Sparrow, J M, Harvey, I, Golding, J, Lee, J., Adams, G., Sloper, J., McIntyre,A., Fielder, A. R, Aylward, G W, Rahi, J., Dezateux, C. (1998). Future of preschool vision screening. BMJ 316:937-937.

Rushton, L. (2000). Reporting of occupational and environmental research: use and misuse of statistical and epidemiological methods. Occup Environ Med 57: 1-9.