Units

BMJ 1997;314:1874 (28 June)

General practice

Statistics Notes: Units of analysis

Douglas G Altman, head,^a J Martin Bland, professor of medical statistics ^b

^a ICRF Medical Statistics Group, Centre for Statistics in Medicine, Institute of Health Sciences, Oxford OX3 7LF, ^b Department of Public Health Sciences, St George's Hospital Medical School, London SW17 0RE

Correspondence to: Mr Altman

In clinical studies the focus of interest is almost always thepatient. If we carry out a randomised trial to compare two treatmentswe are interested in comparing the outcomes of patients whoreceived each of the treatments. In some conditions severalmeasurements will be taken on the same patient, but the focusof interest remains the patient. Failure to recognise this factresults in multiple counting of individual patients and canseriously distort the results. We explain this error below.Its frequency in medical research is indicated by the wholechapter devoted to it in Andersen's classic compilation.¹

The simplest case is when researchers study a part of the humananatomy which is, so to speak, in duplicate: eyes, ears, arms,etc. At the other extreme very many measurements can be takenon a single patient. Such data arise frequently in dentistry,with measurements made on each tooth, or even each face of eachtooth, and in rheumatology, in which pain or mobility may beassessed for each joint of each finger. In statistical terminologythe patient is the sampling unit (or unit of investigation)and thus should be the unit of analysis.

There are two related consequences of ignoring the fact thatthe data include multiple observations on the same individuals.Firstly, this procedure violates the widespread assumption ofstatistical analyses that the separate data values should beindependent. Secondly, the sample size is inflated, sometimesdramatically so, which may lead to spurious statistical significance.

Inflated samples

To take a simple case, we may wish to compare the blood pressuresof two groups of 30 patients. If we measured blood pressureon each arm of each patient we could double the number of observationsbut not the amount of information, as the two pressures fromeach patient will be very similar. The use of the t test tocompare the two sets of 60 observations is invalid. Andersen¹presented data from a randomised double blind crossover trialof ketoprofen and aspirin in the treatment of rheumatoid arthritis.An impressive P value of 0.00000001 was obtained from an analysisof 3944 observations, but these were obtained from only 58 patients.Such errors are not rare. In a review of 196 randomised trialsof non-steroidal anti-inflammatory agents Gøtzsche foundthat 63% of reports used the wrong units of analysis.²

We previously discussed a similar fallacy arising in the useof correlation coefficients, when multiple observations fromeach individual produced a spurious increase in the sample size anda corresponding spurious "significant" relationship.³ We suggested techniques to analyse such data when the focus was either thevariation within subjects⁴ or between subjects⁵.

There is nothing wrong in collecting such data; indeed the useof multiple observations can often improve the statistical powerof a study. But such studies need to be analysed correctly. Thesimplest approach is to collapse all the data for an individual into a summary measure.⁶ For example, we could validly analysethe mean of the two blood pressure values for each patient.Alternatively, we can use a statistical method which explicitlytakes account of the multiplicity. With well designed studieswe may be able to use analysis of variance. A more complex generalapproach is multilevel modelling,⁷ which is not available instandard statistical software and may be difficult to applyand interpret.

Take account of multiplicity

The same objection applies to the use of multiple measurementsmade on different occasions. Here too the sampling unit is thepatient, and thus the unit of analysis should also be the patient.² A further feature of this type of study is that in some situationsthe number of measurements made on a patient may itself carry prognostic information. For example, repeat measurements maybe made only if there is some clinical concern-for example,fetal ultrasound measurements in pregnancy. To treat all these measurements as independent is clearly wrong, but bias is introduced too when those with more data are systematically different fromthose with single observations. An extreme example of this phenomenonoccurs when analysing multiple hospital admissions for a potentially fatal condition.¹ Those with more than one admission must have survivedthe first admission.

Failure to carry out the correct analysis can lead to problemsof interpretation too. Commenting on one trial, Andersen observed,"This trial resulted in the apparent conclusion that after 1year 22% of the patients, but only 16% of the legs, have expired."¹

Similar problems arise when we cannot sample individual patientsdirectly but choose a sample of hospitals, wards, or generalpractices and then obtain data for all or a subsample of the patientswithin these groups. Here analysis of data for individual patientsleads to the errors described above. We consider this type ofstudy in forthcoming Statistics Notes.

References

Andersen B. Methodological errors in medical research. Oxford: Blackwell, 1990.

Gøtzsche PC. Methodology and overt and hidden bias in reports of 196 double-blind trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis. Controlled Clin Trials 1989;10:31-56.

Bland JM, Altman DG. Correlation, regression, and repeated data. BMJ 1994;308:896.

Bland JM, Altman DG. Calculating correlation coefficients with repeated observations. Part 1: correlation within subjects. BMJ 1995;310:446.

Bland JM, Altman DG. Calculating correlation coefficients with repeated observations. Part 2: correlation between subjects. BMJ 1995;310:633. [Correction BMJ 1996;312:572]

Matthews JNS, Altman DG, Campbell MJ, Royston JP. Analysis of serial measurements in medical research. BMJ 1990;300:230-235.

Goldstein H. Multi-level statistical models. 2nd ed. London: Edward Arnold, 1995.

This article has been cited by other articles:

Bland, J M., Kerry, S. M (1997). Trials randomised in clusters. BMJ 315: 600-600.

Altman, D. G (1997). Study to predict which elderly patients will fall shows difficulties in deriving and validating a model. BMJ 315: 1309-1309

Kerry, S. M, Bland, J M. (1998). Analysis of a trial randomised in clusters. BMJ 316: 54-54.

Graham, W., Smith, P., Kamal, A, Fitzmaurice, A, Smith, N, Hamilton, N, Wyatt, J. (2000). Randomised controlled trial comparing effectiveness of touch screen system with leaflet for providing women with information on prenatal tests • Commentary: Evaluating electronic consumer health material. BMJ 320: 155-160.