Vocabulary of Research Design

W.D. McCall, Jr., Ph.D.

OS 512, Research Design, 2000

Most students begin this course insisting they are going into private dental practice and will not be doing experiments. In part I agree. But any new technique brought into a practice constitutes an experiment comparing the old technique with the new technique. The usual process is for the clinician to use the new technique for a while and then make a subjective decision. An alternative is to seek new information. This new information can come from one of three sources: (a) Your own data. This source is time-consuming, difficult, and expensive. (b) An authority. This source could be wrong, and could have an agenda that does not include your best interests and your patients' best interests. (c) Published literature. Published literature is not infallible, but it has several advantages over verbal reports of authorities. First, you can read it at your own pace instead of at the pace of the lecturer. So you have the opportunity to be more critical. Second, the act of writing clarifies thinking more than the act of speaking, so the author is more likely to get it right. And third, most journals use some level of refereeing so that the worst manuscripts are filtered out.

Thus, you will need to read the published literature, and you will need to read it critically. This course has the goal of teaching you to read the published literature critically. This handout has the goal of teaching you the vocabulary -- the ideas -- needed. These ideas include the statistical or experimental unit, independence of data, type of data (independent, correlated), types of variables (independent, dependent), operational definitions of variables, types of research design (pre-experimental, quasi-experimental, true experimental), and the timing of data collection (one-shot, pretest only, posttest only, both pretest and posttest, time series).

Reliability and validity. Reliability means repeatability, and validity means true. Reliability depends on the methods being reliable, on the sample size being both large enough to accommodate the usual biological variability and drawn from appropriate populations, and, in the long run, on a number of studies from different laboratories or clinics. Validity has both internal and external components: the internal validity pertains to the published study and may have very tight controls on the type of patient and their compliance with taking the medication, whereas the external validity deals with the problems of the real world such as doctors missing diagnoses and patients not taking the medication.

Operational definitions of variables. This phrase is more intimidating than it needs to be. An operational definition tells you how to make the measurement. To overdo an example, the operational definition of a subject's height might be as follows. Use a book held against a wall and the top of the subject's head to make a pencel mark on the wall; measure from the floor to to pencil mark with a Handy Dandy Tape Measure.

A more realistic example involves palpating the lateral pterygoid muscle for tenderness. Where should you push? With which finger? How hard? What response from the subject is positive? Any of these operations could alter the outcome so all need to be clearly defined.

Number of subjects. This is both the most concrete, since the number is given, and the most vague, since the number needed for a reliable result depends on the subject matter and the rarity of the problem. Many neurological disorders are reported as single cases whereas a behavioral study may use 100 or more for a pilot study. As an initial rule, the sample size was large enough if the authors claim their results were "statistically significant" or if they claim their results could have occurred by chance less than 5% of the time, usually written "p < 0.05."

The statistical unit and independent data. Suppose you wished to know something about the distribution of hair colors in a class of 100 dental students. You could take one hair from the head of each dental student, analyze the color, and prepare a table of the number of hairs of each color. The statistical unit would be the dental student and each hair sample would be independent of the next. On the other hand you could take 100 hairs from the head of one dental student, analyze the colors, and prepare a table of the number of hairs of each color. Now the statistical unit is each individual hair, and the hair samples are not independent because most hairs on the one head are likely to be the same color.

Experiments are conducted and statistical tests are computed because variability in the properties of the statistical units, leading to variability in the results, is expected. The purpose of a large sample is to include that variability in the data.

Consider another example. Suppose you are interested in caries so you examined the teeth of 30 patients in your practice and found 15 teeth with caries. You might divide to obtain an average of 0.5 cavities for each patient. But this information, as given in the previous sentence, is ambiguous. Did you find 15 patients each with one cavity? Or maybe 29 patients with zero cavities and one patient with 15? If your statistical unit is the tooth then the 20 to 30 teeth in each patient are not likely to be independent. On the other hand, suppose your statistical unit is the patient. Then one patient with caries out of 30 is 3.3% not 50%.

The statistical unit is the individual item being counted or measured. Statistical units are independent if, knowing the results of several units, you are unable to predict the next unit. Statistical units are not independent if, knowing the results of several units, you can predict the next result. If you can predict the next result, the data are correlated.

Note that the statistical unit is singular (the student, not the students; the patient, not the 30 patients in the study). Do not confuse the experimental unit with the sample size. Also, the phrases "experimental unit" or "sampling unit" are synonyms for statistical unit. The key idea to be clear about is what is being counted.

Inclusion criteria and exclusion criteria. The inclusion criteria are the rules that get the statistical units into the study. For example, an advertisement in the dental school lobby asked for "post-menopausal women." An unwritten criterion was also "willing to participate in the study." The exclusion criteria are the rules that keep statistical units out of the study. Exclusion criteria may be things like "no history of cancer or heart disease." The source of the subjects for a study and the criteria for inclusion and exclusion can be a source of bias.

Independent and dependent variables. To generate another example, suppose you wish to compare the hair color of dental students above, with the hair color of dental alumni who graduated 30 years ago. Now you have two groups: students and alumni. You formed these two groups by using each individual's date of graduation from dental school.

The variable you use to form the groups is the independent variable. You have some control over it in the sense that you select the date, 30 years ago rather than 5 years ago for example. In many cases the independent variable is the cause. One independent variable can lead to several groups. The independent variable, "graduation date," led to two groups in the example above. Similarly, one independent variable called "drug" might lead to three groups taking aspirin, ibuprofen, and acetaminophen.

The variable you measure to make the comparison is the dependent variable. In this example, hair color is the dependent variable. In many cases the dependent variable is the effect. The dependent variable is usually the result of the experiment.

Notice that the adjective "independent" has been used with "data" (independent data are not correlated) and "independent" has also been used with "variable" (the independent variable causes changes in the dependent variable).

If you wanted to expand your caries research, you might find 30 patients who do not brush their teeth and compare them with the 30 patients who do brush their teeth. The independent variable is tooth brushing and it has two levels: brush and don't brush, which leads to the two groups. The dependent variable might be number of cavities in each patient.

Types of research design. There are three types of research designs: pre-experimental, quasi-experimental, and true experimental. These can be distinguished by the number of groups and how the groups were formed. A pre-experimental design has one group. A case report, or a report of several cases, is pre-experimental. In this design the inclusion criteria defines the group and is also the independent variable. This type of research is the least strong for hypothesis testing but is still important because a disease entity must first be recognized before it can be investigated. What we now call Temporomandibular Disorders is still called Costen's Syndrome in the medical literature after a report of several cases by Costen. Parkinson's Disease gets its name from the author of a report, not because Dr. Parkinson had the disease.

The initial hair color experiment and the initial caries experiment are pre-experimental because each had only one group.

A correlational design has one group and two or more dependent variables are measured. The correlational design is a type of pre-experimental design.

A quasi-experimental design has more than one group but the investigator does not control entry of the subjects into the group membership. The hair color of students and alumni experiment above is quasi-experimental because group membership is determined by dental school graduation. The second caries experiment is quasi-experimental because the group membership is controlled by the subject's tooth brushing behavior.

A true experimental design has more than one group and group membership is controlled by the investigator using random assignment. Since factors unknown to the investigator may influence the results, random assignment of individuals to groups should equalize these factors among the groups. If the investigator does not know of any factors that might influence the results, a simple randomization is appropriate. If the investigator does know of a factor that might influence the results, then a stratified random assignment might be used.

Be clear about the distinction between random assignment of individuals to groups (where group membership usually implies the experimental condition) and random assignment of intact groups to experimental conditions. Intact groups can contain systematic biases that the random assignment is intended to distribute equally among groups.

When a factor other than the main factor of interest is identified, there are two strategies. One is to use stratified random assignment to equalize the confounding factor among the groups. A second is to form separate groups and measure the effect of the confounding factor.

Some books (Spector, 1981) do not separate the group-forming aspect of research design from the data-timing aspect of research design.

Timing of data collection. Data might be collected before any intervention, or pretest only; after an intervention, or posttest only; both before and after the intervention, or pretest-posttest; or, finally, at several times, called a time series. In a correlational design, there may not be an intervention, so the data collection might be done only once or "one-shot."

To summarize to this point, a critical analysis of the research design asks a series of questions: (1) What is the statistical or experimental unit? (2) What are the inclusion and exclusion criteria? (3) What is the independent variable? (4) What is the dependent variable? (5) Do these variables have operational definitions? (6) What is the type of design? (7) What is the timing of the data collection? The answers to these questions will affect our opinion of the validity of the conclusions of the research.

Notation. We need a notation to be able to show the features of the design. The tradition is as follows.

1. Use one line for each group.

2. Use an O to indicate an observation on all members of the group.



So the student hair color experiment would be written:

O1



And the student versus alumni hair color experiment would be written

O1

O1

The two lines indicate that a control group was used in addition to the original group.



3. Observations at multiple times (a time series) is shown by multiple O's.

O1 O2 O3

4. An intervention (e.g., a treatment) is shown by an X.

Suppose you gave a toothpaste to one group of patients and after six months observed the number of caries in each patient. The notation would be

XO

And if you used two types of toothpaste with two groups of patients the notation would be

X1O

X2O

This notation has some ambiguity. You cannot tell from looking at the X's and O's whether the treatment occurred once, e.g., give a pill, or was ongoing, e.g., ask the patients to brush with a particular toothpaste for some months. We will use the X to indicate the onset of treatment relative to the data collection.

5. If randomization was used to assign individuals to the group, indicate the randomization with an R for each group.

R X1O

R X2O

6. If matching (but not randomization) was used to make the groups more nearly homogeneous, use an M. For example if age and gender were use to match a patient group with a non-patient group and then a diagnostic observation was made, the notation would be

M O

M O



As another example, you might collect data on a large number of patients who had success or failure with a treatment then match (by discarding subjects) the two groups afterwards on some relevant variable. The notation would be

X1O M

X1O M

Further Reading. Several books are partially or wholly are devoted to research design. There are chapters in Shott (1990) and in Weintraub (1985), and the books by Spector (1981), Darby and Bowen (1980), and Feinstein (1977). The books are ordered in order of increasing complexity. My favorite is the book by Feinstein. He writes with clarity and humor, and he refuses to let the statistical tail wag the clinical dog.

References Cited.

Darby, M.L., & Bowen, D.M. (1980). Research methods for oral health professionals . St. Louis: C.V. Mosby Co.

Feinstein, A.R. (1977). Clinical Biostatistics . St. Louis: C. V. Mosby Co.



Shott, S. (1990). Experimental Design. In Statistics for Health Professionals (pp. 1-10). Philadelphia: Saunders.

Spector, J.L. (1981). Research Designs . Newbury Park: Sage Publications.

Weintraub, J.A., Douglas, C.A., & Gillings, D.B. (1985). BIOSTATS, Second Edition, Revised ed. Research Triangle Park; NC: CAVCO Inc.