Chapter 4

Consistency Within and Across Situations:
Trait Versus Environment


History
The Controversy of Mischel and Peterson
Reinforcing the trait argument
The rejection of traits: Behavioral assessment
Person-environment interactions
Aptitude-by-treatment interactions
Environmental assessment
Moderators of cross-situational consistency
States and traits
Process research
Summary

History

Traditional psychological measurement and behavioral assessment represent one of many schisms found in psychology (Staats, 1983). Proponents of these two approaches tend to be interested in traits or behavior (Fiske, 1979). In one world spins traditional measurement, with its emphasis on traits, "real, relatively stable differences between individuals in behavior, interests, preferences, perceptions, and beliefs" (Murphy & Davidshofer, 1988, p. 17). In another is behavioral assessment, with its emphasis on psychological states as influenced by the environment. Rarely do the literatures of these worlds mix.

The tradition started by Watson focused on behavior. Behaviorists maintained that a psychological phenomenon was not real unless it could be directly observed. Given such assumptions, there was little need for measurement concepts: whatever could be operationally defined (in the laboratory) or observed (in the clinic), existed. On the other hand, the psychological phenomena of interest to the mental testers were almost always assumed to be unobservable: traits were latent. No single behavior could constitute intelligence, for example, but intelligence formed an abstract, useful construct that could explain clusters of behavior. Intelligence itself was not directly observable, and to this day the most advanced theories of mental testing discuss intelligence in terms of latent traits.

Some initial integration of the two positions has occurred (e.g., Silva, 1993), but the process remains incomplete. Behavioral assessors, for example, have more recently observed that fear is a construct. That is, fear itself cannot be directly observed, but the construct may be useful for certain purposes. In the clinic, many effective behavioral techniques have been developed to alleviate patients' fears. In the course of applying those techniques, however, assessors have observed a desynchrony between different indices of fear. At the conclusion of treatment, it is common for a snake phobic to be able to approach a snake and verbally report no fear, but still possess an accelerated heart rate. Unless we define each phenomenon by its measurement mode, we must resort to the use of a construct to make sense of this situation. As described in Chapter 2, Rachman and Hodgson (1974) cited Lang's (1968) definition of fear as a construct composed of "loosely coupled components" (Rachman & Hodgson, 1974, p. 311) that may co-vary or vary independently. Because of instances like these, behavioral assessors have become interested in applying traditional measurement concepts in their domain (e.g., Hartman, 1984).

As previously defined, traits are enduring, stable psychological characteristics of individuals. In the context of this chapter, traits are assumed to be stable across situations. Thus, persons described as honest are expected to display honest behavior regardless of the situations they found themselves in. Figure 14 displays examples of cross-situational consistency and inconsistency. For example, individuals who score low on a test of honesty may behave dishonestly in classrooms and stores, while more honest individuals behave honestly in those settings. In religious situations, however, both high and low honesty individuals may behave honestly. Honest behavior in this case is situation specific.

----------------------------------------
Insert Figure 14 About Here
----------------------------------------

Use of the term trait implies that enough cross-situational stability occurs so that "useful statements about individual behavior can be made without having to specify the eliciting situations" (Epstein, 1979, p. 1122). Similarly, Campbell and Fiske (1959) stated that "any conceptual formulation of trait will usually include implicitly the proposition that this trait is a response tendency which can be observed under more than one experimental condition" (p. 100).

The assumption that psychological phenomena are traits has long guided psychological measurement, and the continued faith in trait measurement has been maintained at least partially by its success. For example, most measurement in vocational psychology is guided by trait-and-factor theory. Typically the tests employed by vocational psychologists measure interests, also assumed to be stable. Test-takers' interest scores are compared with the scores of successful workers in all occupations to determine the best fit. The Theory of Work Adjustment (Dawis & Lofquist, 1984; Lofquist & Dawes, 1969) proposes that to maximize worker satisfaction and production, individuals' abilities and needs must be matched with job requirements and reinforcers. Abilities, needs, job requirements, and reinforcers are assumed to be relatively stable. Yet vocational counselors have also been influenced by a developmental concept, that of vocational self-concept crystallization (Barrett & Tinsley, 1977). Developmental theorists have suggested that vocational identities--beliefs about one's abilities and needs--tend to grow and shift until when they crystallize, typically around age 18. That is, one's vocational identities become a trait, presumably stable for the remainder of one's life. A similar idea, the differentiation hypothesis (Anastasi, 1985), has been proposed to account for the emergence in individuals of group factors of intelligence.

As noted in Chapter 2, traits were first employed with the concept of intelligence, a phenomenon assumed to be transmitted through heredity and immune to environmental influences. Although measures of intelligence predicted school performance better than physiological measures, these mental tasks still fell considerably short of the mark of total prediction. If intelligence tasks fail, where is the next logical place to look for variables to assist in prediction? Personality and temperament were psychologists' answer to that question.

Developers of personality and temperament tests closely copied the assumptions and procedures of intelligence tests (Danziger, 1990). With both intelligence and personality tests, scaling involved aggregation, that is, a total score was obtained by adding the total number of correctly performed tasks or endorsed items. Most importantly, personality, like intelligence, was assumed to be consistent across persons and independent of environments.

The Controversy of Mischel and Peterson

But trait-based tests are not entirely consistent. Loevinger (1957) wrote that "Circumstances contrive to keep behavior largely unpredictable, however constant its propensities" (p. 688-689). The importance of environment was emphasized in 1968 with the publication of Mischel's Personality and Assessment and Peterson's Clinical Study of Social Behavior. These books ignited a controversy among measurement and personality theorists that continues to smolder.

Mischel (1968) contended that personality constructs were unstable, that is, the influence of traits was relatively small compared to the influence of situations or environments. He reviewed findings of measurement studies and proceeded to criticize personality psychologists for failing to account for environmental factors when measuring traits: "What people do in all situations and on all tests can be affected, often quite readily, by many stimulus conditions and can be modified substantially by numerous environmental manipulations" (p. 10). Mischel favored measuring behavior in specific situations as opposed to measuring signs of underlying mental processes that could presumably predict future behavior. Mischel believed that new theories of measurement needed to be developed that could account for human adaptability, perception, cognition, self-regulation and self-modification.

Peterson (1968, p. 23) sounded a similar theme when he stated that research had "suggested very strongly that traditional conceptions of personality as internal behavior dispositions were inadequate and insufficient" because of the influence of situations. Peterson reviewed studies supporting this position in a number of areas, including research documenting the effects of examiners on the behavior of individuals taking projective devices. Whether Rorschach examiners were friendly or distant, Peterson noted, influenced the number and types of responses produced by examinees. Discussing personality assessment in the context of clinical applications, Peterson concluded that because "strong positive evidence for validity and utility is nowhere to be seen...it looks as if entirely new approaches to the clinical study of behavior will have to be developed" (p. 3).

Mischel's and Peterson's publications prompted many psychologists to re-examine trait-based measurement approaches. The violation of the expectation of personality consistency produced three major responses. Traditional theorists sought out more evidence for the consistency of traits. In contrast, some psychologists came to reject intrapsychic traits entirely. Most mainstream psychologists did not follow this extreme direction, however, preferring to search for explanations in the interaction between psychological traits and environments.

Reinforcing the trait argument. Some contemporary psychologists consider the attack to be repulsed and the battle won (e.g., Block, 1977; Epstein, 1990; Goldberg, 1993). For example, Anastasi (1985) stated that "the long-standing controversy between situational specificity and personality traits has been largely resolved" (p. 134). Anastasi's solution was to redefine traits as repositories of behavioral consistencies. Traits so defined are not causes, but simply convenient descriptions of psychological regularities that occur and may be influenced by environmental contexts.

Based on studies employing a variety of research methodologies and samples, personality researchers have become increasingly confident that long-term stability of personality traits exists. West and Graziano (1989) concluded that research studies have demonstrated substantial long-term stability of personality in children and adults. They also noted, however, that stability declines across longer measurement intervals, is lower in children, and depends on the particular traits measured. Moreover, predictions of personality from one time point to another typically account for only about 25% of the variance, leaving considerable room for environmental and person-environment influences. Examining the stability of vocational interests, Swanson and Hansen (1988; see also Campbell, 1971) found similar results: although individual variability and environmental influences existed, trait stability could be demonstrated over time. Similarly, Staw and Ross (1985) studied 5,000 middle-aged men and found that job satisfaction remained stable even when employees changed jobs and occupation. In a laboratory study with 140 undergraduates, Funder and Colvin (1991) found behavioral consistency across laboratory and real-life settings, although consistency varied by type of behavior.

Epstein (1979, 1980) proposed that trait inconsistency results from insufficient aggregation of measurement observations. For example, one can aggregate 30 test items into a total score, and this total score is likely to predict criteria better than any one of the individual items. Similarly, one can aggregate scores across different measurement occasions. While acknowledging evidence that behavior changes as a result of situational variables, Epstein (1979) reviewed research that found that aggregating psychological measurements results in a substantial increase in validity coefficients. In terms of classical test theory, aggregation works because behavioral consistencies accumulate over multiple measurements while random errors do not (Rushton, Jackson, & Paunonen, 1981). Epstein (1979) also conducted a series of studies that found that through aggregation, intercorrelations of measures of behavior, self-reports, and ratings by others could exceed the typical .30 ceiling.

Most contemporary psychologists view the scores produced on intelligence tests as stable and as indicative of latent traits that operate across environments. For example, Schmidt and Hunter's (1977) work on validity generalization indicated that for general classes of occupational groups, tests of cognitive abilities may have validity across a wide variety of situations. Their research demonstrated that a significant portion of the variability among validity coefficients reported in the literature results from methodological problems such as small sample size, criterion unreliability and scale range restrictions. When these sources of error are removed, cognitive tests have relatively stable validity within occupational groups. This work complements the position of Mischel (1968) who found that cross-situational consistency existed for behavioral correlates of cognitive abilities. However, validity generalization proponents' claim of negligible variation over sites has not been universally accepted (Tenopyr, 1992; also see Cronbach's [1991a] analysis of Hedges' [1987] data), and the question of situational interactions with cognitive abilities is probably not as closed as many psychologists consider it to be.

The rejection of traits: Behavioral assessment. Behaviorists emphasized the dominance of environmental reinforcement in shaping individual's behavior, be it motor or verbal. In contrast with traditional psychological measurement, behavioral assessors are interested in measuring individuals' past learning histories and current environmental influences (Nelson & Hayes, 1986). Behavioral assessors observe behavior in natural or contrived settings and attend to stimuli, behavioral responses, and the consequences of those responses.

The processes, assumptions and procedures of behavioral assessment differ from traditional measurement. Hartman (1984) emphasized that behavioral assessment is direct, repeated and idiographic. Assessment is direct in that the psychologist measures observable behavior. Any observed behavior is considered to be a sample of potential behavior, as opposed to a sign of an underlying, unobservable trait (cf. Goodenough, 1950, cited in Cronbach & Meehl, 1955). Behavior is measured repeatedly for the purpose of demonstrating relative stability before intervention and change after intervention, thus demonstrating that the intervention is the cause of the behavioral change. Assessment may consist of continuous recording of behavior (when only a few behaviors occur) or some type of time sampling. With the exception of areas driven by accountability concerns (e.g., psychiatric inpatients), non-behavioral psychologists typically do little or no formal measurement during the intervention process.

Behavioral psychologists assess idiographic variables, that is, those unique to the individual in question, such as a B-A-C mode sequence. Cone (1988) argued that nomothetic, trait-based measurements produce data remote from single cases. He suggested that idiographic instruments will be more sensitive to individual behavior change. In this context, idiographic measures are criterion-referenced (i.e., scores are compared to some absolute measure of behavior), while nomothetic are norm-referenced (i.e., scores are compared among individuals). Norm-referenced tests are constructed to maximize variability among individuals (Swezey, 1981). However, items which measure behaviors infrequently performed by the population are unlikely to be included in norm-referenced tests. Jackson (1970), for example, suggested that items checked by less than 20% of the test development sample be dropped because they will not contribute to total score variability. Yet those infrequent, idiographically relevant items may be the very ones of interest to change agents and to theorists.

Even the label behaviorists employs differs from traditional measurement. Behavioral assessment, rather than behavioral measurement, is the term employed to describe these measurement approaches because (a) psychologists (and to some extent, clients) typically perform the measurement in a clinical setting, in conjunction with behavioral interventions, and (b) psychologists gather different types of measurement data and integrate them in an assessment.

In contrast with traditional psychological measurement, where anyone can be a self- or other-observer if enough measurements are gathered to decrease measurement error, behavioral assessment involves training observers. Training consists of learning an observation manual (containing definitions of relevant behavior and scoring procedures), conducting analogue observations, on site practice, retraining and debriefing (Hartman, 1984; Nay, 1979; Paul, 1986). Hartman (1984) noted that research has indicated that more accurate observational skills have been associated with older persons, women, and greater levels of social skills, intelligence, motivation, and attention to detail.

Behavioral assessors often express ambivalence about the utility of traditional psychometric analyses. For example, behavioral assessors have begun to attend to validity estimates of what are called higher-order variables, such as anxiety and fear, that are likely to have more than one factor influencing their scores. Whether measures of eye contact, voice volume, and facial expression all relate to a client's complaint about shyness (Kazdin, 1985), for example, can be framed as a question of construct validity. Yet Cone (1988) stated that:

Construct validity will be of no concern to behavioral assessors, in one sense since constructs are not the subject of interest, behavior is; in another sense, behavior can be seen as a construct itself, in which case the instrument will have construct validity to the extent that it 'makes sense' in terms of the behavior as the client and the assessor understand it. (p. 59). Cone (1988) also questioned the importance of discriminant validity, saying that it "is not relevant to an assessment enterprise that is built on the accuracy of its instruments. By definition, an accurate instrument taps the behavior of interest and not something else" (p. 61).

Cone (1988) and Pervin (1984) indicated that additional theoretical and psychometric criteria need to be established for behavioral assessment. For example, Cone (1988) proposed that a behavioral measure, to be considered accurate, must be able to (a) detect the occurrence of a behavior; (b) detect a behavior's repeated occurrence; (c) detect its occurrence in more than one setting; and (d) have parallel forms that allow detection of co-variation to demonstrate that the behavior can be detected independent of any particular method. Cone also observed that no guidelines currently exist for selecting dimensions relevant to particular clients or for developing instruments to assess these dimensions; interestingly, both of these criteria are strengths of nomothetic approaches (cf. Buss & Craik, 1985). Cone proposed that such guidelines include: (a) determining the environmental context of the problem; (b) determining how other people (or models) cope with the problem in that environment; and (c) constructing a template of those effective behaviors to match against the clients' current repertoire. Such a template could be used to guide therapy and as a gauge of therapy's effectiveness.

Person-environment interactions. Although the importance of person-environment interactions has been recognized for some time (Kantor, 1924; Lewin, 1935; Murray, 1938, cited in McFall & McDonel, 1986), interest has surged in the past two decades. Here behavior and environment are viewed as a feedback loop in which both factors influence the other (Magnusson & Endler, 1977). Instead of focusing on persons or situations, behavior must be measured in context, as a process that occurs in a steady stream. Bowers (1973) noted that from an interactionist perspective, individuals influence their environments as much as their environments influence them. To a significant extent, people create their own environments to inhabit (Wachtel, 1973; Bandura, 1986).

Bowers (1973) approached interaction from the perspective of Piaget's concepts of assimilation and accommodation. Individuals assimilate observations from the environment into pre-existing cognitive schemas. At the same time, those schemas are modified to accommodate new information in the environment. Bowers (1973) stated that "the situation is a function of the observer in the sense that the observer's cognitive schemas filter and organize the environment in a fashion that makes it impossible ever to completely separate the environment from the person observing it" (p. 328).

Most interactionists assume that cognition mediates the perception of the environment. This is important because it means that behavior that appears inconsistent may actually be indicative of a single construct. For example, Magnusson & Endler (1977) observed that high anxiety may motivate a person to speak excessively in one situation and withdraw in another. The behaviors differ, but the causal construct (anxiety) is the same across situations. As shown in Figure 15, the relation between behavior and construct may be nonlinear. Thus, anxiety may motivate an individual to increase the amount of talking until it reaches a threshold where the individual begins to decrease speech and finally withdraws.

----------------------------------------
Insert Figure 15 About Here
----------------------------------------

Magnusson and Endler (1977) discussed this type of consistency using the term coherence. They suggest that coherent behavior can be understood in terms of the interaction between an individual's perception of a situation and the individual's disposition to react in a consistent manner to such perceived situations. The factors that influence this interaction, such as intelligence, skills, learning history, interests, attitudes, needs and values, may be quite stable within individuals. As shown in Figure 16, individuals C and D, who score highly on a test of honesty, may show more honest behavior across two situations than individuals A and B who obtain low scores. However, C and D may also display differences between themselves in honest behavior across situations--perhaps because of slight differences in their perceptions of those situations--even though their mean behavior score is the same across situations. From the perspective of the individual, the behavior appears coherent. From the perspective of the observer who looks only at group differences, the behavior appears consistent. From the perspective of the observer who looks at individuals across situations, the behavior appears inconsistent.

----------------------------------------
Insert Figure 16 About Here
----------------------------------------

Appropriate techniques for measuring and analyzing the processes suggested by interactionist theory remain in dispute (Golding, 1975; McFall & McDonel, 1986; Walsh & Betz, 1985). For example, McFall and McDonel (1986) stated that (a) ANOVA procedures which examine statistical interactions fail to investigate the theoretically central question of how person-situation variables interact over time; (b) investigators can easily manipulate experiments to show the relative importance of person, situation, or interaction factors; and (c) problems of scale remain, that is, no framework exists for how to determine the meaning of different units or chunks of the person-situation process. Bowers (1973) maintained that a rigid adherence to research methodologies has obscured the interactionist perspective. Experimental methods help investigators primarily understand the influence of situations, and correlational methods assist in the understanding of person differences.

Aptitude-by-treatment interactions (ATIs). Treatments can be conceptualized as types of situations (Cronbach, 1975; Cronbach & Snow, 1977). In an study where an experimental group is contrasted with a control group, both groups are experiencing different types of situations. Persons can also be conceptualized as having aptitudes, that is, individual characteristics that affect response to treatments (Cronbach, 1975). As shown in Figures 17 and 18, in an ATI study researchers attempt to identify important individual differences that would facilitate or hinder the usefulness of various treatments (Snow, 1991). From a common sense perspective, ATIs should be plentiful in the real world. From the perspective of selection, intervention, and theoretical research, finding ATIs would seem to be of the utmost importance.

----------------------------------------
Insert Figures 17 and 18 About Here
----------------------------------------

ATIs were Cronbach's answer to the problem of unifying correlational and experimental psychology (Snow & Wiley, 1991). Cronbach (1957; 1975) noted that the battle over the relative dominance of traits versus environment was maintained by ignoring the possibilities of interactions. For example, Cronbach (1975) reported a study by Domino (1971) which investigated the effects of an interaction between learning environment and student learning style on course performance. Domino hypothesized that students who learn best by setting their own assignments and tasks (independent learners) might show the best outcomes in a class when paired with teachers who provided considerable independence. Similarly, students who learn best when provided with assignments by the teacher (achievement through conformity) might perform better when paired with instructors who pressed for conformity (e.g., teachers stressed their own requirements). Domino did find empirical support for this interaction.

But Cronbach (1975) and others (cf. Fiske, 1979; McFall & McDonel, 1986; Scriven, 1969) have largely abandoned the search for general laws via ATIs in favor of local, descriptive observations. Cronbach (1975) noted that results supporting ATIs are inconsistent, often disappearing when attempts to replicate occur. He saw time and history as the major culprits: many psychological phenomena change over time, frustrating attempts to fix them in terms of general laws. Cronbach (1975) indicated that trait conceptions do not hold to the extent necessary to demonstrate consistent ATIs. This may also be interpreted as support for the position that situations have stronger effects than traits or trait-situation interactions.

I conducted unpublished research that demonstrates the difficulty of finding ATIs. I investigated whether college students' comfort for computer use interacted with one of two interventions designed for alcohol education. The interventions were a computer-assisted instruction (CAI) program for alcohol education and a set of written materials on which the CAI program was based. The interaction hypothesis suggested that students most comfortable with computers and who completed the CAI program would demonstrate the greatest improvement. Students were pretested and posttested on measures of alcohol attitudes, knowledge about alcohol, and alcohol consumption. Given the resources necessary to run the interventions, data were collected over a period of several years, with preliminary analyses occasionally conducted. Analyses at various stages of data collection found the expected interactions, but when the n per cell reached about 15, only an alcohol attitudes scale showed a significant interaction. Given these results and a change in my access to a major source of research subjects, the project was suspended.

Environmental assessment. Instead of searching for stability in individuals, another group of theorists and researchers sought to find consistency in environments and situations. Attempts to categorize and measure environments form the essence of this measurement approach (Conyne & Clack, 1981; Walsh, 1973).

One of the major tasks of environmental assessment is an analysis of environment types, and many classifications systems have been proposed to accomplish this task (cf. Goodstein, 1978; Huebner, 1979; Steele, 1973). Conyne and Clack (1981) proposed that an environment consists of physical, social, institutional and ecological-climate components that shape and are shaped by people. Moos (1973) classified human environments into ecology, behavior setting, organizational structure, inhabitants' behavior and characteristics, psychosocial climate, and functional reinforcements (i.e., environmental stimuli).

Many vocational psychologists hold a similar assumption about the relative stability of work environments. An occupational setting may attract certain types of individuals on the basis of the setting's fit with the needs and abilities of the worker. One of the best-known and well-researched occupational classifications has been proposed by Holland (1959, 1985). He suggested that work environments may be classified as involving one or more of the following dimensions:

(a) Realistic environments, where work entails mechanical skill, physical strength, motor coordination, and concrete problems;

(b) Investigative environments, with an emphasis on research activities, scientific accomplishments, mathematics ability, and abstract problems;

(c) Artistic environments, involving artistic activities and competencies, and an emphasis on expressive, original, and independent behavior.

(d) Social environments, where work involves social interactions, liking others, cooperation, and help-giving.

(e) Enterprising environments, involving selling and leading activities, self-confidence, aggressiveness, and status.

(f) Conventional environments, involving recording and organizing records and data, conformity, and dependability.

Holland (1985) believed, however, that individuals' characteristics may change the climate of the work setting. The most important variable in this regard is the extent to which an individuals' needs and abilities are congruent with the work environments in which individuals find themselves. Very incongruent individuals leave environments, while moderately incongruent individuals will change, moving toward the dominant persons in the environment.

As shown in Table 7, most person-work environment fit theories suggest that the degree to which individuals fit their work environment determines their level of productivity and job satisfaction. Thus, Realistic individuals working in Realistic occupations will be most productive and satisfied, Investigative individuals in Investigative occupations, and so on. Holland's theory provides for similarity of occupational types (e.g., Investigative and Artistic occupations are more similar than Investigative and Conventional) so that different fits may be rank ordered in terms of their degree of expected productivity and satisfaction.

-------------------------------------------
Insert Table 7 About Here
-------------------------------------------

A crucial question in environmental assessment is whether to classify environments or perceptions of environments. The person many consider the founder of person-environment interaction, Kurt Lewin, continued a Gestalt perspective in which behavior was believed to occur in the context of an individual's total perceptual field of an environment (Lewin, 1951). That is, people are surrounded by a self-generated psychological environment and a non-psychological environment. As in other person-environment theories, cognition has been proposed as a significant mediator of how environmental events are perceived, understood and transformed by individuals (Conyne & Clack, 1981; Bandura, 1986). From this perspective, an understanding of how an individual thinks about a situation is necessary for a person-environment analysis. For example, McCall (1991) observed that after completing a Marine Corps' confidence course, recruits may view the course as a confidence builder or as intimidating. Some theorists had hoped that cognition might prove to be stable across situations, but research results have not been supportive. For example, attributional style--characteristic ways individuals explain and interpret life events (Fiske & Taylor, 1984)--appears to possess little consistency across situations (Bagby, Atkinson, Dickens, & Gavin, 1990; Cutrona, Russell, & Jones, 1984).

What kinds of measurements are undertaken in environmental assessment? Conyne and Clack (1981) provided several examples. Cognitive maps are spatial representations of individuals' psychological environments. Conyne and Clack described a researcher who instructed students to plot where in their neighborhoods they felt high and low stress. The resulting map helped to explain truancy by showing that a city school bus route stopped at many high stress areas where students were afraid of being physically attacked. Geographic maps can be used to locate individuals with psychological characteristics and events (e.g., academic achievement, depression) to examine potential relationships between environment and person. However, many environmental assessment procedures consist only of self-report questionnaires that ask respondents to rate environments along different theoretical dimensions. In Moos' (1979a, 1979b) social climate scales, for example, individuals rate such environments as their college residence hall, classroom, family, and work along such dimensions as relationships, personal growth, and system maintenance and change. Similarly, vocational psychologists typically measure environments by assessing (via self-reports) the interests and abilities of persons successful in specific occupations.

Moderators of cross-situational consistency. Moderator variables are those that change the nature of the relation between two other variables. For example, one may propose that investigative individuals would be most productive and satisfied in an academic or scientific environment as compared to other occupational situations. However, one might find that other variables, such as ethnicity or gender, affect that relation. Female and male academics might produce equal number of publications, but females might also experience less job satisfaction.

A variety of potential moderating variables have been proposed and investigated. Research has suggested differences in consistency by response mode and by levels of aggregated measures (Diener & Larsen, 1984; Epstein, 1983; Mischel & Peake, 1982; Rushton, Brainerd, & Pressley, 1983). For example, Violato and Travis (1988) found that male elementary school students demonstrated more cross-situational consistency on the variable of behavioral persistence. Similarly, Connell and Thompson (1986) found infants' emotional reactions were more consistent across time than their social behavior. Variables such as age (Stattin, 1984), gender (Forzi, 1984), socioeconomic status and cognitive abilities (Violato & Travis, 1988) have also been found to moderate cross-situational consistency.

Bem and Allen (1974; also see Diener & Larsen, 1984; Lanning, 1988; Zuckerman et al., 1988) suggested that individuals themselves are moderators of cross-situational consistency. That is, some persons may act consistent across situations, and others may not; cross-situational consistency could be considered an individual difference variable. Thus, in person A, the trait of honesty is manifested across all situations; with person B, honesty occurs only at church; and person C exhibits little honesty in any situation. Investigators who conduct studies averaging these individuals would find no support for cross-situational positions, but a disaggregation of the data might demonstrate such consistency for pairings of similar individuals such as A and C. Bem and Allen (1974) found that students' ratings of their cross-situational consistencies often did match their behaviors in different situations. For example, students who said they were friendly across situations did show more consistency. However, more recent research has provided mixed support for this position. Chaplin and Goldberg (1984) failed to replicate Bem and Allen's results, while Burke, Kraut and Dworkin (1984) found that subjects' ratings of personal traits, cross-situational consistency, and the traits' importance to their self-schemas were highly correlated. Greaner and Penner (1982) found a low reliability estimate for the 1-item consistency measure previously employed by Bem and Allen.

States and traits. Psychologists have also suggested that psychological phenomena may be manifested through traits and states. States are transitory psychological phenomena that change because of psychological, developmental, or situational causes. Spielberger (1991) credited Cattell and Scheier (1961) with introducing the state-trait distinction. Even theorists interested in measuring traits acknowledge the presence of state effects in psychological testing. Lord and Novick (1968), for example, observe that in mental testing "we can perhaps repeat a measurement once or twice, but if we attempt further repetitions, the examinee's response change substantially because of fatigue or practice effects" (p. 13).

State constructs have occasionally been invoked to explain the inconsistencies found in psychological measures. For example, Matarazzo suggested that the MMPI measures states, not traits, and that it "reflects how you're feeling today, or how you want to present yourself that day" (Bales, 1990, p. 7). Similarly, Dahlstrom (1969) indicated that MMPI scales' reliability should not be assessed through test-retest methods since "there is scarcely any scale on the MMPI for which this general assumption [of temporal stability] is tenable for any period of time longer than a day or two" (p. 27). Even intelligence may possess state properties: IQ test scores have been shown to be affected by amount of schooling (Ceci, 1991; see also Bandura, 1991, and Frederiksen, 1986). Retest of intelligence scores at one-year intervals show high stability but decrease substantially when the retest interval lengthens beyond that period (Humphreys, 1992).

Although trait conceptions have historically dominated measurement, psychologists have always recognized states and struggled to integrate them into measurement theory and practice. Loevinger (1957), for example, reviewed Fiske and Rice's (1955) theory for explaining intraindividual response variation. Fiske and Rice suggested that individuals will change their response to an item because:

(a) something changes within the individual (e.g., the individual matures),

(b) the order of item presentation changes, or

(c) the stimulus situation changes.

Fatigue, for example, is a physiological variable that might produce inconsistent responding on psychological tests. Fiske and Rice (1955) suggested that such variability is lawful, although Loevinger (1957) noted that this belief is at variance with the classical test theory assumption that these errors are random. Loevinger also reviewed studies which demonstrated, over retests, improvements in personality functioning and intelligence. Such improvements, Loevinger suggested, are a function of practice effects and learning by the organism. In other words, psychological phenomena treated by traditional measurement approaches as traits may also demonstrate state effects.

Perhaps the most well-known state-trait measure is the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Lushene, 1970). The STAI consists of two 20-item Likert scales to measure state anxiety (i.e., situation-specific, temporary feelings of worry and tension) and trait anxiety (i.e., a more permanent and generalized feeling). Both scales contain items with similar and overlapping content: state scale items include "I am tense," "I feel upset," and "I feel content," while trait scale items include "I feel nervous and restless," "I feel secure," and "I am content." However, the state scale asks test-takers to rate the items according to how they feel "at this moment" while the trait scale requests the ratings to reflect how the test-takers "generally" feels. The instructions do seem to produce the desired difference: test-retest reliabilities for the state scale, for example, are considerably lower than for the trait. Spielberger (1991) also developed the State-Trait Anger Expression Inventory. Spielberger described state anger as a condition that varies over time as a result of such factors as perceived injustice and frustration. He distinguished state anger from trait anger, the latter being a disposition to perceive many situations as frustrating or annoying and to respond in those situations with state anger. Thus, the two constructs are related: persons high in trait anger will experience more frequent state anger.

Process research. Psychotherapy researchers often distinguish between outcome research, designed to test the efficacy of various interventions, and process research, designed to detect variables that change during the intervention. Process research typically consists of single-subject, within-subject, or between-subject designs that assess such variables as counselor verbal behavior and aspects of the client-counselor relationship (Heppner et al., 1992).

Process research focuses on changes in the counselor, client, or counselor-client relationship within (e.g., treatment sessions) and across (e.g., different types of treatments) situations. The hope is that these changes will be related to intervention outcome, although it has been difficult to demonstrate a strong process-outcome link (Elliott et al., 1987; Heppner et al., 1992; Hill et al., 1988). The failure to show such a relation may partially result from researchers' emphasis on studying counselor variables to the exclusion of client variables (Heppner et al., 1992; Hill, 1982). Heppner et al. (1992) observed that process research typically assumes the client to be a passive agent instead of an active information processor.

Measurement of process variables is performed by trained raters who assess segments or all of the counseling sessions. How much to measure has been one of the central questions of process research, and it appears that the purpose of the research can suggest answers (Friedlander et al., 1988; Heppner et al., 1992). For example, if the researcher is interested in process variables that apply across groups of counselors and clients, small segments of sessions (totalling as little as 10% of the session) have been found to be representative of the process in groups. If the researcher is interested in a single case, however, it appears necessary to sample entire sessions.

Summary and Implications

Situational influences on human behavior have been the most noticed sources of trait inconsistency. In response, psychologists have proposed:

(a) reinforcing arguments that traits exist, as in work on aggregation;

(b) rejecting traits, as in behavioral assessment;

(c) the existence of person-environment interactions, where traits and environments influence each other in a continuous system;

(d) aptitude-by-treatment interactions, where aptitudes interact with environments, the latter conceptualized in terms of treatments;

(e) environmental assessment, involving classification of environmental types;

(f) moderators of cross-situational consistency, variables which facilitate consistency of traits across situations;

(g) psychological states, constructs defined as variables that change over time;

and (h) process research, where researchers attempt to isolate and study variables that change during psychological interventions.

Many psychological phenomena demonstrate trait and state characteristics. In other words, these phenomena are likely to demonstrate some stability (that is, to be a reflection of psychological traits) and change (that is, to be influenced by environmental and developmental factors). Goldberg's (1993) question of "Do traits exist?" may be rephrased as Cone's (1991) "What levels of aggregation of tests and criteria are needed to demonstrate trait properties?" To demonstrate trait consistency, it appears necessary to aggregate items, persons, and occasions of measurement. In addition, some people and dimensions appear more stable than others (Bem & Allen, 1974; Martin, 1988).

Idiographic approaches appear to have been more successful than nomothetic procedures for displaying cross-situational consistency. For example, Lord (1982) found that idiographic measures of conscientious behavior were consistent across situations, while nomothetic methods were not. Walsh and Betz (1985) maintained that behavior is "reasonably predictable, given knowledge of an individual's perception of the situation and of the individual's disposition to respond in that situation" (p. 13). The problem may center primarily on handling error: so many unknown factors operate in the interaction between person and situations that only repeated idiographic assessment may provide some sense of how any particular individual will behave across situations. Under these conditions, Magnusson and Endler suggested (1977, p. 11), "individual behavior across different situations provides a consistent, idiographically predictable pattern." Clinical assessment, for example, provides a unique perspective on individual and situational factors. However, clinical assessors still cannot, for example, make predictions of person-environment interactions that would result in suicidal or homicidal behavior by a specific individual at a particular time.

Despite their initial promise, approaches such as ATIs and person-environment interaction theories have had relatively little impact on psychological measurement. The problem, Walsh and Betz (1985) believe, "is one of measuring and describing the multidirectional transactions. Currently this is a measurement task that has been very difficult to operationalize and make real" (p. 13). Measurement error is a plausible explanation for the failure to find person-environment interactions (Cronbach, 1991). Imprecise measurement may obscure interactions while measurements not guided by theory may lead to inappropriate or insensitive measurement (McCall, 1991). Partially because of cost and the fact that investigators have yet to settle upon a taxonomy for situations (McFall & McDonel, 1986), person-environment approaches have yet to command the attention given to traditional measurement.

-------------------------------------------
Insert Figure 19 About Here
-------------------------------------------

McFall and McDonel (1986) saw the person-situation question as "inherently unresolvable" (p. 238) and suggested that the debate should be dropped. Instead, psychologists should continue elsewhere their search for the "meaningful units with which to describe, predict, and explain behavior" (p. 238). Tryon (1991) reached a similar conclusion. He maintained that situational specificity simply involves different mean levels of behavior in one situation compared to another. As shown in Figure 19, individuals may demonstrate, in general, more anxiety in two testing situations (situations 2 and 4) than in two lectures (situations 1 and 3). Arguments for traits, Tryon suggested, involve persons maintaining their rank on a construct within the distribution. If Persons A, B, C, and D rank first, second, third, and fourth on the amount of anxiety they display in a lecture, and then maintain that ranking in other settings, trait arguments will be upheld even if overall levels of anxiety change. Tryon (1991) believed that it is possible to hold both the situational specificity and trait positions since "activity is both very different across situations yet predictable from situation to situation" (p. 14). He concluded:

Situational differences are so large that they stand out immediately. Person consistency is more subtle and requires aggregation to reach substantial effect size. The Spearman-Brown prophecy formula indicates that either effect size can be made arbitrarily large depending upon the level of aggregation chosen. The implication for research and clinical practice is that one should choose the level of aggregation that provides the necessary effect size to achieve the stated purpose of the empirical inquiry at hand. (p. 14)