PERSONALITY ASSESSMENT

Every year, the American Psychological Association (APA) holds a convention. It’s quite an event. Thousands of psychologists take over most of the downtown hotels in a major city such as San Francisco, Boston, or Washington, DC, for a week of meetings, symposia, and cocktail parties. The biggest attraction is always the exhibit hall, where dozens of high-tech, artistically designed booths fill a room that seems to go on for acres. These booths are set up, at great expense, by several kinds of companies. One group is textbook publishers; all the tools of advertising are applied to the task of convincing college professors like me to get their students to read (and buy) books such as the one you are reading right now. Another group is manufacturers of videos and various, sometimes peculiar, gadgets for therapy and research. Yet another group is psychological testers. Their booths distribute free samples that include not only personality and ability tests, but also shopping bags, notebooks, and even beach umbrellas. These freebies prominently display the logo of their corporate sponsor: the Psychological Corporation, Consulting Psychologists Press, the Institute for Personality and Ability Testing, and so on.

You don’t have to go to the APA convention to get a free “personality test.” On North Michigan Avenue in Chicago, on the Boston Common, at Fisherman’s Wharf in San Francisco, and at Covent Garden in London, I have been given brightly colored brochures that ask, in huge letters, “Are you curious about yourself? Free personality test enclosed.” Inside is something that looks like a conventional personality test, with 200 questions to be answered either True or False. (One item reads, “Having settled an argument out do you continue to feel disgruntled for a while?”) But, as it turns out, the test is really a recruitment pitch. If you take it and go for your “free evaluation”—which I do not recommend—you will be told two things. First, you are all messed up. Second, the people who gave you the test have the cure: You need to join a certain “church” that can provide the techniques (and even the strange electrical equipment) needed to pinpoint and fix your problems.

The personality testers at the APA convention and those who hand out free so-called personality tests on North Michigan Avenue have a surprising amount in common. Both seek new customers, and both use all the techniques of advertising, including free samples, to acquire them. The tests they distribute look superficially alike. And both groups exploit a nearly universal desire to know more about personality. The brochure labeled “Are you curious about yourself?” asks a pretty irresistible question. The more staid tests distributed at the APA convention likewise offer an intriguing promise of finding out something about your own or somebody else’s personality that might be interesting, important, or useful.

Below the surface, however, they are not the same. The tests peddled at the APA convention are, for the most part, well-validated instruments useful for many purposes. The ones being pushed at tourist destinations around the world are frauds and potentially dangerous. But you cannot tell which is which just by looking at them. You need to know something about how personality tests and assessments are constructed, how they work, and how they can fail. So, let’s take a closer look.

S-Data and B-Data Personality Tests

Most personality tests provide S-data. They ask you what you are like, so the score you receive amounts to a summary of how you describe yourself. The “self-monitoring” scale asks how closely you watch other people for cues as to how to behave. The “attributional complexity” scale asks about the level of complexity in your thinking about the causes of behavior. The widely used Stanford Shyness Survey asks simply “Do you consider yourself to be a shy person?” The possible responses are yes or no (Zimbardo, 1977). You can probably guess how this item is scored.

Other personality tests yield B-data. One of the mostly widely used tests in the world, the Minnesota Multiphasic Personality Inventory (MMPI10) is a good example. It presents items—such as “I prefer a shower to a bath”—not because the tester is interested in the literal answer, but because answers to this item are informative about some aspect of personality, in this case, empathy. Preferring a shower is the empathic response, for some reason (Hogan, 1969).

Is intelligence a personality trait? Psychologists have differing opinions (what’s yours?). Either way, tests of intelligence, or IQ tests, also yield B-data. Imagine trying to assess intelligence using an S-data test, asking questions such as, “Are you an intelligent person?” and “Are you good at math?” Researchers have actually tried this, but simply asking people whether they are smart turns out to be a poor way to measure intelligence (Furnham, 2001). So, instead, IQ tests ask people questions of varying difficulty, such as reasoning or math problems, that have specific correct answers. These right or wrong answers comprise B-data. The more right answers, the higher the IQ score.

Some experts in assessment have proposed that tests based on (what I call) B-data be labeled “performance-based” instruments (McGrath, 2008), including the MMPI. They also include instruments that traditionally have been called “projective” tests.

Projective Tests

Projective tests were originally based on a theory called the projective hypothesis (Frank, 1939). The theory is this: If you are asked to interpret a meaningless or ambiguous stimulus—such as an inkblot—your answer cannot come from the stimulus itself, because the stimulus actually does not look like, or mean, anything. The answer must instead come from (i.e., be a “projection” of) your needs, feelings, experiences, thought processes, and other hidden aspects of the mind (H. A. Murray, 1943). The answer might even reveal something you don’t know about yourself. (Notice that this could never happen with S-data.)

This is the theory behind the famous Rorschach inkblot (Exner, 1993; Rorschach, 1921). Swiss psychiatrist Hermann Rorschach dropped blots of India ink onto note cards, folded the cards in half, and then unfolded them. The result was a set of symmetric, complex blots.11 Over the years, uncounted psychiatrists and clinical psychologists have shown these blots to their clients and asked them what they saw.

Of course, the only literally correct answer is “an inkblot,” but that is not considered a cooperative response. Instead, the examiner is interested in whether the client will report seeing a cloud, a devil, a parent, or whatever. I once heard a clinical psychologist describe a client who reported seeing a “crying St. Bernard.” The woman who gave this response was grieving over a boating accident in which she accidentally killed her husband. The psychologist interpreting her response noted that dogs don’t cry, but people do, and the traditional role of a St. Bernard is as a rescuer. This interpretation illustrates how whatever the client sees, precisely because it is not actually on the card, may reveal something about the contents of the mind. It also illustrates that the thoughts revealed by the inkblot response might not necessarily be deep, hidden, or mysterious. While it was interesting and probably useful for this therapist to know that the client was still upset about the accident, it wasn’t exactly surprising.

Interpretation is sometimes subtler. Consider these two responses to Card I of the Rorschach (which I am not supposed to show you, but you will be able to guess what it looks like). One client said: “This is a butterfly. Its wings are ripped and tattered, and it doesn’t have very long to live.” Another client responded to the same card by saying: “This is a butterfly. I don’t know what to make of these white spaces; I don’t know any kind of butterfly with white spots on its wings quite like that. They really shouldn’t be there, but I guess its wings are ripped” (McGrath, 2008, p. 471). Psychologist Robert McGrath (2008) noted that the first response seems to reveal some morbid preoccupations, due to its reference to death and redundant use of the words “ripped” and “tattered.” The second response seems to reveal a tendency to obsess or overanalyze.

The same projective hypothesis has led to the development of numerous other tests. The Draw-A-Person test requires the client to draw (you guessed it) a person, and the drawing is interpreted according to what kind of person is drawn (e.g., a man or a woman), which body parts are exaggerated or omitted, and so forth (Machover, 1949). Large eyes might be taken to indicate suspiciousness or paranoia; heavy shading might mean aggressive impulses; and numerous erasures could be a sign of anxiety. The classic Thematic Apperception Test (TAT) asks clients to tell stories about a set of drawings of people and ambiguous events (Morgan & Murray, 1935; H. A. Murray, 1943). A more recent version uses pictures that include “a boy in a checked shirt . . . a woman and a man on a trapeze, two men in a workshop, and a young woman working on the [balance] beam” (Brunstein & Maier, 2005, p. 208). The themes of these stories are used to assess the client’s motivational state (McClelland, 1975; C. P. Smith, 1992). If a person looks at an ambiguous drawing of two people and thinks they are fighting, for example, this might reveal a need to be aggressive; if the two people are described as in love, this might reflect a need for intimacy; if one is seen as giving orders to the other, this might reflect a need for power (Figure 2.6 and Try for Yourself 2.3).

A Rorschach inkblot shows an amorphous blob of black ink, mirrored across the center.
More information

A Rorschach inkblot shows an amorphous blob of black ink, mirrored across the center. It branches out on either side into various irregular shapes.

(a)
A black and white picture shows a man and woman at a table set with tea in front of a fireplace.
More information

A black and white picture shows a man and woman at a table set with tea in front of a fireplace. The woman is halfway out of her chair and the man is leaning forward in his chair with his hand around the woman’s arm.

(b)

Figure 2.6 Two Projective Tests (a) Rorschach inkblot. This picture resembles but is not one of the inkblots in Rorschach’s famous test. The real blots traditionally are not published so that someone taking the test will be seeing them for the first time. (b) Thematic Apperception Test (TAT). The task is to make up stories about pictures like this one. Themes that emerge in the stories are interpreted as indicating “implicit motives” of which the person might not be consciously aware. The test and its pictures date from the 1930s and the artistic style reflects that era. As far as I know, they have not been significantly updated. Do you think they should be?

TRY FOR YOURSELF 2.3

Two Projective Tests

Instructions: Look at the inkblot in Figure 2.6a. On a sheet of paper, write down what it looks like to you—no more than a sentence or two. Then look at the drawing in Figure 2.6b. Try to imagine what it depicts, and then write down:

1. Who are the people in the picture?

2. What are they doing right now?

3. What were they doing before this moment?

4. What will happen next?

After you are finished, show the pictures to a friend and, without revealing your own responses, ask your friend to do the same thing. Then compare your responses. Are they different? Do you think the differences mean anything? Do they reveal anything surprising?

Please note that these are not actual personality tests (the blot is not actually part of the Rorschach, and the picture is not part of the TAT). However, the exercise will give you a general idea of how these tests work.

To again use the terminology introduced earlier in this chapter, all projective tests provide B-data. They are specific, directly observed responses to particular stimuli, whether inkblots, pictures, or instructions to draw somebody. All the disadvantages of B-data therefore apply. For one thing, they are expensive. It takes around 45 minutes to administer a single Rorschach and another 1.5 to 2 hours to score it (Ball et al., 1994). Compare this to the time needed to hand out a pile of questionnaires and run them through a machine. This issue is serious because to warrant their additional time and cost, projective tests need to be more than somewhat accurate—they should provide useful information over and above what can be gathered by other, more efficient methods (Lilienfeld et al., 2000).

The even more fundamental difficulty with projective tests is that, perhaps even more than other kinds of B-data, a psychologist cannot be sure what they mean. Two different interpreters of the same response might come to different conclusions unless a standard scoring system is used (Sundberg, 1977). But of the projective tests, only the TAT is consistently scored according to a well-developed system (McAdams, 1984). While scoring systems have been developed for the Rorschach (Exner, 1993; Klopfer & Davidson, 1962), not everybody uses them, and even then, the training most practitioners get is less than ideal (Guarnaccia et al., 2001).

The survival of so many projective tests into the 21st century is something of a mystery. Literature reviews that claim projective tests are somewhat accurate generally conclude that other, less expensive techniques work as well or even better (Lilienfeld et al., 2000; Searight, 2020). Even more disturbing, projective tests that are highly dubious, such as ones that ask clients to draw human figures, are sometimes used as evidence in court cases (Lally, 2001).12

A cartoon shows two coaches watching several wrestlers training and fighting in the ring.
More information

A cartoon shows two coaches watching several wrestlers training and fighting in the ring. The caption reads, “He looks very promising, but let’s see how he does on the written test.”

“He looks very promising—but let’s see how he does on the written test.”

Perhaps projective tests endure because some clinical psychologists have fooled themselves. One writer suggested that these clinicians may lack “a skill that does not come naturally to any of us: disregarding the vivid and compelling data of subjective experience in favor of the often dry and impersonal results of objective research” (Lilienfeld, 1999, p. 38). Perhaps the problem is not that the tests are worthless, but that they have been used inappropriately (J. M. Wood et al., 2003). Or, as others have suggested, perhaps the accuracy of these tests is beside the point. They simply help to break the ice between client and therapist by giving them something to do during the first visit. Or, just possibly, these instruments reveal something when used by certain skilled clinicians that cannot be duplicated by other techniques and that has not been fully captured by controlled research.

Objective Tests

The tests that psychologists call “objective” can be detected at a glance. If a test consists of a list of questions to be answered Yes or No, or True or False, or on a numeric scale, and especially if the test uses a computer-scored answer sheet, then it is an objective test. The term comes from the idea that the questions making up the test seem more objective and less open to interpretation than the pictures and blots used in projective tests.

The term objective is probably not really justified, because different people will interpret and respond to test items (such as “I prefer a shower to a bath”) in different ways (Bornstein, 1999a). But maybe that’s not such a bad thing. If everybody read and interpreted an item in exactly the same way, the item would not be very useful for the assessment of individual differences, would it?

Harrison Gough, inventor of the California Psychological Inventory (CPI), included a scale called commonality, which consists of items that are answered in the same way by at least 95 percent of all people. He included it to detect people pretending they know how to read or who are trying to sabotage the test. The average score on this scale is about 95 percent, but a person who can’t read, answering at random, will score about 50 percent (since it is a true-false scale) and therefore will be immediately identifiable, as will someone who (like one of my former students) answered the CPI by flipping a coin—heads True, tails False.

These are clever uses for a commonality scale, but its properties are mentioned here to make a different point. Gough reported that when individuals encounter a commonality item—one being “I would fight if someone tried to take my rights away” (keyed as True)—they do not say to themselves, “What a dumb, obvious item. I bet everybody answers it the same way.” Instead, they say, “At last! An unambiguous item I really understand!” (J. A. Johnson, 2006).13 Unfortunately, commonality items are not very useful for personality measurement, precisely because almost everybody responds to them the same way. A certain amount of ambiguity may indeed be necessary (Gough, 1968; J. A. Johnson, 1981).

Test Construction

Three basic methods are commonly used for constructing objective personality tests: the rational method, the factor analytic method, and the empirical method. Often these methods are used together, but let’s begin by considering the pure application of each.

THE RATIONAL METHOD

Calling one method of test construction “rational” does not mean the others are irrational. It simply means that the strategy of this approach is to come up with items that seem directly, obviously, and rationally related to what the test developer wishes to measure. An early example is a test used during World War I. The U.S. Army discovered that certain problems arose when individuals who were mentally ill were inducted as soldiers, housed in crowded barracks, and issued weapons. Seeking to avoid these problems, the army developed a list of questions for a psychiatrist to ask each potential recruit. As the number of inductees increased, this slow process became impractical. There were not enough psychiatrists to go around, nor was there enough time to interview everybody.

To get around these limitations, psychologist R. S. Woodworth (1917) proposed that the questions could be printed on a sheet, and the recruits could check off their answers. His list, which became known as the Woodworth Personality Data Sheet (or, inevitably, the WPDS), had 116 questions including “Do you wet your bed?” “Have you ever had fits of dizziness?” and “Are you troubled with dreams about your work?” A recruit who responded Yes to more than a few was referred for a more personal examination. Recruits who answered No to all the questions were inducted forthwith into the army.

Woodworth’s idea of listing psychiatric symptoms on a questionnaire was not unreasonable, yet his technique raises a variety of problems that can be identified rather easily. For the WPDS to be a valid indicator of psychiatric disturbance—for any rationally constructed, S-data personality test to work—four conditions must hold (Wiggins, 1973).

First, each item must mean the same thing to the person who takes the test as it did to the psychologist who wrote it. For example, what is “dizziness” exactly? Second, the person who completes the form must be able to make an accurate self-assessment. He (only men were being recruited at the time the WPDS was administered) must have a good understanding of each item, as well as the ability to observe it in himself. Third, the person who completes the test must be willing to report his self-assessment accurately. He must not try to deny his symptoms (in order to get into the army) or to exaggerate them (perhaps in order to stay out of the army). Fourth and finally, all of the items on the test must be valid indicators of what the tester is trying to measure—in this case, mental disturbance. Does dizziness really indicate mental illness? What about dreams about work?

For a rationally constructed test to measure an attribute of personality accurately, all four of these conditions must be met. In the case of the WPDS, probably none of them was.14 In fact, most rationally constructed personality tests fail one or more of these criteria. One might conclude, therefore, that they would hardly ever be used anymore.

Wrong. Up to and including the present day, self-report questionnaires that are little different, in principle, from the WPDS remain the most common form of psychological measurement. Self-tests on popular websites are almost always constructed by the rational method—somebody just thinks up some questions that seem relevant—and they typically fail at least two or three of the four crucial criteria.

But rationally constructed personality tests appear in psychological journals, too, which present a steady stream of new testing instruments, nearly all of which are developed by the simple technique of thinking up a list of questions. These questions might include measures of health status (“How healthy are you?”), self-esteem (“How good do you feel about yourself?”), or goals (“What do you want in life?”). Try for Yourself 2.4 provides an example of a widely used test that assesses optimism and pessimism by asking similarly straightforward questions (Norem & Cantor, 1986). Tests like this are said to have face validity—that is, they seek to measure exactly what they seem to be measuring, on their “face”—but need further evidence such as I-, L-, and B-data.

TRY FOR YOURSELF 2.4

THE FACTOR ANALYTIC METHOD

The factor analytic method of test construction is based on a statistical technique. Factor analysis identifies groups of things—which can be anything from songs to test items—that seem to have something in common. The property that ties these things together is called a factor (Cattell, 1952).

For example, a factor analytic study of music preference asked people to identify pieces that they did and didn’t enjoy. The researchers found that such preferences can be organized into five properties that they labeled “mellow,” “unpretentious,” “sophisticated,” “intense,” and “contemporary”15 (Rentfrow et al., 2011). If you like Farrend’s “Piano Quintet no. 1 in A Minor,” you will probably also enjoy “The Way You Look Tonight” by Oscar Peterson, because both get high scores on the mellow factor. But you probably won’t like “Texas Tornado” by Tracy Lawrence, because it has a negative score on that factor; instead, it gets a high score, or “loads,” on the second, unpretentious factor.

To use factor analysis to construct a personality test, researchers begin with a list of objective items of the sort discussed earlier. The next step is to administer these items to a large number of participants. Then you and your computer sit down together and do the factor analysis. The items that go together are assembled into groups. For example, someone who answers True to “I trust strangers” is also likely to answer True to “I am careful to turn up when someone expects me” and answer False to “I could stand being a hermit.” Such a pattern of co-occurrence means that these three items are correlated. The next steps are to consider what the items have in common, and then name the factor.

The three correlated items just listed, according to Cattell (1965), are related to the dimension “cool versus warm,” with a true-true-false pattern of responses indicating a “warm” personality (Figure 2.7). (Cattell decided on this label simply by considering the content of the items, as you just did.) The factor represented by these items, therefore, is “warm-cool” or, if you prefer to name it by just one end of the scale, “warmth.” These three items now can be said to form part of a “warmth” scale.

A diagram shows three different statements and true or false answers to them, all connected to the word, “warmth.”
More information

A diagram shows three different statements and true or false answers to them, all connected to the word, “warmth.” The first statement reads, “I trust strangers,” and is followed by true. The second statement reads, “I am careful to turn up when someone expects me,” and is followed by true. The third statement reads, “I could stand being a hermit,” and is followed by false.

Figure 2.7 Three Questionnaire Items That Measure the Same Factor If these three items are correlated with each other—that is, people who answer True to the first item tend to answer True to the second one and False to the third—they might all “load on,” or measure, a common psychological factor.

THE EMPIRICAL METHOD

The empirical strategy of test construction attempts to allow reality to speak for itself. In its pure form, the approach has sometimes been called “dust bowl empiricism.” The term refers to the origin of the technique at Midwestern universities (notably Minnesota and Iowa) during the Depression, or dust bowl, years of the 1930s.16 Intentionally or not, the term also serves as a reminder of how dry this approach is, since it is based strictly on data, not any kind of deeper psychological theory.

Like the factor analytic approach described earlier, the first step of the empirical approach is to gather lots of items. The second step, however, is quite different. For this step, you need to have a sample of participants who have already independently been divided into the groups you care about. Occupational groups and diagnostic categories are often used for this purpose. For example, if you wish to measure the aspect of people that makes them good and happy religious ministers, then you need at least two groups of participants—happy, successful ministers and a comparison group. (Ideally, the comparison group would be miserable, incompetent ministers, but typically the researcher will settle for people who are not ministers at all.) Or you might want a test to detect different kinds of psychopathology. For this purpose, you would need groups of people who have been diagnosed as suffering from schizophrenia, depression, anxiety, and so forth. A group of people who have not been diagnosed with these disorders would also be useful for comparison purposes. Then you are ready for the third step: administering your test to your participants.

The fourth step is to compare the answers given by the different groups. If people diagnosed with depression answer a certain group of questions differently from everybody else, those items might form a “depression” scale. Thereafter, new participants who answer questions the same way as people diagnosed with depression did would score high on this scale, and you might suspect that they, too, are depressed. The MMPI, which is the prototypical example, was built using this strategy. For instance, one item on the depression scale is “I sometimes tease animals,” keyed False. This does not mean people who deny teasing animals are depressed! But this answer, on this test, does elevate one’s depression score. The percentage of men who reported that they liked “making a speech” was higher for ministers than other groups such as farmers and factory workers, so the item went on the “minister” scale of the Strong Vocational Interest Blank (SVIB), which, like the MMPI, was developed using the empirical method.

After the items are selected based on the responses of people in the initial groups, the next step is to cross-validate that scale by using it to predict behavior, diagnosis, or category membership in new samples of participants. If the cross-validation succeeds, the scale is deemed ready for use.

COMBINATION OF METHODS

Modern test developers usually use all three approaches. The best way to select items for a personality scale is not haphazardly, but with the intent to sample a particular domain of interest (the rational approach). Factor analysis then confirms that items that seem similar to each other actually elicit similar responses from real participants (Briggs & Cheek, 1986). Finally, any personality measure is only as good as the other things with which it correlates or that it can predict (the empirical approach). To be worth its salt, any personality scale must show that it can predict what people do, how they are seen by others, and how they fare in life.

Glossary

projective test
A personality test that asks the client to interpret a meaningless or ambiguous stimulus.
projective hypothesis
The idea that if a person is asked to interpret an ambiguous stimulus, the answer will indicate the person’s needs, feelings, thought processes, or other hidden aspects of the mind.
objective test
A personality test that consists of a list of questions to be answered by the subject as True or False, Yes or No, or along a numeric scale (e.g., 1 to 7).
face validity
The degree to which an assessment instrument, such as a questionnaire, on its face appears to measure what it is intended to measure. For example, a face-valid measure of sociability might ask about attendance at parties.
factor analysis
A statistical technique for finding clusters of related traits, tests, or items.
empirical method
A method of personality test construction based on comparing answers given by members of different criterion groups.
factor analytic method
A method of personality test construction in which items are grouped together on the basis of factor analysis.
rational method
A method of personality test construction in which items are written based on their apparent or “face” relationship to the trait being measured.

Endnotes

  • By a tradition of mysterious origin, personality tests are usually referred to by their initials, all capital letters, no periods.Return to reference 10
  • According to legend, Rorschach made many blots in this way but kept only the “best” ones. I wonder how he decided.Return to reference 11
  • This use of projective tests has produced a backlash by people who feel they have been victimized by them. Test stimuli such as inkblots and TAT pictures, which in the past were closely held secrets, are now available on websites that also offer advice on the best responses.Return to reference 12
  • Another item from the scale reads, “Education is more important than most people think.” Almost everybody answers True.Return to reference 13
  • In fairness, given how inexpensive it was to administer the WPDS and the costs of adding even one mentally ill person to a combat unit, the test might have been cost-effective after all.Return to reference 14
  • If you want to remember these factors, notice that their initials spell MUSIC.Return to reference 15
  • A severe drought and resulting “dust bowl” afflicted several Midwestern states during that period. However, Minnesota and Iowa were not among them.Return to reference 16