WRAPPING IT UP

SUMMARY

Quality of Data: Psychometrics

The essence of science is that conclusions should be based on data. Data can vary widely in quality; in personality psychology, the important dimensions of data quality are reliability, validity, and generalizability.
Reliability refers to the stability or repeatability of measurements. Validity refers to the degree to which a measurement actually measures what it is trying to measure. Generalizability is a broader concept that subsumes both reliability and validity and refers to the kinds of other measurements to which a given measurement is related.

Research Design

The plan for gathering psychological data is the research design. The three main types of design are case, experimental, and correlational.
Case studies examine particular phenomena or individuals in detail and can be an important source of new ideas. To test these ideas, correlational and experimental studies are necessary.
Each of the three methods has advantages and disadvantages. Case methods describe a phenomenon in detail but have uncertain generalizability. Experimental methods can establish causality but only for a particular method in a particular context. Correlational studies are more realistic but cannot directly establish causality.
The key difference between the two research studies is that experiments demonstrate what can happen, whereas correlational studies demonstrate what does happen. A complete research program should include both methods.

Evaluating the Strength of a Finding

The statistical significance of a result represents the probability that the data would have been obtained if the “null hypothesis” were true, but it is typically misinterpreted as indicating the probability that the substantive (non-null) hypothesis is true. Null-hypothesis significance testing (NHST) has many problems that are increasingly acknowledged. In particular, statistical significance is not the same as the strength or importance of the result.
A better way to evaluate research than statistical significance is in terms of effect size, which describes numerically the degree to which one variable is related to another. One good measure of effect size is the correlation coefficient.
The dependability of a research finding can only be evaluated, ultimately, through replication. This issue came to a head in recent years when some prominent findings were found to not be as well established as psychologists had assumed. No single study can establish the truth of any result, which is why researchers need to repeat and extend findings and always be open to the implications of new data.

Ethical Issues

Some people are uncomfortable with the practice of personality assessment because they see it as undignified or unfair. However, because people inevitably judge each other’s personalities, the real issue is whether personality assessment should be based on informal intuitions or formalized techniques.
Research must be careful to do nothing to harm participants. Potentials for harm include subjecting people to traumatic experiences, deceiving them, or violating their privacy. The potential for the violation of individuals’ privacy is a particularly important issue to be aware of for the future.
Psychology, like any other science, can be used for benefit or for harm, so the methods and uses of research deserve careful thought.
It is now widely acknowledged that the participants studied in psychological research are far from representative of the wider population, but the discrepancy might be even wider in researchers themselves. Increased diversity in researchers, over time, will lead to a wider diversity in the topics studied and findings obtained, but this goal will not be reached quickly or easily.
Norms of “open science” encourage scientists to fully report all of their research methods and findings, including studies that fail to find the expected or hoped-for result, and to share their data with other scientists.

Conclusion

As a citizen, it is important to keep close watch on the activities and methods of schools, police departments, doctors, businesses, governments, and scientists.

THINK ABOUT IT

If you have taken a statistics course, what does a significance level tell you? What does it not tell you? If we were to stop using significance levels to evaluate research findings, what could we use instead?
Let’s say we find that you score 4 points higher on a “conscientiousness” test than another person. Alternatively, imagine that women score 4 points higher on the same test, on average, than men do. In either case, is this difference important? What else would we have to know to be able to answer this question?
Is deception in psychological research justified? Does it depend on the research question? Does it depend on the specific kind of deception? Does it depend on the kind of informed consent offered by the research participant? Who, if anybody, is harmed by the use of deception in research?
Some psychologists research differences in intelligence between races. Let’s say members of one race really do have higher IQ scores than members of another race. Consider: Is this the kind of research psychologists should be doing, or is the issue better left alone? Once the research is done, how will the results be used?
Repeat the preceding question substituting gender for race.
If you found out that the person you had just been talking to was a participant in a research study and that your own speech and actions had been recorded, would that bother you? Do you think your permission should have been required first?
Scientist A manufactures fake data that support the scientist’s pet theory and publishes them in a major journal. Scientist B does three studies; two fail to support B’s theory and one confirms it. Scientist B decides only to publish the confirming study. Whose actions harm science more, Scientist A or Scientist B, or are they the same?
A scientist works hard to complete a study that includes a lot of difficult-to-obtain data. After the findings are published, another scientist says, “I think you analyzed your data incorrectly. Please show me your data.” The first scientist replies, “The data are mine and I worked hard for them. Get your own data.” Does the first scientist have a point? Can you think of any circumstances in which scientists should not be required to share their data publicly?
Scientists often do things that nonscientists do not really understand. How can society make sure that science is used for good rather than evil purposes?

SUGGESTED RESOURCES

Online

Center for Open Science

The Center for Open Science provides many resources to make it easier to do good science. It is where a researcher can “pre-register” a study (state the predictions and planned analyses for a study before it begins), share and access data from other researchers, and share articles not yet published. The website is cos.io.

Society for the Improvement of Psychological Science

This society, founded in 2016, is already growing to be a major force in psychology. Its purpose is to develop and advocate for improved methods and practices. It holds annual meetings, and its website is improvingpsych.org.

Philosophy of Psychology Lectures

The late Paul Meehl, a longtime professor at the University of Minnesota, is probably the most respected methodologist in the history of personality psychology. His ideas about how to connect data with theory provide keen insights into modern controversies, such as issues of replicability and open science, discussed in this chapter. Lectures he gave in one of his graduate-level courses are available online. Although the course is called “Philosophy of Psychology,” as he points out in the first lecture the content is really the philosophy of how to do research in psychology (and other fields). You can watch and listen for free at meehl.umn.edu/recordings/philosophical-psychology-1989.

Print

Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals and meta-analysis. Taylor & Francis.

The most important of a new generation of statistics textbooks that go beyond conventional null-hypothesis significance testing to teach alternative methods for estimating effect sizes and confidence intervals, and accumulating research results over many studies. A lot of recent, interesting information on the “new statistics” is available at Cumming’s website: thenewstatistics.com/itns.

Want to earn a better grade on your test?

Go to INQUIZITIVE to learn and review this chapter’s content, with personalized feedback along the way.

psychometrics: The technology of psychological measurement.
reliability: In measurement, the tendency of an instrument to provide the same comparative information on repeated occasions.
measurement error: The variation of a number around its true mean due to uncontrolled, essentially random influences; also called error variance.
state: A temporary psychological event, such as an emotion, thought, or perception.
trait: A relatively stable and long-lasting attribute of personality.
aggregation: The combining together of different measurements, such as by averaging them.
Spearman-Brown formula: In psychometrics, a mathematical formula that predicts the degree to which the reliability of a test can be improved by adding more items.
validity: The degree to which a measurement actually reflects what it is intended to measure.
construct: An idea about a psychological attribute that goes beyond what might be assessed through any particular method of assessment.
construct validation: The strategy of establishing the validity of a measure by comparing it with a wide range of other measures.
generalizability: The degree to which a measurement can be found under diverse circumstances, such as time, context, participant population, and so on. In modern psychometrics, this term includes both reliability and validity.
case method: Studying a particular phenomenon or individual in depth both to understand the particular case and to discover general lessons or scientific laws.
experimental method: A research technique that establishes the causal relationship between an independent variable (x) and dependent variable (y) by randomly assigning participants to experimental groups characterized by differing levels of x, and measuring the average behavior (y) that results in each group.
correlational method: A research technique that establishes the relationship (not necessarily causal) between two variables, traditionally denoted x and y, by measuring both variables in a sample of participants.
independent variable: The variable that is investigated as the cause, or possible cause, of the phenomenon being investigated. In experimental studies, this variable is manipulated across conditions; in correlational studies, it is measured as it naturally occurs and is traditionally designated x.
dependent variable: The variable that reflects the phenomenon being investigated, typically assumed to be or studied as the result of the independent variable. In correlational studies, the assumed dependent variable is traditionally designated y.
scatter plot: A diagram that shows the relationship between two variables by displaying points on a two-dimensional plot. Usually the two variables are denoted x and y, each point represents a pair of scores, and the x variable is plotted on the horizontal axis while the y variable is plotted on the vertical axis.
correlation coefficient: A number between –1 and +1 that reflects the degree to which one variable, traditionally called y, is a linear function of another, traditionally called x. A negative correlation means that as x goes up, y goes down; a positive correlation means that as x goes up, so does y; a zero correlation means that x and y are unrelated.
Type I error: In research, the mistake of thinking that one variable has an effect on, or relationship with, another variable, when really it does not.
Type II error: In research, the mistake of thinking that one variable does not have an effect on or relationship with another, when really it does.
effect size: A number that reflects the degree to which one variable affects, or is related to, another variable.
confidence interval: An estimate of the range within which the true value of a statistic probably lies.
replication: Doing a study again to see if the results hold up. Replications are especially persuasive when done by different researchers in different labs than the original study.
publication bias: The tendency of scientific journals preferentially to publish studies with strong results.
questionable research practices (QRPs): Research practices that, while not exactly deceptive, can increase the chances of obtaining the result the researcher desires. Such practices include deleting unusual responses, adjusting results to remove the influence of seemingly extraneous factors, and neglecting to report variables or experimental conditions that fail to yield expected results. Such practices are not always wrong, but they should always be questioned.
open science: A set of emerging principles intended to improve the transparency of scientific research and that encourage fully reporting all methods and variables used in a study, reporting studies that failed as well as succeeded, and sharing data among scientists.
p-hacking: Analyzing data in various ways until one finds the desired result.
p-level: In null hypothesis statistical testing, the calculated probability that an effect of the size (or larger) obtained by a study would have been found if the actual effect in the population were zero.