3.5 Skills in Action: Evaluating Research on Sex/Gender Differences/Similarities

Self-Reflection: How Theoretical Perspectives Affect Research

As we’ve said, science is influenced by researchers’ life experiences, assumptions, and biases. Research on similarities and differences is no exception. Historically, people with assumptions about women’s inferiority took it for granted that sex/gender differences justified that inferiority. For example, in the 19th century, one theory blamed women’s alleged inferiority on their reproductive capacities, arguing that menstruation consumed biological resources that could otherwise have “promoted further brain development” (Geddes & Thomson, 1890, as cited in Shields, 2007, p. 96). As discussed at the beginning of the chapter, even today research results can be circulated by the popular press in ways that confirm stereotypes of gender essentialism. What people believe can affect the way they conduct, report, and interpret research, particularly on gender similarities and differences.

Many feminists reject gender essentialism. In particular, liberal feminists tend to take a similarities perspective. For example, in 1903 psychologist Helen Thompson-Woolley tested motor skills among 25 female and 25 male white undergraduate students (Thompson, 1903). She then graphed the distributions of data from the female and male students, demonstrating tremendous overlap, and argued that the small difference found resulted from socialization practices rather than from heredity. Therefore, she argued, if women were given the same educational opportunities as men, those differences would likely disappear. Thompson-Woolley’s work is an example of how focusing on a similarities perspective provided evidence to refute sexist ideas that girls and women shouldn’t advance in academic and professional settings.

In contrast, other feminist psychologists take a differences perspective. They view women and men as distinct, having more differences than similarities. These psychologists think people should appreciate and value women’s unique experiences and attributes (Hare-Mustin & Marecek, 1988). This is a core belief associated with radical feminism and cultural feminism. The idea that some feminists take a perspective associated with gender essentialism may seem problematic. After all, in Chapter 1 we questioned whether cultural feminism is empowering or oppressing, and a similar question applies here. However, not all cultural feminists, or those who advocate for a differences perspective, believe in gender essentialism. Instead, the differences perspective can be a strategic choice to help support social causes that uniquely affect large numbers of women. This practice is known as strategic essentialism (Spivak, 1990). The 2017 Women’s March, discussed in Chapter 2, is an example of strategic essentialism because it was framed around how women are systemically disadvantaged compared with men.

A photo shows a large group of women breastfeeding on a lawn in a public outdoor space. — Figure 3.8 Strategic Essentialism

Although public breastfeeding is legal throughout the United States, women are often asked, or even told, not to do it or to make sure they are covered while nursing. Protests such as this one, organized by lactivists, are an example of using strategic essentialism to create change.

According to the differences perspective, ignoring differences among people can be harmful, especially if knowledge gained from studying one group is applied to another group whose circumstances are different. In clinical research, for example, if the symptoms of a heart attack typically differ in women and men, but if only men have been studied, then doctors may not recognize symptoms of a heart attack in women (Eagly & Wood, 2011). The differences perspective has also had some influence in modern science. In 2014, the U.S. National Institutes of Health (NIH) changed its policy for cell and nonhuman animal research to require that all funded studies have a balanced number of female and male subjects (Clayton & Collins, 2014). Previously, most research using nonhuman animals had been done on males so that the female reproductive cycle wouldn’t complicate the data. Even though there can be solid reasons to take a differences approach when it promotes feminist advocacy and advances social justice goals (Figure 3.8), focusing on the unique experiences of women can give the impression that all women are alike. Those who use strategic essentialism to advocate for social causes may ignore how intersectionality acknowledges each woman’s unique perspective and experiences—something that contributed to tensions among groups of women with different social identities during the organization of the Women’s March.

Both similarities and differences perspectives can be useful, and both can be problematic. However, it’s important to realize how one’s theoretical perspective on this issue can influence each stage of the research process. The fact is that when we look for differences, we choose research methods that will help us find them. Nevertheless, a careful examination of the research on sex/gender differences indicates that the context of the study and who is being studied profoundly matter and can affect the results. Furthermore, because people are complicated and every person is unique, simplistic statements reflecting gender essentialism, such as those in the beginning of the chapter, simply don’t hold up.

(Lack of) Evidence for Gender Essentialism: The Big Picture

Those who endorse gender essentialism often say that they’re using research to back up their claims, but the research simply doesn’t support this assertion. And it’s not as though psychologists haven’t tried to find differences. In this section, we’ll explore the main findings from the thousands of studies that have been designed with the goal of finding gender differences. As a reminder, these studies are grounded in the assumption of a sex/gender binary, which is a source of bias and erases the experiences of trans and/or gender nonbinary people. Participant gender is generally operationalized through self-report with the option to choose only female or male. This is important to keep in mind as we review the evidence (or lack thereof) for gender essentialism.

When a great deal of research has been done on one topic, psychologists have to use additional statistical analyses in order to make sense of the body of findings. For example, in August 2022 a search for human sex difference in PsycINFO returned more than 120,000 articles and 3,600 articles that discussed sex differences in aggression. To combine results from multiple studies, psychologists use meta-analysis, which is essentially a study of studies in which findings from existing studies serve as the data used in the new, summary study. A related technique is meta-synthesis, which statistically combines the results of many meta-analyses. A meta-analysis has the potential to combine data from thousands of studies that look at millions of people. When data from many studies are combined in a meta-analysis, the results are generally presented in terms of an effect size, or a d statistic (Cohen, 1988). This number indicates how large or small a difference is.

As Table 3.2 shows, an effect size with an absolute value of 0.1 (or less) is considered negligible, meaning no difference is assumed to exist. An effect size of 0.2 is considered small, 0.5 is considered medium, and 0.8 is considered a large difference between groups. It’s important to remember that even a large effect doesn’t mean that the groups being examined are completely distinct. To think about this in terms of sex/gender differences, consider that even with a large effect there would be a 69% overlap between women and men, as Table 3.2 shows. The overlap is even greater with smaller effects, and small effect size differences are the most common in sex/gender difference research. If the effect size is small, 92% of women and men would have similar scores—indicating a great deal of overlap. You can see what different effect sizes look like in terms of overlap between groups in Table 3.2.

TABLE 3.2 Common Effect Size Standards
Effect Size: d	Effect Size: Label	Percentage overlap between groups	What does it look like?
0.1	Negligible	96.01%
0.2	Small	92.03%
0.5	Medium	80.26%
0.8	Large	68.92%
Note: The dashed lines on the graphs indicate the mean score for each distribution. Closer means scores represent smaller group differences.

Researchers have reviewed data from many meta-analyses as a tool to gain an overall picture of the extent of sex/gender differences. In 2005, psychologist Janet Hyde reviewed 46 meta-analyses. She found that 30% of the effect sizes were negligible (d < 0.1) and 48% were small (d = 0.11 to 0.35)—a total of 78%. In 2015, researchers conducted an updated review using meta-synthesis of 106 meta-analyses with 386 different reported results (Zell et al., 2015). The total number of participants across all the studies reviewed was more than 12 million. The meta-synthesis found an effect size for the overall difference between women and men of 0.21—a small effect. More than 90% of women and men overlapped, but there was a small overall difference. When the authors of the meta-synthesis looked specifically at the 386 meta-analytic results, they found, quite similar to Hyde (2005), that 85.5% of the effect sizes indicated either negligible or small differences.

Furthermore, it’s important to note that a meta-analysis often relies on published studies. Because of the file-drawer problem, studies that don’t show a difference might not be included because they’re less likely to be published. Remember, part of the feminist approach to research is to ask what’s being excluded. This question applies not just to participants but to research studies themselves.

Of course, one’s theoretical perspective will influence how these results are interpreted. Gender essentialists will look at small differences and claim they show evidence that women are one way and men are another. But looking at the overlap between people when small, or even medium, effects are found indicates that such strong claims about difference are not justified. In addition, a closer examination of the research on cognitive and personality variables indicates that context and representation matter. In other words, when studies are examined more closely, one can see that the results often change depending on how the questions are asked and who is included.

Cognitive Variables: Representation and Context Matter

What are the main findings on similarities and differences in cognitive variables, and how do context and representation affect the findings?

Much research on cognitive factors has not shown overall differences between women and men. Nevertheless, there is a common stereotype that women excel in verbal skills and men excel in math skills, and many claim that research justifies this stereotype. However, the picture is considerably more complicated. In a meta-analysis of math achievement scores of almost a half million students across 69 nations (Else-Quest et al., 2010), researchers found, on average, a very small male advantage in various domains of math skills (d < 0.15 for each skill). Note that this small effect indicates a great deal of overlap between female and male students. Also, there were extensive variations from nation to nation, and girls outperformed boys in some countries. Furthermore, in nations with greater gender equality (e.g., more women enrolled in schools, represented in the legislature, and holding high-level science and math positions), girls performed better on math tests, and the sex/gender difference often disappeared. These findings support the gender stratification hypothesis, the idea that differences found between women and men (especially on cognitive skills) are correlated with the level of gender equality in a country. So sex/gender differences vary depending on who is being studied and where research is being done.

Other research points to the importance of using an intersectional lens and considering a range of social identities and how they interact with power and privilege. Gender differences can disappear when other social identity characteristics are taken into consideration. In a study based on hundreds of thousands of state assessments mandated by the No Child Left Behind Act, researchers found that the overrepresentation of boys at the highest levels of math achievement only occurred for white students (Hyde et al., 2008). For Asian American students, the reverse was true. For example, at the 99th percentile of math achievement, white boys outnumbered white girls by a ratio of around 2:1. However, for Asian American students at this level of achievement, there were slightly more girls than boys, with 0.9 Asian American boys scoring at this level for every 1 Asian American girl (Hyde et al., 2008).

A study of spatial skills also showed the importance of using an intersectional lens. This study found, among middle- and upper-middle-class participants, that boys did better than girls (Levine et al., 2005). Among participants of lower socioeconomic status, however, no sex/gender differences were found. In this case, if only one aspect of identity had been examined (e.g., focusing only on sex/gender), the results would have presented an oversimplified picture.

Research on cognitive variables also points to the importance of context—specifically, what participants think the study is about and how the questions are asked. One study found that boys did better on a mental rotation task but girls did better on a mental paper-folding task (Miller & Halpern, 2014) (Figure 3.9). Other studies have found different results when they described cognitive tasks differently. For example, in a study of spatial memory, girls did better than boys when told it was a test of drawing skills, but boys did better than girls when told it was a test of geometry (Huguet & Régner, 2009).

A composite of two illustrations depicts a mental rotation task and a mental folding task. — Figure 3.9 Mental Rotation and Mental Folding Tasks

For the mental rotation task shown on top, participants are asked to imagine rotating the drawing on the left and to pick two of the four drawings on the right that match (Answers: B and C). For the mental folding task shown on the bottom, participants are asked to imagine folding the drawing on the left along the dotted lines and to choose the folded shape on the right that would match (Answer: A). Performance on the mental rotation task shows a large sex/gender difference, while performance on the mental folding task shows a very small one. The reason why there are large sex/gender differences for one but not the other remains a mystery. (After Miller & Halpern, 2014)

In another study, researchers gave women and men a spatial perspective-taking task that involved viewing a diagram of a city from above and navigating from one location to another by writing “right” or “left” at every turn (Tarampi et al., 2016). In one condition, participants were given the start and stop points and were told the task was a measure of spatial ability. In this case, men did better than women. In another condition, the researchers showed the same map but drew a human figure that needed to be directed through the city (Figure 3.10, p. 120). Participants were told that it was a task of empathy rather than of spatial ability. Voilà—the sex/gender differences disappeared! The context in which participants encountered the task mattered. These findings suggest that differences in some spatial skills may have more to do with expectations about what women and men are supposed to be good at rather than actual cognitive differences.

A map with three rectangular blocks in three rows shows a navigation path between the rectangular blocks. — Figure 3.10 Context Matters

In the Tarampi et al. (2016) study, the map on the left was used for participants who were told that the study was exploring spatial ability. The map on the right, identical except for the inclusion of a human figure at each turn, was used for participants who were told that the study was exploring empathy. A sex/gender difference was found when participants thought it was a task of spatial ability but not when they thought the task was about empathy.

A map with three rectangular blocks in three rows shows a navigation path with human figures between the rectangular blocks. — Figure 3.10 Context Matters

In the Tarampi et al. (2016) study, the map on the left was used for participants who were told that the study was exploring spatial ability. The map on the right, identical except for the inclusion of a human figure at each turn, was used for participants who were told that the study was exploring empathy. A sex/gender difference was found when participants thought it was a task of spatial ability but not when they thought the task was about empathy.

Personality Variables: Representation and Context Matter

How similar or different are women and men on personality variables, and what role does cultural context play in these patterns?

Many of the stereotyped sex/gender differences expressed in the books mentioned at the beginning of this chapter have to do with personality and behavior. Women supposedly want relationships, desire intimacy, and seek connection. Men supposedly want sex, are poor communicators, and are protective of their romantic partners. But, in reality, how different or similar are women’s and men’s personalities? To understand research on personality and sex/gender, it’s important to attend to both context and social identity.

Researchers generally view personality as varying on five major dimensions, known as the Big Five: openness to new experiences, conscientiousness, extraversion, agreeableness, and neuroticism (e.g., the tendency toward anxiety and depression; Costa et al., 2001). Each dimension has many subdimensions. In a meta-analysis, researchers found sex/gender differences on several of the dimensions as well as on particular subdimensions (Feingold, 1994). The largest effects were for men outscoring women on measures of assertiveness (a subdimension of extraversion; d = 0.50), although some studies indicated that this difference was found more on paper-and-pencil personality assessments than in behavioral observations. There was also a large effect for women outscoring men on tender-mindedness (a subdimension of agreeableness; d = –0.97) and a small effect for anxiety (a subdimension of neuroticism; d = –0.25 to –0.28). However, these differences still imply a great deal of similarity. Even the largest effect mentioned represents an overlap of approximately 62%. Also, these differences are consistent with sex/gender stereotypes: Being tender-minded is part of the feminine stereotype, and being assertive is part of the masculine stereotype, so personality differences can’t be understood without understanding how social power dynamics and gender socialization affect people.

Research on personality consistently points to the importance of using an intersectional lens. Specifically, studies show that sex/gender differences can disappear when race and/or ethnic background is taken into account. In one meta-analysis of almost 700 studies, sex/gender differences in experiencing guilt were only found for white participants and not for Asian American, Black, or Latine participants (Else-Quest et al., 2012).

Research on personality also points to the importance of context. In other words, the findings can change when variables are studied in different situations. For example, meta-analyses have shown that in studies of helping behavior, when the behavior was openly observed by others, men tended to help more often than women (Eagly & Crowley, 1986). However, when helping took place without anyone watching, there were no sex/gender differences. Men were also more likely to help women than to help men. These findings suggest that men may help (and help women) because that’s how they think they’re supposed to act, especially in front of others.

Research on personality also shows that people are complicated—each person has a unique constellation of personality traits, and very few people are consistently typed as “female” or “male.” For example, researchers studying traits that are highly gendered (e.g., communication with peers, problem behavior) found that between 59% and 70% of people had a mix of traditionally feminine and masculine traits. Less than 2% of people had all “female” or “male” traits (Joel et al., 2015).

Another example of how context matters is found in research on sexuality. For example, research asking participants about their number of sexual partners generally finds that men report more sexual partners than women. However, when you tell participants that they’re being monitored with a lie detector (although they aren’t), women and men report identical numbers of sexual partners (Alexander & Fisher, 2003). As a second example, women tend to have fewer orgasms than men, but context matters here too. These differences largely disappear when studying women in committed relationships and women who experience adequate foreplay and clitoral stimulation (Conley et al., 2011)—topics we’ll return to in Chapter 7.

The Brain: An Ever-Changing Mosaic

Most people who endorse gender essentialism use biology to justify their beliefs. The term neurosexism was coined to describe a bias in neuroscience that justified or reinforced gender stereotypes (Fine 2010; Fine et al., 2013). Indeed, quite a bit of research has been dedicated to comparing the brains of women and men. But what do the data actually say?

Overall, studies tend to show that any brain differences found between women and men are small at best and that there is considerable overlap between groups (Wallentin, 2009). Even on variables with some of the largest differences—for example, a part of the hypothalamus is, on average, twice as large in men—about 30% of men have brains that look more similar to a typical female brain (Garcia-Falgueras et al., 2011). Furthermore, context matters even in brain research—most of which is done with nonhuman animal subjects. In one part of the hippocampus, evidence of greater activity, on average, was found in males when an animal didn’t experience stress, but evidence of greater activity was found in females when the animal did experience stress (Reich et al., 2009).

An important finding that has recently come to light is that individual women and men do not have brains that are consistently “female” or “male” typed. Indeed, every individual is complex—consistent with the idea of intersectionality. In one study, researchers examined MRI scans of more than 1,400 adults and identified the brain areas that showed the largest average differences between women and men (Joel et al., 2015). Researchers then looked at each brain, one at a time, to see if women and men consistently had gender-typed brains. Instead of “female” or “male” brains, the researchers found a brain “mosaic”: Most participants had parts of the brain that were “female” typed, other parts “male” typed, and still other parts somewhere in between (Figure 3.11).

Two illustrations show the brain mosaics of females and males. — Figure 3.11 Human Brain Mosaic

This illustration (from Joel et al., 2015) shows the human brain mosaic. Each horizontal line represents the brain of one participant (women on the left and men on the right); each column represents one brain region, and darker colors represent greater gray-matter volume. Each brain is a unique mosaic of features. While there are some sex/gender differences on average, brains are extremely variable and each one is unique.

Finally, even if there are brain differences between women and men, one can’t assume that differences cause different abilities or behaviors. Indeed, people who support gender essentialism will often jump from the presence of a small difference to a statement that sex/gender causes those differences. In fact, research suggests that the reverse may be true. Brains demonstrate plasticity—that is, they have the ability to change in response to aspects of the environment and learning experiences. For example, taxi drivers develop larger-than-usual brain structures devoted to visual memory due to their experience with driving (Maguire et al., 2000), and musicians develop a larger-than-usual auditory cortex due to their greater need to process sound (Jäncke et al., 2001). So even if women and men do show brain differences, it may be the result of different experiences. In other words, biological explanations don’t rule out social explanations. They can go hand in hand.

Research also shows that one’s social environment may affect brain development and ultimately one’s health and mental health. Specifically, experiences of poverty, adverse experiences in childhood, and environmental factors such as air and noise pollution have all been linked to changes in the brain (Ferschmann et al., 2022). It has also been hypothesized that other stressors, such as the experience of racism, may affect brain development (Ferschmann et al., 2022).

Hormones: Context Matters

How does research on gender differences and similarities in hormones challenge the sex/gender binary?

Another area of focus is the role of hormones in shaping sex/gender differences. It’s generally assumed that there are female hormones (e.g., estrogen and progesterone) and male hormones (e.g., testosterone). In fact, assumptions about a sex/gender binary have limited how researchers have asked questions about hormones. Until recently, researchers have mainly studied “female” hormones like estrogen and progesterone in women and “male” hormones like testosterone in men (Hyde et al., 2019; van Anders, 2013). This was another case of personal perspective and assumptions affecting the research question, the participants, and the methodology.

Once researchers started to look at “female” and “male” hormones in all people, they discovered something striking. Not only do people of all sex/genders produce estrogen, progesterone, and testosterone, but the data show that the levels of these hormones across sex/genders are much more similar than researchers had realized. On average, all people have similar levels of estrogen and progesterone; in fact, non-pregnant women have levels more similar to men than to pregnant women (Liening et al., 2010; van Anders, 2010). It’s been suggested that a more accurate binary based on estrogen and progesterone levels would place pregnant people in one category and all other people in another (Hyde et al., 2019). This would be quite different from the standard sex/gender binary!

Your Turn

In this chapter, you’ve familiarized yourself with questions to help you think critically about bias in research, and you’ve applied them to assess research exploring gender differences. Next, find a research article on a topic that interests you. Use the questions to interrogate how bias may have played a role at different phases in the research process. Now do the same for another article. If the first one you chose was a quantitative study, make sure the second study is qualitative (or vice versa). Do you see differences in the ways bias may have influenced design, analysis, interpretation, and so forth? Do you see a difference in the way the researchers talk about bias?

In contrast, men do have higher levels of testosterone on average; however, the link between testosterone and some central, essential aspect of masculinity is less clear (van Anders, 2013). Instead of being linked to masculinity per se, high testosterone appears to be linked to behaviors such as competitiveness, and low testosterone appears to be linked to nurturance. Moreover, these connections occur in all people, not just men. Research on testosterone also points clearly to the importance of context. People’s hormones fluctuate as a result of their activity level, diet, and body fat, and they change when people engage in either nurturing or aggressive activities (DuBois & Shattuck Heidorn, 2021). For example, a longitudinal study found that testosterone levels decreased in fathers and that men who did more childcare had the largest decreases (Gettler et al., 2011). The effects were strongest for fathers of newborns, but testosterone remained lower in fathers compared to non-fathers over time. In another study comparing fathers from two communities in Tanzania, those who were involved in daily childcare had lower levels of testosterone than those who weren’t (Muller et al., 2009). Another study found that when young men were in a room with a fake baby that cried but couldn’t be comforted, their testosterone levels went up (van Anders et al., 2012). When they were able to comfort the baby, their testosterone levels went down.

Testosterone levels also tend to rise during displays of power and aggression. When women were asked to role-play firing someone (an act that demonstrates power), their testosterone levels went up (van Anders et al., 2015). As a result of this finding, researchers hypothesize that even though men do have higher average levels of testosterone than women, the fact that women are socialized not to display aggression may be one reason for their lower testosterone levels (van Anders et al., 2015).