The Google Manifesto 3: A Tangled Knot Of Bad Statistical Reasoning
♪ Who wants to linearly regress forever? ♪♪
This article is Part 3 of a series about the Google Manifesto. [Part 1] [Part 2]
Sorry about the delay on this article. With all that was happening in Charlottesville and elsewhere, I’ve been a bit distracted.
Let me be up-front with one thing: I am not an expert. I don’t have a bachelor’s degree, never mind a master’s degree, in any field. I do have 9+ years experience in the technology industry as a Site Reliability Engineer plus a fair bit of software development on the side, but almost all of my knowledge of science beyond the high school level is self-taught.
In contrast, manifesto guy James Damore has attained significantly higher formal education than I have: a BS from UIUC is nothing to sneeze at, and an MS from Harvard is quite prestigious. His expertise is Systems Biology, which is a varied field but typically involves using computers to simulate the internal chemistry of individual cells. [1] Nevertheless, a person with this educational background in the sciences should be quite familiar with the process of reading papers, interpreting results, and analyzing their methodologies for flaws, as well as familiar with the fundamentals of science such as statistics and logic.
Which makes it all the more sloppy that even I, an amateur, can spot serious flaws in the manifesto’s writing.
In this article, I will discuss some of the ways in which the manifesto exhibits flawed reasoning. My goal here isn’t to debate the manifesto’s presented “facts” — I’m going to save that for a future article. [Edit on 2017–08–22: Or check out “The truth has got its boots on” by Erin Giglio. She does an incredible job of tearing apart the Manifesto’s citations, explaining the missing context, and citing real research to back up her criticism.] Instead, I want to discuss some problems that exist in the manifesto itself, independent of the evidence it cites.
Many of these differences are small and there’s significant overlap between men and women, so you can’t say anything about an individual given these population level distributions.
— James Damore
The Google Manifesto is… not particularly well thought out.
The most egregious errors made by the manifesto author — errors that permeate the work — are problems with statistical reasoning. To cite a specific example, the author assures us that:
-
There is a population-level difference between the average man and the average woman, but this tells us little to nothing about individuals.
-
Tech is an occupation that only individuals with certain abilities and temperaments can thrive in.
What’s the problem? Well, if tech is truly an elite occupation, then any talk about average members of the general population is meaningless. Averages would matter if the people being fed into Google’s recruiting pipeline were randomly drawn from the population at large. But this isn’t how Google hires people: Google tries to identify people who already have a demonstrated interest and skill in tech, so the ability/temperament distributions of people in the recruiting pipeline will not match those of the general population.
The manifesto author places a great deal of emphasis on these averages… wholly not recognizing that, each time he makes a point which must be qualified by the word “average”, that point definitely does not apply to the women in (or beyond!) Google’s recruiting pipeline. Because we do not know if the recruiting selection process is linear or not, we therefore do not know if shape of the probability distribution function has changed due to the recruitment process, and therefore we cannot make the assumption that male and female Google engineering candidates are simply the long tail of a Gaussian distribution over the general population. One would have to conduct a study comparing the women in tech population against the men in tech population, controlling for culture and biases, to draw any meaningful conclusions about biological differences at companies like Google.
The statistical problems don’t stop there. The manifesto commits another major error in reasoning, namely: if the population-level differences between the mean man and the mean woman are small, and therefore there is significant overlap between the normal distributions of men and women — propositions which the author readily accepts, as quoted above — then this difference would be easily overwhelmed by any external factor with a larger effect size.
Do such potentially-confounding external factors exist? Absolutely! For instance, women who submit code for review are more likely to have their changes approved, but only if their gender is concealed from the reviewers; if gender is revealed, then women are less likely to have their changes approved. [link] [link] Likewise, there’s a well-known bias against taking women’s suggestions with the seriousness they deserve, and of shaming a woman for showing the same assertiveness that would be praised in a man. These factors could absolutely sabotage the progress of a woman’s career, making it less likely that she would be recruited by Google regardless of her actual ability.
In short, there are well-known, well-documented reasons — independent of biology — why women are positioned to fail in tech, and those reasons have nothing to do with the women themselves and everything to do with the working environment.
To my knowledge, there have been no studies comparing the effect size of these anti-woman biases vs the effect size of biological differences between men in tech and women in tech, which is the only way one could tease out which fractions of the Google gender disparity is due to which causes. However, the biological differences reported in the literature are tiny: there are a few that seem robust but are tied to adult testosterone level, one that’s a small-but-probably-real bias with regards to spatial vs linguistic reasoning, and a number of other minuscule biases that may just be statistical noise.
(Keep in mind that, due to problems such as publication bias, we expect the literature to contain a fair number of studies that claim statistical significance for a small effect — even for effects that are just statistical noise. Let me repeat that: there are papers that claim statistical significance for an effect that doesn’t actually exist, and the scientists who wrote the paper did nothing wrong. One paper is not proof; this is why reproducibility is critical to science.)
In the absence of studies directly comparing women in tech and men in tech, we can only rely on the studies over the general population. And given that those studies claimed small effects, any objective reader would expect the anti-woman biases to dominate reality, not the biological effects.
Let’s do some statistics. I’m really rusty and mostly self-taught, so I may get some details wrong, but the basic reasoning should be sound.
First, suppose that the recruiting selection process is a linear function. Second, assume that the reason men are more likely to be software engineers is entirely due to biological differences between men and women. Damore claims that this effect is “small”, so let’s make the generous assumption that the mean man is as much as 5%tile away (0.13σ) from the mean woman.
The US population is composed of 2.54% software engineers and 50.8% women, and Google is composed of 17% female software engineers, which we will assume is representative of the tech industry. This means that 4.28% of US men are software engineers, as are 0.850% of US women.
If 100% of people who are qualified for tech actually go into tech — a bold assumption — then a man in tech must be 1.72σ away from the mean man (95.7th percentile) to be qualified, while a woman in tech must be 2.39σ away from the mean woman (99.2nd percentile) to be qualified. If the disparity between men and women is entirely due to biological differences, then the biological difference between the mean man and the mean woman is 0.67σ (i.e. a 50th percentile man would be in the 74.9th percentile if judged against women). That’s not a small difference: it’s an effect size over 5 times stronger than what we were expecting from the literature.
If there are people with tech aptitude who don’t go into tech, then the story gets worse for Damore. Suppose that, of people with tech aptitude, only 1 in 10 go into tech. Now these assumptions imply that 42.8% of men (0.18σ from mean) are qualified for tech, but only 8.50% of women (1.37σ) are qualified for tech, for a difference between means of 1.19σ (a 50th percentile man would be judged as an 88.3rd percentile woman). That’s huge: it’s more than 9 times stronger than the effect size we were expecting.
In short, we’ve contradicted our assumptions, so at least one of the assumptions must be wrong: either the biological differences between men and women are not small like we assumed, or the recruiter filtering function is strongly nonlinear (and thus the men:women ratio of the software industry has nothing to do with population differences between the average man versus the average woman), or lastly there is a confounding factor (such as bias against women) which explains why men are 5 times more likely to work as software engineers at Google when compared to women. As the literature does not support the large effect size, Damore’s argument (that biological differences cause the Google gender ratio disparity) falls apart.
These are just some of the criticisms that can be laid at the feet of the Google Manifesto. It’s a good thing for Damore that the manifesto is not scholarship, because it would be awful scholarship — the sort of thing that would be torn apart in peer review at any journal worth its salt.
Footnote 1
Note that Systems Biology has about as much to do with Neuroscience (brains are made of cells) as Psychology has to do with Economics (economies are made of people). To use a metaphor, Star Wars and Star Trek are both science fiction, but being an expert on Star Wars doesn’t make you an expert on Star Trek.
In short, when Damore cites Neuroscience studies, he is acting in the capacity of an interested layperson, not of an MS degree holder, because he does not have the depth of knowledge about the state of the Neuroscience field and its literature. Studies are written for fellow scientists within the same field—not for laypeople, nor even for scientists outside the field—and this means that those without the field’s context can easily misinterpret a study’s implications or significance. A study with a surprising result requires replication and/or a very high statistical significance to be treated seriously, and only someone within the field will understand what is “surprising”.