Review of 'Look Who's Talking : Gender Differences in Academic Job Talks'

Reviewer: Dennis Leung

Inviter role(s): AUTHOR

Publication date of review: 2023-08-27

Bookmark

Dennis Leung4

Look Who's Talking : Gender Differences in Academic Job TalksCrossref ScienceOpen

This paper applies permutation methodology to study gender disparity in academic job talks

Average rating:	    Rated 3.5 of 5.
Level of importance:	    Rated 3 of 5.
Level of validity:	    Rated 4 of 5.
Level of completeness:	    Rated 4 of 5.
Level of comprehensibility:	    Rated 3 of 5.
Competing interests:	None

Reviewed article

Record: found
Abstract: found
Article: found

Is Open Access

Look Who's Talking : Gender Differences in Academic Job Talks

Amanda Glazer, Hubert Luo, Shivin Devgon … (2023)

The "job talk"is a standard element of faculty recruiting. How audiences treat candidates for faculty positions during job talks could have disparate impact on protected groups, including women. We annotated 156 job talks from five engineering and science departments for 13 categories of questions and comments. All departments were ranked in the top 10 by US News & World Report. We find that differences in the number, nature, and total duration of audience questions and comments are neither material nor statistically significant. For instance, the median difference (by gender) in the duration of questioning ranges from zero to less than two minutes in the five departments. Moreover, in some departments, candidates who were interrupted more often were more likely to be offered a position, challenging the premise that interruptions are necessarily prejudicial. These results are specific to the departments and years covered by the data, but they are broadly consistent with previous research, which found differences of comparable in magnitude. However, those studies concluded that the (small) differences were statistically significant. We present evidence that the nominal statistical significance is an artifact of using inappropriate hypothesis tests. We show that it is possible to calibrate those tests to obtain a proper P-value using randomization.

0 comments Cited 0 times     Rated -3 of 5. – based on 3 reviews

Preprint version 2

Bookmark

Review information

DOI:: 10.14293/S2199-1006.1.SOR-STAT.AFUNVI.v1.RIITBL

License:

This work has been published open access under Creative Commons Attribution License CC BY 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Conditions, terms of use and publishing policy can be found at www.scienceopen.com.

ScienceOpen disciplines: Applications,Statistics

Keywords: randomization tests,type III error,job talk,nonparametric,permutation,gender,academia

Review text

Using new data obtained from UC Berkeley and permutation methodologies, this article adopts a rigorous nonparametric approach to test whether female candidates are generally asked more (different types of) questions during academic job talks. In contrast to a similar paper (by Blair-Loy et al.) that employs a parametric, possibly unfounded ZINB model to study the same problem, the present paper finds no strong evidence suggesting that women candidates get asked more questions than their male counterparts; in particular, even when the present authors apply a randomization-calibrated test based on the ZINB model, they still can't find strong evidence.

I find this paper quite stimulating, and it touches upon the area of permutation tests which perhaps many main-stream statisticians are not too familiar with these days (in my humble opinion); as such I have picked up the book by Pesarin and Salmaso (which the authors' method is based on ) to have a quick read. While I appreciate the model-free approach taken by the authors, I would like to play devil's advocate here and point out the possibility that the power of the randomization test employed in this paper may simply not be high enough to detect the difference in median between women and men. For instance, there could be combining functions other than the Fisher Omnibus one to choose from, and the optimal permutation test is generally hard to nail down (p.107 in Pesarin and Salmaso). Another limitation of the current study, as also pointed out by the authors, is that the dataset is not large enough to stratify by year to allow for more fine-grained analysis.

Minor comments:

1. The use of certain terminology could have been more consistent to increase readability, e.g. "overall talks" vs "entire talks" (p.8 and p.9).

2. Table captions could have been extended to describe the content of the table better, e.g. In Table 1, it wasn't apparent to me until later on seen in Section 5.2 that "median events" refers to median number of audience utterances.