Female Code Breakers

Here’s a fascinating story about the women who helped break codes during WWII.  The article appeared as part of ACM TechNews, and is excerpted from the book Code Girls: The Untold Story of the American Women Code Breakers of World War II by Liza Mundy.

via The Secret History of the Female Code Breakers Who Helped Defeat the Nazis – POLITICO Magazine

(Thanks to Barbara Ryder, emeritus professor and former chair of Computer Science at Virginia Tech, for the pointer)

Advertisements

Gender & Racial Disparities in Big Cancer Data

As a researcher who works with large publicly available biological datasets, I was reminded of the potential biases in big data when I came across this blog post from the University of Michigan Health Lab:

How Genomic Sequencing May Be Widening Racial Disparities in Cancer Care .  Nicole Fawcett, Aug 17, 2106.

Cancer is a notoriously heterogeneous disease, meaning that different patients with the same cancer type may harbor different sets of mutations.  Further, many genes associated with cancer tend to be mutated at very low frequencies in tumors [1].   In order to gain enough statistical power to confidently identify these rare “driver” mutations, we need data from hundreds to thousands of tumor samples.  Obtaining such a large number of samples often requires collecting tissues whenever possible.

The Cancer Genome Atlas (TCGA) is a massive data repository for dozens of cancers, containing data from hundreds to thousands of individuals for most cancer types.  The post above describes a recent study that determined the racial breakdown of tumor samples in 10 of the 31 tumor types from TCGA.  They found that while the samples were racially diverse — even, in some cases, matching the U.S. population — the number of African-American, Asian, and Hispanic samples were too small to identify group-specific mutations with 10% frequency for any tumor type except breast cancer in African-Americans. On the other hand, there were enough Caucasian samples in every tumor type to identify mutations with 10% frequency in the population (and 5% frequency for 8 of the 10 tumor types assessed).  Consequently, we identify more “rare” mutations that pertain to Caucasians simply because we have more data to support the findings.  Further, only 3% of the total samples were Hispanic, while Hispanics comprise 16% of the U.S. population.

This disparity is not limited to a race.  Gender representation in big cancer data has also been in the press.  The under-representation of women in sex-nonspecific cancer over the past 15 years has been reviewed by Hoyt and Rubin (Cancer 2012), who noted that this gap may be widening.

Want to see the discrepancies for yourself?  The data is easy enough to obtain, but Enpicom has a fantastic interactive visualization of the entire TCGA data repository by patient gender, race, and age.

screen-shot-2016-09-14-at-5-20-46-pm

Consider glioma, for example – while the incidence rate of brain tumors is higher in women than in men [2], women comprised only 41.4% of the over 1,100 samples.

screen-shot-2016-09-14-at-1-50-31-pm

Even more alarmingly,  over 88% of the samples are Caucasian.screen-shot-2016-09-14-at-1-50-03-pm

There is evidence of higher incidence rates of brain cancer in Caucasians compared African-Americans and Hispanics, but surely this doesn’t justify the over-representation in this dataset.

So, what should we do?

On one hand, we need to carefully design data collection efforts to ensure that different racial/ethnic groups are adequately represented – not simply to reflect the proportion in the U.S. population but to gain enough statistical power to confidently identify rare mutations.   On the other hand,  “convenience sampling” methods of obtaining tumors from the most convenient places, even if the population is homogenous, have enabled consortia to collect enough data in the first place.  In fact, we better understand the “rare mutation” concept due to the mostly-white patient data collected by TCGA and others.

The only clear answer is that we need more data.


[1] This is often called the “long tail” distribution of cancer gene mutations.  For more information, see, for example,  Lessons from the Cancer Genome. Garraway and Lander.  Cell 2013.

[2] All primary malignant and non-malignant brain and CNS tumors.  In fact, the incidence rate of malignant brain tumors is slightly higher in men.  Cancer statistics from the Central Brain Tumor Registry of the United States.

 

 

Networks in Biology (it’s not what you think)

I am currently designing my upper-level undergraduate class I will teach next fall.  The proposed course description* begins with:

Computational Systems Biology

A survey of network models used to gain a systems-level understanding of biological processes.  Topics include computational models of gene regulation, signal transduction pathways, protein-protein interactions, and metabolic pathways…

As a result, I’ve been keeping my eye out for networks (or, mathematically-speaking, graphs) in biology.  I found a fascinating network in this recently-published paper:

Males Under-Estimate Academic Performance of Their Female Peers in Undergraduate Biology Classrooms
Grunspan DZ, Eddy SL, Brownell SE, Wiggins BL, Crowe AJ, et al. (2016) Males Under-Estimate Academic Performance of Their Female Peers in Undergraduate Biology Classrooms. PLoS ONE 11(2): e0148405. doi: 10.1371/journal.pone.0148405

I often see reports on gender bias in computer science, but I somehow thought that biology would be the least gender biased of the STEM disciplines.  I was surprised that this type of bias has been uncovered in biology, and in classes with more female students than male students.  The paper has already been highlighted on sources such as Science Daily, The Atlantic, and the Huffington Post, among others.  The wealth of information in the paper — from the experimental design to the study setting to the final results — warrants an important, broad discussion.

In this post, however, I’ll focus on the networks.

The authors conducted multiple surveys where students nominated the “best performers” in their introductory biology courses at a large American university.  These surveys were given at different parts of the course, and they were conducted across three different iterations of the same undergraduate biology class.  Figure 1 of the paper shows two networks displaying two surveys from the same class, six weeks apart.

 

 

journal.pone.0148405.g001

Figure 1. Unequal distribution of peer perception of mastery of content among genders grows over the term.  Grunspan et al., PLOS ONE 2016.

These networks show the students (represented as nodes in the graph) in a particular class, and “votes” as directed edges from nominators to nominees.  Male students are shown in green, and female students are shown in orange.  The size of nodes indicates the number of nominations received by each student.  The structure of these networks is striking.  There are many students who do not nominate anyone and are not nominated by anyone, resulting in “singleton” nodes.  In both networks, there is a general cohort of students that receive nominations; however the distribution of these nominations are much more skewed in the second survey.

The intuitive trend that we see in these graphs is that “the green nodes tend to get bigger” corresponding to a larger proportion of nominations go to male students.  However we see female students also receive more nominations in the second survey compared to the first.  The authors quantify these aspects using exponential-family random graph models (ERGMs) to assign coefficients on model statistics relating to gender, outspokenness, and grade.  They found a specific gender bias, that male students tend to nominate other male students, after controlling for grade and outspokenness.  Female students, on the other hand, do not exhibit a gender bias toward nominating males (or females for that matter), after controlling for these factors.

There are many, many other factors that may contribute to these observations, and some are noted in the paper.  The courses were taught (and in some cases co-taught) by four male instructors and only one female instructor, the classes ranged in size from 196 to 760 students, one class employed “random call” lists rather than calling on raised hands.  Besides outspokenness, interactions in lab sections and outside class would undoubtedly affect students’ perceptions.  This paper opens a tremendously important conversation about implicit gender bias in the classroom, even in majors with more female students than male students.  As the paper concludes,

This gender biased pattern in celebrity was experienced by over 1,500 students in our analyses.  This number is striking, but less worrisome than the millions of students who attend college STEM classes that may perpetuate the same biases described here.

Grunspan et al., PLOS ONE 2016.

* Pending approval of various college committees – it may change

Quantifying the gender bias in federally-funded STEM research

We all know that there is a gender disparity in STEM fields.  Is it harder for women in these fields to obtain federal funding compared to their male colleagues?  In 2013, Helen Chen published an article in Nature summarizing women’s continual challenges in science.    The infographic below from the paper describes the gap in NIH-funded research grants.

from Inequality quantified: Mind the gender gap by Helen Chen, Nature Vol 595 Issue 7439 2013.

At first glance, the funding gap looks appalling – only 30% of the NIH’s grants are going to women!  However, there’s a missing ingredient here:  the fraction of NIH grant proposals submitted by women.  To get this information, let’s go back to 2008 for a minute.  Jennifer Pohlhaus and others at the NIH assessed the gender differences in application rates and success rates for 77% of the awards submitted in 2008, including training grants, midcareer grants, independent research grants (e.g., R01), and senior grants.  They found that the acceptance rates reflected the application rates for most NIH grants.  however, men had a higher success rate once they had received their first NIH grant and become NIH investigators.  So the funding gap in the infographic may not be tied to women having lower success rates in funding, but rather that fewer women are submitting grants.  A visualization of the data from the NIH is available on their webpage.

The Nature article (and many many other articles) point to the fact that women tend to leave science early in their education and careers.  In the 2008 NIH grant applications there were more female applicants than male applicants for three of the early career / training awards (F31, K01, K23), and two other early career awards (F30 and F32) showed no statistical difference between the number of male and female applicants.  However, male applicants significantly outnumbered female applicants in all midcareer, independent research, and senior career programs.

An evaluation of gender bias is currently underway for six other federal agencies: NSF, DOD, DOE, USDA, HHS, and NASA.  The audit, conducted by the Government Accountability Office (GAO), will first release a report that investigates whether the agencies evaluate proposals based on potentially biased measures.  The GAO will then release a second report identifying potential factors that lead to the disparity in funding between men and women.  Once out, it will be an interesting read…

Slaughter Announces GAO Audit on Gender Discrimination in Federal STEM Research Funding | Congresswoman Louise Slaughter.