Sabbatical Part 7

Preprints and Pasta

I have finally joined the preprint bandwagon (which is embarrassingly late, given how much I am in support of open science).  Here are four recent preprints that are finally free to the public.  Congratulations and thanks go to the four other PIs, one postdoc, one grad student, and six undergraduates who contributed to these papers.

Integrating Protein Localization with Automated Signaling Pathway Reconstruction
Ibrahim Youssef, Jeffrey Law, Anna Ritz
Full version of BIBM 2018 conference paper, under review
Big Question: When we hunt for protein interactions that are potentially involved in cellular signaling responses, can we use information about where the proteins are localized in the cell?
Short Answer: Yep.  We modify an existing algorithm to find paths within large protein protein interaction networks that respect where we expect proteins to be expressed in the cell.

Distance Measures for Tumor Evolutionary Trees
Zach DiNardo, Kiran Tomlinson, Anna Ritz, and Layla Oesper
RECOMB-CCB 2019 conference paper
Big Question: Suppose we have two “options” for how an individual’s tumor has evolved, in terms of the order of acquired mutations.  How should we compare these options?
Short Answer: We need to compare both the grouping (labeling) and the relative order of acquired mutations in evolutionary time – our distances account for these features.

Connectivity Measures for Signaling Pathway Topologies
Nicholas Franzese, Adam Groce, T. M. Murali, and Anna Ritz
GLBio 2019 conference paper
Big Question:Signaling pathways are inherently more complicated than their graph representations, and many other representations have been proposed to capture the complexity of signaling pathway topologies (e.g. compound graphs, bipartite graphs, and hypergraphs).  In these representations, what does it mean for two molecules to be “connected,” and is connectivity a useful measure?
Short Answer: We show that “connectivity” highly depends on the data representation, and we propose a parameterized measure that switches from connectivity in graphs to connectivity in hypergraphs.  This measure can capture the “influence” of one signaling pathway on another better than other connectivity measures.

Improved Differentially Private Analysis of Variance
PETS 2019 conference paper
Marika Swanberg, Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, Andrew Bray
Big Question: The Analysis of Variance (ANOVA) statistic is heavily used in contexts where user privacy is imperative (biomedicine, sociology, and business).  We already developed an algorithm with differential privacy guarantees on datasets of a certain size – can we improve the method so we get this guarantee with fewer data points?
Short Answer: Yep.  In fact, we define variants of the ANOVA statistic with good differential privacy guarantees.

…and the pasta. In non-work-related news, I just returned from a family vacation in Italy! We saw great architecture and art (in addition to the pasta, wine, and gelato), walking about 10 miles a day on cobblestones.

The History of Science Museum (Museo Galileo) in Florence was fantastic, displaying scientific instruments big and small.

 

Finally, good luck Reed thesis students as they wrap up their senior theses!  Only a few short weeks until Renn Fayre….

Advertisement

Time-to-parity for women publishing in STEM fields

A recent paper by Holman et al. in PLOS Biology presents a new look at the gender gap in publications for millions of authors from over one hundred countries in over six thousand journals.  You can interact with the data through their  web app.

The gender gap in science: How long until women are equally represented?
Luke Holman, Devi Stuart-Fox, Cindy E. Hauser, PLOS Biology 2018.

The authors present the current author gender ratio, its rate of change per year, and the estimate number of years until the gender ratio comes with 5% of parity.  A few notes below the image…

Here are the first things I noticed:

  1. The estimated percent of women authors “maxes out” at 50% (there’s a Figure 2 that includes fields with a higher percentage of women).
  2. arXiv.org – the preprint server that began as a mathematics and physics venue – has particularly poor percent of women authors.
  3. First author percentages tend to be “ahead of the curve” for each discipline, while last authors lag behind the numbers for all authors.  In many fields, first authors denote who did the most work, and last authors denote who funded the work.  My hunch is that a higher proportion of women get papers as graduate students and postdocs, whereas fewer women make it to senior-level faculty as heads of a lab.
  4. On a positive note, more women are publishing in the fields than before (the rate of change is mostly positive).

The paper’s supplementary figure S3 shows data for Computer Science (from arXiv).  Based on current trajectories, only two sub-categories (Information Theory and Robotics) hope to see gender parity within the next 50-100 years.  We still have a long way to go.

Pre-prints as a speedup to scientific communication

Tomorrow, I’ll sit on a panel about Open Data and Open Science as part of Reed’s Digital Scholarship Week.  I am somewhat familiar with these topics in computer science, but I decided to read up on the progress with Open Access in Biology.

As a junior professor trying to get a foothold in a research program, I’ll admit that I haven’t spent a lot of time thinking about Open Science.  In fact, the first thing I did was look up what it meant:

Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society.                       – Foster Project Website

Ok, this seems obvious,  especially since so much research is funded by taxpayer dollars.  Surprisingly, Open Science is not yet a reality.  In this post, I’ll focus on the speed of dissemination – the idea that once you have a scientific finding, you want to communicate it to the community in a timely manner.

Biology findings are often shared in the form of peer-reviewed journal publications, where experts in the field comment on drafts before they are deemed acceptable for publication.  Peer-review may be controversial and even compromised (just read a few RetractionWatch posts), but in theory it’s a good idea for others to rigorously “check” your work.  However, the peer-review process can be slow. Painfully slow.  Findings are often published months to even years after the fact.

In computer science, my “home” research discipline, it’s a different story.  Computer science research is communicated largely through conferences, which often includes paper deadlines, quick peer-review turnaround times, and a chance to explain your research to colleagues.  Manuscripts that haven’t undergone peer-review yet may be posted to arXiv.org, a server dedicated to over one million papers in physics, mathematics, and other quantitative fields.  Manuscripts submitted to arXiv are freely available to anyone with an internet connection, targeting “all levels of an inquiring society.”

A biology version of the site, BioRxiv.org, was created in 2013 — more than 20 years after arXiv was established.   It only contains about three thousand manuscripts.  What is the discrepancy here?  Why is the field reluctant to change?

Last February, a meeting was held at the Howard Hughes Medical Institute (HHMI) Headquarters to discuss the state of publishing in the biological sciences. The meeting, Accelerating Science and Publication in Biology (appropriately shortened to ASAPbio), considered how “pre-prints” may accelerate and improve research.  Pre-prints are manuscript drafts that have not yet been peer-reviewed but are freely available to the scientific community.  ASAPBio posted a great video overview about pre-prints, for those unfamiliar with the idea.  While the general consensus was that publishing needs to change, there are still some major factors that make biologists reluctant to post pre-prints (see the infographic below).

This is an excellent time to talk open science in Biology.  It has become a hot topic in the last few months (though some in the field have been pushing for open science for years). The New York Times recently wrote about the Nobel Laureates who are posting pre-prints, and The Economist picked up a story about Zika virus experiment results that were released in real time in an effort to help stop the Zika epidemic.

Open Science has the potential to lead to more scientific impact than any journal or conference publication.  The obstacles are now determining what pre-prints mean to an academic’s career – in publishing the manuscripts, determining priority of discovery (meaning “I found this first”), and obtaining grants.  I rely on freely-available data and findings in my own research, yet I’ve never published a pre-print.  After writing this post, I think  I may start doing so.

preprint-opinions-graphicAdditional Sources:

Mick Watson’s 2/22/2016 post about generational change on his blog Opiniomics.

Michael Eisen’s  2/18/2016 post about pre-print posting on his blog it is NOT junk.

Handful of Biologists Went Rogue and Published Directly to Internet, New York Times, 3/15/2016.

Taking the online medicine, The Economist, 3/19/2016.

Responsible research, even when you’re wrong

I follow Retraction Watch, a blog dedicated to reporting academic misconduct in scientific publishing.  I’ve learned that there are two types of retractions: the malicious, intentional acts such as destroying data and plagiarism, and the unintentional mistakes.  This post is about the unintentional mistakes.

As I glance through the new posts, I am always a bit apprehensive.  Will I come across any familiar names?  Will they find something directly related to my sub-field that invalidates my own findings?  Will I find my name there due to some code bug or mathematical error?

I am a computer scientist who has striven to publish datasets and code along with publications.  It is rarely the case where I work with a dataset that is not publicly available.  In some sense, this makes my work easier to justify – I can provide all that’s needed to reproduce my results.  But still, we’re all human, and unintentional mistakes may happen.

Today, my apprehension was transformed into respect after reading this blog post about authors who retracted their own paper in light of additional experiments they conducted post-publication.

“[T]hese things can happen in every lab:” Mutant plant paper uprooted after authors correct their own findings | Retraction Watch

The authors submitted a retraction notice that included the experimental data that showed that their results described a different mutation than the one they intended.  Not only did they take action to retract the paper, but the last author, Dr.Hidetoshi Iida, notified researchers using the plant seeds that reportedly contained the mutant.

In a publish or perish world, it takes a lot of guts to do the right thing.  This retraction, in fact, contributed to the progress of scientific knowledge.  I applaud these authors, and I hope that other honest mistakes are corrected in similar ways.

How many authors is too many?

Nature recently published a quick blurb about a paper on fruit fly genetics that has sent social media abuzz.  Why?  Because the paper, published in G3: Genes Genomes Genetics, lists over 1,000 authors.  Further, more than 900 of these authors are undergraduates and members of the Genomics Education Partnership, an organization that has posted a record of the commentary on the author number. The author list, which spans the first three pages of the PDF, is shown below.

    fly2     fly3

The paper has sparked a larger debate about the role of training and education in research, particularly when it comes to undergraduate involvement.  Alongside the paper, the authors also released a blog post about undergraduate-empowered research in the Genetics Society of America’s Genes to Genomes blog.  This is the first paper I’ve seen that lists a blog post as supporting information.

I can see arguments on both sides. On one hand, crowd-sourcing allows us accomplish tasks impossible for a single person to execute. The computer scientist in me loves this aspect of the story.  Here, “the crowd” is the sea of undergraduates that edited and annotated a DNA sequence (the Muller F element, or the “dot” chromosome) in fruit flies by analyzing and integrating different types of data.  Unlike papers that use Mechanical Turk to collect data, where the crowd is typically non-experts, this particular crowd has learned a set of specialized skills that facilitated the research.  Undergrads dirtied their hands with real data and learned valuable insights about how to conduct research.  The educator in me finds the endeavor incredibly impactful for the scientists-in-training.  On the other hand, being buried in the author list makes one’s contributions look meaningless.  What does it mean to be a co-author on such a paper?  If the Genomics Education Partnership consisted of only a few dozen undergraduates, would it be better?  Some of these questions are discussed in the Neuro DoJo blog post.  The academic “I-want-to-get-tenure” in me cringes at the thought that good research may be down-weighted because the author contributions were unclear if I were in the middle of the author list.

I think that the undergraduates from the Genomics Education Partnership did conduct research that contributed to the paper, and they should be credited in some way.  It seems that in the age of crowdsourcing, perhaps there needs to be an intermediate between authorship and acknowledgement that indicates a collective contribution from a group of people (e.g. students in a class or members of a consortium).

Responses to the sexist review by PLOS One

There has recently been a lot of attention on the journal PLOS One and their handling of a heavily gender-biased review received by an evolutionary biologist on a manuscript about gender differences in the Ph.D. to Postdoc academic transition.  PLOS One has taken a few actions, including asking the academic editor who handled the manuscript to step down from the editorial board and removing the offending reviewer from their database.  Dr. Michael Eisen, one of the founders of PLOS, has provided interesting commentary on the subject.

What’s just as troubling is that the reviewer clearly used his personal assessment of not only the authors’ gender but also their “junior” academic status in his criticism.  A blog that focuses on manuscript retractions has another summary of the issue.  Dr. Fiona Ingleby, one of the authors of the manuscript, tells the authors,

Megan and Fiona are pretty unambiguous names when it comes to guessing gender.  But in fact, the reviewer acknowledged that they had looked up our websites prior to reading the MS (they said so in their review). They used the personal assessment they made from this throughout their review – not just gender, but also patronising comments throughout that suggested the reviewer considered us rather junior.  – Fiona Ingleby

This has raised some major issues about the peer-review process, including whether a reviewer’s identity should ever be revealed in a single-blind or double-blind review.  Dr. Eisen addresses this in his blog posted above.  Dr. Zuleyka Zevallos wrote about removing publishing bias in science, here in the form of sexism. In response to Dr. Eisen’s post, she explicitly addresses the accountability that institutions and publishers need to have in place.

It will be interesting to see how PLOS One and other publishers address this now-viral issue, especially by the changes to the peer-review process.  Some have noted that the PLOS One editor failed in his/her duty by returning this gender-biased review to the authors instead of disregarding it, and it wasn’t necessarily a problem with the procedures.  But more and more journals are moving to different review styles, including Nature’s experiment with double-blind peer review.  If the PLOS One review had been double-blind, the reviewer may have been able to guess the gender of the authors but would not have been able to verify that, and especially not verify their current academic positions.