Pre-prints as a speedup to scientific communication

Tomorrow, I’ll sit on a panel about Open Data and Open Science as part of Reed’s Digital Scholarship Week.  I am somewhat familiar with these topics in computer science, but I decided to read up on the progress with Open Access in Biology.

As a junior professor trying to get a foothold in a research program, I’ll admit that I haven’t spent a lot of time thinking about Open Science.  In fact, the first thing I did was look up what it meant:

Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society.                       – Foster Project Website

Ok, this seems obvious,  especially since so much research is funded by taxpayer dollars.  Surprisingly, Open Science is not yet a reality.  In this post, I’ll focus on the speed of dissemination – the idea that once you have a scientific finding, you want to communicate it to the community in a timely manner.

Biology findings are often shared in the form of peer-reviewed journal publications, where experts in the field comment on drafts before they are deemed acceptable for publication.  Peer-review may be controversial and even compromised (just read a few RetractionWatch posts), but in theory it’s a good idea for others to rigorously “check” your work.  However, the peer-review process can be slow. Painfully slow.  Findings are often published months to even years after the fact.

In computer science, my “home” research discipline, it’s a different story.  Computer science research is communicated largely through conferences, which often includes paper deadlines, quick peer-review turnaround times, and a chance to explain your research to colleagues.  Manuscripts that haven’t undergone peer-review yet may be posted to arXiv.org, a server dedicated to over one million papers in physics, mathematics, and other quantitative fields.  Manuscripts submitted to arXiv are freely available to anyone with an internet connection, targeting “all levels of an inquiring society.”

A biology version of the site, BioRxiv.org, was created in 2013 — more than 20 years after arXiv was established.   It only contains about three thousand manuscripts.  What is the discrepancy here?  Why is the field reluctant to change?

Last February, a meeting was held at the Howard Hughes Medical Institute (HHMI) Headquarters to discuss the state of publishing in the biological sciences. The meeting, Accelerating Science and Publication in Biology (appropriately shortened to ASAPbio), considered how “pre-prints” may accelerate and improve research.  Pre-prints are manuscript drafts that have not yet been peer-reviewed but are freely available to the scientific community.  ASAPBio posted a great video overview about pre-prints, for those unfamiliar with the idea.  While the general consensus was that publishing needs to change, there are still some major factors that make biologists reluctant to post pre-prints (see the infographic below).

This is an excellent time to talk open science in Biology.  It has become a hot topic in the last few months (though some in the field have been pushing for open science for years). The New York Times recently wrote about the Nobel Laureates who are posting pre-prints, and The Economist picked up a story about Zika virus experiment results that were released in real time in an effort to help stop the Zika epidemic.

Open Science has the potential to lead to more scientific impact than any journal or conference publication.  The obstacles are now determining what pre-prints mean to an academic’s career – in publishing the manuscripts, determining priority of discovery (meaning “I found this first”), and obtaining grants.  I rely on freely-available data and findings in my own research, yet I’ve never published a pre-print.  After writing this post, I think  I may start doing so.

preprint-opinions-graphicAdditional Sources:

Mick Watson’s 2/22/2016 post about generational change on his blog Opiniomics.

Michael Eisen’s  2/18/2016 post about pre-print posting on his blog it is NOT junk.

Handful of Biologists Went Rogue and Published Directly to Internet, New York Times, 3/15/2016.

Taking the online medicine, The Economist, 3/19/2016.

Advertisements

Grants keep coming to Reed Biologists

As a new computational biologist at Reed College, I was excited about the prospect of continuing to do research while teaching innovative courses.  I’ve written about the research opportunities at Reed, and faculty across campus have received over two million dollars of grant funding in 2014/2015.

The Biology Department just secured two more research grants from the M.J. Murdock Charitable Trust to investigate neurogenesis in zebrafish (Dr. Kara Cerveny) and discover candidate driver genes in cancer (me!).

Small schools also have an opportunity to play a large role in undergraduate education programs.  Another NSF grant was recently awarded to Dr. Suzy Renn to organize a STEM workshop on undergraduate involvement in the NSF’s Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative.

All in all, 2016 seems like it will be another great research year.

Tenure

A couple articles came across my news feed in the past week a few months ago related to tenure.  While they are not directly related to each other, I thought I’d mention them in the same post.

First, how many faculty in the United States are actually tenured?  I was surprised to find that, according to the American Association of University Professors (AAUP), over 50% of faculty hold adjunct (non tenure-track) positions.  AAUP calls these positions “contingent faculty” because, regardless of their full-time or part-time status, their school makes little to no long-term commitment in terms of job security.  The increasing reliance of institutions on adjunct faculty has an impact not only on the faculty but also on the students and the research at the institution.  An article from The Atlantic summarizes many of these points:

The Cost of an Adjunct | The Atlantic

Now, tenure itself may be a controversial topic – some say that the system encourages faculty to slack off after getting tenure, or to keep teaching outdated material long after they should have retired.  The tenure process is incredibly stressful, sometimes unclear, and notoriously unfair – and this is just scratching the surface.  But once tenure is obtained, faculty may end up doing more out-of-the-box, high-risk research and teaching that they wouldn’t have attempted otherwise.

Trying to Kill Tenure | Inside Higher Ed

Quantifying the gender bias in federally-funded STEM research

We all know that there is a gender disparity in STEM fields.  Is it harder for women in these fields to obtain federal funding compared to their male colleagues?  In 2013, Helen Chen published an article in Nature summarizing women’s continual challenges in science.    The infographic below from the paper describes the gap in NIH-funded research grants.

from Inequality quantified: Mind the gender gap by Helen Chen, Nature Vol 595 Issue 7439 2013.

At first glance, the funding gap looks appalling – only 30% of the NIH’s grants are going to women!  However, there’s a missing ingredient here:  the fraction of NIH grant proposals submitted by women.  To get this information, let’s go back to 2008 for a minute.  Jennifer Pohlhaus and others at the NIH assessed the gender differences in application rates and success rates for 77% of the awards submitted in 2008, including training grants, midcareer grants, independent research grants (e.g., R01), and senior grants.  They found that the acceptance rates reflected the application rates for most NIH grants.  however, men had a higher success rate once they had received their first NIH grant and become NIH investigators.  So the funding gap in the infographic may not be tied to women having lower success rates in funding, but rather that fewer women are submitting grants.  A visualization of the data from the NIH is available on their webpage.

The Nature article (and many many other articles) point to the fact that women tend to leave science early in their education and careers.  In the 2008 NIH grant applications there were more female applicants than male applicants for three of the early career / training awards (F31, K01, K23), and two other early career awards (F30 and F32) showed no statistical difference between the number of male and female applicants.  However, male applicants significantly outnumbered female applicants in all midcareer, independent research, and senior career programs.

An evaluation of gender bias is currently underway for six other federal agencies: NSF, DOD, DOE, USDA, HHS, and NASA.  The audit, conducted by the Government Accountability Office (GAO), will first release a report that investigates whether the agencies evaluate proposals based on potentially biased measures.  The GAO will then release a second report identifying potential factors that lead to the disparity in funding between men and women.  Once out, it will be an interesting read…

Slaughter Announces GAO Audit on Gender Discrimination in Federal STEM Research Funding | Congresswoman Louise Slaughter.

Funding models for Ph.D. programs

UC-Irvine is adopting a new funding model for Ph.D. programs in some departments.  The idea is to provide increased funding for five years, and then offer a two year postdoc teaching position.

This article, from Slate, is written about Ph.D. programs in the humanities, where it is generally much harder to secure a job than in the computational sciences.  Still, an interesting perspective.

UC–Irvine’s 5+2 program: A good idea, but the worst job title in academia..

Responsible research, even when you’re wrong

I follow Retraction Watch, a blog dedicated to reporting academic misconduct in scientific publishing.  I’ve learned that there are two types of retractions: the malicious, intentional acts such as destroying data and plagiarism, and the unintentional mistakes.  This post is about the unintentional mistakes.

As I glance through the new posts, I am always a bit apprehensive.  Will I come across any familiar names?  Will they find something directly related to my sub-field that invalidates my own findings?  Will I find my name there due to some code bug or mathematical error?

I am a computer scientist who has striven to publish datasets and code along with publications.  It is rarely the case where I work with a dataset that is not publicly available.  In some sense, this makes my work easier to justify – I can provide all that’s needed to reproduce my results.  But still, we’re all human, and unintentional mistakes may happen.

Today, my apprehension was transformed into respect after reading this blog post about authors who retracted their own paper in light of additional experiments they conducted post-publication.

“[T]hese things can happen in every lab:” Mutant plant paper uprooted after authors correct their own findings | Retraction Watch

The authors submitted a retraction notice that included the experimental data that showed that their results described a different mutation than the one they intended.  Not only did they take action to retract the paper, but the last author, Dr.Hidetoshi Iida, notified researchers using the plant seeds that reportedly contained the mutant.

In a publish or perish world, it takes a lot of guts to do the right thing.  This retraction, in fact, contributed to the progress of scientific knowledge.  I applaud these authors, and I hope that other honest mistakes are corrected in similar ways.

How many authors is too many?

Nature recently published a quick blurb about a paper on fruit fly genetics that has sent social media abuzz.  Why?  Because the paper, published in G3: Genes Genomes Genetics, lists over 1,000 authors.  Further, more than 900 of these authors are undergraduates and members of the Genomics Education Partnership, an organization that has posted a record of the commentary on the author number. The author list, which spans the first three pages of the PDF, is shown below.

    fly2     fly3

The paper has sparked a larger debate about the role of training and education in research, particularly when it comes to undergraduate involvement.  Alongside the paper, the authors also released a blog post about undergraduate-empowered research in the Genetics Society of America’s Genes to Genomes blog.  This is the first paper I’ve seen that lists a blog post as supporting information.

I can see arguments on both sides. On one hand, crowd-sourcing allows us accomplish tasks impossible for a single person to execute. The computer scientist in me loves this aspect of the story.  Here, “the crowd” is the sea of undergraduates that edited and annotated a DNA sequence (the Muller F element, or the “dot” chromosome) in fruit flies by analyzing and integrating different types of data.  Unlike papers that use Mechanical Turk to collect data, where the crowd is typically non-experts, this particular crowd has learned a set of specialized skills that facilitated the research.  Undergrads dirtied their hands with real data and learned valuable insights about how to conduct research.  The educator in me finds the endeavor incredibly impactful for the scientists-in-training.  On the other hand, being buried in the author list makes one’s contributions look meaningless.  What does it mean to be a co-author on such a paper?  If the Genomics Education Partnership consisted of only a few dozen undergraduates, would it be better?  Some of these questions are discussed in the Neuro DoJo blog post.  The academic “I-want-to-get-tenure” in me cringes at the thought that good research may be down-weighted because the author contributions were unclear if I were in the middle of the author list.

I think that the undergraduates from the Genomics Education Partnership did conduct research that contributed to the paper, and they should be credited in some way.  It seems that in the age of crowdsourcing, perhaps there needs to be an intermediate between authorship and acknowledgement that indicates a collective contribution from a group of people (e.g. students in a class or members of a consortium).