Pre-prints as a speedup to scientific communication

Tomorrow, I’ll sit on a panel about Open Data and Open Science as part of Reed’s Digital Scholarship Week.  I am somewhat familiar with these topics in computer science, but I decided to read up on the progress with Open Access in Biology.

As a junior professor trying to get a foothold in a research program, I’ll admit that I haven’t spent a lot of time thinking about Open Science.  In fact, the first thing I did was look up what it meant:

Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society.                       – Foster Project Website

Ok, this seems obvious,  especially since so much research is funded by taxpayer dollars.  Surprisingly, Open Science is not yet a reality.  In this post, I’ll focus on the speed of dissemination – the idea that once you have a scientific finding, you want to communicate it to the community in a timely manner.

Biology findings are often shared in the form of peer-reviewed journal publications, where experts in the field comment on drafts before they are deemed acceptable for publication.  Peer-review may be controversial and even compromised (just read a few RetractionWatch posts), but in theory it’s a good idea for others to rigorously “check” your work.  However, the peer-review process can be slow. Painfully slow.  Findings are often published months to even years after the fact.

In computer science, my “home” research discipline, it’s a different story.  Computer science research is communicated largely through conferences, which often includes paper deadlines, quick peer-review turnaround times, and a chance to explain your research to colleagues.  Manuscripts that haven’t undergone peer-review yet may be posted to arXiv.org, a server dedicated to over one million papers in physics, mathematics, and other quantitative fields.  Manuscripts submitted to arXiv are freely available to anyone with an internet connection, targeting “all levels of an inquiring society.”

A biology version of the site, BioRxiv.org, was created in 2013 — more than 20 years after arXiv was established.   It only contains about three thousand manuscripts.  What is the discrepancy here?  Why is the field reluctant to change?

Last February, a meeting was held at the Howard Hughes Medical Institute (HHMI) Headquarters to discuss the state of publishing in the biological sciences. The meeting, Accelerating Science and Publication in Biology (appropriately shortened to ASAPbio), considered how “pre-prints” may accelerate and improve research.  Pre-prints are manuscript drafts that have not yet been peer-reviewed but are freely available to the scientific community.  ASAPBio posted a great video overview about pre-prints, for those unfamiliar with the idea.  While the general consensus was that publishing needs to change, there are still some major factors that make biologists reluctant to post pre-prints (see the infographic below).

This is an excellent time to talk open science in Biology.  It has become a hot topic in the last few months (though some in the field have been pushing for open science for years). The New York Times recently wrote about the Nobel Laureates who are posting pre-prints, and The Economist picked up a story about Zika virus experiment results that were released in real time in an effort to help stop the Zika epidemic.

Open Science has the potential to lead to more scientific impact than any journal or conference publication.  The obstacles are now determining what pre-prints mean to an academic’s career – in publishing the manuscripts, determining priority of discovery (meaning “I found this first”), and obtaining grants.  I rely on freely-available data and findings in my own research, yet I’ve never published a pre-print.  After writing this post, I think  I may start doing so.

preprint-opinions-graphicAdditional Sources:

Mick Watson’s 2/22/2016 post about generational change on his blog Opiniomics.

Michael Eisen’s  2/18/2016 post about pre-print posting on his blog it is NOT junk.

Handful of Biologists Went Rogue and Published Directly to Internet, New York Times, 3/15/2016.

Taking the online medicine, The Economist, 3/19/2016.

Advertisements

Everyone’s a math person

I just read this article in WIRED in response to math to students majoring in elementary school education.

Quit Saying ‘I’m Just Not a Math Person’ | WIRED.

People end up doing math all the time, whether they realize it or not.  The author of the WIRED post, Rhett Allain, ends by saying that not only should math be taught in all grade levels, but programming should be taught at (nearly) all grade levels as well.  As we see a growing market for coding tutorials aimed at kids, I bet we’ll see programming become a meaningful part of elementary education.

CAVEs

Reading: Two Selections by Brenda Laurel, available from the New Media Reader.

  1. “The Six Elements and the Causal Relations Among Them.” Computers as Theater, 49-65. 2nd ed., 1993.
  2. “Star Raiders: Dramatic Interaction in a Small World,” Ph.D. Thesis, Ohio State University, pp. 81-86, 1986.

I am going to tackle the task of identifying a form of human-computer interaction (HCI) that has some/most/all of Aristotle’s six qualitative elements of drama:

  • Enactment: All that is seen
  • Melody (Pattern): All that is heard
  • Language: Selection/arrangement of words
  • Thought: Inferred processes leading to choice
  • Character: Groups of traits, inferred from agents’ patterns of choice
  • Action: The whole action being represented.

This is a tall order, in part because we must keep in mind that “the whole action must have a beginning, a middle, and an end” for it to be a satisfying plot.  Video games, TV, and movies all have this notion, as sovink77 has written about in her post.

Here’s one technology that, if it becomes less pricey, may bring a new dimension to human-computer entertainment.  The Cave Automatic Virtual Environment (or CAVE) looks like a very boring room – a “box” with white walls.  However, when you add a bunch of of projectors along with head and hand tracking capabilities, the CAVE becomes a 3D interactive world.  In grad school my friends modeled bat flight, wrote 3D dynamic poetry, and developed virtual painting techniques using the CAVE.  Researchers at UC Davis have also pioneered this work in virtual reality, for example with their augmented sandbox:

However, none of these examples of the CAVE follow the notion of a storyline.  If we can interact with a virtual world in this way, we’re getting closer to an interactive video game.

Once we have this environment, then I believe we will have all six elements Laurel described in human-computer activity.  While it is still way too expensive to build your own CAVE in your living room, don’t bother: Microsoft has already filed a patent for it.

Ephemeralization

This WIRED article resonated with the New Media Seminar I’m taking at Virginia Tech.

Big Data: One Thing to Think About When Buying Your Apple Watch | WIRED.

I hadn’t heard of  the term ephemeralization coined by Buckminster Fuller before, which is the promise of technology to do “more and more with less and less until eventually you can do everything with nothing.” Fuller cites Ford’s assembly line as one example of ephemeralization.  Ali Rebaie, the author of the WIRED article, writes that the Big Data movement is another form of it.  Our ability to analyze huge datasets has lead to designing more efficient technology.  All in all, Fuller seems to fit right in with the others we have been reading in the seminar.

The vision of machine learning, from 1950

Reading: “Computing Machinery and Intelligence” by Alan Turing. Mind: A Quarterly Review of Psychology and Philosophy 59(236):433-360. October 1950. (one reprint is here by a quick Google search).

Computer scientist majors will learn about the famous Turing Machine in any introductory Theory of Computation class.  They might get a cursory mention of the “Imitation Game,” the subject of this article (with the recent movie, this may change).  I am intrigued by so many aspects of this article, but I will limit my observations to two items.

Part I: Could this article be published today?

The notion of the “Imitation Game” and an exploration of its feasibility is incredibly forward-thinking for Turing’s time  — so much so that he admits to his audience that he doesn’t have much in the way of proof.

The reader will have anticipated that I have no very convincing arguments of a positive nature to support my views.  If I had I should not have taken such pains to point out the fallacies in contrary views.

The article was published in a philosophy journal, so Turing was able to allow his arguments to take idealistic positions which were not practical at the time (though many of his arguments are closer to reality today).  Yet he does not focus on the arguments that establish the feasibility of such a computer (or the program), but lays out a framework for “teaching” machines to play the Imitation Game.  Through his descriptions I can easily see the foundations of fundamental computer science sub-disciplines such as artificial intelligence and machine learning.  He truly was an innovative thinker for his time.  I wonder if a similar forward-thinking article would be published today, with little evidence for idealistic scenarios.  Perhaps there is a Turing of 2015 trying to convince the scientific community of a potential technological capacity that will only be confirmed fifty years from now.

Part II: Scale

There are many numbers in Turing’s article relating to the amount of storage capacity required for a computer to successfully participate in the Imitation Game.  He didn’t seem to be too worried about storage requirements:

I believe that in about fifty years’ time it will be possible to programme computers with a storage capacity of about 109 to make them play the imitation game so well than an average interrogator will not have more than 70 per cent. chance of making the right identification after five minutes of questioning.

I was interested in seeing how accurate his estimates were.  Keep in mind that 10×102=103; that is, each time the exponent increases by one we are multiplying the quantity by 10. For example, if we look at the capacity of the Encyclopedia Brittanica

  • 2×109: capacity of the Encyclopedia Brittanica, 11th Ed. (Turing, 1950)
  • 8×109: capacity of the Encyclopedia Brittanica, 2010 Ed. (last one to be printed)

We see that the size of the encyclopedia has quadrupled in the past 60 years.  Now, let’s look at Turing’s estimates of the capacities of both a future computer and the human brain.

  • 109: capacity of a computer by 2000 (Turing, 1950)
  • 1010-1015: estimated capacity of the human brain (Turing, 1950)
  • 3×1010: standard memory of a MacBook Pro, 2015 (4Gb memory)
  • 4×1012: standard storage of a MacBook Pro, 2015 (500Gb storage)
  • 8×1012-8×1013 : estimated capacity of the human brain (Thanks Slate, 2012)
  • 2×1013: pretty cheap external hard drive (3Tb)

Our current laptops can hold more bits in memory than Turing believed would be able to be stored!  Pretty amazing.  Consider the speed (in FLOPS = floating point operations per second) of two of the world’s supercomputers:

  • 80×1012:  IBM’s Watson, designed to answer questions on Jeopardy (80 TeraFLOPS)
  • 33.86×1015: Tianhe-2, the world’s fastest supercomputer according to TOP500 (33.86 PetaFLOPS)

In 2011, USC researchers suggested that we could store about 295 exabytes of information, which translates to 2.3×1021 bits.  That’s a number even I cannot comprehend.