This is the first post for a New Media Seminar that I am participating in at Virginia Tech. The main site for this seminar includes comments on weekly readings. Seminar Twitter Hashtag: #vtnmss15.
Reading: “As We May Think” by Vannevar Bush, Atlantic Monthly, 176(1):101-108. (July 1945).
Many of the passages in Vannevar Bush’s article As We May Think resonate with the current state of scientific research. As I was reading the first part of the article, I could see the desire to accumulate and aggregate massive amounts of information, making data collection a more efficient process. In the back of my mind I kept thinking, “But what next? What happens after you have obtained all this information?” In biological discovery, we are exactly at this point; the development of high-throughput technologies are producing enormous amounts of observations about biological systems. As examples, 1,000 human genomes have been sequenced to better understand genetic variation in human populations, and thousands of cancer genomes have been measured to identify underlying mechanisms of cancer. How are we going to use these datasets to further scientific discovery? Then, I came across this paragraph:
So much for the manipulation of ideas and their insertion into the record. Thus far we seem to be worse off than before—for we can enormously extend the record; yet even in its present bulk we can hardly consult it. This is a much larger matter than merely the extraction of data for the purposes of scientific research; it involves the entire process by which man profits by his inheritance of acquired knowledge. The prime action of use is selection, and here we are halting indeed. There may be millions of fine thoughts, and the account of the experience on which they are based, all encased within stone walls of acceptable architectural form; but if the scholar can get at only one a week by diligent search, his syntheses are not likely to keep up with the current scene.
The passage above begins Bush’s response to the pivotal moment when we have collected, annotated, and recorded so much data it becomes difficult to manage. In computer science, the sub-discipline of Big Data has emerged exactly to address Bush’s comment that “for we can enormously extend the record; yet even its present bulk we can hardly consult it.” How do we manage this data to aid in scientific discovery, by placing the right information in front of the right experts? But first, we must know what we’re looking for.
Bush writes “there may be millions of fine thoughts…all encased within stone walls of acceptable architectural form.” However, one may need to find a handful of these thoughts to advance scientific understanding. We have a needle in a haystack problem, except that the individuals who manage the haystack may not have the expertise to identify the needle once they’ve found it. Bush’s notion of selection and indexing in subsequent passages begin to capture the need for clear data organization, and the memex is his ultimate idea of organizing one’s knowledge.
Biological research is becoming more and more collaborative, as enormous datasets require sophisticated technologies and algorithms for analysis. Building upon Bush’s memex, we now need tools and paradigms to describe one individual’s knowledge to others. Perhaps we will develop a framework that allows researchers to benefit from the “inheritance of acquired knowledge” in an efficient and descriptive manner. Wikipedia may in fact be a CliffsNotes for everything in life (their statistics page boasts about 800 new articles a day). Providing a means for inherited knowledge may be a necessity to, as Bush puts it, “keep up with the current scene” of scientific discovery.