Number of Degrees vs. Job Openings

From the blog Mike the Mad Biologist comes a summary of a NY Times article comparing the number of degrees obtained to the number of job openings for different STEM fields.

Yes, We Have A STEM Glut | Mike the Mad Biologist

The takeaway message: if you’re interested in a science field, learn some computing skills along the way.

Advertisements

Breakdown of CS faculty hires in 2017

Craig E. Wills of WPI recently wrote a report that describes the outcomes of advertised CS faculty positions across institutions in 2017. It was a follow-up to a previous report on the CS positions advertised in 2017.  The report contains a wealth of information about the number of faculty positions filled at different institutions.

In summary, 244 of the 323 advertised positions were fulfilled, giving an aggregate 75% success rate.  Not surprisingly, this success varied by institution type: 90% of the positions advertised by the top 100 graduate schools according to U.S. News Rankings were filled, whereas other PhD-granting institutions, Masters-granting institutions, and Bachelors-granting institutions had 67%, 66%, and 69% success rates, respectively.

Wills also looked at the faculty positions by research area.  I’ll focus on three:

  • AI/DM/ML: artificial intelligence, computational linguistics, data mining, machine learning, natural language processing, text analytics
  • CompSci: computational biology, computational life science, computational medicine, computational neuroscience (you get the picture…)
  • Security: cryptography, forensics, information assurance, privacy, security

The figure below shows the percent of faculty positions sought for each field on the x-axis and the percent of faculty positions filled for each field on the y-axis:

faculty-positions-fig3

Points that lie on the red x=y line indicate that the percent of faculty positions filled exactly matched the percent of faculty positions sought.  Let’s look at the three largest outliers:

  • AI/DM/ML was sought for 11% of the positions by area, but ended up filling 21% of the positions.
  • DataSci was sought for 16% of the positions by area, but ended up filling only 7% of the positions.
  • Security was sought for 23% of the positions, but ended up filling only 12% of the positions.

Wills cited many factors associated with these discrepancies, including the fact that nearly a quarter of the positions did not specify an area of interest in their ad.  Additionally, institutions simply did not end up hiring in the areas of interest, either because they could not find candidates in that area or they found better candidates in other areas.  Areas could also be satisfied with multiple fields (for example AI/DM/ML or DataSci accounted for 27% of the positions sought and ended up filling 28% of the positions when combined).

Another factor that Wills considered was the number of Ph.D.s produced by area (based on Taulbee Survey results):

faculty-positions-fig5

It’s good to be in Security, since only 6% of the Ph.D.s produced are in this area compared to the demand of 23% of the positions sought in this area.  It’s also good to be in AI/DM/ML because over 20% of the faculty positions were filled in this area, even if the job ads didn’t specify it.

Overall, the report was an interesting read – I’m looking forward to seeing these trends over time.

Female Code Breakers

Here’s a fascinating story about the women who helped break codes during WWII.  The article appeared as part of ACM TechNews, and is excerpted from the book Code Girls: The Untold Story of the American Women Code Breakers of World War II by Liza Mundy.

via The Secret History of the Female Code Breakers Who Helped Defeat the Nazis – POLITICO Magazine

(Thanks to Barbara Ryder, emeritus professor and former chair of Computer Science at Virginia Tech, for the pointer)

Field Trip! Computational Biology on the Road

A few weeks ago I took my students to the Association Computing Machinery Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB) in Seattle, WA.  It was a fantastic experience for everyone involved – the organizers did an excellent job running the conference.  I asked my students to reflect on the conference, and I figured I should do the same.

With such a large cohort of undergraduates at a scientific conference, my role shifted to encompass one of an educator as well as a researcher.  I honed in on the accessibility of the material in talks, feeling a bit of pride when the speakers showed an image or mentioned a topic I have taught in class.  I also had some moments of “wow, should have taught them that” when a speaker presented a fundamental concept we have not yet covered.  Many of my students came out of sessions excited about what they had just learned – they talked with the speakers, asked for their papers, and are now delving into this new material.  Graduate student attendees became mentors, fielding questions about why they went to graduate school and how they picked their research topic.

ACM-BCB was an ideal size – the conference had compelling talks and tutorials while being small enough to chat with the keynote speakers and conference organizers.  I caught up with existing colleagues and met some potential collaborators in the Pacific Northwest.  I also found myself in discussions with  graduate students about my position in a liberal arts environment.  Reed had a research presence, since three Reed students submitted posters to the poster session.  My students had garnered enough research experience — either through their thesis, summer research, or independent projects in class — to have engaging conversations with other attendees.

Finally, the trip to ACM-BCB as a class taught everyone (including me) the importance of logistics.  Some gems:

  1. Make sure the taxi to the train station can fit the entire group.
  2. Remember who you gave the posters to in your mad dash to find parking before your train departs (see #1).
  3. Make sure your PCard credit limit is set so it’s not declined at the hotel.
  4. Tell your students the correct time of the first keynote.

And the question of the day: is a (very detailed) receipt for a can of soda written on a napkin by a bartender reimbursable?

Pre-prints as a speedup to scientific communication

Tomorrow, I’ll sit on a panel about Open Data and Open Science as part of Reed’s Digital Scholarship Week.  I am somewhat familiar with these topics in computer science, but I decided to read up on the progress with Open Access in Biology.

As a junior professor trying to get a foothold in a research program, I’ll admit that I haven’t spent a lot of time thinking about Open Science.  In fact, the first thing I did was look up what it meant:

Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society.                       – Foster Project Website

Ok, this seems obvious,  especially since so much research is funded by taxpayer dollars.  Surprisingly, Open Science is not yet a reality.  In this post, I’ll focus on the speed of dissemination – the idea that once you have a scientific finding, you want to communicate it to the community in a timely manner.

Biology findings are often shared in the form of peer-reviewed journal publications, where experts in the field comment on drafts before they are deemed acceptable for publication.  Peer-review may be controversial and even compromised (just read a few RetractionWatch posts), but in theory it’s a good idea for others to rigorously “check” your work.  However, the peer-review process can be slow. Painfully slow.  Findings are often published months to even years after the fact.

In computer science, my “home” research discipline, it’s a different story.  Computer science research is communicated largely through conferences, which often includes paper deadlines, quick peer-review turnaround times, and a chance to explain your research to colleagues.  Manuscripts that haven’t undergone peer-review yet may be posted to arXiv.org, a server dedicated to over one million papers in physics, mathematics, and other quantitative fields.  Manuscripts submitted to arXiv are freely available to anyone with an internet connection, targeting “all levels of an inquiring society.”

A biology version of the site, BioRxiv.org, was created in 2013 — more than 20 years after arXiv was established.   It only contains about three thousand manuscripts.  What is the discrepancy here?  Why is the field reluctant to change?

Last February, a meeting was held at the Howard Hughes Medical Institute (HHMI) Headquarters to discuss the state of publishing in the biological sciences. The meeting, Accelerating Science and Publication in Biology (appropriately shortened to ASAPbio), considered how “pre-prints” may accelerate and improve research.  Pre-prints are manuscript drafts that have not yet been peer-reviewed but are freely available to the scientific community.  ASAPBio posted a great video overview about pre-prints, for those unfamiliar with the idea.  While the general consensus was that publishing needs to change, there are still some major factors that make biologists reluctant to post pre-prints (see the infographic below).

This is an excellent time to talk open science in Biology.  It has become a hot topic in the last few months (though some in the field have been pushing for open science for years). The New York Times recently wrote about the Nobel Laureates who are posting pre-prints, and The Economist picked up a story about Zika virus experiment results that were released in real time in an effort to help stop the Zika epidemic.

Open Science has the potential to lead to more scientific impact than any journal or conference publication.  The obstacles are now determining what pre-prints mean to an academic’s career – in publishing the manuscripts, determining priority of discovery (meaning “I found this first”), and obtaining grants.  I rely on freely-available data and findings in my own research, yet I’ve never published a pre-print.  After writing this post, I think  I may start doing so.

preprint-opinions-graphicAdditional Sources:

Mick Watson’s 2/22/2016 post about generational change on his blog Opiniomics.

Michael Eisen’s  2/18/2016 post about pre-print posting on his blog it is NOT junk.

Handful of Biologists Went Rogue and Published Directly to Internet, New York Times, 3/15/2016.

Taking the online medicine, The Economist, 3/19/2016.