Scientists threatened by demands to share data
The open data movement is polarizing the scientific community
When biologist and TechKnow host Philip Torres discovered a potentially new spider species, he shared his findings on Twitter. It was a decision that may have cost him the opportunity to publish his research in a top journal.Phil Torres
When Christopher Lortie was earning his Ph.D. in ecology at the University of British Columbia in the late 1990s, he joined a small consortium of international ecologists who pooled their resources to study the potential effects of climate change on alpine-plant communities around the world. He spent several months trudging up and down mountains in Kluane National Park in the Yukon documenting the health of plants. In 2002, the ecologists combined their research to produce a paper for the scientific journal Nature. It was widely read and cited. They then published the data supporting their findings. It was an experience that set a precedent for Lortie: “It feels good to share.”
Over the past 15 years, Lortie has shared his data and research papers, and collaborated with other investigators in ways that until recently were deemed counterproductive, or insignificant, to personal success in the sciences. He is part of a growing number of scientists who have encouraged members of their profession to make their research more transparent and accessible under the theory that sharing information will expedite scientific discovery. “There will be fantastic discoveries, and that’s all that really matters,” says Lortie.
In May 2012, some 65,000 people, including researchers, librarians and advocates of information sharing, signed a petition urging the Obama administration to adopt open access policies that would make the results of taxpayer-funded scientific research freely available to the public. In response, the White House issued a memorandum in February to almost two dozen federal funding agencies instructing them to create individual plans for ensuring that research papers will be available within roughly 12 months of publication. It also required agencies to make the data in those papers “stored and publicly accessible to search, retrieve, and analyze.”
The policy marks a turning point in the open access movement, which has fought formidable odds for more than a decade. The movement arguably started in 2002 when a small group of organizers released a statement of principles called the Budapest Open Access Initiative. The ideas it espoused challenged the business model of a lucrative scholarly publishing industry that relied on libraries to pay exorbitant journal subscription fees, as high as $40,000 annually in some cases. In the early 2000s, proponents of open access launched the Public Library of Science (PLOS), an online-only, peer-reviewed scholarly journal that provides all its papers for free. Rather than relying on subscriptions, the PLOS business model requires authors, and by extension their funding agencies, to pay a fee for publication.
As of 2012 there were 6,713 open access journals — some online only, others also in print — making papers immediately available. That growth is now driven largely by traditional publishers who have begun to adopt open access models for some of their publications. According to one study, between 2017 and 2021, 50 percent of scholarly articles will appear in open access journals that immediately make their papers available, and that number will rise to 90 percent between 2020 and 2025. “Open access has become an entrenched player,” John Wilbanks, who helped organize the White House petition, wrote in an email.
It is threatening to scientists to think that their data will be that available.
But as access to the published results of scholarly research spreads, issues around making the data that is summarized in those papers widely available are coming to the fore. The White House memorandum has made that a certainty. Sharing the results of scientific research is a bit like unveiling a newly built house, and scientists generally want it widely viewed, so the growth in open access publishing is a boon for most. Sharing data, on the other hand, is comparable to handing over the architectural plans and building materials used to construct the house. Others can scrutinize the quality of work and reuse the basic components to build their own house. That raises fears about discovery of errors and theft of future research ideas.
“It is threatening to scientists to think that their data will be that available,” says Heather Piwowar, who describes herself as a “scientist who studies scientists” and is co-founder of ImpactStory, a nonprofit that attempts to ameliorate some of the issues raised by open data. The organization recently received $500,000 in funding from the Alfred P. Sloan Foundation, which specializes in grants that further the sciences.
Multiple studies show that scientists are not sharing their data when their peers request it, or even when it is required by a journal. (Some journals, including Nature and Science, began to require authors of research papers to publish the data and methodology described in the article.) When Piwowar conducted a study in 2011 looking at more than 11,500 journal articles about gene expression studies, she found that just 45 percent of the studies shared their data, despite the fact that sharing gene expression data is widely encouraged within genomics. She also determined that those least likely to share data were studying cancer or human subjects.
“I think the public thinks that we’re all learning from everyone else’s work. That’s not true, and furthermore, it’s not true in ways that are even worse than you might think,” says Piwowar. Other studies have found that young scientists and the most productive scientists are likeliest to be rejected when requesting data from others.
Piwowar is working on a study, to be released next year, that examines the experience of authors who share data. Results are preliminary, but she has found that some scientists’ fears have been realized. “Sure enough, some people have had their future research ideas published by someone else before they got to them,” she says. “[And] some people have had errors found.”
It’s easy to say everything should be published as quickly as possible, but it could lead to crushing careers before they ever get off the ground.
Within the sciences, it’s clear that not all data is equal. Some is abundant and relatively easy to reproduce, like genomic information; other material can take years of painstaking work to collect. For scientists producing data in the latter group, sharing it and then getting “scooped” is a significant concern because it is so hard to prevent. A proponent of open data, Harvard University science historian Peter Galison, nonetheless says, “We do want to give some time for people to work through — especially young investigators — and to garner the fruits of their labor. It’s easy to say everything should be published as quickly as possible, but it could lead to crushing careers before they ever get off the ground.”
Since scooping can be difficult to stop, many people are trying to increase the rewards for sharing. Success and funding in science can depend in large part on publishing scholarly articles in prestigious journals and doing so often; the more frequently others cite your paper, the better. Recently, there have been efforts to find ways to credit those who produce data that becomes the basis for other researchers’ papers. Scientists who make their material available through repositories, such as Dryad Digital Repository or Figshare, receive a Digital Object Identifier (DOI), which allows others to list the data set as a citation in their papers.
Sharing data can also be laboriously time-consuming, and to a harried scientist that’s unrewarded time that could be spent producing papers. “People are busy,” says Jonathan Eisen, a genetics professor at the University of California, Davis. “Everyone is overwhelmed with life and email and, in academia, trying to get funding and write papers. Whether something is open or not open is not highest on the priority list. There’s still need for making people aware of open science issues and making it easy for them to participate if they want to.”
Numerous services are popping up to lessen the burden of archiving and cleaning up data for wide consumption. Open Context, an organization that specializes in publishing archaeological data (which is notoriously hard for researchers to obtain), is trying to elevate the prestige of data sharing so it can be presented as a piece of scholarship. Open Context peer-reviews, cleans up and structures the data, making it compatible with other databases, so researchers can list it as a publication on their CV along with the ever-important journal-article citations.
I’d much rather be relevant. In science, that’s harder than anything else.
Some scientists still believe that sharing data can be beneficial to their academic career and to productivity. “My general attitude about open science is that I’d much rather be relevant. In science, that’s harder than anything else,” says Titus Brown, an assistant professor at Michigan State University who runs a genomics, evolution and development lab and practices open science. “If I make my work available, I have a higher chance of being relevant.” By sharing his grant proposals and software source code, and pre-publishing his papers before they hit peer review journals, Brown says he has attracted funders and students. He is confident that openness will pay off — or at least not work against him — when he comes up for tenure review next year.
The spider discovered by Torres is capable of producing elaborate fake spider decoys, above.Phil Torres
Last year, one of Wired magazine’s top stories covered the discovery of a potentially new spider species in Peru that builds a decoy spider in its web. Philip Torres, who started his Ph.D. in ecology last month, discovered the spider and almost immediately posted the finding on Twitter in order to learn more about it from other scientists and the public. He also videotaped the discovery and posted it. The spider’s remarkable behavior stirred up a media storm that may have cost him the opportunity to publish his research in a top journal, but he’s OK with that. “I think it’s great to have a story that can impact a lot of people,” says Torres, now a contributor to Al Jazeera’s TechKnow. “The video had over 400,000 views. That’s a lot of people who have been able to see what science is actually like at times.” It also helps that a lot of entomologists have now heard of the spider.
An often cited and wildly successful example of open data is the Human Genome Project, an international scientific effort that concluded in 2003 when it determined the sequencing in human DNA and mapped the human genome. The data was made public, and a recent report estimates that the federal government’s $3.8 billion investment in the project generated roughly $796 billion in economic activity through 2010.
“It has transformed the way we do science across biological scales, from the molecular all the way up to studying whole ecosystems,” says Carl Boettiger, a postdoctoral student at UC Santa Cruz. “The value is in enabling science to progress faster.”