Read Columbia’s response to the White House Office of Science and Technology Policy’s (OSTP) request for information (RFI) on public access to scientific publications resulting from federally funded research. The OSTP issued this RFI—and a related RFI on public access to data—to fulfill requirements of the America COMPETES Act.
You can also read the Columbia Libraries response to the data RFI.
1. Are there steps that agencies could take to grow existing and new markets related to the access and analysis of peer-reviewed publications that result from federally funded scientific research? How can policies for archiving publications and making them publically accessible be used to grow the economy and improve the productivity of the scientific enterprise? What are the relative costs and benefits of such policies? What type of access to these publications is required to maximize U.S. economic growth and improve the productivity of the American scientific enterprise?
The United States and its businesses compete in a world in which substantive investments abroad have been made to make research publicly available. Europe in particular is investing heavily not only in governmentally funded research repositories but also in the development of shared infrastructures throughout the European Union through the establishment of the European Research Area, the focus of which is on publicly shared research as a primary means to eliminate duplication of research efforts and lower the R&D costs to innovation. The European Commission report “Europe 2020 Flagship Initiative Innovation Union” (found at http://ec.europa.eu/research/innovation-union/pdf/innovation-union-communication_en.pdf) makes the explicit point several times that part of the European strategy to achieve market success is free access to publicly funded research and data.
An indication of the wealth of services and industries based on access to a large publishing corpus can be best seen through the innovative tools, platforms, and services that large companies — for example, technology giants such as IBM and Microsoft, media companies such as the Tribune Company and Thomson Reuters, and publishers such as Reed Elsevier — have been able to develop from their large (but mostly proprietary) corpora. Opening up published research to broader public access and use will allow start-ups and entrepreneurial small businesses to likewise have access to this wealth of information and to more easily create collaborative partnerships that could result in more jobs, less costly R&D, and more innovation. Just one example of a company built on freely available information is Google, whose remarkable growth and financial success are based entirely on access to freely available information that Google could then monetize.
A tip of the iceberg in innovation can also be seen in the numerous “apps” that have been created by individuals and businesses using publicly available information and data, products such as Twitter mash-ups, weather apps, transportation apps, and apps that utilize Flickr or Wikipedia content that are then often funded by advertisements or individual purchases. Programmer challenges such as the Sunlight Labs’ Apps for America 2’s Data.gov Challenge (http://sunlightlabs.com/contests/appsforamerica2/) provide a taste of what innovations can be encouraged when text, images, multimedia, and data can be mined and repurposed at will, thus lowering the barrier to entry for an individual or small group to create products that only a few years ago were solely the purview of large companies with considerable capital.
Likewise, tax dollars can be better utilized through providing machine-readable research that can be sorted and analyzed through tools being developed or that could be developed, such as BioText from the University of California at Berkeley (http://biosearch.berkeley.edu/) or the suite of tools being created by the National Centre for Text Mining in the United Kingdom (http://www.nactem.ac.uk/software.php), initiatives only possible because of the increasingly large corpora available through publicly funded and governmentally encouraged repositories in various countries. The explosion of biomedical analysis tools already being built on top of data using (particularly) PubMed and the full text within PubMed Central and its offshoots in the UK and Canada show as well the promise of open information to spur innovation.
Open availability to research outputs and data also allows for development of projects that encourage participation from citizens and students alike, from “citizen science” projects (such as Galaxy Zoo [http://www.galaxyzoo.org/] and NASA’s Ice Hunters [http://blogs.discovery.com/friendly-citizen-science/2011/06/be-an-ice-hunter-for-nasa.html]) to cultural heritage efforts (such as the Library of Congress’ Flickr project [http://www.loc.gov/rr/print/flickr_pilot.html], which in turn inspired the creation of Flickr’s Commons [http://www.flickr.com/commons]). The outcome of such projects is a more informed and more engaged citizenry that contributes directly to advancing knowledge and building new services (both private and public).
Many forms of access to published works will open opportunities for economic growth and expanded productivity. Access should not be technologically limited, but should meet the demand for machine-readability and human interface. Further, open access to scholarly publications is already fostering new business models. Some aggregators of databases of full-text journals continue to include journals that are also available to the public online without restriction, and libraries and other parties continue to purchase those databases. For example, EBSCO long has been a leader in the development of databases of scholarly journals that it markets to libraries and other purchasers. Many of the journals in the EBSCO database are freely available to the public in full directly from publishers through their web sites or other means. Nevertheless, EBSCO continues to find value in including such journals in their commercially marketed databases and continues to build its business, earning revenues and fostering economic activity by selling enhanced and federated access to these freely available journals. Under EBSCO’s standard terms, it does not pay a royalty (or pays a reduced royalty) for inclusion of open-access materials in the database, and we know from our own experience with academic journals based at Columbia University that many journal editors are willing to forgo the royalties in exchange for the added readership that can come through access by way of EBSCO and other aggregators. The open accessibility of the journals, therefore, does not hinder business prospects, but instead promotes opportunities for new business models and for additional public access to the scholarly publications.
Increasing the availability of open access to research also has important consequences for the economic and social implications of higher education. Under conventional publishing systems, many of the studies that result from federal funding are available only through the purchase of subscriptions or databases available from commercial suppliers. The escalating cost of those publications is well documented. A recent study affirms lower costs to most libraries if journals are made open access (http://www.bioone.org/doi/pdf/10.1641/B570709). The growing expense of purchasing journals is exacerbating the so-called “digital divide” that is separating the persons in this country who have access to information from those who do not. Only the largest universities and libraries typically have the budget to acquire the vast range of publications that are necessary to support the information needs of researchers and students and to foster the next generation of knowledge. One can compare university budgets in any city, state, or community to find obvious unevenness in the ability of some colleges and universities to acquire the scholarly literature, leaving it available only to researchers who have the advantage of being at universities with a strong funding base. This disparity in access to information in turn leads to a disparity in educational opportunity and differences in career opportunities for graduates of many of our institutions of higher learning. Perhaps most pointedly, Columbia and other research libraries must now offer grants to subsidize the ability of researchers to come to the library and use collections that are not available online (http://library.columbia.edu/indiv/spcol/research_awards.html). The fact that materials are not online not only constrains access, but also compounds the expense of doing research.
Further, the increasing cost of many of the subscriptions and databases from some publishers inevitably leads to one of two results: The institution may reduce expenditures for other materials or cancel that particular purchase, leading to further loss of information access; or the increased costs of information access are passed along to students in the form of increased fees or escalating tuition. We are well aware of the challenges associated with increases in university tuition and the implications for the ability of students to enjoy the opportunities and economic prosperity that can come from a university education. Assuring open access to the scholarly literature can help to level the accessibility of information resources by students at all colleges and universities throughout the country. Open access can also help ameliorate the rising cost of education and help keep these costs in check.
2. What specific steps can be taken to protect the intellectual property interests of publishers, scientists, Federal agencies, and other stakeholders involved with the publication and dissemination of peer-reviewed scholarly publications resulting from federally funded scientific research? Conversely, are there policies that should not be adopted with respect to public access to peer-reviewed scholarly publications so as not to undermine any intellectual property rights of publishers, scientists, Federal agencies, and other stakeholders?
Policies related to open access of scientific research may have implications for three different areas of intellectual property: copyright, patent, and trademark. This response will address those implications separately.
Under the structure of current copyright law, most journal articles and other publications that result from research are protectable under copyright law. Copyright protection is, however, applicable only to the expression in the work and not to the facts. Thus, the facts, data, findings, and conclusions that result from scientific research are not themselves protectable by copyright. Any new policy related to open access of literature that includes such elements should not affect that fundamental premise of copyright law. However, the narrative text and other expression in the publication are ordinarily protectable by copyright law. Some publishers have conventionally required that the author transfer or assign the copyright in full to the publisher. The movement toward open access of such literature has led to a reconsideration of such policies by many authors and publishers. Many publishers today voluntarily add some flexibility to their standard practices, permitting authors to make at least a pre-publication version of their article available on a web site, in a repository, or by some other means openly available to the public. On the other hand, some publishers have added flexibility to their terms of publication only to the limited extent necessary to comply with legal requirements for public access, such as if the work is funded through the National Institutes of Health (NIH)
Despite these developments, nothing in the movement toward open access of research literature has in any way altered the fact that the work is protectable under copyright law. Nothing in any of these developments has affected the terms under which the public or universities may use the work under fair use or other copyright exception. Nothing about the progression toward open access undercuts the ability of a publisher to hold the rights it needs to build an effective business model for scholarly publishing. The most significant, recent change has been a heightened awareness among authors that in fact copyright applies to their work and that they have choices with respect to the terms on which they will agree with a publisher for the publication and other future uses of the work. Similarly, publishers also have gained a greater appreciation for the fact that they do not need all rights associated with the copyright in order to meet their business objectives. Indeed, some publishers are offering an open-access alternative for journal articles (see, for example, http://authorservices.wiley.com/bauthor/onlineopen.asp). The result is actually a revitalized understanding of copyright as a means for the sharing of rights and furthering simultaneously the interests of publishers and authors alike. While some of these changes have been spearheaded by the legal requirement with respect to research funded by the NIH, almost all of the change that has occurred with respect to copyright has been made through the voluntary actions of authors and publishers as they continue to explore the best terms on which to make new work available to readers and researchers.
Many articles include descriptions of scientific studies and findings that may themselves be patentable. With that fact in mind, many standards and expectations about open-access policies include either a required or an optional embargo period. A pre-publication embargo is an opportunity for the researcher to identify the patentable elements of the work and to begin the process of pursuing a patent application before the work is published. The act of publishing the patentable findings (whether the publication is made by the journal publisher or by the author through open access) has important implications for the patent process.
First, once the findings are made public, they may then be treated as “prior art” and be used to determine whether another invention that is the subject of another person’s application is in fact novel under patent law. The difficulty of locating and identifying prior art is well documented as a burden on innovation and as an impediment on the patent examination process (http://www.peertopatent.org/wp-content/uploads/sites/2/2013/11/CPI_P2P_YearTwo_lo.pdf).
Second, a publication can serve to bar the patentability of the author’s own invention. For example, if the inventor is responsible for the act of publishing the findings, as may be the case when a scientific paper becomes open access, then that act of publication has the potential to bar the ability of that inventor to secure a patent on the invention. To alleviate this harsh consequence, U.S. law does not make a complete bar on patentability effective until twelve months after the date of publication. If the objective is to further the interests of inventors and encourage patents, then an embargo before formal publication may be warranted. Any further embargo following publication is not relevant to the bar on patentability. If the objective is to clarify the record of invention and to facilitate more accurate review of patent applications and prior art by the U.S. Patent and Trademark Office, then early publication is most desirable. Any embargo would interfere with the scientific record of invention. Indeed, some inventors prefer to make their works open access immediately in order to take advantage of establishing the findings as “prior art” and thereby blocking someone else from claiming to have made the same invention and to secure prior rights under the patent system. Not all inventors desire patent protection or take steps to assure that their inventions are legally secured. For example, some inventors of medical devices or treatments prefer open availability of their works in order to serve the public interest. A policy on open access should not prohibit that approach to patents.
Many publishers have a strong interest in protecting their trademarks, and many of them may find that their trademarks are in fact one of their most important and valuable assets. Many authors strive to place their articles with certain journals specifically for the reputation that goes with the brand. The ability for an author to cite his or her work as having been published in a well-known leading scientific journal is often an important mark of prestige and an important representation of the quality of the research. The name associated with the journal is sometimes one of the more valuable assets in the scientific communication system. This point is affirmed in at least one large-scale study of academic authors and readers in the scientific disciplines (http://www.peerproject.eu/fileadmin/media/reports/PEER_D4_final_report_29SEPT11.pdf). A similar dynamic also occurs with the publication of many monographs and conference proceedings. The ability for a researcher to specify that his or her work has been published by a prestigious university press or has been published in the proceedings of a major conference is again often a mark of distinction and speaks to the quality of the scientific research. Those virtues associated with the work are identified by the trademark name of the journal or of the press.
Most discussions about open access of scientific literature have included an expectation or requirement that the open-access version of the work will include a full citation and often an Internet link to the work in its final form. That citation embodies the name of the journal or publisher and hence carries over and affirms the strong trademark strength of those names. Therefore, an open-access system that assures that the open-access version of the work has all of those intrinsic qualities is a system that will best strengthen the trademark interests of the publisher of the final work. We are concerned that a system that leads to multiple different versions of the publication can actually have negative implications for the value of the publisher’s brand. In fact, allowing open access of journal articles has been shown to increase citations (for example, see the studies at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4755635 and http://www.istl.org/10-winter/article2.html), thus potentially strengthening the prestige and trademark value of the journal’s name. Therefore, in order to assure the full strength and to protect the interests of the trademark owner, we believe that the open-access version of the publication should be as close as possible to the final published version. Ideally, the published version and the open-access version should be identical. Identical versions, we contend, would be in the best interests of the trademark owner, the publisher, and the author, as well as in the best interests of readers, researchers, and students who will use this material in the future.
3. What are the pros and cons of centralized and decentralized approaches to managing public access to peer-reviewed scholarly publications that result from federally funded research in terms of interoperability, search, development of analytic tools, and other scientific and commercial opportunities? Are there reasons why a Federal agency (or agencies) should maintain custody of all published content, and are there ways that the government can ensure long-term stewardship if content is distributed across multiple private sources?
We are certain that many interested parties will submit comments for consideration related to the prospect of the federal government being responsible for maintaining a centralized repository of research publications or at least assuring access to those publications wherever they may be held. Within this discussion, we do not want to overlook the legal options that the federal government currently has available to it under existing law in order to encourage and facilitate the creation and provision of access to such a repository. Under Section 407 of the U.S. Copyright Act, the owner of the copyright or of the exclusive right of publication in a work published in the United States shall deposit copies of such work with the U.S. Copyright Office. That duty of making the deposit would in almost all instances lie with the author or the publisher of the scholarly work. If the title to the copyright — or if the exclusive right of publication — has been transferred to the publisher, then the publisher has a duty to make the deposit. Otherwise, the duty to make the deposit lies with the author.
This duty of “legal deposit” is not a condition to copyright protection, but it is a long-standing requirement of the law in the United States and in most countries of the world in order to build a resource of the cultural, literary, and scientific works of the national heritage. Legal deposit also serves the purpose of documenting the exact publication in the event of future disputes. The primary purpose of legal deposit, however, is to build a national library or other collection of published works that reflect the cultural and scientific heritage of the country. Strengthening compliance with legal deposit consistent with Section 407 could serve many of the objectives of the proposed federal repository of scientific publications. The Copyright Office could receive the copies and arrange with the Library of Congress or other governmental agency to keep and retain the materials. The publications could also be made openly accessible either in accordance with legislative action by Congress or in accordance with the conditions of the research grant provided by the federal agency. Even without public accessibility of the content of the materials, maintaining the repository alone would serve two important objectives: It would assure the existence of the materials and increase the likelihood that they would be preserved to meet future needs; and the repository could offer a searchable database of the metadata of the publications, allowing researchers to discover their existence and then search for the particular items elsewhere. Naturally, facilitating public accessibility of the full content of the materials is strongly preferred in order to meet the public interest in assuring the availability of research output resulting from federal grants.
Enhanced use of the legal deposit system to develop the database of federally funded work would also have important benefits for the copyright owners of each work. Published works may be deposited with the U.S. Copyright Office apart from a formal registration of the copyright. However, the practical reality is that deposit compliance and copyright registration are often paired together. When a copyright owner registers a claim of copyright in the work, the process requires deposit of the copies; those copies satisfy the legal deposit requirement. By enhancing the use of the legal deposit system in order to meet the goals of the public access repository, the system would in turn be encouraging registrations of the claims to copyright in the works. Registration of copyright has important legal benefits for the copyright owner. Registration creates a presumption of ownership, and it allows for a copyright owner to seek statutory damages and recovery of attorneys’ fees in the event of litigation. To the extent that this method of developing the repository leads to more copyright registrations, it will in turn strengthen the legal position of the copyright owner and counterbalance some of the concerns that copyright owners may have about making their works publicly available.
The power of a centrally supported repository and database to rapidly advance science has been long evidenced by the public availability of the database now known as PubMed even before the creation of the full-text PubMed Central repository. Multiple repositories maintained by publishers, institutions, societies, and other third parties could play a similar role, but would need to meet conditions that allow for indexing, public access, reuse, interoperability, and preservation. Such repositories would need to be certified as “trusted repositories” that fulfill all designated criteria, including the uniform adoption of standards such as the National Library of Medicine (NLM)’s widely used Journal Publishing DTD and the proposed Open Text Mining Interface and a requirement that publishers follow the standards currently in place for PubMed Central and make available for access and use not only the PDF of the article but also the XML that they almost all already generate.
No matter what the repository decision, dark archiving solutions (such as LOCKSS/CLOCKSS [http://www.lockss.org/lockss/Home] or Portico [http://www.portico.org/digital-preservation/]) are not adequate, even if 100% participation were mandated. (Currently such participation is optional, and few publishers contribute content. A recent study conducted by Columbia and Cornell Universities showed that only 15% of their journals were being archived by LOCKSS and Portico combined [see ]). Public access to materials ensures the demand for investment in migration and ongoing preservation. Conversely, materials in dark archives may one day be discovered to be unusable and unrecoverable and therefore useless to future generations of researchers.
4. Are there models or new ideas for public-private partnerships that take advantage of existing publisher archives and encourage innovation in accessibility and interoperability, while ensuring long-term stewardship of the results of federally funded research?
Existing publisher archives often do not permit the levels and types of access that public–private partnerships could leverage to full advantage, but glimmers of the possibilities of such access are available in some collaborations that have involved researchers and publishers opening up their results to use and reuse. One example is the coordination in 2003 of the World Health Organization’s Multicentre Collaborative Network for Severe Acute Respiratory Syndrome that involved thirteen labs in ten countries sharing their research with each other in a way that spurred the highly efficient discovery of the cause of SARS and informed governmental solutions to contain the epidemic (see http://www.sarsreference.com/sarsref/virol.htm). Similarly, providing accessibility and interoperability to long-term archives of scientific literature might be a role for collaborative efforts by scholarly and professional societies, universities, and federal agencies acting in concert.
The authors of the article “Discovery Is Never by Chance: Designing for (Un)Serendipity” (http://dx.doi.org/10.1145/1640233.1640279), written by a team of academic and commercial collaborators, make the point that some of the most exciting and innovative discoveries could be made by allowing computers to assist with the process of discovery and scientific serendipity. Broad and deep human- and machine-readable access to research outputs will allow continued and rapid development of businesses focused on serendipitous discovery across disciplines and the creation of a whole range of services built on semantic technology (so-called Web 3.0) — whether, as the recent Semantic Technology Conference highlighted (http://semtech2011.semanticweb.com/), those developments are in healthcare, finance, publishing, marketing and advertising, emergency response, life sciences, consumer applications, the emerging field of sentiment analysis, or other areas.
5. What steps can be taken by Federal agencies, publishers, and/or scholarly and professional societies to encourage interoperable search, discovery, and analysis capacity across disciplines and archives? What are the minimum core metadata for scholarly publications that must be made available to the public to allow such capabilities? How should Federal agencies make certain that such minimum core metadata associated with peer-reviewed publications resulting from federally funded scientific research are publicly available to ensure that these publications can be easily found and linked to Federal science funding?
Interoperability of search, discovery, and analysis across repositories requires consistent metadata that is machine-readable and machine-interpretable, especially concerning object-specific rights for downloading, use, and reuse of the research. Within the context of the goal of interoperability among discipline-specific archives and repositories, metadata should be seen as the means for enabling the specific objectives outlined by the Office of Science and Technology Policy’s Request for Information rather than merely a description of the specific research article. Alongside the descriptive metadata (e.g., title, abstract, author, keywords) necessary for discovery and identification, administrative metadata must be included that outlines the proper management of the resource, such as when and how the object was created, the file type and other technical information, who can access the file and what can be done with it and other rights information, and the preservation information needed to archive and preserve the file.
Any baseline standard for metadata should begin with Dublin Core (http://dublincore.org/), but to fulfill the requirements for true interoperability, the elements of Dublin Core would need to be expanded in strategic ways (e.g., Qualified Dublin Core, Dublin Core Application Profiles), particularly to enable greater specificity for expressing intellectual property rights information and to supply both machine- and human-understandable context for each published resource. Important elements of any metadata model should include controlled vocabulary that makes explicit statements about reuse, attribution for funding organizations and grant identification, and descriptions of the resources that enable relationships to be determined semantically, such as the Resource Description Framework (RDF) and Web Ontology Language (OWL).
Existing metadata standards can be leveraged to inform a broader metadata specification for robust search, discovery, and analysis of research published through funding by federal agencies. In particular, the standards established by Dublin Core, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) (http://www.openarchives.org/pmh/), the DataCite Metadata Schema (http://schema.datacite.org/), and the Europeana Semantic Elements (http://www.europeana.eu/schemas/ese/) could form the basis for a schema developed by agencies already dedicated to improving metadata interoperability, such as the National Information Standards Organization (NISO) and the Library of Congress. Other elements of such a standard might be controlled identifiers, such as the research identification system being developed by the Open Researcher and Contributor ID (ORCID) project and the institutional identifier system known as I2, currently under discussion by NISO. Also of importance in any such standard would be metadata that provides for usage tracking and analytics across various repositories, such as the Standardized Usage Statistics Harvesting Initiative (SUSHI) schema (http://www.niso.org/schemas/sushi). The last of these elements would prove especially useful for federal agencies and the researchers they fund to understand the impact and reach of their work.
As the metadata standards for use in interoperable repositories are being developed, it is important that the metadata standard describing data not be developed separately from those describing publications. The publication standard will of necessity need to support analysis of published texts as data objects, and data that are considered integral to the publication will need to be associated with the publication in a clear manner. Critical to the success of any interoperable repository system will be the possibility of building bridges between related publications and the underlying data that support them in meaningful and machine-navigable ways (e.g., via vocabularies for semantic relationships and unique identifiers).
6. How can Federal agencies that fund science maximize the benefit of public access policies to U.S. taxpayers, and their investment in the peer-reviewed literature, while minimizing burden and costs for stakeholders, including awardee institutions, scientists, publishers, Federal agencies, and libraries?
If an agency requires public access to publications that result from government funding, the specifics of the policy could, without clear guidance and coordination among the agencies, vary greatly; those variables could affect the success of a repository to meet its public service mission. We would like to suggest that a repository, whether hosted by the federal government or by an institution or publisher (as long as that repository has been certified as “trusted”), could maximize its benefits for the public through partnerships with the many different publishers of the content. We anticipate that the technologies for developments related to the creation and searchability of such repositories will change routinely for the indefinite future.
The most effective policy would be to maximize access to the content by setting standards and requirements for deposit of the content, while at the same time maintaining flexibility to accommodate new technologies. For example, different publishers currently make their content available on different technological platforms that use diverse mark-up and search interfaces. An effective federal policy should not dictate the technology or the platform. Instead, the policy should stipulate that publications maintained by governmental agencies, publishers, universities, or other organizations that may be part of the effort of allowing access to federally funded research should use platforms that allow for interoperability (including protocols for easy deposit of publications into a variety of repositories), uniform indexing practices, and federated search results. Also, the policy should set clear objectives related to preservation of the content, metadata standards for the individual publications (including clear identification of the grant and agency that funded the research), and other such requirements. The publishers, universities, agencies, and other participants in the effort should then have the flexibility to use the latest available technological means for meeting those objectives. Overall, good policy that serves the objective of maximizing the benefit of public access to scientific publications should be relatively specific about the research and technological standards, while at the same time not being confined to specific technological tools that may become obsolete in the future.
7. Besides scholarly journal articles, should other types of peer-reviewed publications resulting from federally funded research, such as book chapters and conference proceedings, be covered by these public access policies?
Much of the discussion about open-access policies for scientific publications has centered on peer-reviewed journal articles. We would encourage policy makers to apply their policies beyond simply those works in a way that respects discipline-specific practice. Not all research publications that result from federal grants are in fact peer reviewed through the same methodologies, but these publications should nevertheless be included as valid research output. For example, while pre-publication peer review may be standard practice in the sciences, peer review in a conventional structured manner is not customarily undertaken with regard to publications in law journals. Further, many publications other than journal articles that result from federal research funding could and should be subject to open-access availability in the same way journal articles are and for the same reason, that the public has the right to access publications of whatever genre that arise from public funding of that research. For example, for many researchers in the fields of engineering and computer science, the most important results of their research often appear only in conference proceedings, rather than in journals. Technical reports, research reports, monographs, and contributions to edited volumes all count as primary literature and should be subjected to the same requirement as journal articles, with the caveat outlined below that there needs to be a mechanism in place to determine that the version of record is the only version subject to the deposit requirement.
Regardless of the type of publication, a policy calling for open access to the scientific literature should, at a minimum, include the standard that open access is mandatory only for the final version of the work as determined by the author. For example, for some researchers, conference proceedings are widely distributed and many of them are made publicly available online. An author of a contribution to such a proceeding expects that the work will be published and made available to all researchers in his or her field and to the general public. By contrast, in other disciplines, the paper that appears in a conference proceeding sometimes has only limited circulation, often only to attendees at the conference. The author may have delivered that paper at a conference specifically with the objective of gaining the benefit of comments from colleagues in order to further revise the paper for wider publication in some other venue. A paper appearing under those circumstances may not have the benefit of a clear indication from the author that the paper is ready for public accessibility. That paper, therefore, should not be made subject to a requirement of public access through the collective repository. Clear guidance as to depositing and reporting the results of research in its final forms, whatever the genre, should come from the federal agency requiring the deposit.
8. What is the appropriate embargo period after publication before the public is granted free access to the full content of peer-reviewed scholarly publications resulting from federally funded research? Please describe the empirical basis for the recommended embargo period. Analyses that weigh public and private benefits and account for external market factors, such as competition, price changes, library budgets, and other factors, will be particularly useful. Are there evidence-based arguments that can be made that the delay period should be different for specific disciplines or types of publications?
Publishers have long argued that a lengthy embargo is necessary to support their business, as subscribers would no longer pay for material that is freely available after a short period of time. This argument, however, is based on speculation rather than on fact. Subscribers, in particular academic libraries, do not as a matter of course drop journals because the content is freely available. Instead, they continue to subscribe to the journal as the publication of record. As an example, the Cornell-hosted repository arXiv (http://arxiv.org/), founded in 1991, has become the discipline repository of choice for those working in the field of physics and now increasingly in mathematics, computer science, quantitative biology, quantitative finance, and statistics. Despite arXiv’s popularity, physics journals remain alive and well; in fact, the number of physics journals has actually grown, rather than declined, over the past two decades that arXiv has been in existence, from 380 journals in 1991 to 597 in 2011 (see http://academic.research.microsoft.com/Journal/15655/physics).
Further, even a very short embargo period (six months or less) has been shown by those willing to try it to not have an effect on subscriptions. For example, the American Society for Cell Biology’s Molecular Biology of the Cell has deposited content with PubMed Central since the repository’s inception in 2001 with an embargo of only two months, to no financial ill effect (see http://ascb.org/index.cfm?navid=10&id=1968&tcode=nws3). For the past decade, Rockefeller University Press has likewise released content from its journals no later than six months after publication and has nevertheless enjoyed robust subscription numbers; in fact, the executive director of the press, Michael Rossner, opined in 2010 that “[c]harging for information in only the first six months after publication is a clear-cut way to know how valuable it is” — and suggested that those who needed to lock up their content longer than that are perhaps selling products no one wants or needs (http://jcb.rupress.org/content/early/2010/04/07/jcb.201003068; see also http://www.rupress.org/site/misc/philosophy.xhtml).
Embargoes of any length only apply, however, to those who wish to retain a business model that relies on subscriptions for revenue. Increasingly, immediate open access is seen as a valid business model. In addition to an explosion of open-access publishers (foremost among them the Public Library of Science [PLoS]) that have proven profitable, the same model has been adopted over the past two years by a number of commercial publishers as part of their business portfolio: SpringerOpen (June 2010), SAGE Open (April 2011), Wiley Open Access (2011), several Elsevier journals (launched in 2010-2011), such as International Journal of Surgery Case Reports (http://www.casereports.com/) and Results in Physics (http://www.journals.elsevier.com/results-in-physics/#description), and so on.
Business model innovation that accommodates public access is happening already; there is no reason to back away from federal public access policies merely because some publishers are unwilling to explore new models and revenue streams and cling instead to the hope of an unchanged status quo. Innovative publishers will be successful publishers.