Comments offered by CDRS staff at public meetings on White House OSTP public access directive

Publications

Comments from Center for Digital Research and Scholarship Director Rebecca Kennison at the "Public Access to Federally-Supported Research and Development Data and Publications: Publications" meeting hosted by the National Academies, May 14, 2013.

My name is Rebecca Kennison and I am the Director of the Center for Digital Research and Scholarship, which is part of the Columbia University Libraries/Information Services. CDRS’ mission is to increase the utility and impact of research produced by faculty and students at Columbia by creating, adapting, implementing, supporting, and sustaining innovative digital tools and publishing platforms for content delivery, discovery, analysis, data curation, and preservation. We support in our work an institution, Columbia University, dedicated to advancing knowledge and learning at the highest level and to conveying the products of its efforts to the world. Within this context, we support making the results of federally funded research available to and useful for the public, industry, and the research community, the objectives of the White House Memorandum on “Increasing Access to the Results of Federally Funded Scientific Research.”

We appreciate in particular the calls for agency plans to be developed in consultation with all stakeholders, which include universities and their libraries. We believe that the development of consistent federal agency policies to ensure access to federally funded publications will accelerate discovery, improve education, and empower entrepreneurs to translate research into commercial ventures and jobs. To realize this potential, we strongly encourage agencies to be as consistent as possible in their policies and requirements to minimize the cost and complexity of compliance for both principal investigators and research administration. If policies are consistent, then a range of possibilities exist for any given agency to fulfill the requirements of the Memorandum.

Universities such as Columbia have already made significant investments to support the development of institutional repositories that can — and already do — play a role in sharing the new knowledge produced by our researchers. Likewise, multiple repositories — whether maintained by federal agencies, publishers, societies, commercial entities, or some combination of these — could play a similar role. Any repository selected to provide access to federally funded publications would, however, need to be certified as able to fulfill certain criteria agreed upon by all agencies. A suitable repository should be defined as one that meets all requirements for ensuring full public accessibility, productive reuse (including downloading, text mining, machine analysis, and computation), interoperability with other repositories housing federally funded publications, metadata based on open standards, and a commitment to long-term stewardship and preservation. It is our hope that providing accessibility to both machines and humans and ensuring interoperability to long-term archives of publications, however defined, might be a role for collaborative efforts by scholarly and professional societies, universities, and federal agencies acting in concert.

In support of the researchers that everyone within the scholarly communication system should realize we all must serve, we urge the uniform adoption by publishers of standards for publication such as the National Library of Medicine’s widely used Journal Publishing DTD and the proposed Open Text-Mining Interface, and we implore publishers to make available for access and use not only the PDF of the publication but also the XML that they almost all already generate. Optimally, these machine- and human-readable outputs would be provided by publishers to the agencies’ designated repository or repositories without additional charge to authors, to their institutions, or to the public.

I have much more I could say, but I’m sure others will make those points more eloquently.

Thank you.

___

Comments from Center for Digital Research and Scholarship Research Data Manager Amy Nurnberger at the "Public Access to Federally-Supported Research and Development Data and Publications: Publications" meeting hosted by the National Academies, May 14, 2013.

Hi, My name is Amy Nurnberger, and I am the Research Data Manager at Columbia University. Thank you for this opportunity to offer comment on “Increasing Access to the Results of Federally Funded Scientific Research.” Columbia University is dedicated to advancing knowledge and learning at the highest level and to conveying the products of its efforts to the world, and we support the Memorandum’s objectives within this context.

We particularly appreciate the efforts of the agencies to consult with the various parties that will be affected by the resulting policies, and applaud the goal of developing policies consistent in their compliance requirements. Beyond consistency, effective policies for public access to publications should:

  • set clear objectives related to preservation of the content,
  • describe funding provisions for access and preservation for both the short- and long-term archiving of scientific literature,
  • require human- and machine-readable access to research outputs,
  • facilitate working partnerships between the existing repositories maintained by publishers, institutions, societies, and other third parties that meet conditions that allow for indexing, public access, reuse, interoperability, preservation, and can be certified as “trusted repositories,”
  • maintain flexibility to accommodate new technologies by allowing freedom to choose the most appropriate platform,
  • mandate that federally funded research publications be made available on interoperable platforms that are accessible through federated search, and provide usage tracking and analytics across various repositories,
  • maximize access to the content by setting standards and requirements for deposit of the content, particularly with regard to metadata that should use controlled vocabularies, provide attribution for funding organizations and grant identification, describe resources in a way that enables relationships to be determined semantically, and use controlled identifiers,
  • implement processes specifically designed to achieve the policy’s stated goals that facilitate easy compliance and reduce administrative overhead, rather than adopt systems created for other users and other purposes.

Good policy that maximizes the benefit of public access to scientific publications should be specific about the research and technological standards, while allowing freedom of choice in the specific technological tools used to achieve these standards.

We urge the federal agencies when developing their policies to consider the power of consistency in encouraging adherence and minimizing the cost and complexity of compliance. We believe these goals are in keeping with those previously stated by this administration and certainly with the goals of principal investigators and university administration.

Thank you again for this opportunity.

___

Data

Comments from Center for Digital Research and Scholarship Director Rebecca Kennison at the "Public Access to Federally-Supported Research and Development Data and Publications: Data" meeting hosted by the National Academies, May 16, 2013.

My name is Rebecca Kennison, and I am the Director of the Center for Digital Research and Scholarship, which is part of Columbia University Libraries/Information Services. CDRS serves the digital research and scholarly communication needs of the faculty, students, and staff of Columbia University. Within our portfolio, we manage Columbia’s research repository, and through our Scholarly Communication Program we provide a wealth of information on a number of topics, including the opportunities and challenges of data management, data sharing, and data visualization. Along with our Research Data Manager, Amy Nurnberger, from whom you’ve already heard, we work closely with our colleagues, both in the Libraries and in the Office of Research Administration, to provide valuable training and education about data management and the research data life cycle. It is within this context that we welcome the opportunity to respond to the White House Memorandum on “Increasing Access to the Results of Federally Funded Scientific Research.”

We appreciate in particular that the Memorandum calls for agency plans to be developed in consultation with all stakeholders, which include universities and their libraries, who share common interests with the federal government in promoting broad public access to and the productive reuse of research data.

I would like to present here four recommendations to begin to achieve this goal.

First, agencies should permit principal investigators to request funding to cover their research data management costs as part of the data management plan requirements that already exist. Only by integrating the costs of storage, data curation, and long-term stewardship of data into the granting process will it be possible to gather enough information to allow for a proper consideration of the genuine cost to share and preserve various data types, information that will be crucial for an evaluation of the full benefits of these data to other researchers and to the public. We urge adoption of this requirement by all agencies.

Second, given the variability of agency funding, we believe the wisest policy is to encourage the growth of existing repositories and the development of new ones that will be managed by academic institutions, consortia, scholarly societies — or a combination of these — in partnership with government agencies, rather any individual agency trying to go it alone. Every researcher funded by a federal agency should be required to submit their datasets to a suitable repository upon completion of the grant in order to help ensure consistency in compliance. Allowing researchers to deposit data in the repository of their institution or a collaboratively run university-sponsored or other federal agency-approved data repository and providing a persistent link to that data in reports to the federal agency providing funding would, we believe, permit maximum compliance with minimal confusion.

Third, no matter where the data resides, final peer-reviewed scholarly publications should be linked openly and persistently to their source data to allow for reuse and replication of results and, as much as possible, underlying datasets should likewise be linked to the publications that arise from them. Agencies should require the use of persistent, unique identifiers for datasets in order to facilitate discovery and reuse of data, development of new services, and demonstration of the impact of sharing data in ways that align with existing discipline norms.

Fourth, but perhaps most significantly, we encourage – in fact, urge – the involvement of the scholarly and professional societies in the identification and development of domain-specific digital data standards and of the data repositories themselves. As both liaisons among and representatives for their constituencies, societies are equipped to deal with the inevitable idiosyncrasies of the data in their domain and should be vital partners in the development of any agency policy on research data.

Thank you.

Comments from Center for Digital Research and Scholarship Research Data Manager Amy Nurnberger at the "Public Access to Federally-Supported Research and Development Data and Publications: Data" meeting hosted by the National Academies, May 16, 2013.

Hi, My name is Amy Nurnberger, and I am the Research Data Manager at Columbia University. Thank you for this opportunity to offer comment on “Increasing Access to the Results of Federally Funded Scientific Research.” Columbia University is dedicated to advancing knowledge and learning at the highest level, and to conveying the products of its efforts to the world. We support the Memorandum’s objectives within this context.

We particularly appreciate the efforts of the agencies to consult with the various parties that will be affected by the resulting policies, and applaud the goal of developing policies consistent in their compliance requirements. Beyond consistency, the keys to encouraging preservation of digital data and providing public access to them are:

  • A definition of research data that serves the objective of making the results of federally funded research useful for the public, industry, and the scientific community
  • A clear framework for communicating the usage rights for those data
  • The provision of open data repositories that adhere to common standards for defining, identifying, describing, and storing data by facilitating alliances of existing repositories and creating and funding interoperable repositories managed in partnership with the government
  • A funding system that supports the deposit, maintenance, and preservation of federally funded research data and provides for the unanticipated costs of data stewardship
  • Investment in technological infrastructure that makes data management compliance practicable, painless, and intellectually profitable

To attain these goals, we encourage agencies to work closely with discipline-specific groups such as professional and scholarly societies, information technology specialists, librarians, and research administrators, creating alliances to fulfill some of the roles that may best be taken on at an agency level, such as:

  • Creating data-aggregating portals that provide a unified point of access to disparately archived data
  • Promoting and incentivizing best-practice solutions for data archiving and preservation
  • Reshaping existing grant management workflows to accommodate mechanisms for making research data available in such a way that important stakeholders are minimally burdened by these new requirements, thereby reducing both administrative costs and obstacles to compliance. This reshaping may take the form of automation based on clearly communicated standards that integrates compliance into existing workflows for granting, research, and publication/distribution  
  • Providing a centralized index of identification and description standards to facilitate the discovery, reuse, and impact tracking of data.  Such a resource could foster adherence to community practice and reduce barriers to interoperability.

Addressing questions of governance, and adoption or development of standards and conventions among disciplinary communities is another area where we feel discipline-specific groups such as professional and scholarly societies, information technology specialists, librarians, and research administrators can assist agencies. These questions include at a minimum issues of:

  • Establishing baseline metadata requirements for interoperability and discovery
  • Requiring that labeling be done in human- and machine-readable formats
  • Ensuring the clear labeling of data so that all stakeholders are aware of the use conditions of a given dataset
  • Encouraging the assignment of use conditions at all steps of the data lifecycle. Raw data may go through many transformations before they find their way into publications and other end-uses, but the ability to trace those data end-to-end is an essential part of the verification process
  • Assuring that data are clearly associated with the publications that cite them and the code used to process them for purposes of validation and reproducibility

At this point in time you, the funding agencies, are presented with a variety of possibilities in terms of your potential roles and actions with respect to provisioning public access to data. We encourage you to continue to approach these opportunities in company with members of your scientific and discipline-specific communities to develop policies that enable consistency, that provide useful definitions of research data, that allow practical funding and compliance practices, and that enable standards facilitating data discovery and reuse. Working together we can develop open paths that achieve the mandates’ intent for making federally funded research data publicly accessible.

Thank you again for this opportunity.