Sharing

Why share data?

Increase research visibility:

Making your data available to other researchers through widely-searched repositories can increase your prominence and demonstrate continued use of the data and relevance of your research.

Facilitate discovery:

Enabling other researchers to use your data reinforces open scientific inquiry and can lead to new and unanticipated discoveries. And doing so prevents duplication of effort by enabling others to use your data rather than trying to gather the data themselves.

Satisfy funder & journal requirements:

Many funding agencies, and some journals, now require that researchers deposit in an archive data which they collect as part of a research project.

Establish priority:

Data posted online can be timestamped to establish the date they were produced, blocking “scooping” tactics

Speed research:

Data sharing can accelerate discovery rates, as researchers into Alzheimer’s disease have found

Adapted from: MIT & UW


Data sharing considerations:

Requirements

Funders

The requirement affecting a large number of Columbia researchers is the NSF data sharing policy. As part of this policy, the NSF requires that all grant applications include a two-page data management plan. For more information, see our information on the NSF data management policy requirements.

Information on other funders with data sharing requirements, such as the National Institutes of Health (for projects with $500,000 or more of direct costs in any one year), the Howard Hughes Medical Institute, and the Wellcome Trust can be found with our Data Management Plan templates.

Journals

Some journals, such as Nature, Science, PNAS, PLoS ONE, and those associated with the ESA, require that data underlying articles they publish be made available. Check individual journal policies to be sure.

Privacy & confidentiality

Some data may require special protections because they are highly sensitive or highly regulated. These Sensitive Data may require encryption, anonymization, or other security measures. Release of Sensitive Data can damage:

  • Your participants: identity theft, financial loss, privacy violations, etc.
  • You: loss of reputation, loss of position, ethics violations
  • Your institution: financial liability

Here is an overview of some resources and guidance collated by JISC on de-identification and anonymization: link

Here are some tools that may assist you in de-identifying or anonymizing data: comparison table

Intellectual property & licensing

Copyright & data

Copyright and ownership questions around data can be complex. In the U.S., facts cannot be copyrighted. But an original compilation of facts, such as in a database, may be copyrightable. And expressive data—or data involving a certain amount of judgement or creativity—or an expressive representation of data such as a graph, is copyrightable. See “Facts and Non-Creative Works” on this page of the Copyright Advisory Office website.

What rights do I have to data resulting from a research project?
  • PI: Sponsors grant research funds to the Trustees of Columbia University, and the PI acts as steward of the research data and makes decisions on its use and distribution within the parameters of sponsor and Columbia guidelines. See the Intellectual Property section under “Obligations and Responsibilities of Officers of Instruction and Research” in the Faculty Handbook for more information on Columbia policies. Make sure you are aware of your obligations under these policies and those of your research sponsor.
  • Postdoc: This depends on the specific circumstances of your research. Talk to your work supervisor or contact the Office of Postdoctoral Affairs.
  • If you are leaving Columbia, do not assume you can take data with you. Whether or not you can do so depends on many factors, including the status of the project and the policies of the research sponsor and Columbia. Contact Sponsored Projects Administration for more information.

Licensing

Licensing is an important part of data publication, because it lets people know what can be done with the published data and what restrictions there may or may not be on the data’s reuse. Some researchers choose to apply a license to their data, often choosing one from the Creative Commons family (CC 0 is frequently recommended).

Data Use Agreements

Data use agreements allow you to share research data in a way that provides more protection for participants, which may be appropriate for some types of data, such as clinical research data. These agreements may also reduce the risk of sharing data by ensuring the research value of secondary analyses or requiring certain agreements from those who wish to access the data. Please find templates and examples linked from the listed organizations:

Data citation

Cite data to give the data producer proper credit and to enable readers of your work to access the data, for their own use, or to replicate your results. Whether citing data, or expecting others to cite your data make sure that the following elements are provided:

  • Creators
  • Date: year of publication rather than collection or coverage
  • Title
  • Version or edition
  • Publisher, data center or repository
  • Identifier and/or permanent URL

Also see: Citing data from Academic Commons, DataPub-CDL: Basic Data Citation, and the Data Citation Principles

Adapted from: MIT