Why share data?
Increase research visibility:
Making your data available to other researchers through widely-searched repositories can increase your prominence and demonstrate continued use of the data and relevance of your research.
Enabling other researchers to use your data reinforces open scientific inquiry and can lead to new and unanticipated discoveries. And doing so prevents duplication of effort by enabling others to use your data rather than trying to gather the data themselves.
Satisfy funder & journal requirements:
Many funding agencies, and some journals, now require that researchers deposit in an archive data which they collect as part of a research project.
Data posted online can be timestamped to establish the date they were produced, blocking “scooping” tactics
Data sharing can accelerate discovery rates, as researchers into Alzheimer’s disease have found
Data sharing resources
Columbia data security policies
Data sharing considerations:
The requirement affecting a large number of Columbia researchers is the NSF data sharing policy. As part of this policy, the NSF requires that all grant applications include a two-page data management plan. For more information, see our information on the NSF data management policy requirements.
Information on other funders with data sharing requirements, such as the National Institutes of Health (for projects with $500,000 or more of direct costs in any one year), the Howard Hughes Medical Institute, and the Wellcome Trust can be found with our Data Management Plan templates.
Some journals, such as Nature, Science, PNAS, PLoS ONE, and those associated with the ESA, require that data underlying articles they publish be made available. Check individual journal policies to be sure.
Privacy & confidentiality
Some data may require special protections because they are highly sensitive or highly regulated. These Sensitive Data may require encryption, anonymization, or other security measures. Release of Sensitive Data can damage:
- Your participants: identity theft, financial loss, privacy violations, etc.
- You: loss of reputation, loss of position, ethics violations
- Your institution: financial liability
Here is an overview of some resources and guidance collated by JISC on de-identification and anonymization: link
Here are some tools that may assist you in de-identifying or anonymizing data: comparison table
Intellectual property & licensing
Copyright & data
Copyright and ownership questions around data can be complex. In the U.S., facts cannot be copyrighted. But an original compilation of facts, such as in a database, may be copyrightable. And expressive data—or data involving a certain amount of judgement or creativity—or an expressive representation of data such as a graph, is copyrightable. See “Facts and Non-Creative Works” on this page of the Copyright Advisory Office website.
What rights do I have to data resulting from a research project?
- PI: Sponsors grant research funds to the Trustees of Columbia University, and the PI acts as steward of the research data and makes decisions on its use and distribution within the parameters of sponsor and Columbia guidelines. See the Intellectual Property section under “Obligations and Responsibilities of Officers of Instruction and Research” in the Faculty Handbook for more information on Columbia policies. Make sure you are aware of your obligations under these policies and those of your research sponsor.
- Postdoc: This depends on the specific circumstances of your research. Talk to your work supervisor or contact the Office of Postdoctoral Affairs.
- If you are leaving Columbia, do not assume you can take data with you. Whether or not you can do so depends on many factors, including the status of the project and the policies of the research sponsor and Columbia. Contact Sponsored Projects Administration for more information.
Licensing is an important part of data publication, because it lets people know what can be done with the published data and what restrictions there may or may not be on the data’s reuse. Some researchers choose to apply a license to their data, often choosing one from the Creative Commons family (CC 0 is frequently recommended).
- —How to License Research Data
- — Federal Open Licensing Playbook
- — Publishers Guide to Open Data Licensing-ODI (some UK specific details, but a nice overview)
- — Guides to Licensing
- — Creative Commons 4.0 and Open Data
Data Use Agreements
Data use agreements allow you to share research data in a way that provides more protection for participants, which may be appropriate for some types of data, such as clinical research data. These agreements may also reduce the risk of sharing data by ensuring the research value of secondary analyses or requiring certain agreements from those who wish to access the data. Please find templates and examples linked from the listed organizations:
- — Clinical Study Data Request (CSDR)
- — Yale Open Data Access (YODA) Project
- — Multi-regional Clinical Trials Center at Harvard University
Cite data to give the data producer proper credit and to enable readers of your work to access the data, for their own use, or to replicate your results. Whether citing data, or expecting others to cite your data make sure that the following elements are provided:
- Date: year of publication rather than collection or coverage
- Version or edition
- Publisher, data center or repository
- Identifier and/or permanent URL
Adapted from: MIT