Finalizing Data

Data Appraisal & Selection for Storage

Not all data should be archived. Not all data that is archived should be kept for the same time, or in the same way. [ Why? ] To figure out what needs to go where when, you should appraise your data on the following principles1:

  • Relevance to research mission
  • Historical or scientific value
  • Uniqueness
  • Reliability / Integrity / Usability of data
  • Replicability, or lack thereof
  • Cost of management and preservation
  • Adequate available documentation
  • Satisfaction of requirements

Data Publication

The term data publishing is used to refer to a broad range of varying conceptions around the implications of the word “publication.” At a minimum any data publication should permit data citation. Data publications include the following forms:

Also consider issues of human participant or respondent confidentiality and disclosure risks. ICPSR’s Guide to Data Preparation provides a comprehensive overview of these considerations.

Consider the 7 Deadly Sins of data publication:

  • — PDFs
  • — Web interfaces
  • — Malformed tables
  • — No metadata
  • — Inconsistency within datasets
  • — Inconsistency across datasets
  • — Bad licensing

Data Licensing for Publication

Licensing is an important part of data publication, because it lets people know what can be done with the published data and what restrictions there may or may not be on the data’s reuse. In the United States, copyright does <strong>not</strong> protect data, facts, or ideas. It may protect some creative and original selection, arrangement or coordination of these, (Feist Publications v. Rural Telephone, the Supreme Court) e.g., in the form of a database (but not phonebooks), but does still not extend to the facts themselves within the database. (Questions on copyright? See Copyright Advisory Office or contact This is where licensing comes in.

Data Repositories

There are a variety of places where your data may be archived. A reputable data repository will provide long-term storage and access to data, validation of data integrity [check-sum], and a permanent resource locater (e.g., DOI, Purl) to make its data persistent, unique, and citable.


Depositing Data in Academic Commons

You can deposit your research results in Academic Commons, Columbia’s institutional repository, a service of the Libraries’ Center for Digital Research and Scholarship (CDRS).

Benefits of Depositing in Academic Commons

When you deposit data (and other materials) in Academic Commons, you receive:

  • a permanent URL
  • secure replicated storage (multiple copies of the data, including onsite and offsite storage)
  • accurate metadata
  • a globally accessible repository
  • the option for contextual linking between data and published research results



1 Whyte, A. & Wilson, A. (2010). “How to Appraise and Select Research Data for Curation”. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: