Data Repositories

Following is a list of some of the data repositories available to Columbia researchers. A reputable data repository will provide long-term storage and access to data, validation of data integrity [check-sum], and a permanent resource locater (e.g., DOI, Purl) to make its data persistent, unique, and citable.

Data Repositories at Columbia

Academic Commons  Columbia’s research repository accepts data sets on all subjects from Columbia faculty, students, and staff.

Deep-Sea Sample Repository based at Lamont-Doherty Earth Observatory (LDEO) that archives ocean floor samples and digital data about them.

Integrated Earth Data Applications (IEDA) based at LDEO is a community-based data facility that provides data services for observational solid earth data from the Ocean, Earth, and Polar Sciences

IRI/LDEO Climate Data Library The IRI Data Library is a freely accessible online data repository and analysis tool that allows a user to view, analyze, and download hundreds of terabytes of climate-related data through a standard web browser.

Repository Catalogs

re3data  Registry of Research Data Repositories: a global registry of research data repositories

DataBib  A tool for helping people identify and locate online repositories of research data

ODiSEA  International Registry on Research Data: highlighting the combined efforts of six Spanish universities to make open access data available

BioSharing Registers “well-constituted efforts developing standards for describing and sharing biosciences experiments, ensuring these resources are informative and discoverable”

Nature  A list of recommended repositories for items published in Nature


Subject-specific Data Repositories

Data repository catalog: re3data



National Space Science Data Center  The Permanent archive for NASA space science mission data and space physics mission data.

SIMBAD  Basic data, cross-identifications, bibliography and measurements for astronomical objects outside the solar system.

Biological Sciences


Dryad  This repository is for research data underlying published, peer-review journal articles in basic and applied biosciences. It is run by the National Evolutionary Synthesis Center and the University of North Carolina Metadata Research Center in partnership with a consortium of journal publishers and societies.

RCSB Protein Data Bank  A worldwide repository of information about the 3D structures of large biological molecules, including proteins and nucleic acids.



PubChem  National Center for Biotechnology Information (NCBI)’s repository of data on the biological activities of small molecules.

Climate and Earth Sciences


National Center for Atmospheric Research’s Research Data Archive  Atmospheric and geosciences research data.

National Climatic Data Center  Repository run by the National Oceanographic and Atmospheric Administration (NOAA).

National Geophysical Data Center  NOAA’s repository for for geophysical data describing the solid earth, marine, and solar-terrestrial environment, as well as earth observations from space.

National Snow and Ice Data Center  This repository, part of the Cooperative Institute for Research in Environmental Sciences at the University of Colorado at Boulder, manages polar and cryospheric data.

UNAVCO  GPS data repository run by a nonprofit consortium.

Marine Data


National Oceanographic Data Center  NOAA’s marine data repository.

Social Sciences


Inter-university Consortium for Political and Social Research (ICPSR)  Repository for social science data run by an international consortium of academic and research institutions.

Open Context  Data repository for archeology and related disciplines run by a nonprofit consortium.

Back to Data Management index page