Archive and share data

Where to share data

There are many ways that researchers can share their data. These include:

Criteria in selecting a data repository

Source: University of Iowa

  1. FAIR Principles: FAIR means that data publishing platforms should enable data to be Findable, Accessible, Interoperable, and Re-usable. The FORCE11 FAIR Principles (simplified here) are:
    1. To be Findable any Data Object should be uniquely and persistently identifiable (have an identifer, such as a DOI)
    2. Data is Accessible in that it can be always obtained by machines and humans, upon authorization, through a well-defined protocol
    3. Data Objects are Interoperable (i.e. interpretable by a computer, so that they can be automatically combined with other data) if metadata and data use community agreed formats, language, vocabularies, and standards.
    4. Data Objects are Re-Usable if the above are met, if the data can be automatically linked or integrated with other data sources, with proper citation of the source, and have a clear machine-readable licence.
  2. Cost: Is there a cost to depositing data? Is it ongoing? Are these costs budgeted for?
  3. Discoverability: Are there adequate metadata fields to describe your data? Is the repository indexed by Google?
  4. Persistent identifiers: Does the repository register your data to create a persistent identifier (eg. a DOI)? These are necessary for citing your data.
  5. Policies and licenses: Are data use agreements and/or licensing (Creative Commons) clearly presented, to allow depositors to state explicitly up front what uses they would be willing to allow?
  6. Scholarly impact: Does it track data citation or download?
  7. Certification: It is possible for repositories to get certification (eg. CoreTrust Seal of Approval) which indicates how well they preserve digital content. Although good to have, note that very few repositories have achieved certification.

Discipline-specific data repositories

Discipline-spcific or domain repositories accept datasets related to either a specific discipline (e.g. genomics) or a broad subject-area (e.g. social sciences). Some repositories allow for self-archival and will provide limited or no curation service; others, like ICPSR, will provide in-depth curation services to subscribing institutions (Concordia is an ICPSR member) provided that the data fits within their collection development policy.


General-purpose repositories

If a discipline-specific repository is not available, general-purpose repositories are the next best option. They typically accept a wide range of data types, and are suitable for cross-disciplinary data. Below are some examples:

Canada flag Canadian general-purpose repositories


Concordia University Dataverse (from Borealis)
Description Concordia Library service offer

Why should you use this repository?

  • Support from Concordia Library (see next column)
  • Available for free to all Concordia researchers
  • Useful for small to medium sized datasets:
    • files ≤ 5GB
    • datasets ≤ 10GB
  • Data hosted on Canadian servers
  • Embargo options available

Ready to deposit? Consult the following:

Need help?:
lib-research.data@concordia.ca

What we offer:
  • Provide guidance on the deposit process and the preparation of the dataset for deposit
  • Create subDataverses
  • Assign permissions to different members of a team within a subDataverse
  • Rewiew deposited datasets
    • Review all metadata for completeness
    • Review adequacy of file naming convention
    • Review preservation friendliness of file formats
    • Check to see if files can be opened
  • Publish deposited datasets
  • Provide workshops on preparing and depositing data in Dataverse
What we do not offer:
  • Create metadata (readme files, data dictionaries, code books, catalogue metadata)
  • Review data for scientific quality
  • Data cleaning
  • File conversions
  • Data anonymization or de-identification
  • Data deposit
Federated Data Research Repository (FRDR):
  • Free to all Canadian researchers
  • Useful for very large datasets (files greater than 5GB)
  • Long term preservation of data is assured upon data ingestion with the help of the Archivematica preservation software
  • Data hosted on Canadian servers

 

Other commonly used general-purpose repositories

Dryad Non-profit repository allowing a total storage space of 50GB for US$120.
Figshare Commercial repository allowing a total storage space of 20GB for free.
Open ICPSR Accepts social and behavioural science research data. Different levels of curation services (from none to complete) are offered at varying prices.
OSF Open Science Framework (OSF) is a free and open source project management repository that supports researchers across their entire project lifecycle.
Zenodo A multidisciplinary platform hosted by CERN. Accepts all research outputs from all fields of science.

See also: Generalist repository comparison chart


Institutional or recommended repository

Institutional repositories

Depositing in discipline-specific or general-purpose repositories is encouraged, as they are generally better suited for data curation and dissemination. However, if there is no suitable discipline-specific repository for your dataset, and you do not wish to deposit in the Concordia University Dataverse repository, consider using Spectrum, Concordia University's institutional repository.

Recommended repositories

Some journals are requiring that researchers make the data associated with their papers publicly available to facilitate verification and replication of results. These publishers may either recommend a data repository, and in some cases, require that authors deposit their data in a specific repository. Note that if there is a cost to depositing data, it may be covered either by the submitter or by the publisher.

Below are examples of publisher recommended data repositories:


Data papers

Data papers describe datasets, and do not typically include any interpretation or discussion. Data papers are published either in a journal’s “Data Papers” section, or in a journal that exclusively publishes data papers (for example, see Nature’s Scientific Data).

According to Oregon State University:

"The purpose of a data journal is to provide quick access to high-quality datasets that are of broad interest to the scientific community. They are intended to facilitate reuse of the dataset, which increases its original value and impact, and speeds the pace of research by avoiding unintentional duplication of effort."
 
Back to top arrow up, go to top of page