Documenting your data

What do I document and describe?

It is important to begin documenting your data at the start of your research and to continue doing so throughout the project. If you create the documentation only at the end of the project, important details may be lost or forgotten.

There are three types of documentation for a research project: study-level metadata, variable-level metadata, and catalogue metadata.

Study-level metadata

Provides context for understanding why the data were collected and how they were used. It could include:

  • Rationale and context for data collection
  • Data collection methods (protocols, sampling design, instruments or software used, etc.)
  • Structure and organization of data files
  • Secondary data sources used
  • Data validation and quality assurance (checking, proofing, cleaning, calibration, etc.)
  • Transformations of data from the raw data through analysis
  • Information on confidentiality, access and use conditions

Variable-level metadata

Provides more granular information, as it explains, in detail, the data and dataset. It could include:

  • Variable names, descriptions, units
  • Data type (integer, Boolean, character, etc.)
  • Explanation of codes and classification schemes used
  • Data processing methods, software used, scripts, codes
  • Data formats (.csv, .mat, .tiff, .txt, etc.) and software (including version) used

This information can be embedded in a data file. For example, variable, value and code labels can be added in an SPSS file. Interview transcripts can embed metadata in a header.

Find out more (documentation for quantitative, qualitative secondary data)

Catalogue metadata

When sharing data in a repository, the information added during data upload typically describes the content, context and provenance of the dataset(s) in a standardized and structured manner. This helps users find data, judge whether it is suitable for their research, and provides a bibliographic record for citing data.

The metadata in these data records often use international standards or schemes, consisting of mandatory and optional elements. Example schemes include Dublin Core (example) or the Data Documentation initiative (DDI) (example).

Example catalogue metadata could include:

  1. Name of the project
  2. Dataset title
  3. Project description
  4. Dataset abstract
  5. Principal investigator and collaborators
  6. Contact information
  7. Dataset handle (DOI or URL)
  8. Dataset citation
  9. Data publication date
  10. Geographic description
  11. Time period of data collection
  12. Subject/keywords
  13. Project sponsor
  14. Dataset usage rights

 
Back to top arrow up, go to top of page