Document and describe · Research data management guide

Why document my data?

Documenting and describing your data makes it easier for you and others to reuse data at a later date. Imagine that you were taking over a project in the middle of a grant, but could not contact the principle researcher. What information would you need to continue the project? Here are some examples:

File handling (naming convention, folder structure)
Processing steps (how to get from point A to B)
Protocols (what decisions were made and why)
Field abbreviations/name glossary (what does ABC3130 stand for)

This is what is called metadata, which is "data about data" or the who, what, when, where, why, how of your research.

What do I document and describe?

It is important to begin documenting your data at the start of your research and to continue doing so throughout the project. If you create the documentation only at the end of the project, important details may be lost or forgotten.

There are three types of documentation for a research project: study-level metadata, variable-level metadata, and catalogue metadata.

Study-level metadata

Provides context for understanding why the data were collected and how they were used. It could include:

Rationale and context for data collection
Data collection methods (protocols, sampling design, instruments or software used, etc.)
Structure and organization of data files
Secondary data sources used
Data validation and quality assurance (checking, proofing, cleaning, calibration, etc.)
Transformations of data from the raw data through analysis
Information on confidentiality, access and use conditions

Variable-level metadata

Provides more granular information, as it explains, in detail, the data and dataset. It could include:

Variable names, descriptions, units
Data type (integer, Boolean, character, etc.)
Explanation of codes and classification schemes used
Data processing methods, software used, scripts, codes
Data formats (.csv, .mat, .tiff, .txt, etc.) and software (including version) used

This information can be embedded in a data file. For example, variable, value and code labels can be added in an SPSS file. Interview transcripts can embed metadata in a header.

Find out more (documentation for quantitative, qualitative secondary data)

Catalogue metadata

When sharing data in a repository, the information added during data upload typically describes the content, context and provenance of the dataset(s) in a standardized and structured manner. This helps users find data, judge whether it is suitable for their research, and provides a bibliographic record for citing data.

The metadata in these data records often use international standards or schemes, consisting of mandatory and optional elements. Example schemes include Dublin Core (example) or the Data Documentation initiative (DDI) (example).

Example catalogue metadata could include:

Name of the project
Dataset title
Project description
Dataset abstract
Principal investigator and collaborators
Contact information
Dataset handle (DOI or URL)
Dataset citation
Data publication date
Geographic description
Time period of data collection
Subject/keywords
Project sponsor
Dataset usage rights

How do I document my data?

Documentation can take many forms. It can be written in free text, such as a readme file, or the metadata can be captured in a structured, machine readable file, encoded using an xml format.

Structured, discipline specific metadata is preferable, but if no standard exists, writing “readme” style files are the most simple way of recording metadata.

Readme files

A readme file provides information about a data file. It allows yourself and others to understand and reuse the data at a later date.

Best practices:

Follow the Cornell guide to writing "readme" files.

Start writing the readme files at the beginning of the research project.
Record the information in a text file (.txt)
Use a template to help guide you, but tailor it to the needs of the project and kind of data that is being documented. Template examples:
- Cornell
- Oregon State
Update the file as the research progresses.
When the research is complete and ready to be shared, deposit the readme file alongside the data in a repository.

Data dictionaries & codebooks

Data dictionaries and codebooks provide variable-level metadata. These two types of documents may provide overlapping information.

Data dictionaries: describe the names, definitions, and attributes of the data elements in a file. Find out more:
- How to make a data dictionary (OSF)
- Describing your data with data dictionaries (Smithsonian Libraries)
- USGS Data Managament guide on data dictionaries
Codebooks: used by survey researchers to provide information about the data from a survey instrument. Find out more.

Lab notebooks

Lab notebooks (print or online) are also a great way to document your research. They include methodology, results, calculations, etc. They are helpful for publishing, sharing, or reproducing your research.

Information on lab notebook best practices
Information on choosing an electronic lab notebook:
- Harvard
- Cambridge

Metadata standards

Find out if your discipline uses a metadata standard to describe data. In fact, specific disciplinary data repositories may require a formal standard. These metadata files are often saved in a machine readable format, such as xml. There are tools that can help with the creation of these metadata files. See the Tools section for more information.

To find an appropriate metadata standard for your discipline, consult the following resources:

Disciplinary metadata guide (Digital Curation Center)
Open directory of metadata standards (Research Data Alliance)
Metadata standards catalog (Research Data Alliance)

Tools to document my data

Creating standardized metadata can be difficult and time consuming. There are tools that can help. Some help you select controlled vocabularies to include in your documentation. Others help you complete the metadata schema.

Use this comparison chart, created by Stanford, to select the tool that is right for you

Documenting your data