File formats and organization

File formats

Before starting a project, it is important to think about file formats as this may have implications for the life of the data. Follow these best practices to reduce the chances of data loss from software or data obsolescence.
Best practices

File formats when collecting data

  • Open and non-propriety data are preferable
  • If data must be in a proprietary format, ensure that it can easily be converted to open, non-proprietary format
  • Select formats commonly used by research community
  • More best practices...

File formats when sharing / preserving data

  • Format should be open, non-proprietary, machine-readable
  • Share multiple formats if format used by research community is typically proprietary (eg. MonaLisa_v1.psd AND MonaLisa_v1.tiff)
  • For proprietary files, indicate (using a readme file) software/hardware needed to open files
  • If compression is necessary, use lossless format
  • More best practices...

 

Recommended formats for sharing, reuse and preservation

Sources: UK Data Service, Oregon State University

Type of data Recommended formats Acceptable formats
Tabular data (with extensive metadata)
variable labels, code labels, defined missing values
.por (SPSS portable format) .sav, .dta, .mdb,.accdb
Tabular data (with minimal metadata)
column headings, variable names
.csv, .tab .txt, .xls, .xlsx, .mdb, .accdb, .dbf, .ods
Geospatial data
vector & raster data
.shp, .shx, .dbf, .prj, .sbx, .sbn, .tif, .tfw, .dwg, .gml .mdb, .mif, .kml, .dxf, .svg
Textual data .rtf, .txt, .xml .html, .doc, .docx
Image data .tif (TIFF 6.0) .jpeg, .jpg, .jp2, .gif, .tif, .tiff, .raw, .psd, .bmp, .png, .pdf
Audio data .flac .mp3, .aif, .wav
Video data .mp4, ogv, .ogg, .mj2 .avchd
Documentation and scripts .rtf, .pdf, .xhtml, .htm, .odt ..txt, .doc, .docx, .xls, .xlsx, .xml
Chemistry data
spectroscopy
.jdx (JCAMP)  

Consult the annually updated Library of Congress Recommended Formats Statement for more information on recommended file formats.

Back to top

File naming, organization, versioning

Before starting a project, it is important to plan file management strategies. This will help you save time later on. When developing file organization conventions, be consistent, document them, and share them with anyone who may access the data.
Find out about...
Back to top

Directory structure

Consider creating a readme.txt file, in your project's main file folder, that gives an explanation of the directory structure and describes the contents of the major folders. See MIT's README file & folder schema example.
Best practices
  • The main folder should have an informative name. For example: title, unique identifier, and date (year).
  • Subfolders should be divided by common theme. For example:
    • research activity (interviews, surveys, experiment)
    • parameter assessed
    • data type (images, text, databases)
    • kind of material (publications, deliverables, documentation)
  • Consider restricting the level of folders to three or four deep and not to have more than ten items per folder.

Directory structure examples:

Psychology example Marketing example
Directory structure psychology example Directory structure marketing example
Source: Berenson, K.R. 2018. Managing your research data
and documentation
. American Psychology Association.
Source: UK Data Service

 

Back to top

File naming

Consider creating a readme.txt file, in your project's main file folder, that explains your file naming convention, as well as any abbreviations or codes.
Best practices

Common elements in folder or file names:

  • Project or experiment name or acronym
  • File creator's name/initials
  • Date
  • Version number
  • Data characteristics. For example:
    • Location/spatial coordinates
    • Type of data (eg. Survey)
    • Conditions (eg. Lab instrument, Solvent, Temperature, etc.)

Rules of thumb for file names:

  • Keep file names as short as possible while including all necessary information.
  • Do not use spaces, full stops (.), or special characters (eg. &, *%#;()!@$^~'{}[]?<>)
  • Use hyphens (-), underscores ( _ ), or camel case (FileName) to separate elements in a file name
  • Dates should use consistent formatting (eg. YYYYMMDD)
  • Version numbers should have leading zeros to allow for multi-digit versions (eg. v_05, v_023)

 

Examples of useful file names Examples of poor file names
FG1_CONS_20100212.rtf
interview transcript of the first focus group with consumers, that took place on 12 February 2010
  • SrvMthdDraft.doc
  • SrvMthdFinal.doc
  • SrvMthdLastOne.doc
  • SrvMthdFridaynight.doc
Int024_AP_20080605.doc
interview with participant 024, interviewed by Anne Parsons on 5 June 2008
Focus group consumers 12 Feb?.doc
BDHSurveyProcedures_v04.pdf
version 4 of the survey procedures for the British Dental Health Survey
Health&Safety Procedures1

Source: UK Data Service

File renaming

Software is available for batch renaming multiple files using an automated process. Example software include Renamer (Mac) or Bulk Rename Utility (Windows). Find out more.

 

Back to top

File versioning

Version control strategies depend on whether files are being accessed by multiple users and in multiple locations. Consider using these best practices to keep track of file versions.
Best practices
  • Keep a copy of the 'master' data, and never edit it.
  • Add version information in file naming convention (eg. creation or modification date OR version number)
  • Use tools or software to help track file versioning. This could include:
    • Tools that automatically assign version numbers (eg. Electronic Lab Notebooks)
    • File sharing services (eg. Dropbox, Google Docs)
    • Version control software (eg. Subversion, Git)
    • Version control tables (see below)
  • Find out more

Version control table example

Source: UK Data Service

Title: Hearing screening tests in Montreal daycares
File name: HearingScreenResults_v05.csv
Description: Results data of 120 Hearning Screen Tests carried out in 7 daycares in Montreal during June 2017
Created by: Kate Smith
Maintained by: Mandy Watson
Created: 04/07/2017
Last modified: 25/11/2017
   
Version Responsible Notes Last amended
05 Mandy Watson Version 03 and 04 compared and merged by MW 25/11/2017
04 Alex Thakor Entries checked by AT, independent from SK 17/10/2017
03 Steve Knight Entries checked by SK 29/07/2017
02 Karen Miller Test results 81-120 entered 05/07/2017
01 Mandy Watson Test results 1-80 entered 04/07/2017
Back to top
 
Back to top arrow up, go to top of page