File formats and organization

File formats

Before starting a project, it is important to think about file formats as this may have implications for the life of the data. Follow these best practices to reduce the chances of data loss from software or data obsolescence.
Best practices

File formats when collecting data

  • Open and non-propriety data are preferable
  • If data must be in a proprietary format, ensure that it can easily be converted to open, non-proprietary format
  • Select formats commonly used by research community
  • More best practices...

File formats when sharing / preserving data

  • Format should be open, non-proprietary, machine-readable
  • Share multiple formats if format used by research community is typically proprietary (eg. MonaLisa_v1.psd AND MonaLisa_v1.tiff)
  • For proprietary files, indicate (using a readme file) software/hardware needed to open files
  • If compression is necessary, use lossless format
  • More best practices...


Recommended formats for sharing, reuse and preservation

Sources: UK Data Service, Oregon State University

Type of data Recommended formats Acceptable formats
Tabular data (with extensive metadata)
variable labels, code labels, defined missing values
.por (SPSS portable format) .sav, .dta, .mdb,.accdb
Tabular data (with minimal metadata)
column headings, variable names
.csv, .tab .txt, .xls, .xlsx, .mdb, .accdb, .dbf, .ods
Geospatial data
vector & raster data
.shp, .shx, .dbf, .prj, .sbx, .sbn, .tif, .tfw, .dwg, .gml .mdb, .mif, .kml, .dxf, .svg
Textual data .rtf, .txt, .xml .html, .doc, .docx
Image data .tif (TIFF 6.0) .jpeg, .jpg, .jp2, .gif, .tif, .tiff, .raw, .psd, .bmp, .png, .pdf
Audio data .flac .mp3, .aif, .wav
Video data .mp4, ogv, .ogg, .mj2 .avchd
Documentation and scripts .rtf, .pdf, .xhtml, .htm, .odt ..txt, .doc, .docx, .xls, .xlsx, .xml
Chemistry data
.jdx (JCAMP)  

Consult the annually updated Library of Congress Recommended Formats Statement for more information on recommended file formats.

