File formats and organization
File formats
Before starting a project, it is important to think about file formats as this may have implications for the life of the data. Follow these best practices to reduce the chances of data loss from software or data obsolescence.
Best practices
File formats when collecting data
- Open and non-propriety data are preferable
- If data must be in a proprietary format, ensure that it can easily be converted to open, non-proprietary format
- Select formats commonly used by research community
- More best practices...
File formats when sharing / preserving data
- Format should be open, non-proprietary, machine-readable
- Share multiple formats if format used by research community is typically proprietary (eg. MonaLisa_v1.psd AND MonaLisa_v1.tiff)
- For proprietary files, indicate (using a readme file) software/hardware needed to open files
- If compression is necessary, use lossless format
- More best practices...
Recommended formats for sharing, reuse and preservation
Sources: UK Data Service, Oregon State University
Type of data | Recommended formats | Acceptable formats |
---|---|---|
Tabular data (with extensive metadata) variable labels, code labels, defined missing values |
.por (SPSS portable format) | .sav, .dta, .mdb,.accdb |
Tabular data (with minimal metadata) column headings, variable names |
.csv, .tab | .txt, .xls, .xlsx, .mdb, .accdb, .dbf, .ods |
Geospatial data vector & raster data |
.shp, .shx, .dbf, .prj, .sbx, .sbn, .tif, .tfw, .dwg, .gml | .mdb, .mif, .kml, .dxf, .svg |
Textual data | .rtf, .txt, .xml | .html, .doc, .docx |
Image data | .tif (TIFF 6.0) | .jpeg, .jpg, .jp2, .gif, .tif, .tiff, .raw, .psd, .bmp, .png, .pdf |
Audio data | .flac | .mp3, .aif, .wav |
Video data | .mp4, ogv, .ogg, .mj2 | .avchd |
Documentation and scripts | .rtf, .pdf, .xhtml, .htm, .odt | ..txt, .doc, .docx, .xls, .xlsx, .xml |
Chemistry data spectroscopy |
.jdx (JCAMP) |
Consult the annually updated Library of Congress Recommended Formats Statement for more information on recommended file formats.