Consistent file naming conventions help you avoid errors or duplication in your research, make your files both machine and human-readable, and make file sorting and organization easier.
In the example below, the same sample is given two different names by two different lab members, leading to confusion and duplication of work.
Being specific and consistent in your file naming makes it easier to quickly read and identify files in a list, and makes it clear what type of information is contained within.
In the example below, the file name contains a date, which repetition it is from a gene expression experiment, and you can see that it is a spreadsheet file by the .csv extension.
2016_05_10_gene_expression_rep01.csv
2016_06_01_gene_expression_rep02.csv
2016_06_20_gene_expression_rep03.csv
2016_07_11_gene_expression_rep04.csv
Before you begin your research, decide on a naming convention for your files. Document the naming convention you choose, and make sure that you and your collaborators follow it. It will save you time and will help others who may use your files in the future. Best practices include:
Bisondata_1.0 = original document
Bisondata_1.1 = original document with minor revisions
More considerations for naming files can be found at these websites:
Versioning refers to saving new copies of your files when you make changes so that you can go back and retrieve specific versions of your files later. Saving multiple versions makes it possible to decide at a later time that you prefer an earlier version. You can then immediately revert back to that version instead of having to retrace your steps to recreate it.
In its most basic form, versioning relies on a sequential numbering system. Within a given version number category (major, minor), these numbers are generally assigned in increasing order and correspond to changes in the data. The US Geological Survey recommends the following structure:
The ETDplus project, led by the Educopia Institute, offers additional guidance for version control. Versioning should be taken into account when developing the folder and file naming structure. The following guidance is taken from the ETDplus brief on version control, available on the project site:
At the beginning of a research project, it is important to create a stable folder structure in which you can organize materials. The specific folders will depend on your own research process. File organization could be based on how you plan to gather materials, which experiment or process generated them when they were created, or other strategies. The key is to use folders that make sense to you and allow you to find your materials easily.A simple method to designate a revision is to note it at the end of the file name. This way, files can be grouped by their name and sorted by version number. For example:
If you use version numbers, one issue that can arise is that computers will sort files based on the position of the characters. This can lead to strange, unhelpful results. For example:
A good practice that can help you to avoid these problems is to use dates to designate version numbers. If you choose this strategy, format dates as year-month-day (20150930). Using this order will help avoid confusion when collaborating with other researchers or systems that use a day-month-year or month-day-year, and it will help your computer sort versions in chronological order. For example:
If the files you are using are created or edited collaboratively, you may want to incorporate names or initials into your file naming conventions so that you know which versions contain updates by each individual on your team. For example:
Date formats can vary between countries. The most common confusion is between the United States and European formats:
US - April 8, 2021, or 04/08/2021 vs. European - 8th April, 2021 or 08/04/2021.
Choosing a standard format for dates, and using a numerical notation, will help avoid confusion and errors.
ISO 8601 is the best standard for date formats:
YYYY-MM-DD = 2021-04-08 or 20210408
You can also break this down further with time notation if needed:
YYYYMMDDTHH:MM:SS, or 20210408T15:21:09
As you see in the example above about consistent file naming, it's helpful to use extensible file names to help organize and sort files with numerical content. When you view files in your file explorer or folders on your computer, we have all probably experienced the numbers being out of order and having to hunt for the file you need. The answer to that is extensibility!
When creating your file naming structure, think about whether you will be using image outputs or other ordered content and plan for that. If you know you will have hundreds or even thousands of files, building in that placeholder will allow you to easily order and find your files.
Good | Bad! |
---|---|
AtherRat_ex001_lipitor.tif |
AtherRat_ex1_lipitor.tif |
AtherRat_ex002_lipitor.tif |
AtherRat_ex10_lipitor.tif |
AtherRat_ex003_lipitor.tif |
AtherRat_ex2_lipitor.tif |
AtherRat_ex004_lipitor.tif |
AtherRat_ex3_lipitor.tif |
The format of the electronic data files you work with during your research may be determined by the research equipment and computer hardware and software that you have access to. However, for long-term preservation and ease of sharing, best practices may dictate that the files be converted to a different format after your project has ended. Give some thought to this eventuality at the outset. Considerations include:
Stanford University Libraries - Data Management Services provides a useful overview of preferred file formats. From the Stanford resource:
Containers: TAR, GZIP, ZIP
Databases: XML, CSV
Geospatial: SHP, DBF, GeoTIFF, NetCDF
Moving images: MOV, MPEG, AVI, MXF
Sounds: WAVE, AIFF, MP3, MXF
Statistics: ASCII, DTA, POR, SAS, SAV
Still images: TIFF, JPEG 2000, PDF, PNG, GIF, BMP
Tabular data: CSV
Text: XML, PDF/A, HTML, ASCII, UTF-8
Web archive: WARC
Additional helpful guidelines for selecting file formats can be found at these websites:
Best practices for preservation is to save your data on preservation formats. These four formats are the gold standard for making sure your data will be available for long term, as they can be opened and viewed on any operating system using any kind of software. They are:
Avoid using special characters in your file names:
~ ! @ # $ % ^ & * ( ) ’ “ ; < > ? { } [ ]
Most modern software probably won't allow these characters in names, but avoid them regardless. Special characters can cause confusion with coding or scripting languages or create errors.
Avoid using abbreviations in your file names. This leads to confusion and makes the file name difficult to read. You will probably forget the abbreviation you created! Make file names clear and human-readable.
BAD: msewt.csv
GOOD: 2018_09_20_mouse_weight.csv
Starting fresh with a new project and developing a file naming scheme is the best way to save time and aggravation. But if you need to clean up an existing file structure, there are tools out there to help you and make it less time-consuming. No endorsement is implied for any specific tool below, this is a list of available options. There may be more out there as well.