Appropriate file types for Git

Non-code files should go on the server, not the git repo.

What are appropriate files types to put in a Git repo?

Git works best with plain text files, including most code:

  • Matlab scripts (.m)
  • R scripts (.R)
  • Text files in simple formats (.txt and .csv, or .ost and .pcf, for Audapter)

What are NOT appropriate file types to put in a Git repo?

Almost anything else is a "binary" file type (explained below) and should not be added to a Git repo:

  • Matlab data files (.mat)
  • Figures and pictures (.fig, .jpg, .png)
  • Documents and slides (.docx, .pdf, .pptx)
  • Audio files (.wav)
  • Matlab Live Scripts (.mlx) -- these are actually binary files. If you want to keep track of a Live Script, save it as a regular .m file and add that to the repo.

Where should those binary files go instead?

If they are files related to an experiment, a good option is on the Waisman smng/ server. For example, /smng/experiments/[expt name]/figures, or smng/experiments/[expt name]/test_audio. If you need to have these files locally on your computer for some reason, you can copy them from the server to the local drive: C:/Users/Public/Documents/experiments/[expt name]/test_audio

What's a binary file? Why does this matter?

So why is code OK but PDFs aren't? For one, code files like .m are plain text files, which makes them teeny tiny (10 KB) in comparison to a PDF (2000 KB). Secondly, Git is optimized to work with text files, which represent characters: when you make a change to a text file, git looks to see what characters have changed, and (basically) only saves the difference from the last commit. For binary files, which represent bytes (not characters), Git isn't able to figure out what changed from one commit to the next; it can only tell if A.) The file is exactly the same as last time, or B.) Something changed. If something changed, it saves the whole file over again. This is doubly inefficient, since git is adding this updated file as (basically) a "new file" to the commit history instead of just the differences, and it's potentially a big file if it's binary.

Another reason to leave out binary files is that they take up space in the repository for all time. A git commit is a snapshot of all the files in the repository, and a repository contains the data from all past commits. This means that commits including large files continue to take up space on all lab computers, even if the files themselves have been deleted from the repository! This is also why it's important to never include participant data in a public repository -- all of a repo's commit history is visible to anyone with access to that repo, even after it's been deleted. (It can still be deleted in special, annoying-to-perform ways.)

Don't do it



Keywords:
git, binary 
Doc ID:
117306
Owned by:
Chris N. in SMNG Lab Manual
Created:
2022-03-11
Updated:
2026-03-25
Sites:
Speech Motor Neuroscience Group