Box 2: What not to version control

You can version control any file that you put in a Git repository, whether it is text-based, an image, or giant data files. However, just because you can version control something, does not mean you should. Git works best for plain text based documents such as your scripts or your manuscript if written in LaTeX or Markdown. This is because for text files, Git saves the entire file only the first time you commit it and then saves just your changes with each commit. This takes up very little space and Git has the capability to compare between versions (using git diff ). You can commit a non-text file, but a full copy of the file will be saved in each commit that modifies it. Over time, you may find the size of your repository growing very quickly. A good rule of thumb is to version control anything text based: your scripts or manuscripts if they are written in plain text. Things not to version control are large data files that never change, binary files (including Word and Excel documents), and the output of your code.

In addition to the type of file, you need to consider the content of the file. If you plan on sharing your commits publicly using GitHub, ensure you are not committing any files that contain sensitive information, such as human subject data or passwords.

To prevent accidentally committing files you do not wish to track, and to remove them from the output of git status , you can create a file called .gitignore . In this file, you can list subdirectories and/or file patterns that Git should ignore. For example, if your code produced log files with the file extension .log , you could instruct Git to ignore these files by adding *.log to .gitignore . In order for these settings to be applied to all instances of the repository, e.g. if you clone it onto another computer, you need to add and commit this file.