git: forever remove files or folders from history
I recently had a need to rewrite a git repository’s history. This isn’t generally a very good idea, though it is useful if your repository contains files it should not (such as unneeded large binary files or copyrighted material). I also am using it because I had a branch where I only wanted to merge a subset of files back into master (though there are probably better ways of doing this). Anyway, it is not very hard to rewrite history thanks to the excellent git-filter-branch tool which comes with git. However, if your goal was to reduce a large repository’s size then git-filter-branch does not quite finish the job since it makes temporary backups of the filtered out files. To remove those, you need to do a little more work. To make it easier to permanently remove files, I wrapped it in a little bash script git-remove-history (also shown below) — simply go to the root of your repository and run the script with the list of files you want to delete and it will do the rest. There is an interesting thread about doing this here on KernelTrap.
#!/bin/bash set -o errexit # Author: David Underhill # Script to permanently delete files/folders from your git repository. To use # it, cd to your repository's root and then run the script with a list of paths # you want to delete, e.g., git-delete-history path1 path2 if [ $# -eq 0 ]; then exit 0 fi # make sure we're at the root of git repo if [ ! -d .git ]; then echo "Error: must run this script from the root of a git repository" exit 1 fi # remove all paths passed as arguments from the history of the repo files=$@ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD # remove the temporary history git-filter-branch otherwise leaves behind for a long time rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune
This is awesome, thanks so much for posting.
One thing I noticed was while the file is removed the individual commits associated with the file seem to still exist when I run gitk. I tried to run
git rebase -i HEAD~10to squash a few commits but this had unexpected results. Am I doing something wrong?@Frank
I noticed this too. When removing files it rewrites commits but doesn’t remove them altogether (even if the file you removed was the only one affected by a commit). Try using a commit filter (instead of a file filter) to have it completely remove commit(s).
I still get the old files somewhere stuck in my pack file after this. Is this still the way to do it?
antony-stubbss-macbook-pro-2:test antonystubbs$ git reflog show
77833c3 HEAD@{0}: filter-branch: rewrite
75ea505 HEAD@{1}: commit: Remove big binary file
0e61e76 HEAD@{2}: commit: Add big binary file
2bf4f40 HEAD@{3}: commit (initial): Small repo
antony-stubbss-macbook-pro-2:test antonystubbs$ rm -rf .git/refs/original/ && git reflog expire –all –expire=0 && git gc –aggressive –prune
Counting objects: 5, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (5/5), done.
Total 5 (delta 2), reused 0 (delta 0)
antony-stubbss-macbook-pro-2:test antonystubbs$ git reflog show
antony-stubbss-macbook-pro-2:test antonystubbs$ du -ksh .
9.7M .
@David Underhill
see:
–prune-empty
Some kind of filters will generate empty commits, that left the tree untouched. This switch allow git-filter-branch to ignore such commits.
i need a bit of help. i get:
$ git-delete-history test1
zsh: command not found: git-delete-history
i’m running mac osx, git version 1.5.x, the file called git-remove-history.sh is in the current folder.
i also tried typying git-remove-history test1, but that didn’t work either.
many thanks for any tips….
@jovo
It looks like you need to specify the path to the script too. Since it is in the folder you are currently in, try
./git-delete-history test1(or alternatively, you could do
bash git-delete-history test1).
no dice
any other ideas?
maybe i could save this file in some folder that’s already in my path (i don’t know which folders are in my path)?
thanks again
@jovo
You need to run it with the actual name of the file — if you save it as ‘git-delete-history.sh’ then you need to run it, as David says, with
./git-delete-history.sh test1Note the addition of ‘.sh’.
David: Thanks so much for this script. I committed a 255MB file and almost immediately realised what a bad move that was, but it’s now fixed. Yay!
I tried this script under msysgit (Windows).
It runs and removes files from history, BUT ALSO THE ORIGINAL FILES from my working directory!
Is this intended behavior?
Or maybe a problem with msysgit?
Add some escaped quotes to deal with paths and filenames with spaces in them:
git filter-branch –index-filter “git rm -rf –cached –ignore-unmatch \”${files}\”" HEAD
@Ernest
Yes, it removes the files from all of your repository’s history — including the head of the repository.
Awesome, thanks!
@Phil Lawrence
As sort of a follow up in case it helps anyone else. I recently started using git again, and basically needed to remove a bunch of binary files that I’d added via ‘git add . / git commit’
The key for me (having read the ‘git magic’ tutorial also) is to remember that you can remove files from off the head of the repository but still keep them in your working directory (via ‘git rm –cached … / git commit’).
Once they’re gone from the repository, you can then invoke ‘git-remove-history …’ with the files (which still exist) and they will be cleanly removed from your .git repo…
At any rate, thanks a lot for this script! Pushes me over the threshold to adopting git again….
I wanted to delete a binary file to reduce the size of the repository, your script did delete the file of all the revisions but it isn’t reducing the space usage, how can I do that?
T-H-A-N-K Y-O-U!
Thanks, great help. Worked excellent. I had some log files in the history that I didn’t want there, poof, gone.
FREAKING AWESOME! You deserve a trophy.
@jovo
Replacing the
line with something like:
will allow it to delete paths that include spaces.
Note, my bash skills aren’t awesome, that exact line will fail in a lot of cases
I removed the ‘ensure we are in the root of the .git repo’ check, and changed the line
rm -rf .git/refs/..
to
rm -rf refs/..
And was able to run it in a bare git repository: it seemed to be successful.
However, when synchronising different repositories with pull and push, the files seem to get created again. Any ideas on the best way to use this script across multiple repositories?
Donate Beer button?
Thanks!!
Thanks! This saved my butt.
Thanks so much! One little thing the I realized, git takes ages going over all commits in history. http://stackoverflow.com/questions/872565/how-do-i-remove-sensitive-files-from-gits-history page indicates how you can limit the rewrite to a certain number of commits.
Adding -q to the git rm call improves the speed of the script by suppressing its output.
Was checking in to see if there was any sort of updated version of this thing as I still use it quite regularly – I think you have a typo after your exit 0 – you have “exit 0are still” I think you may have been typing in the wrong location.
@Jesse G. Donat Oops, not sure how that typo snuck in there – fixed! No updated version though; I don’t use it too frequently and this has served me sufficiently well on the occasions that I’ve needed it (so far :p).
I’m still getting the sensitive info showing up when I do this:
git grep sensitive $(git rev-list –all)
So clearly it’s not actually fully gone. Any suggestions?
I can just zap my history, since I’m just sanitizing my dotfiles repo so I can share it and don’t care much about the history, but I’d rather not.
Thanks a lot, this worked great for me!
Thanks so much for writing this, David. It helped us immensely reduce our repository size.
We had been checking a bunch of RubyGems into vendor/cache, which really added up over time. (Note for others: we were using some private gems; what we decided to do instead was include them in vendor/gems as git submodules.)
Thanks so much!, it’s help a lot.
This is amazing, thanks so much.
I am having problems with filenames with ” ( ” and ” ) ” in them…
-bash: syntax error near unexpected token `(‘
Any help?