Home > bash, Coding, version control > git: forever remove files or folders from history

git: forever remove files or folders from history

April 1st, 2009

I recently had a need to rewrite a git repository’s history. This isn’t generally a very good idea, though it is useful if your repository contains files it should not (such as unneeded large binary files or copyrighted material). I also am using it because I had a branch where I only wanted to merge a subset of files back into master (though there are probably better ways of doing this). Anyway, it is not very hard to rewrite history thanks to the excellent git-filter-branch tool which comes with git. However, if your goal was to reduce a large repository’s size then git-filter-branch does not quite finish the job since it makes temporary backups of the filtered out files. To remove those, you need to do a little more work. To make it easier to permanently remove files, I wrapped it in a little bash script git-remove-history (also shown below) — simply go to the root of your repository and run the script with the list of files you want to delete and it will do the rest. There is an interesting thread about doing this here on KernelTrap.

#!/bin/bash
set -o errexit
 
# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2
 
if [ $# -eq 0 ]; then
    exit 0
fi
 
# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi
 
# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD
 
# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

David Underhill bash, Coding, version control , , , , , ,

  1. June 25th, 2009 at 08:56 | #1

    This is awesome, thanks so much for posting.

  2. June 26th, 2009 at 11:02 | #2

    One thing I noticed was while the file is removed the individual commits associated with the file seem to still exist when I run gitk. I tried to run git rebase -i HEAD~10 to squash a few commits but this had unexpected results. Am I doing something wrong?

  3. June 29th, 2009 at 08:07 | #3

    @Frank
    I noticed this too. When removing files it rewrites commits but doesn’t remove them altogether (even if the file you removed was the only one affected by a commit). Try using a commit filter (instead of a file filter) to have it completely remove commit(s).

  4. July 6th, 2009 at 04:48 | #4

    I still get the old files somewhere stuck in my pack file after this. Is this still the way to do it?
    antony-stubbss-macbook-pro-2:test antonystubbs$ git reflog show
    77833c3 HEAD@{0}: filter-branch: rewrite
    75ea505 HEAD@{1}: commit: Remove big binary file
    0e61e76 HEAD@{2}: commit: Add big binary file
    2bf4f40 HEAD@{3}: commit (initial): Small repo
    antony-stubbss-macbook-pro-2:test antonystubbs$ rm -rf .git/refs/original/ && git reflog expire –all –expire=0 && git gc –aggressive –prune
    Counting objects: 5, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (5/5), done.
    Total 5 (delta 2), reused 0 (delta 0)
    antony-stubbss-macbook-pro-2:test antonystubbs$ git reflog show
    antony-stubbss-macbook-pro-2:test antonystubbs$ du -ksh .
    9.7M .

  5. July 6th, 2009 at 04:49 | #5

    @David Underhill
    see:
    –prune-empty
    Some kind of filters will generate empty commits, that left the tree untouched. This switch allow git-filter-branch to ignore such commits.

  6. Wavatar
    jovo
    July 23rd, 2009 at 10:55 | #6

    i need a bit of help. i get:

    $ git-delete-history test1
    zsh: command not found: git-delete-history

    i’m running mac osx, git version 1.5.x, the file called git-remove-history.sh is in the current folder.
    i also tried typying git-remove-history test1, but that didn’t work either.

    many thanks for any tips….

  7. July 24th, 2009 at 12:07 | #7

    @jovo
    It looks like you need to specify the path to the script too. Since it is in the folder you are currently in, try

    ./git-delete-history test1

    (or alternatively, you could do

    bash git-delete-history test1

    ).

  8. Wavatar
    jovo
    July 25th, 2009 at 07:00 | #8

    no dice :(

    any other ideas?

    maybe i could save this file in some folder that’s already in my path (i don’t know which folders are in my path)?

    thanks again

  9. October 6th, 2009 at 08:35 | #9

    @jovo
    You need to run it with the actual name of the file — if you save it as ‘git-delete-history.sh’ then you need to run it, as David says, with

    ./git-delete-history.sh test1

    Note the addition of ‘.sh’.

    David: Thanks so much for this script. I committed a 255MB file and almost immediately realised what a bad move that was, but it’s now fixed. Yay!

  10. Wavatar
    Ernest
    October 30th, 2009 at 10:34 | #10

    I tried this script under msysgit (Windows).

    It runs and removes files from history, BUT ALSO THE ORIGINAL FILES from my working directory!

    Is this intended behavior?

    Or maybe a problem with msysgit?

  11. Wavatar
    Phil Lawrence
    January 3rd, 2010 at 00:39 | #11

    Add some escaped quotes to deal with paths and filenames with spaces in them:

    git filter-branch –index-filter “git rm -rf –cached –ignore-unmatch \”${files}\”" HEAD

  12. January 3rd, 2010 at 09:32 | #12

    @Ernest
    Yes, it removes the files from all of your repository’s history — including the head of the repository.

  13. January 4th, 2010 at 18:30 | #13

    Awesome, thanks!

  14. January 23rd, 2010 at 01:28 | #14

    @Phil Lawrence

    As sort of a follow up in case it helps anyone else. I recently started using git again, and basically needed to remove a bunch of binary files that I’d added via ‘git add . / git commit’

    The key for me (having read the ‘git magic’ tutorial also) is to remember that you can remove files from off the head of the repository but still keep them in your working directory (via ‘git rm –cached … / git commit’).

    Once they’re gone from the repository, you can then invoke ‘git-remove-history …’ with the files (which still exist) and they will be cleanly removed from your .git repo…

    At any rate, thanks a lot for this script! Pushes me over the threshold to adopting git again….

  15. April 1st, 2010 at 10:08 | #15

    I wanted to delete a binary file to reduce the size of the repository, your script did delete the file of all the revisions but it isn’t reducing the space usage, how can I do that?

  16. Wavatar
    zbot1
    April 14th, 2010 at 06:30 | #16

    T-H-A-N-K Y-O-U!

  17. April 15th, 2010 at 00:34 | #17

    Thanks, great help. Worked excellent. I had some log files in the history that I didn’t want there, poof, gone.

  18. June 4th, 2010 at 11:06 | #18

    FREAKING AWESOME! You deserve a trophy.

  19. Wavatar
    James Andres
    September 13th, 2010 at 11:02 | #19

    @jovo

    Replacing the

    files=$@

    line with something like:

    files=$(for i in "$@"; do echo "'$i' "; done)

    will allow it to delete paths that include spaces.

    Note, my bash skills aren’t awesome, that exact line will fail in a lot of cases :-(

  20. Wavatar
    Eoghan Murray
    October 8th, 2010 at 09:02 | #20

    I removed the ‘ensure we are in the root of the .git repo’ check, and changed the line
    rm -rf .git/refs/..
    to
    rm -rf refs/..

    And was able to run it in a bare git repository: it seemed to be successful.

    However, when synchronising different repositories with pull and push, the files seem to get created again. Any ideas on the best way to use this script across multiple repositories?

  21. Wavatar
    Arnold
    October 26th, 2010 at 02:36 | #21

    Donate Beer button?

  22. Wavatar
    Misael
    March 22nd, 2011 at 11:32 | #22

    Thanks!!

  23. Wavatar
    Bobby
    March 27th, 2011 at 18:17 | #23

    Thanks! This saved my butt.

  24. Wavatar
    Raphael
    May 18th, 2011 at 05:40 | #24

    Thanks so much! One little thing the I realized, git takes ages going over all commits in history. http://stackoverflow.com/questions/872565/how-do-i-remove-sensitive-files-from-gits-history page indicates how you can limit the rewrite to a certain number of commits.

  25. June 10th, 2011 at 12:10 | #25

    Adding -q to the git rm call improves the speed of the script by suppressing its output.

  26. August 2nd, 2011 at 18:39 | #26

    Was checking in to see if there was any sort of updated version of this thing as I still use it quite regularly – I think you have a typo after your exit 0 – you have “exit 0are still” I think you may have been typing in the wrong location.

  27. August 2nd, 2011 at 18:43 | #27

    @Jesse G. Donat Oops, not sure how that typo snuck in there – fixed! No updated version though; I don’t use it too frequently and this has served me sufficiently well on the occasions that I’ve needed it (so far :p).

  28. Wavatar
    Michael Leuchtenburg
    November 8th, 2011 at 20:26 | #28

    I’m still getting the sensitive info showing up when I do this:
    git grep sensitive $(git rev-list –all)

    So clearly it’s not actually fully gone. Any suggestions?

    I can just zap my history, since I’m just sanitizing my dotfiles repo so I can share it and don’t care much about the history, but I’d rather not.

  29. November 21st, 2011 at 20:49 | #29

    Thanks a lot, this worked great for me!

  30. December 2nd, 2011 at 17:07 | #30

    Thanks so much for writing this, David. It helped us immensely reduce our repository size.

    We had been checking a bunch of RubyGems into vendor/cache, which really added up over time. (Note for others: we were using some private gems; what we decided to do instead was include them in vendor/gems as git submodules.)

  31. Wavatar
    Carlos Gant
    January 13th, 2012 at 05:40 | #31

    Thanks so much!, it’s help a lot.

  32. Wavatar
    karl
    March 14th, 2012 at 13:01 | #32

    This is amazing, thanks so much.

    I am having problems with filenames with ” ( ” and ” ) ” in them…

    -bash: syntax error near unexpected token `(‘

    Any help?

  33. Wavatar
    Daniel
    September 25th, 2012 at 15:11 | #33

    You sir, are a scholar and a gentleman. This saved me. Much obliged!

  1. May 26th, 2009 at 02:28 | #1
  2. July 9th, 2009 at 17:15 | #2
  3. May 18th, 2010 at 01:34 | #3