Home > Coding, bash, version control > git: forever remove files or folders from history

git: forever remove files or folders from history

April 1st, 2009

I recently had a need to rewrite a git repository’s history. This isn’t generally a very good idea, though it is useful if your repository contains files it should not (such as unneeded large binary files or copyrighted material). I also am using it because I had a branch where I only wanted to merge a subset of files back into master (though there are probably better ways of doing this). Anyway, it is not very hard to rewrite history thanks to the excellent git-filter-branch tool which comes with git. However, if your goal was to reduce a large repository’s size then git-filter-branch does not quite finish the job since it makes temporary backups of the filtered out files. To remove those, you need to do a little more work. To make it easier to permanently remove files, I wrapped it in a little bash script git-remove-history (also shown below) — simply go to the root of your repository and run the script with the list of files you want to delete and it will do the rest. There is an interesting thread about doing this here on KernelTrap.

#!/bin/bash
set -o errexit
 
# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2
 
if [ $# -eq 0 ]; then
    exit 0are still
fi
 
# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi
 
# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD
 
# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

David Underhill Coding, bash, version control , , , , , ,

  1. June 25th, 2009 at 08:56 | #1

    This is awesome, thanks so much for posting.

  2. June 26th, 2009 at 11:02 | #2

    One thing I noticed was while the file is removed the individual commits associated with the file seem to still exist when I run gitk. I tried to run git rebase -i HEAD~10 to squash a few commits but this had unexpected results. Am I doing something wrong?

  3. June 29th, 2009 at 08:07 | #3

    @Frank
    I noticed this too. When removing files it rewrites commits but doesn’t remove them altogether (even if the file you removed was the only one affected by a commit). Try using a commit filter (instead of a file filter) to have it completely remove commit(s).

  4. July 6th, 2009 at 04:48 | #4

    I still get the old files somewhere stuck in my pack file after this. Is this still the way to do it?
    antony-stubbss-macbook-pro-2:test antonystubbs$ git reflog show
    77833c3 HEAD@{0}: filter-branch: rewrite
    75ea505 HEAD@{1}: commit: Remove big binary file
    0e61e76 HEAD@{2}: commit: Add big binary file
    2bf4f40 HEAD@{3}: commit (initial): Small repo
    antony-stubbss-macbook-pro-2:test antonystubbs$ rm -rf .git/refs/original/ && git reflog expire –all –expire=0 && git gc –aggressive –prune
    Counting objects: 5, done.
    Delta compression using up to 2 threads.
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (5/5), done.
    Total 5 (delta 2), reused 0 (delta 0)
    antony-stubbss-macbook-pro-2:test antonystubbs$ git reflog show
    antony-stubbss-macbook-pro-2:test antonystubbs$ du -ksh .
    9.7M .

  5. July 6th, 2009 at 04:49 | #5

    @David Underhill
    see:
    –prune-empty
    Some kind of filters will generate empty commits, that left the tree untouched. This switch allow git-filter-branch to ignore such commits.

  6. jovo
    July 23rd, 2009 at 10:55 | #6

    i need a bit of help. i get:

    $ git-delete-history test1
    zsh: command not found: git-delete-history

    i’m running mac osx, git version 1.5.x, the file called git-remove-history.sh is in the current folder.
    i also tried typying git-remove-history test1, but that didn’t work either.

    many thanks for any tips….

  7. July 24th, 2009 at 12:07 | #7

    @jovo
    It looks like you need to specify the path to the script too. Since it is in the folder you are currently in, try

    ./git-delete-history test1

    (or alternatively, you could do

    bash git-delete-history test1

    ).

  8. jovo
    July 25th, 2009 at 07:00 | #8

    no dice :(

    any other ideas?

    maybe i could save this file in some folder that’s already in my path (i don’t know which folders are in my path)?

    thanks again

  9. October 6th, 2009 at 08:35 | #9

    @jovo
    You need to run it with the actual name of the file — if you save it as ‘git-delete-history.sh’ then you need to run it, as David says, with

    ./git-delete-history.sh test1

    Note the addition of ‘.sh’.

    David: Thanks so much for this script. I committed a 255MB file and almost immediately realised what a bad move that was, but it’s now fixed. Yay!

  10. Ernest
    October 30th, 2009 at 10:34 | #10

    I tried this script under msysgit (Windows).

    It runs and removes files from history, BUT ALSO THE ORIGINAL FILES from my working directory!

    Is this intended behavior?

    Or maybe a problem with msysgit?

  11. Phil Lawrence
    January 3rd, 2010 at 00:39 | #11

    Add some escaped quotes to deal with paths and filenames with spaces in them:

    git filter-branch –index-filter “git rm -rf –cached –ignore-unmatch \”${files}\”" HEAD

  12. January 3rd, 2010 at 09:32 | #12

    @Ernest
    Yes, it removes the files from all of your repository’s history — including the head of the repository.

  13. January 4th, 2010 at 18:30 | #13

    Awesome, thanks!

  14. January 23rd, 2010 at 01:28 | #14

    @Phil Lawrence

    As sort of a follow up in case it helps anyone else. I recently started using git again, and basically needed to remove a bunch of binary files that I’d added via ‘git add . / git commit’

    The key for me (having read the ‘git magic’ tutorial also) is to remember that you can remove files from off the head of the repository but still keep them in your working directory (via ‘git rm –cached … / git commit’).

    Once they’re gone from the repository, you can then invoke ‘git-remove-history …’ with the files (which still exist) and they will be cleanly removed from your .git repo…

    At any rate, thanks a lot for this script! Pushes me over the threshold to adopting git again….

  1. May 26th, 2009 at 02:28 | #1
  2. July 9th, 2009 at 17:15 | #2