Ever received a tarball dump of a git repo with changes, and wanted to graft it back into the history at the most likely branching off point? e.g. a vendor source tarball. This is what I ended up doing:
1. Clone the repo into ./repo (or adjust as desired).
2. export GIT_DIR=repo/.git
3. extract the tarball to ./tarball (or adjust as desired)
4. cd tarball
5. git log --reverse --format="%H %ai" --after=2016/03/01 --before 2016/06/30 | while read HASH DATE; do echo -n "$HASH $DATE "; git diff --name-only --diff-filter=M $HASH | wc -l; done | tee /tmp/diff_files # adjust date ranges as needed. This can take a LONG time!
6. sort -k5n /tmp/diff_files | head
to get the commits with the smallest number of changed files.
7. cd ../repo
8. git checkout $hash
9. cd ../tarball
10. git add .
11. git status # to check if there are any other untracked files that need to be updated in the index
12. git commit -m "Add vendor tarball"
13. git tag some_tag # so it doesn't get lost
done
This used the strategy of minimizing the number of modified files, and ignoring any new or deleted files.
If there are too many candidates based only on number of changed files, save the list of candidates, then compare based on the number of changed lines in the modified files (i.e. remove --name-only)