Git tricks
June 26, 2015 — April 19, 2024
My own git notes, not intended to be tutorial; there are better learning resources than this. Some are noted here, in fact. See also the more universally acclaimed classic git tips.
1 Learning git
See the fastai masterclass for many more helpful tips/links/scripts/recommendations. Learn Git Branching explains the mechanics in a friendly fashion. Steve Bennett’s 10 things I hate about Git is also useful.
Mark Dominus’ master class:
2 Handy git config
2.1 git editor
I often do editing in VS code it is convenient to set it as git editor:
2.2 .gitignore
Basic level: Ignore macOS .DS_Store
files
Galaxy brain version:
gitignore.io- Create Useful .gitignore Files For Your Project
I am fond of the macos,visualstudiocode
combo.
3 Handy git commands
3.1 Merging ours/theirs
During a merge, git checkout --theirs filename
(or --ours
) will checkout respectively their (or our) version. The following sweet hack will resolve all files accordingly:
I do this a lot and will never remember the details, so here are some aliases for fish which I can use to make this easier:
echo "alias git_theirs 'git diff --name-only --diff-filter=U | xargs git checkout --theirs --'" > ~/.config/fish/conf.d/git_theirs.fish
echo "alias git_ours 'git diff --name-only --diff-filter=U | xargs git checkout --ours --'" >> ~/.config/fish/conf.d/git_theirs.fish
chmod a+x ~/.config/fish/conf.d/git_theirs.fish
source ~/.config/fish/conf.d/git_theirs.fish
3.2 Searching
3.2.1 …for a matching file
3.2.2 …for a matching commit
Easy, except for the abstruse naming; It is called “pickaxe” and spelled -S
.
3.3 track the history of a file including renames
Kashyap Kondamudi advises Use –follow option in git log to view a file’s history.
3.4 Clone a single branch
3.5 Remove file from versioning without deleting my copy
3.6 delete remote branch
3.7 Push to a non-obvious branch
This is almost obvious except the git naming of things seems… arbitrary? Why refs/heads/SOMETHING
?
Read on.
4 What git calls things
By which I mean that which is formally denoted as git references. git references is the canonical description of the mechanics. tl;dr the most common names are refs/heads/SOMETHING
for branch SOMETHING
, refs/tags/SOMETHING
and remotes/SOMEREMOTE/SOMETHING
for (last known state of) a remote branch.
As alexwlchan explains, these references are friendly names for commits, and should be thought of as pointers to commits.
And yet there is something a little magical going on. How come if I pull a branch, I get the latest version of that branch, not the earliest to use that name? Other stuff is happening.
The uses are (at least partly) convention and other references can be used too. For example gerrit
uses refs/for/
for code review purposes.
5 Filters
Commands applied to your files on the way in and out of the repository. Keywords, smudge
, clean
, .gitattr
These are a long story, but not so complicated in practice. A useful one is stripping crap from jupyter notebooks.
6 Commit hooks
For doing stuff before you put it in cold storage. e.g asking DID YOU REALLY WANT TO INCLUDE THAT GIANT FILE?
Here is a commit hook that does exactly that. I made a slightly modernized version:
UPDATE: I decided this was a waste of time and removed it.
After that installation you can retrofit the hook to an existing repository thusly
There are various frameworks for managing hooks, if you have lots. For example, pre-commit is a mini-system for managing git hooks, based on python. Husky is a node.js
-based one.
I am not sure whether hook management system actually saves time overall for a solo developer, since the kind of person who remembers to install a pre-commit hook is also the kind of person who is relatively less likely to need one. Also it is remarkably labour-intensive to install the dependencies for all these systems, so if you are using heterogeneous systems this becomes tedious.
To skip the pre-commit hook,
7 Subtrees/submodules/subprojects/subdirs/subterranean mole people
Sub-projects inside other projects? External projects? The simplest way of integrating external projects seems to be as subtrees. Once this is set up we can mostly ignore them and things work mostly as expected. Alternatively there are submodules, which have various complications. More recently, there is the subtrac system, which I have not yet used.
7.1 Submodule
Include external projects as separate repositories within a repository is possible, but I won’t document it, since it’s well documented elsewhere, and I use it less often, because it is fiddly, in that Some discipline is required to make it go; you need to remember to git submodule init
etc.
7.2 Subtrac
Have not yet tried.
subtrac
is a helper tool that makes it easier to keep track of your git submodule contents. It collects the entire contents of the entire history of all your submodules (recursively) into a separate git branch, which can be pushed, pulled, forked, and merged however you want.
7.3 Subtree
Subtree subsumes one git tree into another in a usually-transparent way (no separate checkout as with submodules.) It can be used for temporary merging or for splicing and dicing projects.
7.3.1 Splicing a subtree onto a project
Creatin’:
Updatin’:
git fetch remote branch
git subtree pull --prefix=subdir remote branch --squash
git subtree push --prefix=subdir remote branch --squash
Con: Rebasin’ with a subtree in your repo is slow and involved.
7.3.2 Taking a cutting to make a sub-project
Use subtree split
to prise out one chunk. It has various wrinkles but is fast and easy.
pushd superproject
git subtree split -P project_subdir -b project_branch
popd
mkdir project
pushd project
git init
git pull ../superproject project_branch
Alternatively, to comprehensively rewrite history to exclude everything outside a subdir:
7.4 Download a sub-directory from a git tree
This works for github at least. I think anything running git-svn
?
- replace tree/master => trunk
- svn co the new url
9 Conventions
Many possible workflows are feasible with git. Large teams often have elaborate conventions for what to name branches and who gets to merge what with whom. Here are some I have seen in the wild:
- Conventional Commits
- Trunk-based Development
- GitHub flow (which is not the same as gitflow)
Some of these systems have associated helpers, see next.
10 Helpers
Git has various layers of abstraction, from a very basic infrastructure of plumbing, through notionally use-friendly, higher-level porcelain commands which are supposed by authors of git to be user friendly but universally regarded as “a passable first attempt at best”, through to various helper commands that ease various workflows and pain points.
10.1 git-worktree
10.2 git-branchless
arxanas/git-branchless: Branchless workflow for Git
The branchless workflow is designed for use in a repository with a single main branch that all commits are rebased onto. It improves developer velocity by encouraging fast and frequent commits, and helps developers operate on these commits fearlessly.
In the branchless workflow, the commits you’re working on are inferred based on your activity, so you no longer need branches to keep track of them. Nonetheless, branches are sometimes convenient, and
git-branchless
fully supports them. If you prefer, you can continue to use your normal workflow and benefit from features likegit sl
orgit undo
without going entirely branchless.
10.3 git-undo
Also : git undo: We can do better
How is it so easy to “lose” your data in a system that’s supposed to never lose your data?
Well, it’s not that it’s too easy to lose your data — but rather, that it’s too difficult to recover it. For each operation you want to recover from, there’s a different “magic” incantation to undo it. All the data is still there in principle, but it’s not accessible to many in practice.
…To address this problem, I offer
git undo
10.4 gerrit
Gerrit is a code review system for git.
10.5 legit
legit
simplifies feature branch workflows.
10.6 rerere
Not repeating yourself during merges/rebases? git rerere automates this:
11 Importing some files across a branch
12 Garbage collecting
In brief, this will purge a lot of stuff from a constipated repo in emergencies:
13 Editing history
13.1 Cleaning out all big files
bfg
does that:
13.2 Deleting specific things
I think bfg
also does this. There is also native support:
13.3 Exporting a minimal history from some repository
i.e. Exporting a branch for a client/collaborator, which they should still operate on in git, but which does not contain all the potentially proprietary stuff in the main repo. Ideally they should see one commit with no history.
If the merge history is clean, there is no need to be fancy; if I have a branch which has never merged in any secret information then I can just push it to a new repository and it won’t bring along any of the secret stuff.
OTOH, research code is often unhygienic and chaotic, so we might need to be more careful.
Option 0: Export a tarball and then forget about git:
13.3.1 Option 1: squash the whole thing onto a single commit.
I don’t know a faster way of doing this than the classic:
Addendum: possibly this bash hack is superior?
13.3.2 Option 2: create an orphan branch and copy the files over
git checkout --orphan temp_branch
git add -A
git commit -am "Initial commit"
git branch -D master
git branch -m master
git push -f origin master
Merging with branches created this way can be tedious, however.
13.3.3 Option 3: Serendipitous orphan
create an orphaned commit that exactly matches an existing commit:
TREE=`git cat-file -p master |sed '1,/^$/s/^tree //p;d;'`
COMMIT=`echo Truncated tree | git commit-tree $TREE`
git branch truncated-master $COMMIT
git branch backup-master-just-in-case-i-regret-it-later master
git push -f origin truncated-master:master
I think this possibly allows us to easily cherry-pick commits against the new tree and return to the original.
14 Making git work with a broken-permission FS
e.g. you are editing a git repo on NTFS via Linux and things are silly.
15 Detecting if there are changes to commit
16 Emergency commit
Oh crap I’m leaving the office in a hurry and I just need to get my work into git ASAP for continuing on another computer. I don’t care about sensible commit messages because I am on my own private branch and no-one else will see them when I squash the pull request.
I put this little script in a file called gitbang
to automate this case.
#!/usr/bin/env bash
# I’m leaving the office. Capture all changes in my private branch and push to server.
if output=$(git status --porcelain) && [ -z "$output" ]; then
echo "nothing to commit"
else
git add --all && git commit -m bang
fi
git pull && git submodule update --init --recursive && git push
Or: this could be a fish function why not
function gitbang --argument-names remoteref --description "push to remote"
set pushrel ""
set -q remoteref; or set pushrel HEAD:remoteref
set gitstatus (git status --porcelain)
if (test -z "$gitstatus" )
echo "nothing to commit"
else
git add --all ; and git commit -m bang
end
git pull ; and git submodule update --init --recursive ; and git push
end
Pro tip: if you use VS code there is a feature called Cloud Changes that synchronises your changes to the cloud, so you can pick up where you left off on another computer without arsing about with git.
17 Git hosting
- Codeberg Documentation is a gitea host
- Gitlab
- github
Or maybe avoid the need for hosting by using…
18 git email workflows
19 Content-specific diffing
Tools such as git-latexdiff provide custom diffing for, in this case, LaTeX
code. These need to be found on a case-by-case basis.
20 SSH credentials
Managing SSH credentials in git is non-obvious. See SSH.
21 Jupyter
For sanity in git+jupyter, see jupyter
.
22 Decent GUIs
See Git GUIs.
23 Which repo am I in?
For fish
and bash
shell, see bash-git-prompt.
24 Data versioning
See data versioning.