Hi,
I wanted to start a new thread about beman.infra; the existing discussion is
here,
but in this thread I wanted to provide a brief summary of the problem, talk about some
work I’ve done to address it, and focus the discussion specifically on the git workflow
that we want to use.
The background is that the beman.exemplar repository is simultaneously:
- A template that we want users to base new projects on
- A collection of tooling and best practices that we are continually updating as we evolve
our recommendations for Beman libraries
We have an issue with the fact that there’s no good way to propagate updates to our
template into projects that have already been generated from that template.
To address this, we want to try moving some parts of exemplar into a separate repository
that other projects can pull in as a dependency.
So far, we’ve built up a consensus that we don’t want to use git for dependency management
in Beman. However, in the specific case of beman.infra, I think that there is an important
technical reason to make an exception. The reason is that beman.infra is the only
dependency that we need to already be available in order to even launch a CMake configure,
because we want beman.infra to contain the CMake toolchain files. Every other dependency
can be configured by CMake itself, but with beman.infra, there’s a chicken-egg problem
that makes git dependency management the only solution.
I’ve put up two pull requests representing potential approaches for adding beman.infra to
beman.exemplar via git:
- PR 157 adds beman.infra to beman.exemplar as a git subtree
- PR 158 adds beman.infra to beman.exemplar as a git submodule
Here is my brief summary of the conceptual differences between git submodule and git
subtree.
Adding a git submodule to a repository is essentially equivalent to storing a tuple of
{URL, commit hash, subdirectory path} inside the repository. Then, various git submodule
commands are available which can reify the tuple by checking out the repository into the
specified path at the specified commit, or can update the tuple’s properties to reflect
changes to the URL, commit hash, or subdirectory path.
The principal disadvantage of submodules that raises objections is the fact that the
submodule’s subdirectory is not automatically kept in sync with the tuple; doing so
requires manually running git submodule
commands. If the user forgets to run
git submodule update --init
after a git clone
, the submodule’s subdirectory will be
empty; similarly, without running git submodule update
after a git pull
, the
submodule’s subdirectory can be outdated.
On the other hand, git subtrees take advantage of git’s support for merging together
unrelated commit histories in order to incorporate the git commits of the dependency into
the commit graph of the parent. For example, imagine we have a git repo called parent
with the following history:
o Frobnicate widgets
|
o Reticulate splines
|
o Initial commit
And a repo called dependency
with the history:
o Reinitialize enigmas
|
o Calibrate flux capacitors
|
o Initial commit
When dependency
is added as a subtree, the repository’s history will look like:
o Add 'dependency/' from commit '12345abcdef'
|\
| \
| o Reinitialize enigmas
| |
| o Calibrate flux capacitors
| |
| o Initial commit
|
o Frobnicate widgets
|
o Reticulate splines
|
o Initial commit
When a further change is made to dependency
that needs to be brought in to parent
,
it’s incorporated as a merge commit:
o Merge commit '54321fedcba'
|\
| \
| o Ameliorate checksums
| |
o | Add 'dependency/' from commit '12345abcdef'
|\ |
| \|
| o Reinitialize enigmas
| |
| o Calibrate flux capacitors
| |
| o Initial commit
|
o Frobnicate widgets
|
o Reticulate splines
|
o Initial commit
On the other hand, the corresponding git history of parent
with submodules would look
like:
o Update 'dependency' submodule pointer to '54321fedcba'
|
o Add 'dependency' as submodule
|
o Frobnicate widgets
|
o Reticulate splines
|
o Initial commit
Although either option is an improvement on the status quo, I would prefer that we add
beman.infra to beman.exemplar as a git submodule rather than a git subtree, for the
following reasons.
Git submodules are a more natural way of maintaining a single source of truth for the
contents of beman.infra. The structure of git submodules forces every change to
beman.infra’s contents to actually live inside of the beman.infra repository, because the
submodule is always fetched from its upstream. These changes can either be on the main
branch or on a feature branch, but they must be encapsulated inside of beman.infra. By
contrast, since a git subtree would not force encapsulation of the dependency, it makes it
easier for developers to make local changes to files in their own copy of beman.infra
without contributing them back upstream, leading to potentially messy git merges every
time they update beman.infra.
Git submodules are more flexible with respect to the commit workflows they support. There
is a longstanding holy war between users of merge-based workflows and users of
rebase-based workflows; I’m on the rebase workflow side, and beman.utf_view enforces a ban
on merge commits via unticking GitHub’s “Allow merge commits” checkbox in its settings. I
personally think that merge commits make the git history much more difficult to
interpret. Using a git subtree would require me to add merge commits to my git history,
which breaks my preferred commit workflow; on the other hand, using git submodules
wouldn’t break the commit workflow of developers that prefer git merges.
The main objection to git submodules is the need to keep the submodule in sync with git submodule update
commands. I mainly just see this as an education problem; it just means
that users that want to consume Beman’s git repositories need to learn how git submodules
work. I think that’s a reasonable requirement, since git is the de facto standard for
version control, with a 94% adoption rate, and almost all git users will eventually
encounter git submodules-- most users are already familiar with them, and any user who
isn’t is probably going need to learn them at some point anyway.
We would mainly just need to convey:
Run
git submodule update --init
after you clone, pull, or checkout.
That’s sufficient to consume the library, although it gets more complicated if you want to
contribute your own update to the submodule; but not more complicated than the git subtree
contribution workflow.
Finally, the git subtree workflow is dissimilar to the more typical dependency workflows we use elsewhere. If my library depends on gtest, I specify in CMake that I want gtest to be made available to my library, or do something like adding it to a package manager lockfile, but I don’t incorporate gtest’s entire git history into my repo.
Thanks for reading, and let me know what you think.