Kiriakos Krastillis's Blog

Why git subtree is not the answer

A lot has been said about git submodules lately. Many well known developers seem to cater to the opinion that a native submodule primitive in a VCS is somehow a bad thing and start wildly proposing alternative solutions. In my role as a technical lead at Glanzkinder I was more than once confronted with similar argumentation. Many times we launched into deep research and review procedures investigating all these claims. Now some years later we still are using submodules. We also never had any of the problems that many people seem to advertise. From time to time a less experienced new hire might come up with questions and the idea to scrap git submodule altogether but that's about it. Actually the criticism itself only ever came from younger developers who still weren't completely eloquent with things like Unix, protocols and VCS.

After collecting my thoughts for a while and reading up on our own past research, two big issues come to mind. As analysis will show they actually boil down to one specific problem that isn't even near the software layer.

The this code doesn't exist! issue. This presents itself when People who don't understand how git and the underlying transport protocols work tend to get lost in submodule hell. Usually it is a case of not utilising best practices. Not using git via ssh or git protocol or not authenticating via certificates can lead to states where submodules don't come automatically when checking out a project. This of course isn't a big issue; most experienced users can easily cobble up a manual fetch of submodules via the cli. Even if you are not experienced all you have to do is look at the documentation and you are set for life.
The version conflict problem. In nature another variation of the this code doesn't exist! problem. This is people missing a version bump of a submodule. This happens when someone updates a submodule on his instance of the repository and pushes. The dependency to the new version of the submodule will, correctly, also propagate to the including repo. Users pulling the new commits to the master repo will also have to pull the changes to the submodule. So what if the submodule repo hasn't been pushed to and the changes only exist on the local machine of the developer who made them. That is very unfortunate indeed isn't it? Still one can of course just rollback that commit but that isn't the point here. When using modules in a project one has to comprehend the consequences. You can't go along and treat the whole repo as one monolithic block. You have to be concious about the fact that the submodules must have a separate release lifecycle from the base repo. This also means that it actually is non standard procedure to do changes on the submodule. Of course you have the flexibility of doing it but when committing the changes to the submodule must be released (i.e.: pushed to the upstream repo) before the base repo changes.

So this class of problems is actually lack of understanding and discipline instead of drawbacks to software. If you are occurring such problems in your teams the real culprit is not Your git strategy but overall compliance to logical workflows. Try looking at the project from further away and reflect a bit more on dev operations; such problems will explain and solve themselves eventually when your basics are solid.

In my opinion the drawbacks of using git subtree are not acceptable:

You must learn about a new merge strategy (i.e. subtree).
This actually will break every sophisticated git workflow if you have automated scripts doing things or are doing any advanced git workflow at all.
Contributing code back upstream for the sub-projects is slightly more complicated.
Yes it is!
The responsibility of not mixing super and sub-project code in commits lies with you.
If your projects get busy enough You know you will get in trouble with this one eventually..

Still reading: