From the title I expected it to be a tool for treating multiple separate repos as though they were all just one single monorepo. But from the description in the README, it seems to be for treating subsets of a monorepo as though they were separate repositories.
PS: The title, in case it is changed, is currently “josh: Get the advantages of a monorepo with multirepo setups”
That was what I expected from the description as well. After reading the readme it's not clear to me what problem this is trying to solve, and why this is the solution.
The problem is that large codebases tend to have a huge footprint if you need to clone the whole repo. Git as-is does not allow you to only pull a subset, i.e. specific paths representing a sub project. That's what josh is trying to solve: a "virtual" repo that behaves like a real git repo but behind the scenes seemlessly integrates with the big monorepo.
I believe git provides that functionality through sparse-checkout. You can clone a repository without checking it out, then use sparse-checkout to only pull the paths you want.
sparse-checkout only reduces the number of files copied from the local repo to the working directory. It doesn't affect the amount downloaded data. For that you need shallow and partial clones (shallow clones give you a subset of history, partial clones give you a subset of the files within that history). Partial clones especially are a relatively new and not heavily used git feature.
Partial clone with the --filter feature seems complicated to use. You need to use a bunch of command to set it up and then it looks like you still need to be careful while using it.
disclaimer: I haven't seen for myself what benefits monorepos actually provide so I don't fully grok them.
This is the kind of talk about monorepos that makes me think they are a bad idea. Why would someone want to maintain a monorepo and then pretend it's not a monorepo? Not only just pretend it isn't, but invest not-insignificant time on the problem of pretending it's not a monorepo?
I am immediately thinking of the horribleness of how some of the (older) javascript frameworks re-invented the back button (and browser history in general) instead of.. ya know, using the browser.
One big advantage of a monorepo is that when you check out the tree you automatically get the versions of all the files that work together (assuming there's some CI!). If you want to refactor an API you can refactor its callers easily and check the whole thing in. Etc.
With each project having its own repo, then you have to track the fact that Foobar 2.2 works with baizo 1.6-1.8 but not more recent versions.
Also conceptually it's easier when you are working with the client and the server at the same time, or the two mobile apps, and so on.
Of course people manage without this when the project has stuff that doesn't fit in a software repo (CAD designs, artwork, etc...there's a reason why that POS Perforce survives, for example). Solidworks has its own proprietary RCS that doesn't work with anything else.
IMHO if the project is relatively small (say <500K LoC) a monorepo is almost always the way to go. But with a big project it breaks down.
It’s basic git that breaks down, not the monorepo model.
And it’s less about LoC, and more about the number of files and how much binary stuff you put in your repo (and how often it changes). Git is really bad when binary data is involved.
Git has the facilities to keep monorepos clicking along (shallow clones and sparse checkout) but they aren’t along the “happy path”
And many git repository hosting services, like bitbucket, have limits on how large your repository can be. Sometimes you can upgrade these limits. This often leads to fear of exceeding this limit and can lead to one repository per module.
That’s an interesting claim considering Google, Facebook and Microsoft run monorepos. Heck Apple does too internally although just for the build team (snapshots of each project submitted to them, but it all goes into a mono repo)
Its simple. Monorepos allows(does not force, nor guarantee) you to make bigger atomic changes to many projects at once.
You can update a library and all the downstream projects in a single commit. There's no race condition or caching problem of pulling an update without pulling/seeing the dependency update. You don't need to wait for dependency artifacts to build and propogate.
You can create a turn key build script that will build the world from source. You can skip any local artifact storage like Artifactory. You don't need to pull multiple repos in a serial fashion, no dependent pulls. You can structure your codebase such that if you pull one commit it can have no other dependencies.
The draw back is Git happens to not make it easy to pull just one folder. Other things like Perforce make it trivial.
When you have shared resources between two services (such as React components), you have 2 choices: have a separate shared repo that will need to have its own versioning, and keeping it in sync with development is a pain if multiple people are working on features that touch both shared components and individual services at once, or have a monorepo where the service and shared components can just be worked on through the same repo.
The same story is true with things like APIs or types where two services need to stay in sync.
> I am immediately thinking of the horribleness of how some of the (older) javascript frameworks re-invented the back button (and browser history in general) instead of.. ya know, using the browser.
They did that because the browser didn't support adding to the history via JavaScript.
But even now that the browser does support adding to the history via Javascript ... is that really just "using the browser"? At some level in many modern web apps back button history is not just the browser. This isn't an ancient thing left behind with old frameworks.
Monorepo is an alternative to having binary dependencies with a registry scheme such as npm or maven (at the organization level). It's essentially only workable with tooling support none of us has (unless you work for Google or one of the few other shops that have said in-house tooling). It isn't a workable approach using stock git or github (but that won't stop people trying nor claiming to the contrary).
I wouldn't want to do it at Google's scale without Google's tooling. But my experience has been that, at the scales I've worked at (no more than a couple million LOC across the organization), the limiting factor isn't source control, it's the build system. Maven, for example, doesn't really understand monorepos, so it can be a bit difficult to figure out a how to implement a policy for deciding what needs to be built when that's less heavy-handed than, "build everything always."
I apologise for the title. HN has a short limit for title length, so I came up with my own title. I thought this title did a decent job of presenting, using short language, the main application that the authors gave top-billing in the README.
I am not affiliated with the project.
JOSH claims to be reversible, so it could be used in either direction, which is where the multiple use cases come in. Treating subsets of a repo as their own repo, or treating multiple repos as one. I would say there is some application overlap between this and git submodule/subtree/subrepo and also tools like copybara.
PS: The title, in case it is changed, is currently “josh: Get the advantages of a monorepo with multirepo setups”