From the title I expected it to be a tool for treating multiple separate repos a...

lbhdc · on July 15, 2021

That was what I expected from the description as well. After reading the readme it's not clear to me what problem this is trying to solve, and why this is the solution.

ganafagol · on July 15, 2021

The problem is that large codebases tend to have a huge footprint if you need to clone the whole repo. Git as-is does not allow you to only pull a subset, i.e. specific paths representing a sub project. That's what josh is trying to solve: a "virtual" repo that behaves like a real git repo but behind the scenes seemlessly integrates with the big monorepo.

lbhdc · on July 15, 2021

I believe git provides that functionality through sparse-checkout. You can clone a repository without checking it out, then use sparse-checkout to only pull the paths you want.

rcxdude · on July 15, 2021

sparse-checkout only reduces the number of files copied from the local repo to the working directory. It doesn't affect the amount downloaded data. For that you need shallow and partial clones (shallow clones give you a subset of history, partial clones give you a subset of the files within that history). Partial clones especially are a relatively new and not heavily used git feature.

varajelle · on July 16, 2021

Partial clone with the --filter feature seems complicated to use. You need to use a bunch of command to set it up and then it looks like you still need to be careful while using it.

I'd dream of somethong as simple as

   git clone --partial foo/bar https://example.com/some.repo.git

And then everything would work normally.

chrschilling · on July 16, 2021

Your dream came true ;) This is what Josh does:

  git clone https://example.com/some.repo.git:/foo/bar.git

And then everything works normally.

Jenk · on July 15, 2021

disclaimer: I haven't seen for myself what benefits monorepos actually provide so I don't fully grok them.

This is the kind of talk about monorepos that makes me think they are a bad idea. Why would someone want to maintain a monorepo and then pretend it's not a monorepo? Not only just pretend it isn't, but invest not-insignificant time on the problem of pretending it's not a monorepo?

I am immediately thinking of the horribleness of how some of the (older) javascript frameworks re-invented the back button (and browser history in general) instead of.. ya know, using the browser.

gumby · on July 15, 2021

One big advantage of a monorepo is that when you check out the tree you automatically get the versions of all the files that work together (assuming there's some CI!). If you want to refactor an API you can refactor its callers easily and check the whole thing in. Etc.

With each project having its own repo, then you have to track the fact that Foobar 2.2 works with baizo 1.6-1.8 but not more recent versions.

Also conceptually it's easier when you are working with the client and the server at the same time, or the two mobile apps, and so on.

Of course people manage without this when the project has stuff that doesn't fit in a software repo (CAD designs, artwork, etc...there's a reason why that POS Perforce survives, for example). Solidworks has its own proprietary RCS that doesn't work with anything else.

IMHO if the project is relatively small (say <500K LoC) a monorepo is almost always the way to go. But with a big project it breaks down.

beagle3 · on July 15, 2021

It’s basic git that breaks down, not the monorepo model.

And it’s less about LoC, and more about the number of files and how much binary stuff you put in your repo (and how often it changes). Git is really bad when binary data is involved.

Git has the facilities to keep monorepos clicking along (shallow clones and sparse checkout) but they aren’t along the “happy path”

romwell · on July 15, 2021

That's why there's git-lfs[1] (large file storage)

Keeps the binaries out of your repo, replacing them with pointers

[1]https://git-lfs.github.com

drjasonharrison · on July 15, 2021

And many git repository hosting services, like bitbucket, have limits on how large your repository can be. Sometimes you can upgrade these limits. This often leads to fear of exceeding this limit and can lead to one repository per module.

vlovich123 · on July 15, 2021

That’s an interesting claim considering Google, Facebook and Microsoft run monorepos. Heck Apple does too internally although just for the build team (snapshots of each project submitted to them, but it all goes into a mono repo)

romwell · on July 15, 2021

I think the parent comment meant that multi-repo breaks down for large project, but made a typo.

gumby · on July 15, 2021

Google’s monorepo isn’t complete, excluding various things like Android. Also it isn’t git

romwell · on July 15, 2021

>IMHO if the project is relatively small (say <500K LoC) a monorepo is almost always the way to go

A typo? You seem to mean that multi-repo is the way to go :)

jayd16 · on July 15, 2021

Its simple. Monorepos allows(does not force, nor guarantee) you to make bigger atomic changes to many projects at once.

You can update a library and all the downstream projects in a single commit. There's no race condition or caching problem of pulling an update without pulling/seeing the dependency update. You don't need to wait for dependency artifacts to build and propogate.

You can create a turn key build script that will build the world from source. You can skip any local artifact storage like Artifactory. You don't need to pull multiple repos in a serial fashion, no dependent pulls. You can structure your codebase such that if you pull one commit it can have no other dependencies.

The draw back is Git happens to not make it easy to pull just one folder. Other things like Perforce make it trivial.

beaconstudios · on July 15, 2021

When you have shared resources between two services (such as React components), you have 2 choices: have a separate shared repo that will need to have its own versioning, and keeping it in sync with development is a pain if multiple people are working on features that touch both shared components and individual services at once, or have a monorepo where the service and shared components can just be worked on through the same repo.

The same story is true with things like APIs or types where two services need to stay in sync.

zeven7 · on July 15, 2021

> I am immediately thinking of the horribleness of how some of the (older) javascript frameworks re-invented the back button (and browser history in general) instead of.. ya know, using the browser.

They did that because the browser didn't support adding to the history via JavaScript.

But even now that the browser does support adding to the history via Javascript ... is that really just "using the browser"? At some level in many modern web apps back button history is not just the browser. This isn't an ancient thing left behind with old frameworks.

dboreham · on July 15, 2021

Monorepo is an alternative to having binary dependencies with a registry scheme such as npm or maven (at the organization level). It's essentially only workable with tooling support none of us has (unless you work for Google or one of the few other shops that have said in-house tooling). It isn't a workable approach using stock git or github (but that won't stop people trying nor claiming to the contrary).

mumblemumble · on July 15, 2021

I wouldn't want to do it at Google's scale without Google's tooling. But my experience has been that, at the scales I've worked at (no more than a couple million LOC across the organization), the limiting factor isn't source control, it's the build system. Maven, for example, doesn't really understand monorepos, so it can be a bit difficult to figure out a how to implement a policy for deciding what needs to be built when that's less heavy-handed than, "build everything always."

mumblemumble · on July 15, 2021

Whereas what I was kind of hoping for was something that works like svn externals.

(No, git submodules are not it.)

oftenwrong · on July 15, 2021

I apologise for the title. HN has a short limit for title length, so I came up with my own title. I thought this title did a decent job of presenting, using short language, the main application that the authors gave top-billing in the README.

I am not affiliated with the project.

JOSH claims to be reversible, so it could be used in either direction, which is where the multiple use cases come in. Treating subsets of a repo as their own repo, or treating multiple repos as one. I would say there is some application overlap between this and git submodule/subtree/subrepo and also tools like copybara.

tusharsadhwani · on July 15, 2021

You might want to check out https://github.com/asottile/all-repos :)

geitir · on July 15, 2021

Google's tool repo kind of does what you thought this does

icythere · on July 15, 2021

Is that for android build system only? West is a better one I think https://github.com/zephyrproject-rtos/west#basic-usage

karlding · on July 16, 2021

No, repo is not tied to AOSP or the Android build system.

As an example, Chromium [0] is a non-AOSP project that also uses repo.

[0] https://chromium.googlesource.com/chromiumos/docs/+/HEAD/dev...