Hacker News new | past | comments | ask | show | jobs | submit login

> You introduce a whole new set of failure modes due to going over the network.

A thousand times yes. Distributed systems are hard.

> Debugging is more difficult since you now can no longer step through your program in a debugger but rather have an opaque network request that you can't step into.

Yes. Folks underestimate how difficult this can be.

In theory it should be possible to have tooling to fix this, but I've not seen it in practice.

> You can no longer use editor/IDE features like go to definition.

Not a problem with a good editor.

> Version control becomes harder if the different services are in different repositories.

No organisation should have more than one regular-use repo (special-use repos, of course, are special). Multiple repos are a smell.




> No organisation should have more than one regular-use repo (special-use repos, of course, are special). Multiple repos are a smell.

I would modify this slightly. Larger organizations with independent teams may want to run on per-team repos. Conway's law is an observation about code structure but it sometimes also makes good practice for code organization. And of course, sometimes the smell is "this company is organized pathologically".

Another problem is that large monolithic repositories can be difficult to manage with currently available software. Git is no panacea and Perforce isn't either.


> No organisation should have more than one regular-use repo

Flat out wrong for any organization with multiple products. Which, let's be honest, is most of them.


I guess Facebook, Twitter, and Google are doing things "flat out wrong", then. Yes, that's a weak argument (argument from authority) but it is true that monolithic repositories have major advantages even for organizations with multiple products. Common libraries and infrastructure are much easier to work with in monolithic repositories.

My personal take on it, at this point, is that much of our knowledge of how to manage projects (things like individual project repos, semantic versioning, et cetera) is centered on the open-source world of a million mostly-independent programmers. Things change when you work in larger organizations with multiple projects. You even start to revisit basic ideas like semantic versioning in favor of other techniques like using CI across your entire codebase.


Those are huge organizations with commensurately large developer resources, and they simply work at a different scale than most people on HN. "It works for Google" is not an argument for anything.

Monorepos come with their own challenges. For example, if any of your code is open source (which means it must be hosted separately, e.g. on Github), you have to sync the open-source version with your private monorepo version.

Monorepo are large. Having to pull and rebase against unrelated changes on every sync puts an onerous burden on devs. When you're remote and on the road, bandwidth can block your ability to even pull.

And if you're going to do it like Google, you'll vendor everything -- absolutely everything (Go packages, Java libraries, NPM modules, C++ libraries) -- which requires a whole tool chain to be built to handle syncing with upstream, as well as a rigid workflow to prevent your private, vendored fork from drifting away from upstream.

There are benefits to both approaches. There is no "one right way".


It seems we agree, we are both claiming that "there is no one right way".

I love Git, and I used submodules for years in personal projects. It started with a few support libraries shared between projects, or common scripts for deployment, but it quickly ballooned into a mess. I'm in the process of moving related personal projects to a monolithic repository, and in the process I'm giving up the ability to tag versions of individual projects or provide simple GitHub links to share my code.

Based on these experiences, I honestly think that the only major problem with monolithic repositories is that the software isn't good at handling it, and this problem could be solved with better software. If the problem is solved at some point in the future, I don't think the answer will look much like any of the existing VCSs.

Based on experiences in industry, my observation is that the choice of monolithic repository versus separate repository is highly specific to the organization.


> No organisation should have more than one regular-use repo (special-use repos, of course, are special). Multiple repos are a smell.

Mind elaborating on this?


> > You can no longer use editor/IDE features like go to definition. > Not a problem with a good editor.

What editor are you thinking of that can jump from HTTP client API calls to the corresponding handler on the server?


> No organisation should have more than one regular-use repo (special-use repos, of course, are special). Multiple repos are a smell.

Totally agree with everything else, but gotta completely disagree on this last point. Monorepos are a huge smell. If there's multiple parts of a repo that are deployed independently, they should be isolated from each other.

Why? Because you're fighting human nature, otherwise. It's totally reasonable to think that once you excise some code from a repo that it's no longer there, but when you have multiple projects all in one repo, different services will be on different versions of that repo, and your change may have changed semantics enough that interaction bugs across systems may occur.

You may think that you caught all of the services using the code you refactored in that shared library, but perhaps an intermediate dependency switched from using that shared library to not using it, and the service using that intermediate library hasn't been upgraded, yet?

When separately-deployable components are in separate repositories, and libraries are actual versioned libraries in separate repositories these relationships are explicit instead of implicit. Explicit can be `grep`ed, implicit cannot, so with the multi-repo approach you can write tools to verify that all services currently in production are no longer using an older, insecure shared library, or find out exactly which services are talking to which services by the IDLs they list as dependencies.

While with the monorepo approach you can get "fun" things like service A inspecting the source code of service B to determine if cache should be rebuilt (because who would forget to deploy service A and service B at the same time, anyways...), as an example I have personally experienced.

My personal belief is that the monorepo approach was a solution back when DVCSs were all terrible and most people were still on centralized VCSs like Subversion that couldn't deal with branches and cross-repo dependencies well, and that's just what you had to do, while Git and Mercurial, along with the nice language-level package managers, make this a non-issue.

Finally, there's an institutional bias to not rock the boat (which I totally agree with) and change things that are already working fine, along with a "nobody got fired buying IBM" kind of thing with Google and Facebook being two prominent companies using monorepos (which they can get away with by having over a thousand engineers each to manage the infrastructure and build/rebuild their own VCSs to deal with the problems inherent to monorepos that most companies don't have the resources and/or skills to replicate).

EDIT: Oh, I forgot, I'm not advocating a service-oriented architecture as the only way to do things, I'm just advocating that whatever your architecture, you should isolate the deployables from each other and make all dependencies between them explicit, so you can more easily write tooling to automatically catch bad deploy states, and more easily train new hires on what talks to/uses what, since it's explicitly (and required to be) documented.

If that still means a monorepo for your company's single service and a couple of tiny repos for small libraries you open source, that's fine. If it means 1000 repos for each microservice you deploy multiple times a day, that's also fine (good luck!).

Most likely it means something like 3-10 repos for most companies, which seems like the right range for Miller's Law) ( https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus... ) and therefore good for organizing code for human consumption.


> It's totally reasonable to think that once you excise some code from a repo that it's no longer there, but when you have multiple projects all in one repo, different services will be on different versions of that repo, and your change may have changed semantics enough that interaction bugs across systems may occur.

But having multiple repos doesn't prevent the equivalent situation from happening (and, I think, actually makes it much likelier): no matter what, you have to have the right processes in place to catch that sort of issue.

> You may think that you caught all of the services using the code you refactored in that shared library, but perhaps an intermediate dependency switched from using that shared library to not using it, and the service using that intermediate library hasn't been upgraded, yet?

That's the sort of problem which happens with multiple repos, but not (as often) with a single repo.

> Explicit can be `grep`ed, implicit cannot, so with the multi-repo approach you can write tools to verify that all services currently in production are no longer using an older, insecure shared library, or find out exactly which services are talking to which services by the IDLs they list as dependencies.

A monorepo is explicit, too, even more explicit than multiple repos: WYSIWYG. And you can always see if your services are using the same API by compiling them (with a statically-typed language, anyway).

The beautiful thing about a monorepo is it forces one to confront incompatibilities when they happen, not at some unknown point down the road, when no-one know what changed and why.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: