It took me a while to find out what Canva actually does, but from https://www.ca...

arinlen · on June 16, 2022

> Didn't someone on the team say "hey, it takes 10 seconds to run git status, can we move this junk out and do this another way??"

Why do you assume they didn't?

Just because they arrived at a different conclusion than you that doesn't mean they didn't thought about it. I might very well mean you did not considered the tradeoffs they had to take into account, mainly because you're out of the loop.

rtpg · on June 16, 2022

One reason for doing this is that someone might be using a very small set of the monorepo, so having everything prebuilt means that in general things are very fast, without constantly having to rebuild chunks of your system.

This is, I think, very common with stuff like browsers, where you have artifact checkouts that basically include built stuff since otherwise you're sitting there compiling forever all the time.

Stuff like Bazel in theory helps with this, but tools that help with this are either super idiosyncratic about how they work (meaning hard to adopt) or outright don't work.

I mean personally I would find that pretty annoying and unclean but I like DAGs.

whynaut · on June 16, 2022

They’re not assuming anything, but stating that the post itself does not cover this well/at all.

heyparkerj · on June 16, 2022

The post itself covers in the intro that they made the conscious decision to go with a monorepo, accepting the downsides of it. Much more than this article though, I'd like to read one discussing that decision and why they went that direction.

ElemenoPicuares · on June 16, 2022

Canva gives users without design tools expertise the ability to make fairly polished looking graphics with a super easy and intuitive interface. (As a designer, I can assure you that polished looking is not the same thing as designed.) It’s a very popular service, so they’re dealing with huge scale. Intuitive interfaces often come with complex mechanisms and lots of assets, and they have clients on every major mobile and desktop platform, and the web. They also do a ton of heavy graphics processing that is likely done in lower-level languages than the interface.

The likelihood of a 2000 employee software company simply not considering that they could streamline their build process is pretty slim.

necovek · on June 16, 2022

They have obviously invested a lot over time into streamlining their build process: so much so that they're putting an article about it.

All of the problems they are having are basically due to their use of a monorepo: they do explain that they made the decision early, but I wonder what are the advantages over multiple repos they are seeing that it was worth it all this trouble?

ralphb · on June 16, 2022

We can talk about the advantages of monorepos, but your questions is phrased in a way that makes me think that you don't see any "trouble" in multiple repositories.

I would encourage you to do some research and keep an open mind.

necovek · on June 16, 2022

I see "trouble" in all approaches: their solutions to manage their git monorepo approach it by switching to multi-repo emulation.

etskinner · on June 16, 2022

They seem to have purposely left that out of the scope of this article. There are myriad articles about the benefits and downsides of monorepos vs multiple repos.

necovek · on June 16, 2022

In another sibling comment I mentioned how I looked through their engineering blog and didn't see any post where they have talked about any of the benefits they are enjoying due to their use of monorepo.

The question is specifically about their usecase, since not everybody would hit the same bottlenecks as they did with git monorepos.

Iow, have they stopped and thought whether it's still worth it (eg how often do their engineers make use of the monorepo benefits like cross-project refactorings)?

_the_inflator · on June 16, 2022

Don't rely on this anecdotal heuristic. Have a look at some enterprises. My experience: "How to solve scaling issues?" - "Automation? No, another team." ;)

ElemenoPicuares · on June 16, 2022

This is not a 15k employee enterprise riveting features onto a codebase from the 90s— it’s a <10yo company that makes one product. Big difference in approach.

haizzz · on June 16, 2022

hey hey author here, xlf files are translations that are coupled with the texts we set in the code so they're not really generated I admit that was misleading. What I wanted to get across is they're not touched directly by engineers but they're still created through our translation pipeline where real humans translate them

necovek · on June 16, 2022

How do you split your XLIFF files? Does each project get one big one and the proliferation is simply due to number of languages, or do you have a more granular split (eg. if you've got one component, it will have dozens of XLIFF files for every language, instead of one per language)?

By the numbers you mention, 70% of the files make a ratio of code files to translation files 1-3, so unless you only support 3-5 languages, it's definitely not one XLIFF file per source file, so I wonder at what granularity it is?

(My experience is mostly with localizations using GNU gettext tools, and you usually do a small-finite-number of PO files per project per language, where that small-finite number is exactly one for like 99% of projects)

auscompgeek · on June 16, 2022

It's one XLIFF file per locale per component, not including the source en_US. We currently support 104 locales.

More info: https://news.ycombinator.com/item?id=28931601

necovek · on June 16, 2022

Thanks: that's still a lot of components (3000+) if you've got at least 300k xlf files!

Good job managing all that regardless of the approach :)

tasn · on June 16, 2022

One of my biggest pet peeves: tracking generated files in version control.

The only exception is our generated OpenAPI spec, because we want people to be explicit about modifying the API, and have a CI task that verifies that the API and OpenAPI spec match.

cnity · on June 16, 2022

One alternative that enables you to keep generated files out but still feel like there's an explicit human check in place is to add a gated confirmation step in CI to confirm that the changes to the generated spec match expectations.

Something like: "This change will result in the following new API endpoints: ... do you wish to continue?"

tasn · on June 16, 2022

Hm... An interesting thought!

What does it compare against though? Need to add more state to the CI? We kinda like the interface be part of the version control and having an audit chain that's part of the code.

cnity · on June 17, 2022

Yeah either you'd have to maintain the generated spec as a versioned artifact hosted wherever is most convenient for you, or the CI could actually generate the before and after specs based on the PR diff. If the calculation of the spec is computationally expensive (it shouldn't be) then the latter approach could be a problem.

cmcconomy · on June 16, 2022

you could put it in the pre-commit githook

tasn · on June 19, 2022

That's not enforced centrally (or visible by the reviewer) so hard to trust it that everyone will remember having it.

Though even without that, I'm not sure how it'll even mechanically work.

SenHeng · on June 16, 2022

Remember back when people recommended commiting node_modules into git?

jamesfinlayson · on June 16, 2022

Ah - that would explain why at my current job there was a node_modules directory in git with nearly 2 million lines of Javascript within.

It is gone now.

SenHeng · on June 16, 2022

The days before lock files were a thing and 'it works on my machine!' was rampant.

sausagefeet · on June 16, 2022

Lock files protect you from the version changing out from under you, but modules disappearing from NPM is a thing that happens. Yes, you can use artifactory or similar as a proxy but that requires infrastructure that you may not want to run. That is all to say: there are situations where committing node_modules is the least evil.

anitil · on June 16, 2022

... are we no longer doing 'works on my machine' ?

dotancohen · on June 16, 2022

Ostensibly if it works in Alice's Docker instance, it will run in Bob's Docker instance too.

sigio · on June 16, 2022

Well... unless some dev's have M1-macs, and some of the docker layers are not available for arm, or the other way around, not available for amd64. Gives interesting issues.

Beltiras · on June 16, 2022

Except for weird Docker edge cases (extremely rare, but does happen).

necovek · on June 16, 2022

Not rare at all.

Docker is a congregation of technologies held together with duct tape and glue.

Eg. permissions handling is completely different on Macs with Docker Desktop from the Linux dockerd stuff: on Macs, it automatically translates user ownership for any "mounted" local storage (like your code repository), whereas on Linux user IDs and host system permissions are preserved. Have some developers use Macs and others use Linux, and attempt to do proper permissions setup (otherwise known as "don't run as root"), and you are looking for some fun debugging sessions.

the_common_man · on June 16, 2022

> Docker is a congregation of technologies held together with duct tape and glue.

No, it's not. What a wild conclusion to reach from the example you gave.

coding123 · on June 16, 2022

At companies that don't check in node_modules, build folders, and are using standard packaging tooling like maven or yarn or npm or what-have-you. Yes, I haven't experienced that in like 15 years.

raverbashing · on June 16, 2022

Ugh

The price of letting less experienced people "go crazy" in the repo

diegoveralli · on June 16, 2022

Npm didn't support lockfiles until version 5, released in 2017, Yarn had them at launch in 2016. Before that committing node_modules was often used as a form of vendoring, to get reproducible builds.

If a new project these days commits node_modules to git, it's likely a mistake, but for legacy projects started before 2017 it was the lesser of two evils.

Edit: spelling.

jamesfinlayson · on June 17, 2022

Hm, this project was started in 2017. The node_modules directory was for Serverless (a tool written in Javascript), not the website itself (which was written in AngularJS - probably not the best choice in 2017 either).

Groxx · on June 16, 2022

s/was often used as a/was the only practical/

Prior to lock files (and potentially after, as checked-in files are beyond trivial to modify and review and that can be worthwhile) committing dependencies in some form was basically the only reasonable way to have reproducible builds, unless you wanted to build your own package manager / lock file implementation.

Which is what Yarn did.

megous · on June 16, 2022

Or sane people wanting to have some cheap, low effort way to track changes in their project's dependencies.

eatonphil · on June 16, 2022

Based on how brittle Github actions is I'd be ready to commit node_modules except for that I'm building cross-platform software with native dependencies.

nopurpose · on June 16, 2022

`npm rebuild` should rebuild native code in your committed node_modules

donw · on June 16, 2022

Pretty sure that recommendation came from a Git hosting service that charged by the megabyte.

withinboredom · on June 16, 2022

Not everyone has the skills to build the toolset and use it. My brother called last night to help him change some SASS variables in a bootstrap theme. He’s a data scientist and had no idea how to build bootstrap’s js and apply the new variables. If bootstrap came from npm fully built, over half of his problems he called me about (15 times!) would have been avoided.

dotancohen · on June 16, 2022

  > it takes 10 seconds to run git status

People coming from the SVN world do not think that this is unusual or problematic. And unfortunately even recently I've seen SVN still in use at large legacy companies.

ChrisRR · on June 16, 2022

I don't think it's unfortunate. We use subversion for development in our team and it does everything we need it to. We looked into git and didn't find it offered any features that would significantly improve our process, but found 1000 more ways to shoot ourselves in the foot

For many processes I think SVN is (and has been for many many years) been an absolutely fine method of version control

dotancohen · on June 17, 2022

  > didn't find it offered any features that would significantly improve our process, but found 1000 more ways to shoot ourselves in the foot

You're not wrong about this.

I really like git for the cheap branching, which encourages branching and merging often. But SVN might have cheap branching now, as another commenter implies.

trasz · on June 16, 2022

From my experience SVN isn’t significantly slower than git.

dotancohen · on June 16, 2022

My experience is that anything dealing with a branch, especially but not exclusively creating branches, is very slow in SVN for a repo of any real size, basically anything with a framework.

I do not remember if "stat" was particularly slow, but SVN in general is slow.

capitainenemo · on June 16, 2022

Huh. 10 gigabyte svn repo at work spanning about 40 projects.. Creating branches is virtually instantaneous. It's just a copy which is a free operation (just a link). Curious as to why it would be slow for you. svn cp https:/ /server/svn/trunk/project/ https:/ /server/svn/branches/project/ticket -m "making a branch here"

svn status, even for an entire repo checkout (which is not common) is also fast.

And yeah, it has virtue of simplicity as well doing very well at narrow and shallow even though I'd love to have mercurial's feature set.

It's also rather good in the "wiki" situation since people can operate on their single files without needing to update, sync and merge.

https://www.bitquabit.com/post/unorthodocs-abandon-your-dvcs...

A fun rant, even though git has gotten better-ish at large files.

dotancohen · on June 17, 2022

  > Creating branches is virtually instantaneous. It's just a copy which is a free operation (just a link).

Copy is not a "free" operation, but a symlink is close to "free" if you're measuring disk space.

What version SVN are you using? I'm certain that older SVN versions would actually copy the entire project's files, not symlinks but real copies. That would take forever and running out of disk space was a real concern.

capitainenemo · on June 17, 2022

An svn copy is just a link. It has always been a "free" operation (and yes, the analogy would be to a symlink). I'm not aware of any version of svn that behaved the way you described and I've used it for a couple of decades.

I can perhaps imagine a large repo plus a broken svn client requiring checking out unneeded portions of trees to do a copy, but no client I've used works like that.

Hm. Another theory. Perhaps someone who knew nothing about svn and was using TortoiseSVN's Windows file manager integration was doing a Windows file manager copy, then checking that in as a "branch" with the only link being the commit message instead of using svn's copy which is free and properly links content. That would indeed be an expensive operation, and the wrong thing to do.

trasz · on June 16, 2022

Yeah, branches are slow. Otoh blame is fast compared to git.

capitainenemo · on June 16, 2022

These days mercurial has a cached blame/annotate called "fast annotate" which I love because of one particular awesome feature --deleted which must be seen to be appreciated I think.

I have this alias in my .hgrc file fad=fastannotate -u -n -wbB --deleted

It's by the Facebook engineer Jun Wu who also made the even more awesome "absorb"

mekster · on June 18, 2022

What is wrong with SVN?

jefurii · on June 17, 2022

> Given that 70% of their repo is generated files, that discussion and the tradeoffs involved don't get nearly enough attention from OP.

It's perfectly fine to use Git to track things other than sourcecode. In fact, right on the manpage, Git calls itself "the stupid content tracker".

I've been using Git with git-annex to track archival files with their associated metadata. We keep our data separate from our sourcecode, and segment our data into individual Git repositories for each collection. Git gives us many features that we would have had to build into our app in other ways (data integrity, fixity, etc), though this came with costs.

To my eye it probably would have been better for Canva to use multiple separate repositories instead of a monorepo, but I'm not them and their use-case is not mine.

hzhou321 · on June 16, 2022

I think it is a crowd dumbing effect. Since hundreds of engineers sharing the mono-repo, no one can or care to make the decision or is able to push the decision for alternate. Even when every one is complaining, it is still far from every one agreeing on the alternate. Crowd settle at the lowest denominator.