Coming from OpenStack land where Ceph is used heavily, it's well known that you ...

x0x0 · on Nov 12, 2016

Also -- if you look at the infra updated from the linked article, they mention something about 3M updates/hour to a pg table ([1], slide 9) triggering continuous vacuums. This feels like using a db table as a queue which is not going to be fun at moderate to high loads.

[1] https://about.gitlab.com/2016/09/26/infrastructure-update/

pcarranza · on Nov 13, 2016

It's not updates, it's just querying, updates are not that bad, the main issue there was that the CI runner was keeping a lock in the database while going for the filesystem to get the commit, this generated a lot of contention.

Still this is something we need to fix in our CI implementation because, as you say, databases are not good queueing systems.

sytse · on Nov 13, 2016

For sure there is much we can improve to reduce DB updates, but we do use Redis for queues.

YorickPeterse · on Nov 13, 2016

> They probably could have saved themselves a lot of pain by talking to some Ceph experts still working inside RedHat for architectural and other design decisions.

We have been in contact with RedHat and various other Ceph experts ever since we started using it.

> I agree with other poster who asked why do they even need a gigantic distributed fs and how that seems like a design miss.

Users can self host GitLab. Using some complex custom block storage system would complicate this too much, especially since the vast majority of users won't need it.

sytse · on Nov 13, 2016

You're right. We talked to experts and they warned us about running Ceph on VMs and we tried it anyway, shame on us.

You do need either a distributed FS (GitHub made their on with Dgit http://githubengineering.com/introducing-dgit/, we want to try to reuse an existing technology) or buy a big storage appliance.

camkego · on Nov 13, 2016

Bingo! Seasoned developers and architects with 15-20+ years of experience would very likely question using software stacks like CephFS with warnings on it's website about production use! You really want no exotic 3rd party stuff in your design, and plain-Jane components like ext3 and Ethernet switches. Choosing a newer exotic distributed filesystem may really come back to bite you in the future.