1) I've made a typo, not "at least", but "at maximum", meaning that each server ...

1) I've made a typo, not "at least", but "at maximum", meaning that each server can store up to 5TB of data, it's 2x3TB hard drives servers. The density is very low because we use cheap hardware which fail regularly and such a small data amount means high recovery speed. 5TB is a soft limit and can be different for server groups, but it is not at the moment. Each group of 3 servers has a total capacity of 5TB because data is mirrored.

2) We have a centralised replicated metadata layer which is stored on the same servers as the data itself. All object chunks are stored at one servers group at the same time, so there is no need to connect to multiple servers to serve a file, it is enough to connect to one server from server group to get all the data. Metadata may be stored at different server group though. All chunks are replicated to 3 servers at the same time using a sequential append-only log to ensure that all servers has the same data. This may introduce replication lag and if it is too big for some server then it is removed from the server group until replication lag back to normal (1-3 seconds usually).

Actually, it is much simpler than I explained, data layer with replication, data consistency and sharding is completely transparent to the application layer and it is really-really small and simple. Email me at support@s3for.me and I'll share with you software details and you will understand how simple it is.