It might need to involve some torrent or anonymity platforms to avoid problems like Books3 had when the use and availability of the data is restricted by some jurisdictions.
It also needs to incorporate some deduplication approach as I notice the same data is often repackaged with variations in format or specification.