having worked on & maintained & architected 5 different social backends and failed at 4 of them, here are my key take aways;
data storage:
* keep data structure as simple as possible,
* do not go with the hypes, stick to old?, proven technologies
* have your db constraints as strict as possible in early stages, later you can remove them as for performance improvements
* test the key features of your database choices(does transparent sharding really works?), you will see them failing...
* you dont need a graph db, you need a graph-like access layer to your data
* your >1 month old data wont be accessed/modified at all (mostly) chose your shard key accordingly
* keep duplication of your data as small as possible in early stages
indexing:
* >1 month old thing might not apply here
* ACL can be problematic/hard to manage, try to keep it simple.
queuing:
* to me this is one of the most important elements, if you want a simple way to keep every component in sync with others, use event publishing, really.
* have your re-try mechanism, dead letter queue, delayed processing in place.
> We can use ActiveMQ, which is the most reliable queuing software.
a very bold statement :)
caching:
* think your cache invalidation strategy from the beginning
* keep your immutable and dynamic data cache separately in your code (at least visually)
* try not to mix your business logic and cache code
> We should be ready for anything in the order of billions of queries per seconds.
if only you are next/already facebook...
data storage: * keep data structure as simple as possible, * do not go with the hypes, stick to old?, proven technologies * have your db constraints as strict as possible in early stages, later you can remove them as for performance improvements * test the key features of your database choices(does transparent sharding really works?), you will see them failing... * you dont need a graph db, you need a graph-like access layer to your data * your >1 month old data wont be accessed/modified at all (mostly) chose your shard key accordingly * keep duplication of your data as small as possible in early stages
indexing: * >1 month old thing might not apply here * ACL can be problematic/hard to manage, try to keep it simple.
queuing: * to me this is one of the most important elements, if you want a simple way to keep every component in sync with others, use event publishing, really. * have your re-try mechanism, dead letter queue, delayed processing in place.
> We can use ActiveMQ, which is the most reliable queuing software. a very bold statement :)
caching: * think your cache invalidation strategy from the beginning * keep your immutable and dynamic data cache separately in your code (at least visually) * try not to mix your business logic and cache code
> We should be ready for anything in the order of billions of queries per seconds. if only you are next/already facebook...