I can speak only from my experience. I was in a team doing ads evaluations, and we had multiple other teams specializing in various bits & pieces of the Ads infrastructure - for example one team would keep what new amenities were on a hotel/motel, etc. - so in order to fill all these mundane, but intricate details - e.g. "bool has_pool" of sorts, it'll keep adding fields, or introducing new proto messages to fully explain capture that data. And then each team would own several if not hundreths of these protos, these protos would be bundled as `oneof` (e.g. "union") or something like this in a bigger encapsulating structure - so in your
row database (bigtable), each team would have a column that stores their proto (or empty). So we'll run a batch process to read each such column, and then treat all the fields in the proto stored there as data, and expand all these fields as individual columns over the the columnar (read-only) db.
Later an analysts/statistician/linguist/etc. can query that columnar db with data they are interrested about. So that's what I remember (I left several years ago, so things might've changed), but pretty much instead of typical for row-databases have a column for everything - you just have a column for your protobuf (a bit like storing HSON/JSON in postgres), but then have the ETL process mow down through each field in that "column" and create "columns" for each such field.
We had to do some workarounds though as it was exporting too much, and it was not clear how we can know which fields would be asked about (actually there was a way probably, but it'll taken time to coordinate with some other team, so it was more (I think) on coordination with internal "customers" to disallow exporting these).
But the cool thing, is that if customer team added new field to their proto, our process would see it, and it'll get expanded. If they deprecate a proto, there could be (not sure if there was, but could be added) - no longer export it. But for this to work you need the "protodb" e.g. to introspect, able to reflect actual names in order to generate the column.
row database (bigtable), each team would have a column that stores their proto (or empty). So we'll run a batch process to read each such column, and then treat all the fields in the proto stored there as data, and expand all these fields as individual columns over the the columnar (read-only) db.
Later an analysts/statistician/linguist/etc. can query that columnar db with data they are interrested about. So that's what I remember (I left several years ago, so things might've changed), but pretty much instead of typical for row-databases have a column for everything - you just have a column for your protobuf (a bit like storing HSON/JSON in postgres), but then have the ETL process mow down through each field in that "column" and create "columns" for each such field.
We had to do some workarounds though as it was exporting too much, and it was not clear how we can know which fields would be asked about (actually there was a way probably, but it'll taken time to coordinate with some other team, so it was more (I think) on coordination with internal "customers" to disallow exporting these).
But the cool thing, is that if customer team added new field to their proto, our process would see it, and it'll get expanded. If they deprecate a proto, there could be (not sure if there was, but could be added) - no longer export it. But for this to work you need the "protodb" e.g. to introspect, able to reflect actual names in order to generate the column.