Hacker News new | past | comments | ask | show | jobs | submit login

Protobuf is sort of unique in the serialization format that it can be indefinitely extended in principle. (BSON is close, but has an explicit document size prefix.) For example, given the following definition:

    message Foo {
        repeated string bar = 1;
    }
Any repetition of `09 03 41 42 43` (a value "ABC" for the field #1), including an empty sequence, is a valid protobuf message. In the other words there is no explicit encoding for "this is a message"! Submessages have to be delimited because otherwise they wouldn't be distinguishable from the parent message.



This is an interesting design decision between protobuf and cap’n proto, one has “repeated field of type X” while the other has “field of type list of X”. So one allows adding a repeated/optional field to a schema without re-encoding any messages, while the other supports “field of type list of lists of X” etc.


> So one allows adding a repeated/optional field to a schema without re-encoding any messages,

Hmm don't they both allow that? Or am I misunderstanding what you mean here?

I guess the interesting (though only occasionally useful) thing about protobuf is if you concatenate two serialized messages of the same type and then parse the result, each repeated field in the first message will be concatenated with the same field in the second message.


Yes, "submessages"in Protobuf have the same field serialisation as strings

The field bytes dont really encode tag and type, they encode tag and size (fixed 32bit, 64bit or variable length)

Protobuf is a TLV format. In that regard, it's not unique at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: