Great to see this project is moving forward, can you comment on whether (and which of) the criticisms and suggestions from the original HN discussion have been integrated or have influenced development for this release?
Oh boy, there were so many suggestions, I'd have to read through them all to remember which ones I ended up incorporating. There's so much text, though...
Off the top of my head, I remember that multiple people suggested that text should be allowed to contain NUL characters (not just at the end), which I agreed to. That's not terribly significant, though.
People involved on the mailing list have helped me with several design decisions. One contributor convinced me that my original syntax for unions was pretty stupid and helped me come up with something much better.
At another point, I implemented "inline" structs and lists (which are embedded into their parent without a pointer). This turned out to be overcomplicated for not all that much benefit and I was encouraged to ditch it. A week of work lost, but it was the right decision.
Contributors hacking on implementations in other languages made clear that they weren't interested in writing their code generators in Haskell, so the compiler now has a plugin system allowing code generators in any language.
This is a very incomplete list, obviously... I'd love to have more people telling me what I'm doing wrong. :)
> Using colons to denote types is still ugly and feels
> unnecessary.
Yeah, I hear you. My personal experience is that when I look at Go code, I find it hard to parse variable definitions since the type has no separator. But I'm sure if I wrote more Go, it would get easier. I worry that there is a bit more need for Cap'n Proto schemas to be easily readable to newcomers compared to Go code. On the other hand, in Cap'n Proto there is usually an ordinal number separating the name and type anyway (though not always).
So far not many people have complained, but I could definitely be convinced to drop the colon if that's what people want.
> Your last annotation example confuses the @field and
> $annotation syntax-- string $0 :Text $qux;
> Off the top of my head, I remember that multiple people suggested that text should be allowed to contain NUL characters (not just at the end), which I agreed to. That's not terribly significant, though.
This seems contrary to what the docs[1] say:
> Text is always UTF-8 encoded and NUL-terminated.
Text is still NUL-terminated, it's simply allowed to contain NUL bytes in the middle as well. It's up to applications how they want to interpret this, but enforcing the NUL byte at the end is important for apps that want to pass a char* pointer around and don't want buffer overruns if they forget to validate the NUL terminator themselves. (It's a key design goal for the Cap'n Proto implementation to protect the app from security issues as much as possible.)
I see. I haven't gotten to the encoding page yet, but that sounds perfectly reasonable to me.
Small question before I get there, I imagine that such things are tagged with a length so that you can actually determine where the text field really ends then?
Yes. The actual getter for a Text field returns StringPtr, which is a simple pointer-length pair. If you want to pass that on to a C-style function you call .cStr(), with the caveat that the content is effectively truncated at the first NUL char.
With respect to how imports and the dynamic API: When/If you do get around to implementing the parser in C++, Could you consider making the import statement relative to the path of the file that is currently being dynamically imported? In the Protocol Buffer runtime, it's relative to the path of the top level file being imported which makes sub-files have to know about the path they're located at in order to import anything else.
Actually, that's already how it works. :) Imports are relative to the importing file unless they start with a '/', in which case the import path is searched.
The protobuf approach was somewhat necessitated by the fact that various parts of the protobuf implementation expected file names to be canonical, e.g. so that it could tell if two descriptors were for the same file by comparing the names. This mean it was necessary for the compiler to know where the top of the source tree was and dealing with relative paths would have been pretty ugly. This worked fine for Google-internal usage, but was problematic for a lot of open source users.
Cap'n Proto does not use file names as canonical identifiers. Instead, it requires every file to assign itself a unique ID (a random 64-bit number). This introduces some other problems, but on the whole I think it comes out a lot nicer.
Can anybody explain how/why this type of data transfer is secure? By the sound of it, the main benefit is that we can just read a large file by using mmap on it. Are there not a lot of security considerations to doing that...? Especially when feeding this through into dynamic languages like Python.
Cap'n Proto doesn't allow you to transmit arbitrary native structs. It still has a defined message format, with pointers that can be (and are) validated by the receiver. It's just that the format is one that happens to be very convenient for typical CPUs to access, in particular without the need for any copying or decoding.
In practice, the current implementation hasn't undergone security review yet, so I wouldn't recommend using it on data you don't trust. But it's intended to be secure, and will be by version 1.0.
I think the core of our protobuf library should support many different encoding formats. There is often no need to define and redfine types to support multiple wire formats: Protobuf, Thrift, Cap'n Proto, etc. One definition is enough for most.
https://news.ycombinator.com/item?id=5482081
Great to see this project is moving forward, can you comment on whether (and which of) the criticisms and suggestions from the original HN discussion have been integrated or have influenced development for this release?