Cap'n Proto beta release 0.1 is ready for Real Work

vosper · on June 27, 2013

Link to the previous discussion on HN in April:

https://news.ycombinator.com/item?id=5482081

Great to see this project is moving forward, can you comment on whether (and which of) the criticisms and suggestions from the original HN discussion have been integrated or have influenced development for this release?

kentonv · on June 27, 2013

Oh boy, there were so many suggestions, I'd have to read through them all to remember which ones I ended up incorporating. There's so much text, though...

Off the top of my head, I remember that multiple people suggested that text should be allowed to contain NUL characters (not just at the end), which I agreed to. That's not terribly significant, though.

People involved on the mailing list have helped me with several design decisions. One contributor convinced me that my original syntax for unions was pretty stupid and helped me come up with something much better.

At another point, I implemented "inline" structs and lists (which are embedded into their parent without a pointer). This turned out to be overcomplicated for not all that much benefit and I was encouraged to ditch it. A week of work lost, but it was the right decision.

Contributors hacking on implementations in other languages made clear that they weren't interested in writing their code generators in Haskell, so the compiler now has a plugin system allowing code generators in any language.

This is a very incomplete list, obviously... I'd love to have more people telling me what I'm doing wrong. :)

Scaevolus · on June 27, 2013

Using colons to denote types is still ugly and feels unnecessary.

Your last annotation example confuses the @field and $annotation syntax-- string $0 :Text $qux;

kentonv · on June 27, 2013

> Using colons to denote types is still ugly and feels > unnecessary.

Yeah, I hear you. My personal experience is that when I look at Go code, I find it hard to parse variable definitions since the type has no separator. But I'm sure if I wrote more Go, it would get easier. I worry that there is a bit more need for Cap'n Proto schemas to be easily readable to newcomers compared to Go code. On the other hand, in Cap'n Proto there is usually an ordinal number separating the name and type anyway (though not always).

So far not many people have complained, but I could definitely be convinced to drop the colon if that's what people want.

> Your last annotation example confuses the @field and > $annotation syntax-- string $0 :Text $qux;

Eek, fix pushed. Thanks!

simcop2387 · on June 27, 2013

> Off the top of my head, I remember that multiple people suggested that text should be allowed to contain NUL characters (not just at the end), which I agreed to. That's not terribly significant, though.

This seems contrary to what the docs[1] say: > Text is always UTF-8 encoded and NUL-terminated.

Which is right?

[1] http://kentonv.github.io/capnproto/language.html

kentonv · on June 27, 2013

Text is still NUL-terminated, it's simply allowed to contain NUL bytes in the middle as well. It's up to applications how they want to interpret this, but enforcing the NUL byte at the end is important for apps that want to pass a char* pointer around and don't want buffer overruns if they forget to validate the NUL terminator themselves. (It's a key design goal for the Cap'n Proto implementation to protect the app from security issues as much as possible.)

The encoding page describes this a bit better: http://kentonv.github.io/capnproto/encoding.html#blobs

simcop2387 · on June 27, 2013

I see. I haven't gotten to the encoding page yet, but that sounds perfectly reasonable to me.

Small question before I get there, I imagine that such things are tagged with a length so that you can actually determine where the text field really ends then?

kentonv · on June 27, 2013

Yes. The actual getter for a Text field returns StringPtr, which is a simple pointer-length pair. If you want to pass that on to a C-style function you call .cStr(), with the caveat that the content is effectively truncated at the first NUL char.

DannoHung · on June 27, 2013

With respect to how imports and the dynamic API: When/If you do get around to implementing the parser in C++, Could you consider making the import statement relative to the path of the file that is currently being dynamically imported? In the Protocol Buffer runtime, it's relative to the path of the top level file being imported which makes sub-files have to know about the path they're located at in order to import anything else.

kentonv · on June 27, 2013

Actually, that's already how it works. :) Imports are relative to the importing file unless they start with a '/', in which case the import path is searched.

The protobuf approach was somewhat necessitated by the fact that various parts of the protobuf implementation expected file names to be canonical, e.g. so that it could tell if two descriptors were for the same file by comparing the names. This mean it was necessary for the compiler to know where the top of the source tree was and dealing with relative paths would have been pretty ugly. This worked fine for Google-internal usage, but was problematic for a lot of open source users.

Cap'n Proto does not use file names as canonical identifiers. Instead, it requires every file to assign itself a unique ID (a random 64-bit number). This introduces some other problems, but on the whole I think it comes out a lot nicer.

DannoHung · on June 28, 2013

Woo! Awesome!

RyanZAG · on June 27, 2013

Can anybody explain how/why this type of data transfer is secure? By the sound of it, the main benefit is that we can just read a large file by using mmap on it. Are there not a lot of security considerations to doing that...? Especially when feeding this through into dynamic languages like Python.

kentonv · on June 27, 2013

Cap'n Proto doesn't allow you to transmit arbitrary native structs. It still has a defined message format, with pointers that can be (and are) validated by the receiver. It's just that the format is one that happens to be very convenient for typical CPUs to access, in particular without the need for any copying or decoding.

In practice, the current implementation hasn't undergone security review yet, so I wouldn't recommend using it on data you don't trust. But it's intended to be secure, and will be by version 1.0.

ocharles · on June 28, 2013

Now I'm just eagerly waiting a Haskell and Perl library. Suppose that means I should get off my lazy ass and start contributing!

boothead · on June 28, 2013

Have a look at alphaheavy's protobuf implementation as a starting point.

enigmo · on June 28, 2013

Or just send some pull requests!

I think the core of our protobuf library should support many different encoding formats. There is often no need to define and redfine types to support multiple wire formats: Protobuf, Thrift, Cap'n Proto, etc. One definition is enough for most.

https://github.com/alphaHeavy/protobuf

kentonv · on June 28, 2013

Yes, please! :D

_pmf_ · on June 28, 2013

Allowing to write additional code generators as plugins in a language agnostic way is really nice.

jcarden · on June 27, 2013

Nice!