> Poor interprocess communication in operating systems. What's usually needed is...

Animats · on Oct 14, 2014

Well, there's Microsoft ".NET".

Marshaling is an important, and neglected, subject in language design. Compilers really should understand marshaling as a compilable operation. In many cases, marshaling can be compiled down to moves and adds. Done interpretively, or through "reflection", there's a huge overhead. If you're doing marshaling, you're probably doing it on a lot of data, so efficiency matters.

For Google protocol buffers, there are pre-compilers which generate efficient C, Go, Java, or Python. That works. They're not integrated into the language, so it's kind of clunky. Perhaps compilers should accept plug-ins for marshaling.

Most other cross-language systems are more interpretive. CORBA and SOAP libraries tend to do a lot of work for each call. This discourages their use for local calls.

Incidentally, there's a fear of the cost of copying in message passing systems. This is overrated. Most modern CPUs copy very fast and in reasonably wide chunks. If you're copying data that was just created and will immediately be used, everything will be in the fastest cache.

Fortunately, we can generally assume today that integers are 32 or 64 bit twos complement, floats are IEEE 754, and strings are Unicode. We don't have to worry about 36-bit machines, Cray, Univac, or Burroughs floats, or EBCDIC. (It's really time to insist that the only web encoding be UTF-8, by the way.) So marshaling need involve little conversion. Endian, at worst, but that's all moves.

cbd1984 · on Oct 14, 2014

> Marshaling is an important, and neglected, subject in language design. Compilers really should understand marshaling as a compilable operation.

I generally agree with what you're saying.

This is one of the things COBOL, of all languages, generally got right: You had a Data Definition language, which has been carried over to SQL, and the compiler could look at the code written in that language to create parsers automatically. Of course, COBOL having been COBOL, this was oriented to 80-column 9-edge-first fixed-format records with all the types an Eisenhower-era Data Processing Professional thought would be important.

The concept might could use some updating, is what I'm saying.

> Most modern CPUs copy very fast and in reasonably wide chunks.

And most modern OSes can finagle bits in the page table to remove the need for copying.

> strings are Unicode

By which you mean UTF-32BE, naturally. ;)

> It's really time to insist that the only web encoding be UTF-8, by the way.

This might actually be doable, if only because of all the smilies that rely on Unicode to work and the fact UTF-8 is the only encoding that handles English efficiently.

Animats · on Oct 16, 2014

And most modern OSes can finagle bits in the page table to remove the need for copying.

That tends to be more trouble than it's worth. It usually means flushing caches, locking lots of things, and having to interrupt every CPU. Unless it's a really big data move (megabytes) it's probably a lose. Mach did that, and it didn't work out well.

sitkack · on Oct 14, 2014

Someday the heap will be a protocol.

vidarh · on Oct 14, 2014

What's usually needed is a subroutine call does not imply it can only be satisfied with a subroutine call, but that an abstraction that looks and feels like a subroutine call is preferable to one that looks like a stream of bytes.

And the point is exactly to address data marshalling, which is a hard enough problem that reducing the number of applications that have to independently solve it would be a great benefit.

That most developers don't even tend to get reading from/writing to a socket efficiently right (based on a deeply unscientific set of samples I've seen through my career) implies to me we really shouldn't trust much developers to get data marshalling right.

Worst case? your app falls back on using said interface to exchange blocks of raw bytes if the provided model doesn't work for you.