Hacker News new | past | comments | ask | show | jobs | submit login
A Sketch of the Biggest Idea in Software Architecture (2022) (oilshell.org)
135 points by thunderbong 9 months ago | hide | past | favorite | 67 comments



Is the narrow waste concept useful? Sure everything is bytes but most things overlay structure on top of bytes, and treating those structures as bytes is probably not useful.

You could use “wc” to count the lines/bytes in a json file but counting bytes is likely one of the least useful/interesting things you could do with that json file.


Unless you wanted to compress that json. If you insisted on the structure, you'd have to create a specific algorithm just for json. Because json is just bytes, you get to compress it with a generic byte compression algorithm.


Is json just bytes tho? Doesn’t it allow for variable width encodings?


No matter what encoding your JSON file is, gzip will output a compressed bag of bytes that, when unzipped, will result in the same file coming out the other end. This is true of movie codecs, Word 97 files, or anything, and none of the maintainers of those formats had to be consulted about this in order to make it work. That's what is meant by "thin waist" here.


I know, but it’s not “just bytes” as per parent comment. You cannot infer the length of the content without decoding it. “By definition” it is variable width character data. I think it’s fair to be pedantic vs a fairly dramatic oversimplification.


Less specific interfaces let you do less interesting things, but are more resilient. It's an engineering tradeoff. Purpose-built interfaces that fully expose and understand domain-level semantics are great in certain circumstances, but other times you want a certain minimum abstraction (IP packets and 'bags-of-bytes' POSIX file semantics are good examples) that can be used to build better ones.

If the rollout of HTTP had required that all the IP routers on the internet be updated to account for it, we likely would not have it. Likewise, if we required that all the classic Unix text utilities like wc, sort, paste, etc. did meaningful things to JSON before we could standardize JSON, adoption would likely have suffered.


The basic unix tools do account for variable width though. Variable chars are baked into most OS. When you use these commands the decode is implicit.


You can transport it to an architecture of different endianness without loss of information or metadata and a transformation at destination.

There are important ways in which it is, in fact, "just bytes".


Endianness etc is a feature of the encoding. Most JSON implementations I’ve used require the raw bytes to first be decoded as such.


No, endianness is not a feature of UTF-8 encoding. There isn't a UTF-8LE and a UTF-8BE. That's because the codeunit is bytes.

Forget "decoding", you have to parse JSON. But you don't have to figure out how it's encoded first. Because it's a byte format. You already know.


There isn’t a UTF-8LE/BE because it is implicitly BE for wide characters. Any byte in a WC sequence cannot meaningfully be interpreted (exc character class, page etc) without its companions, so not just bytes. There is an element of presentation that must happen before “mere bytes” are eligible for JSON


By spec all JSON must be UTF-8. Anyone adding encodings to application/json is, at best, redundant.


UTF-8 is variable width my friend


The fact that JSON is UTF-8 doesn't contradict the fact that it's bytes!

That's a feature, not a bug.

i.e. "exterior designs are layered" - https://www.oilshell.org/blog/2023/06/ysh-design.html#exteri...

This is not a trivial point -- there are plenty of specs which are sloppy about text, try to abstract over code points, and result in non-working software.

A main example is the shitshow with libc and Unicode - https://thephd.dev/cuneicode-and-the-future-of-text-in-c#sta...

It suffers from what I call "interior fictions", like wchar_t.


Of course it’s all bytes. It’s all bytes. That doesn’t change the fact that you need to have some awareness of encoding before those bytes are fully sensible


encoding decides how bytes are interpreted


Yes. The encoded content is “just bytes” - once decoded it’s logically something else (var char data structured as json) that transcends bytes at the data level.


This guy just gave a cool name to an old concept.


(author here) It's an old name for an old concept :)

See the previous post The Internet Was Designed With a Narrow Waist, especially the appendix with some history

https://www.oilshell.org/blog/2022/02/diagrams.html

The name was apparently coined by Jon Postel in the late 1970's, but this fact is not well known. There are some misattributions in the appendix

I apparently learned the term from a talk by Van Jacobsen on Named Data Networking at Google in 2006, but I forgot that I learned it that way!

After I wrote the post, a reader Caleb Malchik reminded me about Van Jacobsen's talk, and then I went back and watched it, and realized that's where I got the term from! I then updated the appendix.

---

I have searched Hacker News for "narrow waist", and it surprised me that it's not a very well known idea, despite being old and important. (And again - I wrote these articles specifically to gather related material from others, so I welcome any updates or corrections.)

One place you can see it in passing is in a 2013 paper from Steven Kell - The operating system: should there be one?

https://www.humprog.org/~stephen/research/papers/kell13opera...

Search for "waist in the hourglass"

---

If you think about it, it is not surprising that the "narrow waist" concept from networking is becoming more important to software. Software has become more and more networked for the last 2 decades, and I don't think that trend has even topped out.

It is hard to find any kind of software that is not networked. As an example, in 2002 I worked at EA on shrink wrapped video games. They were completely offline embedded systems. But 2004 was the first year we added networking to the game, and now pretty much all games are networked in some way.

Also, the Internet is the probably the biggest and most successful thing ever engineered ... (not sure what you would compare it to, maybe nuclear reactors or aircraft carriers).

It would be surprising if there weren't some lessons to learn from its engineering!


Aircraft carriers and nuclear reactors are exactly not anything like internet. Huge physical engineering projects are like a huge jigsaw puzzle. Take one piece and picture is not complete. Internet is network, most physical thing to compare with is electric grid. But then again whole electric grid hums with one single frequency.


I would compare Internet to the intermodal shipping container system, the international post, and not much else.


> Are we all not glad we don’t use the Unix method of communicating on the web? Right? Any arbitrary command string can be the argument list for your program, and any arbitrary set of characters can come out the other end. Let’s all write parsers.

- Rich Hickey: Simple Made Easy


I do remember this quote, but it's a little odd because Unix and the web have a very similar structure. HTTP has more metadata like Content-Type, but it's still fundamentally uniform bytes/text.

This is a feature not a bug! And it's INTENTIONAL for both Unix and HTTP.

The Uniform Interface Constraint, the Perlis-Thompson Principle, and LAYERING structure on top of "flat / featureless" mechanisms.

---

It's also the reason for their success!

I'll refer to my recent comment on this - https://news.ycombinator.com/item?id=39925798

> So the reason that shell/Unix and HTTP are so common is a MATHEMATICAL property of software growth.

> How do you make a Zig program talk to a Mojo program? Probably with a byte stream.

> What about a Clojure program and a Common Lisp program? Probably a byte stream. (Ironically, S-expressions have no commonly used "exterior" interchange format)

and the follow-up with EDN - https://news.ycombinator.com/item?id=39931919

Do Common Lisp users have EDN libraries? Do Clojure users parse Common Lisp s-expressions?

---

Also, Oils adds parsers to Unix shell like JSON, so you don't have to write your own!

Shell was not a powerful enough language to be a real shell :) To interoperate between heterogeneous and diverse runtimes, to glue things together when their creators wanted monoliths (e.g. like Go and the JVM have problems interoperating with C)

---

I'm a big fan of Hickey, and have watched almost all his talks. I often refer to his explanations of programming with maps, which explains why protobufs work - https://lobste.rs/s/zdvg9y/maybe_not_rich_hickey

(something many protobuf users apparently don't understand)

I do recall this comment, but I don't recall him saying much else about Unix.

I think if you can live within the JVM, OK you can do everything in Clojure and avoid parsing. And even Datomic takes it to the next level -- he really walked the walk.

---

But I'd claim you can't build Google or LLMs on Clojure and Datomic. The bigger the software system, the more heterogeneous it is.

Bigger codebases are written in more languages. And they have more formats on the wire.

I even see startups of like 10 people using Rust AND Go now (and JavaScript), which is a little odd to me. I'm all about language diversity, but even I see the value of using ONE language for smaller or more homogeneous problems.

But even tiny software projects now use many different languages.


I first heard about this concept in the early 1990s from netpbm (then called pbmtools). The exact language seems to have been removed from the current manual[1] but the basic idea was that if you have N image formats, you can either write NxN tools to convert between them, or you can have a few common image formats (ie. pbm - bitmap, pgm - greyscale and ppm - fullcolour) and then write 2xN tools to convert between the universe of all image formats and the common formats. You then use Unix pipes to connect the tools together.

[1] https://netpbm.sourceforge.net/doc/


Except in that case you're converting to and from an uncompressed buffer... that could be in ram and not use a file format at all?


It's much more useful to have a real file format.

For example, you can generate images from your own code by doing this (with a bit more shell quoting and error checking in the real code):

  fp = popen ("ppmtojpeg > output.jpg", "w");
  fprintf (fp, "P3 1024 1024 255\n");
  for (y = 0; y < 1024; ++y)
    for (x = 0; x < 1024; ++x)
      fprintf (fp, "%d %d %d\n", 255*r[y][x], 255*g[y][x], 255*b[y][x]);
  fclose (fp);


This is the first time I have seen anyone popen’ing a pipeline like that, and writing to its stdin.

Bravo.


I miss a bit some of the old `xxx2yyy` programs. It's tempting to consider:

     FILE.csv.html.pdf.zip
...`csv2html | html2pdf | pdf2zip`, and vice-versa... how possible is it to make the pipeline reversible, eg: `zip2pdf | pdf2html | html2csv`, or in the absence of "reversibility", could you simply shortcut and get straight back to the original `*.csv`?


Other names for the same idea: "modularization", "loose coupling and strong cohesion", "program to interfaces, not implementations", ...


(author here) I will probably write more about this, but it's not just modularity, or interfaces. Or at least those words are clouded by associations with specific mechanisms in whatever programming languages one has used

One additional distinction that's relevant is strong vs. weak modules in the sense here:

https://thunderseethe.dev/posts/whats-in-a-module/#strong-vs...

Some of my comments here - https://lobste.rs/s/eccv1g/what_s_module

and here - https://lobste.rs/s/u7y4lk/modules_matter_most_for_masses

Harper doesn't use the term "strong modularity", but he says that REAL modularity is M x N, and that's closer to what I'm getting at:

https://existentialtype.wordpress.com/2011/04/16/modules-mat...

To be most useful, it is important that the relationship between M and A be many-to-many, and not many-to-one or one-to-one.

(Although I disagree with the assertions about static types -- I think he is making a very domain specific argument, nonetheless we agree on strong modularity)

It needs more analysis, but I'd claim there are at least two kinds of modularity and programming to interfaces, but the difference is not clear to most people

The interior vs. exterior distinction is a big one -- is the interface "owned by" one party, or does it stand alone?

Are you able to do "interior" refactorings that change the interface, or is it fixed, outside the project? This is a big difference in practice


Parsimony


That's Martin Fowler's job :)


Oh, yeah, more names for the same old thing in order to fragment the ecosystem and write more books that could be a mechanical translation of old books (search and replace old-word -> new-word).

That's definitely what the world needs.


parsimonius interfaces must have been taken


So many words to say "prefer tools that deal with text protocols and formats and never break backwards compatibility."

The article cites the HTTP protocol as an example of this. However only HTTP/0.9 and 1 are text format - HTTP/2 and 3 are binary.


A comment from a previous discussion of this post is still true for me:

> The text tries to introduce the idea of a “narrow waist” as an architecture pattern. But I don’t fully understand how the author distinguishes this concept from similar ones, and some of the examples even seem to contradict the principles.

https://news.ycombinator.com/item?id=31777080


The article's a little meandering.

I think the 'waist' in this case is just 'interface', in the general sense - i.e. where two pieces of software meet.

You make the {N-waist-M} composition problem easier because not much detail can leak from N to M and vice-versa. N and M only need to know about e.g. JSON to compile, not each other.

I'd consider a slightly fatter waist to be something like LLVM. My language is wide and complicated, and all the various assembler outputs for different architectures are also wide and complicated, and the idea of LLVM is to bridge the two, but it still feels like it would be a pain to couple my language to LLVM (JSON is narrower than LLVM)


And, reading between the lines o/t article: stable interfaces. The more stuff exists that uses an interface, the slower it tends to evolve. So very-popular interfaces tend to be stable & long-lived by necessity, because 'breaking the world' is not an option.

Come to think of it: that also applies to physical infrastructure: building codes & materials, road networks, AC power grids, gas stations + associated infrastructure to supply those, telecom networks, etc etc. Anything deployed at scale tends to evolve slowly.

Come to think of it: ... [biology]


If I’ve understood then, a narrow waist architecture follows two principles:

- code to an interface not an implementation

- segregate interfaces, don’t overload them with responsibilities


That’s not really an architecture though, it’s just basic principles of modularization.


True, which is perhaps why it's unclear what the difference is between a "skinny waist" and hexagonal architecture, "ports and adapters".


hexagonal architecture gives you the direction of the "waist", not its thickness.

You could have a really skinny interface (one method call only):

    class TaxLogic {
        data = iDatabase.loadOneDatabaseRow(id);
        process(data);
    }

    interface IDatabase {
        loadOneDatabaseRow(id);
    }

    class PostgresDb implements IDatabase {
        loadOneDatabaseRow(id) {...}
    }
But this violates ports and adapters because the logic is on the outside and the database is on the inside.


This is just "loose coupling" expressed with different terminology.

It's not wrong but it's also nothing new - it's about as novel as object oriented programming or DRY.


I don't think it's loose coupling but more: don't build shitty overcomplicated interfaces for simple things.

What it doesn't go into is the fact you don't know if it's a shitty overcomplicated interface or not for about a decade at least, until everyone has built their universe on top of it.

What we have in the article is survivor bias. Things that are exemplars rather than the status quo and that's a good thing. But even considering what it explains, we will still blow our toes off 9 out of 10 times.


>shitty overcomplicated interfaces for simple things.

This is tight coupling.


No tight coupling is a strong immutable dependency. That doesn't necessarily mean complexity. It could mean a leaky abstraction around a simple API, like half of POSIX etc.

What does screw you is things that are hard to reason about and you don't know if you can reason about them until many years have passed or you've tried to replace either side of the abstraction.


Shitty overcomplicated interfaces are directly caused by leaking abstractions.

This is tight coupling.


(author here) That's acknowledged in a few places in the post

So I'm squeezing many topics into this single post. I state the main points, with some justification.

So this issue deserves some more thought, and perhaps more terminology.

It is a hard thing to write about, so I basically dumped a bunch of thoughts. The writing process helps with software design and language design.

There were significant updates since the post was published, namely that Oils is based on the "EXTERIOR narrow waist", unlike other alternative shells:

https://www.oilshell.org/blog/2023/06/narrow-waist.html

https://www.oilshell.org/blog/2023/06/ysh-design.html

And there are some more updates on #blog-ideas on the Oils Zulip.

I have some FAQs like

- Is a narrow waist just an interface?

- Can the entire Internet be statically typed? Where are the static type definitions for DNS, HTTP, IRC, the Google Maps API, and Stripe API?

This is the same question as "Where are the static type definitions for grep and find and cut?"

I would like to publish some of these, but the code and language design are more important right now, and the writing is secondary


> JSON was an explicit design, and it's much better than CSV.

Interesting. Is it better, or is it easier (and cheaper) to parse?


The text parsing[1] is not really relevant, I feel. It's what comes after parsing that's a problem.

With CSV you have to guess at what the delimiter is, you have to guess if the first row are column names, you have to guess what each element is (string, date, integer, etc), you have to guess how escaped characters are represented.

The good news is that from a medium sized input, all of these would be pretty simple to guess while parsing each cell. The bad news is that, if you receive only a single row, you can't very well tell WTH the types are.

With JSON you at least get to read numbers as numbers and strings as strings (and special symbols as special symbols, like `null`).

The downside of JSON is that, like CSV, you still have to specify what the structure of the data is to a receipient, except that, unlike CSV, it's more difficult and prone to error.

[1] Although, parsing CSV is so simple you can frequently just write the parser yourself.


The problem with CSV is it doesn’t specify encoding at the data layer. Somewhat counterintuitively since it has the word “comma” in its name.

No it’s more correctly thought of as a protocol for representing tabular structures as “delimited text”, but DTTF doesn’t have the same ring to it unfortunately.

This faffing around specifics makes CSV as a concept more flexible and “well defined enough” for its main user base, at the cost of simplicity and portability.


The CSV RFC is oriented toward CSV being a MIME type. The line separator in CSV is required to be CR-LF. This can occur in the middle of a quoted datum, and the spec doesn't say whether it represents an abstract newline character or those two literal bytes.


My understanding was this would terminate the record unless enclosed by “encapsulators”, whereupon indeed it would be interpreted as literal text.

Though defined as CRLF in the RFC, presumably for interoperability, you are typically free to define alternative record separators, as well as field separators and encapsulators, and most modern implementations would be smart enough to work with this.


> With CSV you have to guess at what the delimiter is, you have to guess if the first row are column names, you have to guess what each element is (string, date, integer, etc), you have to guess how escaped characters are represented.

I don't think I've been in a situation where I wasn't either writing the export I'm ingesting in CSV because everything else is in CSV, or automatically rectifying imports based on earlier discovery because I didn't have a choice.

IMHO the biggest problem with CSV is that the schema you need exists as a poorly converted and maintained Word doc in a "Beware of the Leopard" location.


It's interesting to imagine a counterfactual world: in this world, "JSON" data is lousy with non-spec-compliant data. Single-quoted strings, bare keys, arrays with no commas, EBCDIC, everyone hates it, no one can get rid of it.

Meanwhile every CSV is pristine RFC 4180. You'd be laughed out of the room if you tried to fob off anything else as CSV.

There are reasons we're in this world and not that one, but the reasons aren't obvious.


CSV is not standardized, it might use comma or semicolon as seperator, there are different conventions for quoting and escaping, headers might be included or not.

JSON is also hierarchical which makes it appropriate for a wider range of data structures.

That said, I’m not sure there is any agreed upon standard for representing data tables in JSON, so it might not be any better for the CSV use case.


I think most people hold RFC-4180 to be the CSV standard, even if it’s not officially a standard. It is sufficiently parametrised to suit all variants while at the same time providing a language and structure that is portable across all.

Don’t get hung up on the fact that RFC’s aren’t officially standards. It’s a publicly accessible standard which many people fall back on as a reference, which is more than you can say for a lot of ISO standards.


Oh, you can hold it all you want, popular implementations still break it.


Exactly, the files my users submit are rarely spec compliant. I have to process them properly regardless of correctness.


I like this very much, but I'm not quite sure if it actually touches the full depth of the subject or if it's accidental hindsight analysis.

I'm not sure I can't settle with the idea that big things gets there because they are useful simple things that people get stuck with.


Protocol Buffers are the narrow waist of Google.


Language standards like POSIX shell, JavaScript, and C++.

...These are narrow waists because they solve the interoperability problem of {user programs ...} × {language implementations ...}

Calling those languages "narrow waist" due to interoperability might be true but it says a lot more about how stuck we are as a field than how great those languages are at interoperability. Both languages are, in each their own, complicated at best.


So many comments posted using dictation in this thread!


The biggest ideas in software architecture are developed while stuck in monster traffic jams. "Siri, create a new blog post."


On the PTA here, the Porcelain Throne of Architecture.


bring it on the more the merrier in any endeavor to build the ultimate mousetrap. Now we just need a name for our humanity which will reverse our species suicidal and violent tendencies.


Feature detection is better than version detection…??? Maybe for the web but jn compiled language world This has led to autoconf automake which is a gigantic waist and also a gigantic waste. Rust style cargo library version specification solves so many problems that are obvious…. We want to comple and link with the exact same version of library code that we built and tested on , because you cannot possibly test every feature nor can you test bug compatibility. This increases portability , not decreases, meanwhile C programs are not even portable between different versions of linux due to library issues. Im skeptical of some of these ideas.

The most successful thinwaist of all is VBA and Excel which of course are never mentioned .




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: