Hacker News new | past | comments | ask | show | jobs | submit login
Text Is the Universal Interface (scale.com)
194 points by marban on Oct 5, 2022 | hide | past | favorite | 77 comments



Did no one read beyond the first paragraph?

This is an article about the effectiveness of large language models, which have learned to do tasks beyond regurgitating text despite only having been trained to regurgitate text.

As long as you are able to describe the task as a text stream and are ok with having a text stream as output, GPT-3 might actually be able to do the task with just a couple of examples as context. I saw a talk at Black Hat about using GPT-3 as a spam filter. Because GPT-3 has trained on so much spam, it knows what spam looks like. It just needs a couple of examples, and off it goes.

> The most complicated reasoning programs in the world can be defined as a textual I/O stream to a leviathan living on some technology company’s servers.

The references to the UNIX philosophy feel a bit tacked on, honestly. Probably just there to grab attention.


It briefly discusses DALL-E and imagine models near the end, which seems to paper over a big part research in AI right now, but that also wasn't the point of the article, which is fair. What I find interesting is that this reminds me how AI is being trained on a very narrow set of human inputs. Images, text, possibly some video, but there's a huge amount of data governing human behavior that is none of those things. Facial expressions, body language, tone, diet, environment, etc. etc. etc. I'm sure it's coming, but it feels a long way off.


> Because GPT-3 has trained on so much spam, it knows what spam looks like. It just needs a couple of examples, and off it goes.

it's seen so much spam it only needs a few examples of spam to work?


No, he's not contradicting himself.

He's saying: it has seen so much spam (without it necessarily having been labeled as such) already, that it only needs to see a couple examples labeled as spam in order to recognize the pattern in all of the spam it has seen before.


oh, gotcha.


Can I use it to pivot my csv files?


what is this "article" of which you speak?


The answer is "no, nobody read the article".


[flagged]


I lost interest in the article itself, but I'm currently down a rabbit hole of one of its links [1]. It's the webpage of some random (random to me that is) person, but it has quite a bit of content (and links to other content). I consider this find a win!

[1] http://www.catb.org/~esr/


When I was 12 I followed ESR’s guide on “how to become a hacker”. I wanted to learn hacking but didn’t know where to start or have anyone to ask. Started with html then c++ and using linux. For me, it’s probably one of the most important things I read, because it ultimately altered the trajectory of my whole life.

http://www.catb.org/~esr/faqs/hacker-howto.html


Similar story for me, was encouraged on IRC by ESR (was randomly inquiring about a GPS software project that he turned out to be supporting - go figure).

His advice was that given ten years almost anybody can become a professional software developer.


Like the OP, I owe much in my life trajectory to esr's How to Become a Hacker.

He's one of the people I've never met/barely interacted with who are dear to me.

He must be over 70 now, and it's been a few years he's neglected his- in past times- lively blog.

He used to be in financial straits, tried to launch a cultural initiative to draw support for Load Bearing Internet People like himself:

http://www.catb.org/esr/loadsharers/

He's still got a subscriberstar if you're looking to lighten your pocket.


Eric S. Raymond is a fairly prolific open source contributor, but he's probably most famous as an author and essayist, specifically of _The Cathedral and the Bazaar_, which arguably did more to popularize the notion of open source than Stallman ever did for free software [0].

I particularly like his series of posts about his modern rewrite of Crowther and Woods' Adventure, starting here: http://esr.ibiblio.org/?p=7536

[0] It helps that ESR's arguments where largely practical in nature rather than ideological. In particular, he coined the aphorism, "with enough eyes, all bugs are shallow".


Just finished "The Cathedral and the Bazaar". Good read. Having read "the innovators by walter isaacson", it's interesting to note how similar the two takes on the history are. It's almost like they're the truth! Glad you posted your comment here. It pushed me to actually dive in a read this essay.

Edit: O boy, I accidentally read "a brief history of hackerdom". Now to dive into what I actually meant to read


ESR is random to you? Oh youngling, much to learn you have.


Maybe images are the universal interface. With some of the advancements in ML, we have different decoders: image-to-text (OCR), layout information (object recognition), and other metadata (formatting, fonts, etc.).

Now, with diffusion-based models like Stable Diffusion and DALL-E, we have an encoder – text-to-image.

Natural analogy to how humans perceive the world and how we've designed our own human-computer interfaces.

[0] https://matt-rickard.com/screenshots-as-the-universal-api [1] https://twitter.com/mattrickard/status/1577321709350268928


> Seeing these quite disparate tasks being tamed under one unlikely roof, we have to ask – what other difficult problems can simply be transcribed into text and asked to an oracular software intelligence? McIlroy must be smiling somewhere.

This quote gave me pause about the entire article. It sounds like they're talking about some long gone philosopher, but according to Wikipedia, Douglas McIlroy, while old, is still kicking [0], and might be able to provide his own impression on the idea that large language models have any relation to the Unix philosophy, without anyone projecting their own beliefs onto him.

[0] https://en.wikipedia.org/wiki/Douglas_McIlroy


As a visual thinker I actually feel really limited by all these text interfaces. I hear a lot of people with aphantasia in this space though (sama, emad) so for them anything other than this idea is probably unthinkable but personally it really doesn't feel like the panacea that others make it out to be.


The idea is inherently very romantic and alluring for developers: the unix philosophy. Arguably yes, it makes total sense, but the difference is that the tools to both develop software and combine them via unix pipes made that a beautiful and simple experience. I don't think AI and language models are there yet. Ultimately we have to move beyond the text just slightly to passing some structured data, because then we can pipe it not between processes, but actual APIs and services.

In the end a model likely ends up being deployed upon a set of APIs with a textual interface for the end user. Command and control system with built in intelligence. Adept + Inflection are likely the thing that get us there, but it's more likely google develops it first or has because effectively Google Search is a text box that's being piped into AI at this very moment to give you the answers you're looking for. It's only a matter of time before that turns into buying, calling, creating, processing or ordering whatever you want too.


Text is universal because it's language, but it's not always the optimum interface.

Try pointing to a person deep in a large crowd versus using language. Language is also inexact and open to interpretation. Often simpler "direct" interfaces are more effective.


no text is near universal because it's language

>Often simpler "direct" interfaces are more effective.

and can probably be used by people who cannot use text or language.


Yep, direct interfaces are great 100% hit (success) practically. Plane console, sound engineer console. Basically the daily life, semantic encoded buttons are more universal. Physics (mechanics) e.g. push/pull + direction + semantics is universal in real life.


Almost anything can be modeled as a translation of one sequence of character to another sequence of text characters.


Well, you've basically just defined a Turing machine, so...yeah, anything computable.


Down to right before machine code, isn't it all just strings of characters aka strings of bytes aka strings representing assembly commands and values?


really loving that the first thing in this article is a photo


The by-line caught my eye. The author is the roon that coined the terms "wordcel" and "shape rotator".


So are you a wordcel or a shape rotator, apparently aka mathcel? [0]

[0] https://knowyourmeme.com/memes/cultures/wordcel-shape-rotato...


Define text. Pure ASCII or Unicode? If Unicode, how much Unicode should be allowed? How do you identify the end of a Unicode text stream, other than closing the input stream (or explicitly waiting for a out-of-band signal like a form submission)?


> Define text.

UTF-8 for all new projects.

> If Unicode, how much Unicode should be allowed?

I think the question is more like "how much should be supported", which is project dependent. If it's just a question of "allowance" then the answer is probably "all of it". If you're a UTF-8 processor, you shouldn't go out of your way to discard or disallow certain codepoints without a good reason.

> How do you identify the end of a Unicode text stream, other than closing the input stream (or explicitly waiting for a out-of-band signal like a form submission)?

I think the UNIX-y answer here is "end of text stream" and "closed stream" are the same. But if you do want to wrap text streams in another text stream you have a couple options: HTTP uses Content-Length and Content-Encoding: chunked (length-prefixing), while programming and markup languages use delimiter characters which must be escaped in the inner stream.

What you definitely should not do is, amusingly, what C does: Reserve a text character (NUL) to use as a delimiter, and hope that character doesn't appear in your content.


> What you definitely should not do is, amusingly, what C does: Reserve a text character (NUL) to use as a delimiter, and hope that character doesn't appear in your content.

I see this sentiment often, but I don't get it. Which human language uses U+0000. What glyph is it in any human language?

If we're using unicode to represent human languages, exactly what problem does U+0000 present?


Whether it's useful in human languages is irrelevant. It's a matter of supporting the entirety of a specified domain. You can't use NUL to mean something special if you purportedly support "UTF-8 data" because some downstream process (not necessarily an end user) might take you at your word and use NUL to represent something significant to them. Even with the fair assumption that no end user will ever use NUL to mean anything, only the highest layer of the stack can make NUL a reserved character. Better to just support the whole of UTF-8 and leave the sinning to downstream.


> Whether it's useful in human languages is irrelevant.

I dunno, I think this is the only relevant bits. The whole point of unicode is to represent human languages.

It's not, and was never sold as, a way to represent anything other than human languages.

Once you get into the argument that "Well, we need to be prepared for peer systems that use unicode for something other than human languages", you may as well give up.

> might take you at your word and use NUL to represent something significant to them.

Then they aren't supporting human languages anyway. And while I see this argument often, I have yet to see anyone using U+0000 for in any way other than simply ignoring it (because no human language uses it so there's no point in displaying it or accepting it as input), or using it as a terminating character.

If you have some examples that actually reserve nul for anything other than termination, I'd love to see it, because thus far all the warnings about how U+0000 should not be used as a terminator is a big nothing-burger.


It’s so easy to play games with specs like this though. I could choose any of a dozen of the original ASCII control characters, or U+FFFE, or some other character that I never expected to encounter and I’d probably be able to avoid ever hearing about the problems it caused, perhaps even because there were none (depending on how obscure the character). I maintain that it’s bad design. It’s not that much more expensive to wrap or escape your text properly. Hell, there are even unused bytes that can’t appear in valid UTF-8 that you could use instead. Just use FF and FE as your delimiters and sleep easy with full domain support.


You know, the funny thing is, I broadly agree with you :-)

If you're coding to the spec, code to the spec. If you're using something reserved (for something that it isn't reserved for) then you deserve everything you get.

The trouble is, I'm not going to get any problems: I have a better chance of winning the National Lottery Jackpot and never having to work again than running into a system that uses U+0000 for anything useful (other than termination).

If I read input until U+0000, there is never going to be a problem as that input did not come from a language and is thus not unicode anyway. If I am writing output and emit U+0000, the receiver (if they don't use it to delimit strings) isn't going to be able to display it as a language anyway, because there is no glyph for it in any language.

At this point in time, my feeling is that this particular ship has sailed. Enforcing U+0000 as reserved, but not for any language, is not a hill worth dying on, which is why I am always surprised to see the argument get made.

> Just use FF and FE as your delimiters and sleep easy with full domain support.

The BOM marks? Sure, that could work, but is has additional meaning. If reading UTF8, encountering 0xFFFE or 0xFEFF means that the string has ended, then this new string is a UTF16 string (either LE or BE), and you need to parse accordingly (or emit a message saying UTF16 not supported in this stream).


> The trouble is, I'm not going to get any problems: I have a better chance of winning the National Lottery Jackpot and never having to work again than running into a system that uses U+0000 for anything useful (other than termination).

It's not up to the wrapper protocol or UTF-8 processing application to decide whether the characters in the stream are sufficiently "useful". And you will encounter them, if you process enough text from enough sources. Web form, maybe not. But a database, stream processing system, or programming language will definitely run into NUL characters in text. They were probably terminators to somebody at some point, or perhaps they are intended to be seen as terminators by somebody downstream of your user. For example, I could write a Java source file that builds up a C in-memory layout, so my Java is writing NUL characters. I can write those NULs using the escape sequence '\0', or I could just put actual NUL characters in a String literal in the source file. Editors let me read and write it, the java compiler is fine with it, and it works as expected. That Java source file is a UTF-8 text file with NULs in it that I suppose are terminators... but not in the way you're implying. They're meant as terminators from somebody else's perspective.

But yeah, it is easy to look at that and go "no, gross, why aren't they just doing it the normal way, we're not supporting that". On the other hand, it's also easy to just support all of UTF-8 like javac and everything else I used in that example do.

> The BOM marks?

UTF-8 BOM is EF BB BF. FF and FE would appear in a UTF-16 stream, but never in UTF-8. In UTF-8 you can use the bytes C0, C1, and F5 through FF without colliding with bytes in decodable text. There is no "new UTF16 string" in the middle of your UTF-8 stream, at least not one that your UTF-8 processing application should care about. Your text encoding detector, maybe.


> UTF-8 for all new projects.

Unless and until Unicode gets proper support for Japanese, please don't do this.


I can’t tell from your comment, but the only thing I can think of that you might be referring to is Han unification?


Yep. Han unification in practice means unicode-based programs replace Japanese characters with Chinese characters that look different (and ugly). The unicode consortium alleges there are workarounds like switching fonts but none of them have ever been effectively implemented in practice; at best you get a program that will display Japanese correctly (at the cost of displaying Chinese poorly) if some obscure configuration option is set.


"Define text" is a pretty good question, but "ASCII" v. "Unicode" is frankly one of those deeply nerdy things that won't matter much at all in the grand scheme of all of this.

I suspect those familiar with pre-computer languages/linguistics would be able to come up with a pretty good working definition here, but something like "communication systems that utilize a relatively small number of repeating symbols to convey meaning, often associated with vocalizations?"


"Pure" ASCII would be 7-bit US-ASCII derived directly from the teletypewriter command set. It consists of exactly 128 character codes, of which roughly 30% are unprintable. Anything beyond that, such as code page 437, are not "pure". It should also be noted that the first 128 character codes of UTF-8 are "pure" 7-bit US-ASCII...


8 bit clean octet transfer; any interpretation of value and/or protocol other than an out of band end of stream / termination is up to the programs involved.

Is that a valid UTF-8 encoded JSON? Formatted newline delimitated record sets? Some formatting of data no one's thought up yet? Dunno, unless the program is instructed to do a specific thing with / to the data don't care.


> How do you identify the end of a Unicode text stream

"Why not just a zero byte?"


You'd need 4 of them to avoid ambiguity. The C string delineation model has a lot of issues we probably don't want to keep. :)


In some encodings? One zero is sufficient for UTF-8.


> define text

Text interface is any non-GUI?


There are many more types of interfaces than GUIs and text-based interfaces. Voice user interfaces (VUIs), for instance, which have become commonplace in recent years, especially with virtual assistants like Alexa or Siri.


VUI's are just text interfaces with a different input system and more error checking. They implement operations that give you convenience in exchange for versatility.


Even if the two types of interfaces share strong similarities in their most commonplace usage, they are nevertheless accessed through two distinct mediums (from which they get their names (TUI / VUI)). Text is not speech, and speech is not text, and the possibility space of what can be realistically (in a human-friendly manner) provided as input or output on each is quite different. On VUIs, for instance, you can do things like query for a song by humming it [0], which would be very difficult to provide a purely textual alternative for for use in a TUI, whereas text-based interfaces make it easier to do things like looking at large swathes of structured content or copypasting a long piece of text to use as input.

[0]: https://blog.google/products/search/hum-to-search/


Where is the line between GUI and TUI? Or TUI and CLI?

Also, where does a system like Genera fits? Was it CLI, TUI or GUI?!


GUIs can have representations of text, and text input fields though...


ASCII and unicode are subtly different things.

you should compare ASCII with one of the UTF encodings of unicode.

the 'unicode' for ASCII's would be just the english alphabet extended with digits and a few more special control characters.


That's what the GP is saying. You can't really tell a priori what encoding to use for a stream of "text" (bytes). Without some sort of metadata about the stream you just have to guess. Convention will help you make an informed guess but it's not guaranteed to be correct. Then stuff breaks in unexpected and stupid ways.


Constraining Unicode encodings to a fewer than 4 bytes means we limit how many countries can use text interfaces in their language. Or how much data from those countries can be passed between programs.


We do not have enough countries to fill up all that space. For UTF-8 with a 4-byte restriction, less than 18% of the available space is currently allocated to blocks.


Text is a very high level construct. I wouldn't call it universal.


"Text Is the Universal Interface"

I've never understood this 'truth'.

How is "Temperature: 45C" somehow easier to parse/process than 0x01 0x00 0x00 0x00 0x4D

You still always have to consider the exact structure of the text (<Field> <colon> <space> <value> <unit>) same as any binary format.


In this context, `0x01 0x00 0x00 0x00 0x4D` is still text.


What is text though?

Aren't bits the universal interface? Bytes have different sizes, text has different encodings, ...


With regard to GPT3, it doesn't know what bits are. I'm guessing doesn't know what bytes are either, and that it operates on the level of some subset of unicode codepoints, with one input-layer and output-layer neuron for each. But I could be wrong, a quick google does not immediately confirm or refute.


Below is a short conversation you inspired me to have with OpenAI's text-davinci-002, mainly because it sounded like it'd be cute rather than because I think it proves anything (I'm no expert prompt engineer):

====

What is the byte value of the letter "c"?

The byte value of the letter "c" is 99.

Answer the next question with a sequence of byte values, one per letter, instead of with normal text.

What animal says "neigh"?

The animal that says "neigh" is a horse.

That was incorrect, you answered with normal text. Try again:

What animal says "neigh"?

The animal that says "neigh" is a horse.

The byte value of the letter "n" is 110, the byte value of the letter "e" is 101, the byte value of the letter "i" is 105, the byte value of the letter "g" is 103.

That was closer, but still not quite right. Here's an example:

What animal says "meow"?

99 97 116

Try again:

What animal says "bark"?

The animal that says "bark" is a dog.

The byte value of the letter "b" is 98, the byte value of the letter "a" is 97, the byte value of the letter "r" is 114, the byte value of the letter "k" is 107.


There is no meaningful difference. The choice of symbols to represent an alphabet is arbitrary.


So long as you can reference the correct context for interpreting the data, see: datalisp.is.


Console/Shell? Learn once, use forever. Graphical GUI? Re-learn your interface every 5 years (or even shorter) when the some company decides it needs to change.


I can never remember all the flags and syntax. Also I hate writing parsers all day which is what you really have to do with text interfaces. UIs are discoverable.

There are some compromises in the middle somewhere. I'm liking Apple Shortcuts when it fits the problem domain I am within.


It depends upon the command line interface. A Cisco router CLI is very discoverable. Sitting at the prompt, hit '?' and get a list of commands. Type the command, then '?', get a list of options. At each point in typing out the command, you can always get a list of what you can type next. And even better, you only have to type enough to distinguish from multiple commands/options. So 'show interface Ethernet/0' and 'sh in Eth/0' are the same command.

And on Unix, there's always the man command. Also, most commands will repond to "command -h" or "command --help", but yes, it's a far cry from the Cisco router command line.


Oh god no I was a CCNA. IOS is horrible.

manpages aren't always that great. Unless they are on a BSD derivative...

PowerShell had some good ideas about making commands discoverable but the nuances and implementation really killed it for me.


I do forget flags as well, quite often. But instead disliking the console for it, i open the a manual with `man command-name` and use the '/' and 'n' keys to search for that option with good success so far.

I do get mad tho when a program doesn't have a manpage.


> I can never remember all the flags and syntax.

That's a feature, not a bug.

If you find yourself often needing a specific arcane sequence, then wrap it up in a command (shell alias, function or script) so that you don't have to always have to type `-r -v --bad-copy -c -m`.

If you find yourself always needing a different set of arcane flags, then wrap it in a wish script or similar (I've made a pomodoro alarm that generates notifications using nothing but zenity).

With GUIs, OTOH, if you find yourself forgetting down which rabiit-hole-menu an option lives in, you're simply out of luck.


> I can never remember all the flags and syntax.

Why not write your own wrapper programs? I wish I could modify GUI menus that easily.


I for one endorse a general, casual and technical re-emphasis on the shell. That and a pastiche of Die Neue Typographie but geared toward GUIs.


Human centric: mechanics (e.g. push/pull + direction) + semantics is universal interface in real life. Lots of semantic encoded buttons run the world.


History repeats itself.


Could you say more about it? Do you mean, we are going back to text based terminal? Sorry. Cannot follow.


In a sense.

I'd argue the first major interface "breakthrough" after the iPhone was Siri/Alexa et al.


[flagged]


Computer software came to maturity in 2017 with the release of Fortnite.


text is overrated. text is valued because its a powerful tool for command and control of people. imperialist apologists like text.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: