Uppestcase and Lowestcase Letters

euske · on April 2, 2021

Fun fact (not fabricated BECAUSE IT'S APRIL 2ND OVER HERE):

Japanese alphabets (kana) doesn't have a concept of upper/lower cases. There are two different types of kanas (round ones and square ones) but they are kind of equal in terms of strength/stress so they can't be used to express anger.

People on 2-Channel forum (Japan's 4chan, basically) came up with a brilliant idea, which is to insert a space between each letter so that it looks a bit wider and has an extra oomph.

Example:

- normal: 今日はいい天気だね。 (it's a fine day, isn't it.)

- angry: 今日はいい天気だね。 (IT'S A F*KING NICE DAY ISN'T IT)

maxnoe · on April 2, 2021

This is a form of typographic emphasis also widely used in German Fraktur texts, called Sperrsatz:

https://de.m.wikipedia.org/wiki/Sperrsatz

HelloNurse · on April 2, 2021

Fraktur letters are even more boxy than kanji and Chinese ideographs, it isn't accidental.

Clewza313 · on April 2, 2021

The standard/formal way to emphasize in Japanese is with bōten "side dots":

https://www.japanesewithanime.com/2018/03/furigana-dots-bout...

However, there is no easy way to enter or display these online, hence hacks like spacing.

jfk13 · on April 2, 2021

The CSS text-emphasis property seems to work in both Safari and Firefox:

data:text/html,<div style="font:16px monospace; text-emphasis:red filled circle">Hello World

Unfortunately it doesn't seem to be supported in Chrome yet.

totallymike · on April 2, 2021

Delighted by learning this little hack for showing an html thing that you can just paste into the address bar. Thanks for showing it to me :)

afiori · on April 3, 2021

you might also like the instant text editor tab

data:text/html, <textarea style="font-size: 1.5em; width: 100%; height: 100%; border: none; outline: none; font-family:monospace" autofocus />

mostly useless but I like it

Tucanix · on April 3, 2021

Very short version: data:text/html, <body contenteditable>

nvader · on April 9, 2021

You can also paste into a <body contenteditable>.

When you land on a site with a delayed paywall, you can copy out the content just before the popup, and then paste it into <body contenteditable> to view at your own leisure.

infogulch · on April 2, 2021

> enter or display

Hey Unicode Consortium, where you at?

polm23 · on April 2, 2021

There's actually Unicode and CSS support for it, but it just isn't widely integrated / used.

https://ja.wikipedia.org/wiki/%E5%9C%8F%E7%82%B9

hrktb · on April 2, 2021

Writing horizontally the dot would be above, so I'd assume there's support by composition.

It would still be a problem of IME and font support.

On input, there's already so many shortcuts and hacks (e.g. SHIFT/CAPS is already taken to force switch from hiragana to katakana) that it's hard to imagine some natural combination that could be memorized.

On the font side, there's already the battle raging for having proper jp fonts in smarphones instead of the chinese ones, so additional support for marginal features is an uphill battle to say the least.

bargle0 · on April 2, 2021

[flagged]

naikrovek · on April 2, 2021

They accept suggestions, and people suggest emoji.

So suggest something that isn't emoji and it'll likely be accepted and added to the next version.

jmmcd · on April 2, 2021

That doesn't help

naikrovek · on April 4, 2021

Suggest the characters that you feel are missing... Was I not clear?

jmmcd · on April 8, 2021

Hmm, it looks like the discussion about emoji has been removed/moderated from the page?

Anyway, my point was only that if someone is unhappy that emojis are being added, then proposing new non-emojis doesn't really solve their problem.

fouc · on April 2, 2021

emoji originates from japanese mobile phones

OscarCunningham · on April 2, 2021

If you're ANGRY can you put several dots?

stavros · on April 2, 2021

That's an interesting fact, but I can't help but feel like your comment is a bit too unaware of its own culture. After all, there's nothing that makes capitals inherently shouty, it's just another convention.

Saying "they didn't have capitals so they used spaces" would sound odd to an alien, who would wonder why capitals were necessary in the first place.

mekkkkkk · on April 2, 2021

But he's not talking to aliens though. We all know what putting stress or emphasis on words means. I'd be surprised if that was an exclusively western phenomenon.

It's fun to hear how it's done in scripts that doesn't support our (as in the average HN reader) default way of doing it (which would be caps or italics/bold).

stavros · on April 2, 2021

Yes, but the comment said:

> Japanese alphabets (kana) doesn't have a concept of upper/lower cases [...] so they can't be used to express anger.

It reads to me a bit like "the natural way to expess anger is uppercase, but they didn't have that, so they did the other thing instead".

Would you find a sentence like "English didn't have dots so they used uppercase to express anger" equally natural?

mekkkkkk · on April 2, 2021

To assume that an English reader would know what "dots" are in your example seems unreasonable. As the sibling comment said, if you wrote that in japanese it would make perfect sense. The sort of meta statement would be "That other language doesn't have X, which the one I'm using has. It instead uses Y, which has the same approximate meaning.".

afiori · on April 3, 2021

The equivalent would be "in Eglish you cannot use special marks for emphasis (as they do not exist) so they different variants of their alphabet".

The comment is premised by how in japanese there are "two" sets of "letters" with slightly different uses but differently from lowercase/uppercase the difference does not translate in emphasis.

Uppercase letters are linguistically related to emphasis independently form net-speak (proper names, I, beginning of sentences), so to me it reads as "the thing that naturally works here is impossible there, so they have a different solution to the problem".

It is fine to take the perspective of your own cultural situation.

nightpool · on April 2, 2021

yes, especially when written in Japanese

seesawtron · on April 2, 2021

Many of the Asian languages do not have the concepts of upper or lower case.

They contain vowels that go beyond the basic 5 "aeiou" of the English language (eg. see [0]). These suffice to let the speaker know how exactly to say the word unlike in English where it has to be learned case by case based on whatever is popular or acceptable pronunciation.

A subset of of these languages are tonal languages which also have special characters to additionally allow the speaker to set the pitch of the word correctly which then changes the meaning of the words [1].

[0] http://www.bbc.co.uk/languages/other/hindi/guide/alphabet.sh... [1] https://en.wikipedia.org/wiki/Tone_(linguistics)

banana_maker · on April 2, 2021

"These suffice to let the speaker know how exactly to say the word unlike in English where it has to be learned case by case based on whatever is popular or acceptable pronunciation."

This is more due to how widespread English is, and how vowels/pronunciation have shifted over time. For example, The Great Vowel Shift. [0]

Korean has exactly the same issues, albeit to a smaller degree. There are plenty of words that aren't pronounced like how they're spelled, due to grammatical rules. 종로 as an example. Or cases where words sound exactly the same and you just need to know the context/spelling- 쫗다, 쫒다, 쫃다, 쫏다, 쫓다.

Then there's regional slang/pronunciation/dialect. Busan dialect is fairly different from 표준어, or the "official" standard language. This phenomenon is not unique to English in any way. Any language scaled up will develop these issues over time.

At least you can try to pronounce "cough" or "종로", instead of not being able to pronounce "願" at all because you don't already know the pronunciation.

[0]: https://en.wikipedia.org/wiki/Great_Vowel_Shift

AnIdiotOnTheNet · on April 2, 2021

> This is more due to how widespread English is, and how vowels/pronunciation have shifted over time.

I'd go so far as to say that english is now spoken by so many people in so many different regions, all of whom can now be heard by each other on a reasonably frequent basis, that it has forced english speakers to become so adept at vowel-reconstruction that one could pronounce words with completely arbitrary vowels and still be understood.

banana_maker · on April 2, 2021

"it has forced english speakers to become so adept at vowel-reconstruction that one could pronounce words with completely arbitrary vowels and still be understood."

This was still occurring before people were able to widely hear other regions' speakers, though.

However things like cough, plough, although, thorough, etc having different, -correct- pronunciations are due to English taking in words from other languages.

munificent · on April 2, 2021

> These suffice to let the speaker know how exactly to say the word unlike in English where it has to be learned case by case based on whatever is popular or acceptable pronunciation.

You can have three things:

1. A spoken language that evolves over time.

2. A writing system that accurately describes pronunciation.

3. A writing system that indicates history and etymology.

But you only get to pick two. English went with 1 and 3, which is arguably the optimal choice.

mwcampbell · on April 2, 2021

Why do you think #3 is more important than #2? Why is preserving history and etymology more important than ease of learning for new writers? Put another way, why should kids have to struggle with spelling so we can have a writing system that preserves linguistic history?

munificent · on April 2, 2021

Three reasons:

1. Because it's more important to know what words mean than it is to know how they sound.

My daughter reads a ton and learns a lot of words from reading. Fairly often, she mispronounces them, and that's OK. What's more valuable is that she can often infer the correct meaning of the word both from the surrounding context and from the parts that the word is made of. If we normalize spelling to match pronunciation, much of the latter gets lost.

It's easier to see that "mean" and "meant" are related than "meen" and "ment". "History" and "story" versus "histery" and "story".

2. Because pronunciation changes over time. If we continuously change spelling to match, it means older printed works get harder to read. In the worst case, they can appear to be saying different words than they intended.

3. Because pronunciation isn't uniform across regions.

Should "lawyer" be spelled "loyer" or "lawyer"? Is "crayon" spelled "crayon", "crayawn", "cran", or "crown"? Is it "caramel" or "carmel"?

mekkkkkk · on April 3, 2021

That's a fascinating idea. Where would you place Japanese and Chinese?

taejo · on April 3, 2021

1 and 3 to some extent. Japanese makes hardly any attempt at 2.

orthoxerox · on April 2, 2021

> They contain vowels that go beyond the basic 5 "aeiou" of the English language

Even English itself has vowels that go beyond the basic "aeiou" of "the English language".

louthy · on April 2, 2021

I’m guessing what you mean, but maybe...

* the letter Y

* vowel pairs, like ‘ou’

* accents, like in ‘précis’

Is that what you mean?

kragen · on April 2, 2021

Probably they mean spoken English, rather than written English. Written languages are traditionally considered secondary to spoken ones in linguistics, perhaps because they tend to be acquired several years later in childhood and several millennia† later in history. English is normally considered to have about 13–15 vowels, if we exclude the rhotics, depending on dialect: TRAP BATH PALM LOT CLOTH THOUGHT KIT DRESS STRUT FOOT FACE GOAT FLEECE GOOSE PRICE CHOICE MOUTH COMMA LETTER HAPPY, in Wells's standard lexical sets.

But, you say, that's 20 lexical sets, not 13–15? Well, no dialect distinguishes all 20. My idiolect (a slight variant of General American) realizes TRAP and BATH as [æ], PALM and LOT as [a], CLOTH and THOUGHT as [ɔ], KIT as [ɪ], DRESS as [ɛ], STRUT as [ʌ], FOOT as [ʊ], FACE as [ei], GOAT as [ʌu], FLEECE and HAPPY as [i], GOOSE as [u], PRICE as [ai], CHOICE as [ɔi], MOUTH as [æu], and COMMA as [ə]. That's 15, or 12 if you leave out PRICE, CHOICE, and MOUTH, which are diphthongs made of vowels that also occur isolated. (GOAT is debatable, usually analyzed as [oʊ] or [ou].)

Different dialects draw the boundaries in different places; for example, dialects with the "trap–bath split", such as RP, famously realize TRAP and BATH differently ([æ] and [a] in RP). Some dialects have fewer vowels; if we consider Indian English to be a single dialect, it may have more speakers than even GA, and most varieties of Indian English have fewer vowels than 12. I haven't found a good phonological analysis, but if you know any Indian English speakers and also know phonology, you know what I mean. https://en.wikipedia.org/wiki/Regional_differences_and_diale... goes into some detail.

______

† The historical gap might be much larger than this. Sumerian cuneiform and Egyptian hieroglyphs date back about 5300 years, and they provide evidence that spoken language was considered to be universal among humans at the time—there is no suggestion of tribes that lacked language anywhere in the written record. Today there are still peoples without written language, and a few who only acquired written language within the last generation. So we have good evidence that it has taken at least 5300 years. But Homo sapiens has been around for sixty times that long, over 300 millennia, and stone tools date back 2 million years. It strains credibility to imagine that the authors of the Lascaux cave paintings or the Denisovans who invented sewing were so unlike us as to lack speech; the origin of spoken language is usually dated to before 40kya. Unfortunately, no tape recorders have yet been found from that epoch, so the uncertainty of the antiquity of spoken language ranges over nearly a factor of 100. Maybe spoken language is a million years older than written language, or five million. Probably not ten million, though, or we'd be studying chimpanzee folklore.

lmohseni · on April 2, 2021

Phonetically, English has about 9 or 10 vowels, depending on accent. Consider bill vs bike, or goose vs look.

mekkkkkk · on April 2, 2021

I love how explicit some written languages seem to be. It sounds great to be able to reliably pronounce any word perfectly. I suppose it's a trade off of complexity though. Learning these more explicit languages seems really daunting. Maybe it's just bias?

bonoboTP · on April 2, 2021

English and French are the oddballs here. In virtually all other European languages you can reliably pronounce any written word of the language. You don't have to go to Asia for this.

mekkkkkk · on April 2, 2021

I don't know about that. I'm Swedish and there are a lot of words in our language that is impossible to deduce the pronunciation of. I'd assume the same is true for the other scandinavian languages as well since they are very similar. Perhaps we are oddballs as well, but it seems unlikely.

elliekelly · on April 2, 2021

You might find the IPA interesting. With maybe an hour of studying to learn the letters/symbols and mouth movements you can reliably pronounce any word in any language so long as you’ve got the IPA spelling.

mekkkkkk · on April 2, 2021

I've actually looked in to IPA at one point. It is extremely useful when learning the basics of a new language. It would be very tedious to try to look up every new word you come across though. Alas, the worst part is that you don't know which words are pronounced differently than you assumed until you hear it or someone raises their eyebrows.

nicky0 · on April 2, 2021

I can't help but feel your comment is a bit too unaware that it's reasonable to assume the reader is familiar with the capitals=shouty convention.

salawat · on April 2, 2021

I don't understand people that read ALL CAPS as shouty, and in fact, it was ine of my first culture clashes on HN. I find the italicized emphasis mode to be harder to read and recognize.

Maybe it seems so odd to me because there are so many licenses, contracts, or government forms that use ALL CAPS as emphasis. I don't know.

It just doesn't translate to shouty in reading mental voice.

Sharlin · on April 2, 2021

Yeah, it's probably just you.

To me (and probably most others), license texts and such absolutely look like they are shouting. I do not understand whence the convention of having them in ALL CAPS, and can only assume it's in itself some sort of a cultural association between ALL CAPS and IMPORTANCE. I have only seen it in English legal texts, anyway – is it even used in other languages? It looks to me like ALL CAPS was originally used to emphasize key points, and then an inevitable race to the bottom happened until EVERYTHING WAS IMPORTANT which really means that nothing is important.

When it comes to typography, nearly every type of emphasis employed in Western text except italics (and sᴍᴀʟʟ ᴄᴀᴘs, which see too little use these days methinks) only exist due to technological limitations, particularly the extremely limited typographic options available to typewriters and, later, 7- or 8-bit text terminals. This includes ALL CAPS, s p a c i n g, and u͟n͟d͟e͟r͟l͟i͟n͟e͟d͟, never mind ASCII crutches like /pseudoitalics/, _pseudounderlined_, and ∗pseudoboldface∗.

drewzero1 · on April 2, 2021

I don't tend to read it as shouting unless the contents are clearly angry. I've always seemed to parse it as more of a monotone 80s/90s computer voice. I think growing up using DOS and BASIC made me just associate all-caps with computers.

Honestly I'm a little surprised that people who've likely seen their share of BAD COMMAND OR FILE NAME would still read that as shouting.

lifthrasiir · on April 2, 2021

There is a similar thing in Korean Hangul (also unicased) where you put full stop between each letter: "알겠습니다." ("I see.") vs. "알.겠.습.니.다." I believe it is an independent invention.

thaumasiotes · on April 2, 2021

In Chinese you use exclamation points:

我！太！满！足！了！

banachtarski · on April 2, 2021

소.데.스.네. ^^

lifthrasiir · on April 2, 2021

That kind [1] of language-script mismatches mainly for humorous purposes actually exists in Korean and is considered a kind of 한본어 (a portmanteau of 한국어 Korean language and 일본어 Japanese language).

[1] In this case, Japanese そうですね "I see" written in Hangul.

tasogare · on April 2, 2021

Chinese phonology is a bit restrictive but I have great fun written short messages in other languages (Japanese, French, English) with Chinese characters. Of course the number of friends I can do that why is very limited, which in a way makes it is even nicer.

banachtarski · on April 3, 2021

Yea, I do it all the time in Korean when joking about something with friends. 친구들이랑 그렇게 얘기해요

kmeisthax · on April 2, 2021

Wait that's the opposite of what westerners do with fullwidth romaji

- normal: It's a fine day, isn't it.

- vaporwave: Ｉｔ＇ｓａｆｉｎｅｄａｙ，ｉｓｎ＇ｔｉｔ．

cookiengineer · on April 2, 2021

Every time I'm reading these letter or the mega-wide numbers in Japanese uploads on youtube, I'm having a heart attack thinking my font cache is broken again.

Not. funny.

qiqing · on April 2, 2021

That's a very good example of using full stops in English for emphasis.

aviraldg · on April 2, 2021

Isn't Katakana (for words that would normally be written with Hiragana or Kanji) also used sometimes to indicate emphasis? https://japanese.stackexchange.com/a/5314/31389

fomine3 · on April 2, 2021

Yes. writing word in kanji or katakana sometimes works as emphasis. In other words, writing word in hiragana works as not to be emphasised.

Writing word in katakana that usually written in kanji is also works as emphasis with a bit different meaning. It tend to be used as stereotype. For example, 福島(Fukushima) / 広島(Hiroshima) is just a name of prefecture, but sometimes written フクシマ / ヒロシマ that refers nuclear plant accident / nuke bomb event. (I really dislike this usage).

nonbirithm · on April 2, 2021

Sometimes all-katakana is used in fiction to indicate foreigner or robotic voices (like the Starmen in Mother 2). Writing a scream as a string of ア's instead of あ's gives it a piercing quality, moreso when you add the dakuten marker (゛), even though it doesn't change the pronunciation in this case.

numpad0 · on April 2, 2021

More of an indication that implications exist, e.g. ガンバる implies it’s supposed to be but not in kanji, ヒロシマ or フクシマ implies nuclear context.

They can be used to convey tones in text similarly to how italics, caps, symbols and other decorations work in general, I think that’s what they meant to say by emphasis.

dheera · on April 2, 2021

No concept of case is also true of Chinese, Korean, Arabic, and if I'm not mistaken, most South Asian scripts as well.

There are an incredible amount of other ways to add emphasis in Chinese though, so it's not lacking anything, and I imagine the same is true of the other languages.

dhosek · on April 2, 2021

Case is a uniquely European script phenomenon, and one that came late in the development of most of the scripts (Cyrillic, because it was the last of the European scripts to be developed has the shortest time between its unicameral origins and the development of upper- and lowercase).

I have a book published in the 1920s with a forward by Stanley Morison, by an author who attempted to enhance the Hebrew alphabet by introducing upper and lower case letterforms to it as well as to bring the letter forms more in line with the styles of the Latin-Greek-Cyrillic alphabets. It's—odd.

lifthrasiir · on April 2, 2021

> I have a book published in the 1920s with a forward by Stanley Morison, by an author who attempted to enhance the Hebrew alphabet by introducing upper and lower case letterforms to it as well as to bring the letter forms more in line with the styles of the Latin-Greek-Cyrillic alphabets. It's—odd.

Before the eventual standardization of Hangul around early 20th century there were numerous attempts to "linearize" Hangul's characteristic syllabic blocks ("풀어쓰기" [1]). Many of them were influenced by Western alphabets and had two cases, and none were successful. And yes, they are also odd.

[1] https://en.wikipedia.org/wiki/Hangul#Linear_Korean

dhosek · on April 2, 2021

When I was an undergrad, I wrote some algorithms for composing Hangul letters into the "ideographs" in Metafont. It was kind of fun to build. The whole east-Asian font project was too ambitious and never got finished though. I was trying to enable algorithmic composition not just of Hangul but also Kanji/Hanzi from radicals but the latter was not as amenable to algorithmic composition.

spijdar · on April 2, 2021

And it's worth mentioning how relatively recently case was "invented" even for Latin and Greek/Greek-derived scripts (including Cyrillic, and probably Latin itself technically...)

The way we write modern text is more modern than most people realize, I think. The letter 'j' wasn't really used as a separate letter indicating a separate sound until sometime around the 16th century I think! Case is older, but not ancient. I've seen some Greek on 3rd/4th century middle eastern ruins and I struggle to read the "all upper case with no spaces" writing sometimes, but that's just how it was! No "lower case" until much later...

Talanes · on April 2, 2021

The characters we think of as lower case were starting to take familiar forms around the 3rd century, but they were just the handwritten form of the language. One set of letters with clear sharp lines that can be worked into stone, and another with curves that can be quickly handwritten.

thaumasiotes · on April 2, 2021

> One set of letters with clear sharp lines that can be worked into stone

This was never a goal; look at B ϴ O P Φ Ψ Ω, or on the Latin end C G Q R S.

Inscriptions are very formal; you carve the letters you have no matter what they look like. There was never any difficulty carving curves.

Handwriting is informal; you write whatever you find easiest.

10000truths · on April 2, 2021

Arabic script has a vaguely similar concept of a given letter having multiple (up to 3) forms. The form depends on what letters precede and/or succeed it (if any).

kingofpandora · on April 2, 2021

That has nothing to do with emphasis though.

efreak · on April 7, 2021

Neither do uppercase letterforms. Uppercase individual letters aren't emphasized, that's just grammar, much like Hebrew's final letters that are only used at the end of words (I'm guessing these indicated the ends of words to assist reading when spacing wasn't used--I've certainly found it useful as such).

davchana · on April 2, 2021

Gurmukhi script also has no case difference, but has something like symbols, which adds sounds like nn ann, u in put or cut

underlines · on April 2, 2021

I learned Thai in the last 10 years and they don't have letter cases. I naturally began to use spacings for emphasis in "Chat language". It's so natural to do this.

กาก -> fail

ก า ก ก ก ก ก -> FAIL!!!!!11!

jiofih · on April 2, 2021

a n g r y

this is also used in English but for a mocking tone.

totetsu · on April 2, 2021

kana's uppercase potential exists out there somewhere in latent space

nullsense · on April 2, 2021

They do have half width versions though that are used when typing sometimes あ -> ぁ like if you want to indicate stretching vowel sounds out for emphasis.

hudixt · on April 2, 2021

This should be same for many language. Many Indian languages doesn't have a concept of upper/lower cases.

amake · on April 2, 2021

Kana are not alphabets; they are syllabaries.

simonebrunozzi · on April 2, 2021

I love Japanese culture and find myself really curious about it. Thanks for sharing this.

Is there a place where I could learn more things like this one?

xmprt · on April 2, 2021

The only difference I see is in the kerning of the letters.

jagged-chisel · on April 2, 2021

yes. that's pretty much what GP said.

wpearse · on April 2, 2021

Let’s be honest... that’s exactly what GP said ;)

koliber · on April 2, 2021

This is the most brilliant thing I've watched in a long time. I love the various ways Tom took this. His sense of humor, obscure references, and ability to tell a good story are outstanding. I also started watching the other videos and they're just as good.

Tom, if you are reading this, I am curious what a capital smiley face emoji looks like. Or the capitalest. What does nirvana look like in a twittable small yellow circle? Or a lowercase one. Or lowestcase one. What is the graphic depiction of the pit of despair.

You got me thinking. Sure, emojis are in color. But could you make them black and white? Or apply the transform to each color channel independently. I'm convinced there is a way, and if there is anyone who can find it, it is you.

And if you do go this way, why stop at emojis? The world needs to know what a capital Mona Lisa looks like. Or the capitalest.

I think you're on the path to general artificial intelligence here. iA.

BlueTemplar · on April 4, 2021

Emojis have a much more letter-like non-color fallback.

ashertrockman · on April 2, 2021

SIGBOVIK is proud to have published this groundbreaking work in its 2021 proceedings, which were released today.

Also, check out the author's video: https://youtu.be/HLRdruqQfRk

For more non-serious research that is often executed seriously, see: http://sigbovik.org

bla3 · on April 2, 2021

If you haven't seen tom7's youtube channel: Every single video at https://www.youtube.com/c/suckerpinch/videos is a treat. Mind-bending, deeply funny, and amazingly work-intensive.

kibwen · on April 2, 2021

Seconded, thirded, and fourthed. My favorite is "NaN Gates and Flip FLOPS", where he dismisses 0 and 1 as being too ugly to be the foundation of all computation and reinvents computing via the beauty of rational numbers, which is to say, the horror of IEEE754 floating point (along with a working hardware implementation!): https://www.youtube.com/watch?v=5TFDG-y-EHs

nonbirithm · on April 2, 2021

Tom7 will probably be the only person ever to bring up strange loops as part of a self-referential narrative for reverse engineering an NES and programming a slideshow presentation with the diagrams of that reverse engineering... on the NES itself.

xmprt · on April 2, 2021

His video on anagraphs is interesting (https://www.youtube.com/watch?v=qTBAW-Eh0tM) especially if watched alongside his followup video proving that generalized kerning is an undecidable problem (https://www.youtube.com/watch?v=8_npHZbe3qM)

jameshart · on April 2, 2021

I love the concept and execution but it really is so sad that the results weren’t more compelling.

I do wonder whether the problem is that training on multiple fonts just doesn’t actually add more data points to the dataset. Fundamentally, you’re going to get a model that knows how to uppercase or lowercase each canonical Latin letter. This model might actually be quite good at generating appropriate lowercase forms for a font given the uppercase glyphs, but it’s not learning an abstract concept of ‘lowercaseness’.

Instead of extending the training set with more fonts, the only real source of additional training data would be more alphabets. Cyrillic and Greek both have case systems so just adding those would have more than doubled the number of training cases - I think the trick would be to start from the Unicode case mapping tables to generate your example data, to give your model more variety of upper/lowercase pairs to get its teeth into and really make it possible to ask it to uppercase an arbitrary letter form.

gwern · on April 2, 2021

The approach he used didn't really make sense. I get the impression Tom is so dedicated to his old custom hackedup NN framework that he tailors the approach to what he could easily do in it.

If you asked someone else, "I want to automatically turn characters into 'more uppercase' or 'more lowercase' versions, and it has to involve neural networks; what do?", they would say something like "Easy enough! dump character maps into a standard StyleGAN2-ADA, train for a few days, then find the latent direction corresponding to uppercasing & lowercasing, and edit whatever font you please." (People have been generating fonts with RNNs or GANs for ages.) You could do this over a week or so, the tooling has gotten pretty easy to use.

And if you asked someone on the cutting-edge, they'd suggesting using OA CLIP through Aleph/BigSleep/etc to automatically edit images using a text input prompt of "a lowercase letter" or "an uppercase letter", or one of the hybrids like StyleCLIP https://github.com/orpatashnik/StyleCLIP . This approach might take all of an hour or two (but results probably would be worse).

mmastrac · on April 2, 2021

I still keep Tom's old fonts around on my computer for various purposes. They have a real character that I haven't found in other fonts: http://fonts.tom7.com/

It would be interesting to see if AI could be used to fill in some of the gaps in the fonts - ie: numbers for his Angstrom font (https://www.dafont.com/angstrom.font)

lunixbochs · on April 2, 2021

Something like mixup augmentation might work here as well for training - blend two SDFs and say "this is 20% A and 80% B".

Someone could also train a model that given some glyphs of a font predicts the rest of the glyphs. Then we can do weird things like give it glyphs from multiple fonts as input to make a hybrid font.

efskap · on April 2, 2021

Personally I'm curious about training a model to predict Cyrillic or Greek letterforms from Latin ones to boost multilingual coverage. I'm sure there are some learnable relationships there like Я <=> R or И <=> H.

wwwwewwww · on April 2, 2021

I can't tell the relationship in your example.

Do you mean relationship in terms of appearance or pronunciation?

Я maps to "ya" sound in Russian, И is "ee" and Н is "N" in Russian and "ee" in modern Greek but in ancient Greek it's something like "e" in "bed" in American English.

efskap · on April 2, 2021

Appearance. I'm just talking about typefaces - absolutely nothing to do with pronunciation.

Some fonts do not have Cyrillic glyphs. If they could be generated by an adequately trained model based on the other glyphs, then that font's multilingual coverage could expand automatically.

wwwwewwww · on April 2, 2021

I get it now. That's interesting - generating missing letterforms!

necovek · on April 2, 2021

GP likely meant the relationship in appearance: the two examples have Cyrillic glyphs closely resemble mirrored outlines of the Latin glyphs.

lunixbochs · on April 2, 2021

Very interesting! Well, I'm playing with the concept a bit now. Currently writing a skia client to generate glyph SDFs as images so I can just use pytorch.

fouc · on April 2, 2021

Which font do you use for development? ;)

bckr · on April 2, 2021

Okay, based on the introduction of the linked paper, this is brilliant. I have seen few if any texts like this one, which treat a fun + interesting + simple idea with fun language while also presenting it in the form of a research article.

Do give this a chance

EDIT: I forgot what day it is. I really like the idea though and I think with a little more work it could really deliver

isoprophlex · on April 2, 2021

The linked paper is a wonderful read indeed!

This guy is the Terry Pratchett of home-grown deep learning libraries...

So I did that and let it run for a month. Actually I had to start over several times with different parameters and initialization weights because it would get stuck (Figure 11) right away or as soon as I looked away from the computer. I prayed to the dark wizard of hyperparameter tuning until he smiled upon my initial conditions, knowing that some- where he was adding another tick-mark next to my name in a tidy but ultimately terrifying Moleskine notebook that he bought on a whim in the Norman Y. Mineta San Jose International Airport on a business trip, and still feels was overpriced for what it is.

bbischof · on April 2, 2021

My favorite line in this wonderful paper is:

The database is just filled with garbage that is unusable for this project: Fonts that are completely illegible, fonts that are missing most of their characters, fonts with millions of control points, Comic Sans MS, fonts where every glyph is a drawing of a train, fonts where everything is fine except that just the lowercase r has a width of MAX INT, and so on.

kebman · on April 2, 2021

07:48 “Comic Sands; so called because the letters look like little droppings of sand, or something. And they are kind of aesthetic. We could consider this an improvement.”

ronsor · on April 2, 2021

Tom does this sort of stuff regardless of whether it's 4/1 or not.

jawns · on April 2, 2021

It reminds me a little bit of "Gödel, Escher, Bach" in terms of its playfulness.

matsemann · on April 2, 2021

The conference "FUN with algorithms" consists mostly if these kind of things. CS done on toy problems or other things making the papers a bit out of the ordinary (but still real science). Not presented in quite the same quirky way, though.

https://drops.dagstuhl.de/portals/lipics/index.php?semnr=160...

dhosek · on April 2, 2021

There's The Journal of Irreproducible Results. Not sure if it's still in publication though. Their website appears to be down/broken.

alisonkisk · on April 2, 2021

Thats one side of the schism with the more popular Annals of Improbable Research.

https://www.improbable.com/

dhosek · on April 2, 2021

Ah, that's the one that I really wanted. Thanks.

sn41 · on April 2, 2021

I wonder why we still have capital letters. I understand that Latin had more angular letters to inscribe on marble, compared to the more rounded lower case letters for day-to-day writing. Now that we have the technology to do rounded inscription, why hang on to this outmoded notation? The simplicity of the Latin alphabet is unnecessarily complicated by the presence of capital letters.

This is not as absurd as it sounds. After all, over the centuries, we have introduced punctuation to make it easier to read, unlike classical writing [1]. Why not carry through with remaining reforms?

[1] https://en.wikipedia.org/wiki/Scriptio_continua

seszett · on April 2, 2021

As far as I know (or as I was taught at school in France) it is Charlemagne who introduced bicameral usage in the latin script with more easily legible "lower case" (which were not actually "lower case" back then of course) letters for most of the script while keeping upper case letters at the beginning of sentences or important words also to improve legibility (serving as cues that help keeping track of where you are in the text). This was basically modern script, only the lower case fonts have evolved somewhat.

Capital letters in modern script are therefore not a remnant of technical limitations, the mix of lowercase/uppercase was actually voluntarily introduced for legibility once these technical limitations were gone.

thaumasiotes · on April 2, 2021

> I understand that Latin had more angular letters to inscribe on marble, compared to the more rounded lower case letters for day-to-day writing.

Funny thing to understand; this has no connection to reality. Letters for inscription are formal and there is no bias towards angularity. Look at this plaque from the first century: https://static.timesofisrael.com/www/uploads/2012/04/Roman-m...

You can clearly see the letters B, C, D, G, O, P, Q, R, and S graven in bronze with all the same curviness they'd have if they were instead carved in marble. [1] They're carved that way because that is what the letters look like; if they're difficult to carve, that just means the carver has to suck it up.

The letter we commonly render U is V in Latin. That is not due to the needs of the medium; that is what the letter looks like. In Latin, there is no letter U. V is not pointed because it was difficult to carve C, D, O, Q, P, R, and S. V is pointed for the same reason M is -- because that is the shape of a V.

[1] Here's an example that is carved in marble: http://codex99.com/typography/images/ancient/trajan_sm.jpg

Talanes · on April 2, 2021

I think you have it backwards, it was technology rendering the difference between the two obsolete that brought capitalization about. Latin just used one script or the other depending on use-case, it was when we had printing presses that we started to start codifying rules for how to mix the two scripts into one large character set.

simias · on April 2, 2021

I don't understand your argument, are you saying that removing capital letters would improve legibility? Note that as far as I know during the antiquity there was no semantic meaning associated with font case, you could use full caps in some settings and full lowercase in others. The image in your linked wikipedia article uses full uppercase for instance.

At any rate it's not at all obvious to me that removing case would improve things. I'm currently reading a book my Valter Hugo Mãe who's an author who (generally) only uses lowercase letters: https://svkt.org/~simias/up/20210402-153754_maquina.jpeg

I've also got a Latin grammar book that, for extra authenticity, only uses uppercase at the beginning of the paragraphs: https://svkt.org/~simias/up/20210402-154010_latin.jpeg

In both cases I often find myself missing the end of a sentence, I think mainly because ',' and '.' are easy to mix up but normally you expect '.' to be followed by an uppercase. Of course part of the issue might just be lack of familiarity.

I'm not arguing that capitals are vital and we couldn't read and write correctly without them, but as you point out we could say the same of spaces and general punctuation. I can read unaccented French just fine for example, but it does force me to slow down at times and to infer more from context.

butanywaythatsjustmyopinion

nullsense · on April 2, 2021

Need them for password strength

m463 · on April 2, 2021

Why not upper and lowercase numbers? Could emphasize cheap or expensive in a price. :)

TheCoreh · on April 2, 2021

Lowercase numbers actually exist:

https://en.wikipedia.org/wiki/Text_figures

hakfoo · on April 3, 2021

I always imagined a writing system that used uppercase/lowercase to be "before and after a decmial point" without an explicit decimal.

Like how some advertising material renders "$499.99" as 499 in a large size, followed by 99 in a smaller, often superscript style, and no discrete decimal.

OscarCunningham · on April 2, 2021

Could we use similar techniques to find the middlecase letters?

Balgair · on April 2, 2021

Well, if we can, are we then now in a situation where the 'step' of 'cases' is no longer an integer sequence? That is to say, that the cases are fractional? Like, if middlecase is 0.5, then can we apply the middlecase operation again and get 0.25/0.75?

As applying cases is then just 'addition', then you can likely get the 'multiplication' and 'divisions' of the cases as applied. 'Exponentials' are just around the corner too.

Is that is the situation, that we can now get transcendental cases. The 'pi' case, or the 'e' case, pick your favorite transcendental.

More interestingly, you can then pull out the imaginary case, using Euler. e^(i*pi) = -1. Maybe that the lower-er case is just the 'e' case to the power of the 'pi' case times the 'i' case. Whatever those operations may mean.

Of course, one you're at the imaginary cases, you might as well step up into derivatives and integrals, it's just curiosity after all. Then you'll be doing partial derivatives and then Lagrangians.

Eigencase-ing comes next, and then Maxwell's equations in case format. 'Del'ing your cases should be a real trip.

After only a bit of puzzling, you're doing quantum mechanics operations with your cases, because why not?

Case-ing operations have a fruitful future for any mathematician, it seems.

OscarCunningham · on April 3, 2021

Hmmm. I wonder if you can do backprop on exp(f) as efficiently as f? Then once you've learnt exp(f) you can just use exp(tf) to transform smoothly from one to the other.

BlueTemplar · on April 4, 2021

Imaginary is easy, you just rotate them !

koliber · on April 2, 2021

We finally have someone here asking the important questions.

CRConrad · on April 2, 2021

Too narrow a use case: Will only work on middle-endian processors.

nullsense · on April 2, 2021

That probably requires a middle-out approach.

hardmath123 · on April 2, 2021

See also: "Words that do Handstands" (2019), where similar tricks are used to hallucinate an ambigram font!

https://news.ycombinator.com/item?id=20820036

http://hardmath123.github.io/ambigrams.html

tomgp · on April 2, 2021

You see that bit at the start of the video where tom7 presses the capslock key, then types a letter, then presses the capslock key again? I worked with a designer (posessed of two hands and no visible impairments) who actually typed like that. Working alongside him was an excellent lesson in patience.

jrib · on April 2, 2021

Have a programmer friend who does the same. At first it really blew my mind and I couldn't tell if he was serious.

But he's a great coder and had loss of successful projects. It's just the way he learned and it doesn't seem limiting or annoying to him.

unfunco · on April 2, 2021

I did that up until about 5 years ago, and I've been a professional software engineer for 13 years or so, and programming longer than that. I had to remove the caps-lock key from my keyboard to force the behaviour out of my muscle memory.

Now though, I have the opposite bad habit, if I need to write a long string of uppercase letters, I don't turn on caps-lock, I instead keep my left little finger pressed on the shift key.

elyobo · on April 2, 2021

Is that a bad habit? I remapped my capslock key to escape years ago and never looked back.

jonnytran · on April 2, 2021

Yes. It contributes to RSI. It's best to use thumbs for frequently used key modifiers.

jack_pp · on April 2, 2021

I chuckled too but then I thought about it and.. it might be slightly slower but will probably be better for your hands long term since hitting shift you have to twist your hand awkwardly

caslon · on April 2, 2021

...what? I just checked on three different keyboards to make sure, but...no. Shift overlaps with A for a reason; you shouldn't be twisting your hand if you're using proper typing technique.

jack_pp · on April 2, 2021

on my laptop the shift is smaller than the caps-lock, about 2/3 the size of the caps-lock.

BlueTemplar · on April 4, 2021

We're talking about 19th century typewriters here...

MPSimmons · on April 2, 2021

From the paper (http://tom7.org/lowercase/lowercase.pdf):

... fonts where everything is fine except that just the lowercase r has a width of MAX INT...

Okay, I cannot imagine what the use of that font would be, but I really want to know.

bombcar · on April 2, 2021

Many fonts are designed for a single logo use - imagine a logo that is lowercase r with the rest of the name under the r.

meken · on April 2, 2021

Amazing video.

My only suggestion for improvement for the modeling is in the part where you tried to find the "ideal" letters for the model.

Instead of sampling generated random bitmaps and ranking those, you could instead initialize the inputs to random continuous values, then optimize the output score for the desired letter with respect to the random input (with your fixed model).

Indeed, this is how people get those trippy dog images you've probably seen.

Anyway, this video was just so good. I'm amazed.

at_a_remove · on April 2, 2021

I was wondering why not model letters as an additive series of strokes, just as we write with a pen, then evaluate the results.

ygaitonde · on April 2, 2021

I really can't think of a better technical YouTuber than tom7

mohn · on April 2, 2021

I would like to recommend Ben Krasnow of Applied Science[0]. His videos are more physics/chemistry/electronics, but they're both great YouTubers.

[0] https://www.youtube.com/user/bkraz333

lifthrasiir · on April 2, 2021

Maybe try Ben Eater [1] out?

[1] https://www.youtube.com/channel/UCS0N5baNlQWJCUrhCEo8WlA

rrmm · on April 2, 2021

This is pretty much my exact experience in doing AI/ML research.

graderjs · on April 2, 2021

The questions no one one needed to be answered... This guy answers them so ... completely

tempodox · on April 2, 2021

Out-of-fashion warning: tom7 uses Subversion, not git. How subversive. He really must belong to the institute for computational heresy. I press shift to reduce conflict all the time and I still feel conflicted.

diplodocusaur · on April 2, 2021

This is a really fun project, no doubt, but I keep seeing stuff about fonts on HN. I was wondering why it is that HN likes fonts? What is it about them that makes them HN-interesting?

mixedmath · on April 2, 2021

> It does genuinely matter that a designer should take trouble and take delight in his choice of typefaces. The trouble and delight are taken not merely "for art's sake" but for the sake of something so subtly and intimitely connected with all that is human that it can be described by no other phrase than "the humanities". If "the tone of voice" of a typeface does not count, then nothing counts that distinguishes man from the other animals. The twinkle that softens a rebuke; the martyr's super-logic and the child's intuition; the fact that a fragment of moss can pull back into the memory a whole forest --- these are proofs that there really is reality in the imponderable, and that not only notation but connotation is part of the proper study of mankind.

- William Zinsser

Zinsser was actually talking about writers here. And it might be a bit hyperbolic, sure. But I think the fact is that people who love to program spend a lot of time staring at words, and given a chance they'll take interest in the clothes that words wear.

diplodocusaur · on April 2, 2021

Huh. That makes sense from a frequency of exposure sense.

It's just that I'd be interested in the main fonts that make practical sense for projects and not so much on the quest for the holy grail of fonts that makes all men fall to their knees in awe at its aesthetic perfection. Doesn't matter how pretty you write bad content, anyways.

Pxtl · on April 2, 2021

Oh gosh I just watched the YouTube video and laughed until I hurt.

rmetzler · on April 2, 2021

I laughed when it came to the part where he 3d-printed the uppestcase letter and described as "not bad, maybe a 2.5 on the Bristol stool scale".

Smithalicious · on April 2, 2021

I read the title as "upper caste" and "lower caste" and imagined what a caste system of letters would look like

moralestapia · on April 2, 2021

What a great idea and a fun read! For an April fool's gag this is a truly remarkable work, kudos to the author!

rendall · on April 2, 2021

I watched that whole video and now I'm questioning the life choices that led me here

ant6n · on April 2, 2021

Would it help to generate a bunch of (small) rotations, translations and scalings for every input/output character pair and train with that?

kebman · on April 2, 2021

When the lockdown was extended by another year... xD

etaioinshrdlu · on April 2, 2021

A different type of model like pix2pix, or anything raster based, not point-based might produce better looking fonts.

isaacimagine · on April 2, 2021

This is incredible! I wonder what would happen if we threw more GPUs at it.

qwertox · on April 2, 2021

10 out of 10 for style and execution.

high_byte · on April 2, 2021

can't believe I watched the whole video

numToStr · on April 2, 2021

That was trippy.

ohuf · on April 2, 2021

Take my uppestvote and Happy Easter for a while, Sir!

Causality1 · on April 2, 2021

This was very obviously inspired by Randall Munroe's "capital numbers" comic and I find the lack of acknowledgement disappointing.

https://www.xkcd.com/2206/

alanbernstein · on April 2, 2021

Well there is a sort of catch-all acknowledgement at least:

> Probably someone already had this idea and did it before I was even born

c3534l · on April 2, 2021

I had this same idea when I was 7. Its just not a unique thought. However, it is interesting that the actual glyphs used are somewhat similar.

dhosek · on April 2, 2021

Arguably, lining figures are uppercase numbers and old-style figures are lowercase numbers. Certainly, the former work better in an all-caps setting than the latter, and the latter in mixed-case typesetting than the former. I used to set Canadian and British postcodes, which are mixes of letters and numbers using small caps and old-style figures when I printed mailing labels for the magazine I published in the 90s.

Balgair · on April 2, 2021

A real shame that he didn't put in the numbers to see what their capitals may have been