.. I take it few people find morse code puns funny anymore.
Seriously, what's the point of this pedantry. What does having 3 basically identical characters add to the language other than a pointless rules for insufferable pedants to power trip over. We've all been using - just fine. On what basis does the person writing this article believe these rules matter, are important, disambiguate language?
Call me a hopeless philistine, but I say down with the dash. One symbol is fine for word-compounding, numerical ranges, subtraction, mid word line breaks. No one needs an em dash to tell them pages 3-8 is not a compound word.
By the same logic you might as well say:
"why are we even kerning fonts, who cares if there's a few gaps when i write »irl«."
The fact that using different dashes does encode meaning in a subtle sense does have relevance for semantics -- but that's, imho, almost secondary to this argument, as it's not as grammatically relevant as commas and. periods, for example.
The primary importance of using the correct dashes is that it preserves a good flow for reading and is paramount to micro-typographic balance:
- A longer dash to link words that belong together is visually perceived as an interruption and doesn't feel like those two words are one
- In reverse, a shorter dash when switching context -- or interjecting another idea within a sentence -- doesn't slow the pace of the text flow enough, and your brain will read/intonate it the same way as when linking words.
- And at last, either of them won't preserve optical balance when displaying a numerical range, as numbers are wider than a hyphen, but narrower than an em space, which would result in either insufficient visual separation compared to spaces following said numbers, or too much of an optical gap within an entity that belongs together.
That's the barebones set of dashes that are relevant for a balanced typographical appearance, not made up pedantic complexity to annoy people. Otherwise we'd be taking about half and quarter em dashes and the likes.
> you might as well say: why are we even kerning fonts [...] is paramount to micro-typographic balance [...] is visually perceived as an interruption [...] won't preserve optical balance
These are typesetters concerns, not writers concerns. They are all context sensitive tweaks to what amounts to the same glyph.
If the rules for each have as well defined contexts as the article suggests, then it sounds like something more suited to ligatures and kerning.
Full glyph replacement ligatures were not something initially supported by all font formats, so perhaps the fact that they continuing to exist as separate characters is more of a historical detail. It's something that could easily be added with new fonts though.
Ligatures typically change the appearance of a character, they do not change the meaning. Merging the hyphen en the n-dash into the same character and then derive the correct one from the context (spaces around it) would be a whole new use of "ligatures".
From a software "separating of concerns" viewpoint it feels wrong to me to have your font renderer infer meaning. A pre-processor that replaces hyphens with the correct dash – like Word does – feels more sane to me.
> Ligatures typically change the appearance of a character, they do not change the meaning
Anyone can use a hyphen for all three purposes right now and people would understand the meaning, because the meaning is primarily derived from the context of surrounding glyphs. Only typesetters would complain that a subtly more appropriate glyph should be used for the purposes of refined optics and geometry etc.
Therefore an endash and emdash ligature could only change the meaning IF the context of each use case overlap. i.e if there is a valid glyph based context in which endash and emdash are both valid... which I don't think there is because that would be far too subtle.
Some fonts—like Inter—do this, but I see people complain that the font isn’t rendering exactly what they typed.
My favorite is that it will render 1920x1080, for example, as 1920×1080. I think the former looks terrible and unprofessional, especially when I see it in actual products rather than prose. So I really hope this catches on.
I’ve gone my entire life without knowing the difference and survived just fine.
It may not be entirely irrelevant, but it’s very close to it. A bit like saying your tie has to be knotted a specific way to look respectable. Very fun for the in-group, but completely incomprehensible to those outside.
Like, I’m not opposed to having a few silly things to learn just to separate those that can be bothered to pay attention from those that do not, but I’d be hard pressed to say it’s actually relevant outside of that.
Text doesn't become unreadable when the dashes are used incorrectly, but (for me) when they're used correctly, they do make the text easier to read and digest.
Thank you for the post. I still don't want to learn & spend mental energy on which of 3 different dashes to use, but now I do see why people would want to (and I think the reasoning is solid, even if I don't personally want to bother with it :) ).
You started by talking about kerning fonts, which is a great analogy.
Building on that - kerning is awesome because stuff looks better and I don't need to do anything for it to happen. Would it work to have my display system figure out which type of dash to use automatically?
Like, a dash inside a word should be short (under the assumption that you're linking the words together) and dashes with whitespace around it should be longer (under the assumption that you're switching context/injecting an idea into a sentence).
No it isn't--double-hypens are a great alternative to an em dash and are interpreted as such by many people and some software. GP's argument is for the grammatical functionality of differentiating dashes, not the specific symbols used.
That said, I don't use en dashes, if I want my numbers to line up I use a fixed-width font.
> No it isn't; double-hypens are a great alternative to an em dash and are interpreted as such by many people and some software. GP's argument is for the grammatical functionality of differentiating dashes, not the specific symbols used.
Semicolons are often better just replaced by periods. I sometimes use them but I've had at least one editor who refused to use them for news-oriented copy.
I find it unlikely that the comment was typed on a typewriter, or sent over a teletype. Computers and phones make it easy to type em-dashes if you want to do that. No sub-par alternatives are needed.
Did you know double dash is treated as one single longer than normal dash by default in iOS?
This character ‘—‘ looks like one long dash to me, even though I typed the dash button twice. What’s even crazier is if I type four dashes ‘——‘ it still looks like one even longer dash; even six ‘———‘ is a solid line, and I can delete it by pressing the backspace button once
I have no idea how to get my phone to display two short dashes side by side: ‘--‘ maybe I can fake it by puttin an emoji in between, knowing that hackernews will filter it out. Let’s see what happens.
Edit - ooh that totally works. I’d never really paid attention to how this feature worked before.
I did, hence my other comment in this thread — proper dashes are easy.
“--” can also be input on iOS by entering “- -” and then deleting the space. Or you can disable the “Smart punctuation” setting and type what you actually mean.
Why do we stop with hyphen, n and m dash? There are at least 30 different use cases, we should not reuse only 3 versions of some short line. Let's make 30 versions, one for each meaning. (cynicism)
Then don't use them? As a reader, I certainly appreciate when people do. When writing documents or HTML I use them because it adds clarity. When typing on a web form, I'll usually use "--" because it's visually similar and much easier to type on a US keyboard. No one, pedants included, have ever tried to correct me on it.
I also use capitalization and punctuation when I type while many people do not. It'd be great if they did, since it makes reading easier and takes almost no additional effort, but I'm not going to let it ruin my day. The parent comment is about why the distinction in dashes matters and has virtually nothing to do with typography enthusiasm, but rather reader comprehension. If you don't want to integrate that information into your life, great, but that's not really a refutation. For my part, I found it interesting. Even though I use em-dashes I learned more about how they're helpful. If you don't want to use them, I'm almost positive no one is ever going to correct you.
I don't and; but dumb tools replace -- with em-dash which breaks shit.
> If you don't want to use them, I'm almost positive no one is ever going to correct you.
Still have to look at it and suffer consequences of dumb editors replacing -- with em-dashes when someone innocently tries to just say commandline parameter
It also looks like you’re drawing attention to something — the use of the double dashes — in making a deliberate choice to break from the norm. Whereas if you just follow the way most people use dashes - single dashes, not double - then it doesn’t really stand out, it just looks ‘normal.’ You’re used to seeing it styled that way. It feels different.
> Personally, I think two hyphens also looks better than just one
It's context-dependent. (Aside: you wouldn't write "context--dependent", which is the use case of the hyphen.)
Ostensibly the en dash is primarily used for ranges, although that's a case where I'm inconsistent. I won't typically write "A - Z" or the technically correct "A–Z", as I think in that case I tend to write "A-Z", using a simple hyphen. I certainly won't write "A -- Z".
The em dash is even wider—it's not typically mistaken for a hyphen.
Em-dashes add a bit of a pause. And having them longer and taking a bit more of horizontal space makes it more intuitive. They also break a sentence into parts. Having them easily distinguishable helps navigate text and reduces overhead. Just like periods or paragraph breaks help you see parts of a text, or syntax highlighting helps you see lexemes in a program.
Using just one dash for everything will be readable in a text message or comment. But not in a (complicated) book, because there the benefit of these small things gets multiplied by the scale of the book.
IMHO, this is the main determination on when I decide to use em-dashes: is the text between them an aside of some kind? An alternatives would be to use parentheses.
Personally I do not find that " - " as the GP suggests enough of a visual cue as "—". And on macOS using different dashes is fairly straight-forward:
* hyphen: the key next to zero, "-"
* en-dash: alt/option-"-": –
* em-dash: shift-alt/option-"-": —
Some apps (e.g. Mail) auto-convert double-"-" into an em-dash as well.
The way I was taught, you use the comma for a brief aside--em dashes are used for a larger diversion (and parenthesis are for the most tenuous connections.)
In other words a reader should be able to skip reading the contents of parenthesis with negligible impact on the context or meaning of the sentence. They should be able to skip reading the contents of em-dash-seperated text without changing the meaning of the sentence. And text between commas should be considered integral to the sentence, while secondary to the primary gist.
What you reference is that commas are used to set off non-restrictive clauses, where the meaning of the sentence is clear without the additional clause. Though, the non-restrictive clause provides additional description of a word in the main sentence.
Such as:
Sometimes writing for money, rather than for art or pleasure, is really quite enjoyable.
Other than that, many people have come up with many writing styles. We mostly seem to be able to understand each other, so we are "all good".
I have written and read text for decades without knowing the difference between those, so whatever space one gets when pressing the spacebar seems to do the job just fine. And if in doubt LaTeX etc will handle the rest well enough if I care about sub-pixel precision of some margins.
Spaces can cause word wrap that can leave a dash at the end or beginning of a line, which is not beautiful. A spaceless em dash doesn't have the wrapping issues while retaining legibility. You could argue that that's a problem with word wrap algorithms, not punctuation, but that situation is not going to change any time soon.
Yea em dash with spaces looks better to me too, I find that it’s harder to read if the em dash is there without surrounding spaces. Looks too cramped, not separated enough.
I have never understood the classical rule of no spaces around em-dashes. If you’re going to use fancy dashes at all, an em-dash represents a clear pause, a break in thought — something more robust than a mere comma. Typesetting an em-dash sometimes literally touching the words on either side has the opposite effect, visually connecting those words rather than separating them, and unlike a lot of the typographical snobbery we sometimes engage in, that one is a well-known (at least to designers) effect of proximity. Personally I prefer a thin space rather than a full one in media where it’s possible, purely for cosmetic reasons, but I’d rather have a normal space than none.
I think that is not really true? There is the "Gedankenstrich" and one can see it in texts. Or do you mean, that it is so rare, that German language almost does not use it? I think that depends on the writer.
Hum, a hyphen is still an entity of its own (it may be even a short, slanted dash in some fonts), then there's the en-dash for association (e.g. "ZDF – Zweites Deutsches Fernsehen"), and there's the "Gedankenstrich", which performs more like a separator. Three typographical entities to express three different concepts. (But there's a tendency of mixing the en-dash with spaces and the "Gedankenstrich", as the latter also comes with surrounding spaces, which may appear overly exaggerated in some fonts.)
However, it is the en-dash, properly, rather than the hyphen. I quite like that punctuation.
Now, anyone typing random texts to a friend or a few need not care, but I think people that write in a professional capacity to more than a few people should know and care.
> In French the em-dash is almost inexistant; we use parenthesis instead usually.
The only French-speaking place I've seen em-dashes used in daily life was Québec. For some (good) reason, it seems administration took a lot of care in using correct typography. My voting district for example was Mercier–Hochelaga-Maisonneuve (the first dash being an en-dash, and the second one a hyphen) and I was always amazed at how all communication actually used these two different dashes.
I can't imagine this level of care in French or Belgian official communication.
Others have mentioned using spaces with an en-dash or hyphen instead of an em-dash. Having used a typewriter -back in the day- I learned to produce text like this.
How I learned the Unreadable: “Sometimes writing for money -rather than for art or pleasure- is really quite enjoyable.”
To the teacher I learned from this was a standard way of punctuating on a typewriter.
Not for me. It's readable, but my brain has to do more work. When I get to "money-rather" my brain trips up slightly, and then I'm confused until the next dash, then I go back and figure it out.
All possible and dealt with in under a second, but in the first example with the longer dash my brain recognises a parenthesis and I take a little "breath pause" before carrying on.
It‘s not unreadable, just a tad more difficult. And as others have pointed out, there are other ways of making it easier again than using a specific character.
But the real point is: The information transported in both examples did not change its meaning and will be understood by the reader / receiver in both cases. If it‘s not, it matters. As long as it is, it‘s pedantic.
Personally, I think this sentence would benefit from a comma before the ‘or’. And in that case we could probably benefit from a clearer way of setting aside the parenthetical.
“Sometimes writing for money, rather than for art, or pleasure, is really quite enjoyable.”
– this seems awkward to me. This version, though:
“Sometimes writing for money—rather than for art, or pleasure—is really quite enjoyable.”
> “Sometimes writing for money, rather than for art, or pleasure, is really quite enjoyable.”
I think that changes the meaning, since it’s now a list of 3 items with an Oxford comma, rather than two lists, with the first list having 1 item, and the second list having 2 items. And I’m having a rough time even making sense of such revised meaning.
Expressed as pseudo-code, I read the original intent of that sentence as:
“money and not(art or pleasure) == enjoyable”
and that can be broken into
“((money and not art) or (money and not pleasure)) == enjoyable
Having multiple options for how to offset parenthetical asides, far from being redundant (or even confusing), offers us—as writers and readers—more opportunities to express the tonal variations (or nuances) that we would – in spoken language – communicate through our voice and body language; moreover it lets us vary the visual, aesthetic quality of our prose – which is as much a part of the experience of reading as comprehension is.
A very common example is in threads for machined screw threads, e.g., 1/4-20. This is not a range of numbers spanning from 0.25 to 20.0, but rather a pair of numbers that define two metrics of a single thing, which combine to uniquely identify the thread.
Perhaps context is sufficient, but adding this to your examples gives us at least three scenarios where the single symbol would mean very different things with pairs of numbers: compounding, subtraction, and numerical ranges. If we add on the clause separation duties of the dashes mentioned in the article, we have four uses where a single symbol sits between two numbers and means entirely different things.
There's no shortage of mathematical notation and delimiting characters. Eg you could write your machine screws as .25+20i. Obviously you raise e to the power of your screw and you get a rotation rate in the complex plane, and a width of screw in the complex plane as well.
Compounding and numerical ops are basically never confused. Machine screw is the only one of these where its even plausible. Not that subtraction and range are ever ambiguous, but if they were just use "#1 - #n" to denote "the numbers 1/n being used as labels for some range of options, not as a numerical values".
All in all, we have plenty of characters. A minimal set of rules, minimal set of characters, rich in predictable patterns, is what makes for a good language. The existence of a whole slew of specialized characters, all basically indistinguishable and frankly unheard of to most, has to work hard to justify itself right to live on my keyboard. We have parenthesis, commas, colons both full and partial, brackets square and curvy, braces, slashes forward and back...More than enough permutations and code space for anyone's expressive needs. Why anyone would opt for more byzantine characters with more rules on top is beyond my imagination.
Then certainly we should remove those superfluous brackets. Commas suffice for parenthetical asides. Sentences already imply grouping. I am a bit upset at your use of double quotes above. After all, we have the single quote, which consumes half as many valuable pixels and does just as good a job of indicating quotation. Colons of any level of completion merely separate clauses, a task more than thoroughly covered by commas and periods. Context is, of course, a great disambiguator, so I see no reason to use any statement terminator besides a period. What possible confusion could arise.
While we are at it, we have so many words. Perhaps we should simplify to one of the several published standards of simplified English. After all, the number of combinations of a thousand words in sentences of arbitrary length is enormous. Why anyone would opt for more byzantine words with more nuanced definitions and rich history of usage, tradition, and cultural value is beyond me.
We could go on with grammar (I mean really, what the hell is pluperfect), spelling ('c', for example is useless on its own, its uses being filled alternately by k or s), fonts (wtf is a serif), capital and lowercase letters, and I am sure many other topics.
Why do we keep more words, punctuation, and other linguistic and typographical devices around than we need? A mix of inertia and legitimate uses and perceived value. It seems to me that many people seem to draw a line between what is acceptable and what is not based on whatever they are comfortable and familiar with by the time they reach the end of their schooling.
I know your examples are intentionally extreme to prove a point, I'm biting anyway.
Parenthetical type grammar with an explicit start character and end character is pivotal for encoding information unambiguously. You can't replicate that with any system that uses the same characters for the start and end, because it would be ambiguous as to if you are starting a nested context or ending the present one. Double, single, and even the rare triple quote allow for nested quotation. In principle a clean open and close quotation mark would also solve this (no subtle pixel hunting). You're right that we don't truly need four redundant variations on bracketing, but reducing it to just one is probably too few as it would be representing too many possible things at once. How about one pair for a narrative context (aka a quote), one pair for linguistic recursion (like I'm doing right now), one pair for collections of objects such as a list or a set. Colons probably could be skipped, everything beyond that is strawmanning me. A certain small number of delimiters / particles / whatever are needed to have expressive completeness. You need to be able to build sequential lists, unordered lists, one of several possibility sets, and / or / not type relations. In other words, a natural language at the very least needs some sort of regex subsystem, but it need not be much more more sophisticated than regex. I'm not a grammar denialist in fact quite the opposite. I want the information coded in simple grammar rules, not ad hoc arbitrary tables continually expanding.
I say this as someone who had a 12th grade vocabulary in 5th grade and its only gone up since, vocabulary is a waste of time.
Actually, I'm almost with you on 'c', but I'd rather throw out 'k' because its one of the few that don't fit on a 7 segment display. Capital letters also don't add much information. Yes actually, I'm fine with all of those going away. I couldn't tell you why the people who design way finding signage avoid serifs like a pox, yet other design fields refuse to read without them. With or without seems to read just fine. I really don't care too much either way. Letters would be better if they all worked more like EFHLT. Right now, too many clashing elements. Some are boxy, some are round, some have sharp diagonals. I'm not saying it has to be a 7 segment design, but it would certainly be pleasing if learning the alphabet, its ordering, how to write it, could all happen much faster by just noticing a few easy repeating patterns. Yes actually, lets do language reform.
>It seems to me that many people seem to draw a line between what is acceptable and what is not based on whatever they are comfortable and familiar with by the time they reach the end of their schooling.
Well I'll agree with you there. All to often pointless pedantry comes down to "my school must be right otherwise I am wrong". Love or hate my reasoning, at least you can't accuse me of doing that.
> Parenthetical type grammar with an explicit start character and end character is pivotal for encoding information unambiguously.
You argue against multiple types of dashes because context is sufficient, despite there being typographical ambiguity. But you insist that we must have typographically unambiguous bracket characters. I must admit that I am struggling in this conversation to determine when we can depend on context and when we need unambiguous markers. Perhaps I am just incapable of picking up on the subtle context that backs up this position of yours. (:
> everything beyond that is strawmanning me
In fact, you will find examples of real human languages that exhibit more extreme versions of the things I have suggested.
There are languages with simpler tense systems than what English has. Slavic languages, for example, tend not to have a pluperfect. So, the example of removing tenses is based in reality.
Hawaiian has an alphabet of just 13 letters. So, removing letters from the 26 in the English alphabet is based in reality.
The Dictionnaire de l'Académie française is being updated to its 9th edition and is expected to have ~60K words[0], whereas English dictionaries report an order of magnitude more[1] (even with the issues in the linked source, this is a large gap). Basic English[2] has a vocabulary of less than 1,000 words (if you desire a vast overhaul of the existing norms of typography, I hope that you are at least willing to entertain prior art in the area of overhauling the use of natural language as a valid example, even if you disagree with the intention or outcome). If you wanted me to go to extremes (which again, I did not in the post you replied to), I could have just suggested we use Toki Pona. Of course, if I did suggest such a conlang, you may have been correct that I was strawmanning you and going to extremes just for a point. Nevertheless, we can definitely conclude that there are, in fact, natural human languages with substantially fewer words than modern English, and there are definitely constructed and artificially restricted natural languages with enormously smaller vocabularies.
You need not agree that these examples constitute best practice, or that they represent desirable goals in the continued evolution of language and written communication. I hope, though, that you can recognize that none of these are strawmen, but based in reality, many in natural languages, and some in artificially constrained natural languages for specific purposes. If anything, I presented examples that do not represent the extremes of any position (I could easily have brought up languages with no written representation, for example). I merely selected additional examples that conform to a broad categorization of removing stuff from modern English.
I welcome further discussion on the topic, but I worry you might dismiss things I say you disagree with, as you have done once above by ascribing an intention of strawmanning you, and as you seem wont to do with typographical conventions you dislike. And if you want to eliminate the punctuation you dislike, what might you do to a person whose arguments you dismiss? (;
It seems though, that you just don’t like the various dashes, which is totally fine. Many other people and I find value in them. Still more probably just go along because, as I said, a big part of language norms comes from inertia. The point of language (other than perhaps some, but not all, artistic expression) is communication. Why abandon the norms that facilitate this communication? Is it better to stand on preference (or perhaps principle) and harm your attempts at communication or to yield to norms and be better understood (though perhaps annoyed)? I do not know that there is a correct answer to this question.
I do hope, though, that I have disabused you of the fanciful notions that I was cherry-picking ideas that are extreme just to prove a point and that I was strawmanning your argument. I have shown above numerous examples that back up each of my suggestions, grounded in the reality of natural human languages. Further, I have shown several examples that are truly extreme to show that my original suggestions were not “intentionally extreme to prove a point.”
I don't care about multiple types of parenthesis per se, I do care about there being a spanning set of grammatical constructs. I don't think period and comma alone would be enough. You need to have constructs for compressing and abstracting. "John/Paul/Ringo/George were in the Beatles." Notice how I just made 4 sentences for the price of one. I could have written "John was in the beatles", "Paul was in the Beatles" ... all four statements fully unrolled. You need constructs which let you FOIL sentence structure just like in math class, presenting (option A, B and C) to (you, and everyone else). You also need a handful of "client server type" interaction structures. Header information. A thing to indicate if the content is a question, request, demand, greeting etc. Grammar is not about encoding literal speech pausing, its about encoding how to deserialize the linear sequence of words.
In theory you could just make "(" and ")" the universal sub-context denoting symbol. You would just need a different extra symbol to clarify between what a parenthesis means. The three systems makes sense. One for data agnostic compression like a JSON object / foiling a math expression, one for relaying text itself as an object in the domain of discussion rather than as the thing being said (aka a "quotation"), and one for scopes that are part of the discussion per se (not quotation).
Context suffices when the parts of speech have no chance of being in the same slot. Compound words and numbers.. your machine screw example was pretty rare. I think the dashes are too specialized in meaning and too hard to tell apart to justify code points in the docs and buttons on my keyboard. If need be, distinguish the various flavors of hyphen with some rule about touching the letter or having two in a row. Our symbol set is reasonable. Not as succinct as Hawaiian, not so bloated as Chinese. 13 chars fits in 4 bits. 26 chars fits in 5. With great strain you can maybe find a workable set of grammatical symbols without blowing past 32 chars, but will probably end up using a 6th. I'm against bloating the raw number of symbols and rules everyone has to rote learn, not dashes in particular. If its already in frequent use like all the paren styles then fine, but lets not make anything worse than it has to be.
> Grammar is not about encoding literal speech pausing,
This is absolutely correct.
> its about encoding how to deserialize the linear sequence of words
This is absolutely incorrect. Grammar is the collection of rules that prescribes the combination of words to make valid collections of the same in a language. Specifically, grammar is distinct from semantics, which is concerned with meaning. A nonsense statement may be grammatically correct.
Punctuation is the collection of non-character glyphs that are used to capture the nuances of spoken language into a written form.
Punctuation is orthogonal to grammar.
Put more briefly: spoken language has grammar and no punctuation; written language has the same grammar as the same spoken language and also punctuation.
Parenthetical asides are represented in spoken language with some combination of marker words, pauses, tone of voice, word choice, and perhaps other indicators I may have forgotten. The purpose of punctuation is to lend some of the nuance of spoken communication to the otherwise sparse written word.
The argument of the number of bits to encode glyphs is also orthogonal to the purpose or usefulness of language, writing, and communication. Computers are tools. A keyboard should justify the paucity of its glyphs, rather than the other way around. Once we get here, we are in the realm of pure opinion and preference, which I don't have much interest in pursuing.
> A nonsense statement may be grammatically correct.
I'm sure then you already know, colorless green ideas sleep furiously.
I see no contradiction in what you are calling incorrect. At some point whatever representation our brain uses for concepts and thoughts, to share that object requires us to pack into a linear sequence of words which can then be reliably unpacked by on the other side. The very nature of verbal communication forces the existence of serialization/deserialization rules. Those rules are what we call grammar. Grammar may be somewhat orthogonal to semantics, as you observe it is possible to encode valid nonsense, but the grammar exists to encode semantics and is thus to some degree tied to it. The grammar rule of "subject verb object" doesn't only tell you how to check the validity of "colorless dreams sleep furiously", it tells you how to deserialize that sentence back into a hierarchy tree of constituents and their relations. It just so happens to unpack as an object of useless constituents and impossible relations.
Punctuation maybe orthogonal to grammar in the general case, but in this particular language they are highly coincident. Virtually all punctuation marks are grammatical particles. It doesn't have to be like this. Some languages have "audible parenthesis" words. Others have words for marking the end of a sentence as a question. Calling punctuation marks a non character seems a bit artificial. Let's just call them the non-audible characters, in analogy with non-printable characters.
The argument about bits was apparently lost in transmission. I assure you this isn't a preference and opinion thing. Information theory applies just as well to natural language encodings as it does to computer protocols. The basic principles of information entropy and optimal transmission encoding shows up in every language: the least frequently used words are the longest. In an analysis of conversations across languages, researchers found the bit rate to be constant. Some spoken languages are seemingly very fast, but that's because the information density per word is lower. The brains bit rate is a constant. Irrespective of if we are using a computer or not, the size of an alphabet is measured in bits. The bits in the alphabet determine how much you can possibly say per character. On an extreme end, Chinese has over 5000 characters. That's around 13 bits of information per character, at the low low cost of memorizing all of them. For comparison, ignoring capitalization and punctuation, English is a 5 bit alphabet meaning the same amount of information fits into 3 letter words. The Hawaiian alphabet can cover 80% of those possibilities with just 3 letters, and the remainder with a 4th. Think about how powerful that is. Is memorizing 5000 arbitrary squiggles worth it to compress the width of words down by ~3 chars?
The number of bits that are in an alphabet also determines the minimum number of unique design elements needed to construct letters for it. 7 segment displays are a great example. As I said, our characters fit on 5 bits. That's the minimum. Now when our letters came about, they didn't know about bits and they certainly weren't doing this on purpose, but almost every letter can be expressed on a 7 segment display. In other words, writing a letter only wastes two bits per character relative to saying it.
When you learn a new ligature in an Arabic script, you've doubled the number of letters you know. When you learn a new Chinese character, you've learned a new Chinese character. Language is a transmission medium. Its a tool. My takes here are no more preference and opinion than the allocation of the radio spectrum. There's an optimization tradeoff to be had between the limited character choices of Hawaiian and the extreme rote memorization of Chinese. Going from 13 to 26 characters does double the learning time, but the learning time at that stage was short anyway. Going from what we currently have to perhaps 60ish characters (6 bits) doubles it again. Maybe that's tolerable. The next step up is a ~128 characters. There may be things you can do quicker with a large set of symbols, but the ROI for learning all those symbols doesn't pay off. Around 5 to 6 bits is where most writing systems settle.
And that's why bloating the raw glyph table with letters and marks is the wrong solution.
But apparently only insufferable pedants care about clarity. That's why we should stop using those pointless number glyphs too and just write them out in unary using hyphens. -/------------------------- is just fine.
Reminds me of this guy I met at a CTF. He decided that punctuation generally is unnecessary. What's the use of having so many different symbols if the only thing they denote is pauses between words.
so when he wrote something . he used only periods to denote pauses . no other punctuation symbols . no capital letters . some people were thinking that his periods stand for perl concatenation operators . i dont know if he is still doing this . i hope he stopped
actually i kinda love that . punctuation is semi arbitrary anyway . and this is actually much easier to read than the usual literary english full of semicolons and dashes . mimics speech much better too .
Some people do talk like that . All complete thoughts . Sequential.
Other people—and I very much count myself among them—have a less linear, more tree-like mode of expression; where the ideas, instead of building on what came before, are being laid out out of order – the ideas aren’t completed – and more complex punctuation is needed to establish the relationships between those thoughts.
It sounds like I’m saying the former is less sophisticated than the latter. I don’t think that’s true.
I think we should probably try to express our ideas in a way that doesn’t require out-of-sequence reasoning. Short, simple sentences. With clear meanings. Building on one another. Much easier to follow.
The tree-like mode of endless nested parentheticals and asides is just a rendering of an incomplete thought process.
Not better or more sophisticated. Just still in progress.
The article is not actually very pedantic - at one point, the author encourages us to break the rules - and I feel it has been offered in the sense of "printers have developed these variations on the basic dash, and if you choose to use them, it is probably best to use them in the same sense as printers themselves do."
In several significant computer typography systems, the notation for an en dash is a doubled hyphen (--), and for an em dash a tripled one (---). Notably LaTeX and Markdown (Pandoc flavoured: <https://pandoc.org/MANUAL.html>).
In LaTeX I’ve been using \textemdash instead. I don’t actually know why, just, usually these sort of longer names tend to have some niche edge case they handle better.
em-dashes and parenthetical should be used sparingly so it isn’t too annoying to do all the extra typing.
My preference is for spare markup where at all possible. Less typing, less mental overhead, clearer source text.
If it's necessary to be explicit for clarity and proper rendering, then sure. But otherwise, the less friction the better.
After years of procrastinating in learning LaTeX (the Lion Book turned out to be a clear, delightful, and highly useful reference), one of the pleasant surprises was that paragraphs are simply denoted by two carriage returns. After years of hand-coding HTML where matching <p> and </p> tags (among many others) was a constant occupational hazard, this was just ... pleasing.
Markdown has a similar philosophy, if a far more restricted set of capabilities. That set is however sufficient for a tremendous number documents, and if it's ultimately insufficient still remains a useful way to get started with writing.
I've been through enough different HTML variants whilst not having to adhere strictly to standards that I'm moderately fuzzy what the current state of the standard is. I hear the current standard is also ... large.[1]
But even if it's not strictly necessary to balance <p> tags ... it is necessary to do so with many other HTML elements, and missing or mis-typed tags can utterly bork a page, particularly if there's any complexity to it.
(Hand-crafting tends to minimise that complexity, but it's still possible to get reasonably twisted.)
That said, checking one of my favourite HTML5 references, whose page source itself is a beautiful example of clean HTML ... I see that Mark Pilgrim in fact omits the close tags on his paragraphs:
It reminds me of the strong feelings about Comic Sans.
The guy who created it said something like, “If you love Comic Sans you don’t know much about typography and should probably get a new hobby. And if you hate Comic Sans you don’t know much about typography and should probably get a new hobby.”
I feel the same about this. The average person has about a billion things to improve in their writing before the “correct” use of different dashes should become something they think about.
Depending on the audience, I think the article is justified and gives a good overview. Just thinking of scientific papers, where sometimes you spend a full year carefully laying out the words. Being concise here helps improve legibility and is definitly worth the effort.
... with all due respect to folks who choose the hard and extremely frustrating academic career path, the inefficiency is so absurd that it truly only can exist in these gigantic institution-sized machines. (And in similar sized corporate money-makers.)
Most papers are fundamentally flawed, unfortunately, due to lacking sufficient information and data for replication, being underpowered (and not controlling for many factors).
It took decades to get to some minimally sensible standards (preregistration, conflict of interest declarations, awareness of the most common stats issues, power analysis), but we're still far from doing effective science.
Money is still handed out based on feels, hypes, name recognition (when it's not blinded) for laughably small projects, instead of focusing on establishing longer term ones and/or improving the actual science output (ie. data and hypothesis generation) of existing ones.
(Yes, of course, academia approximates this. Yes, yes. Everything's fine. We'll have a usable model of Alzheimer's any second now! Aaany second. Just let this new totally effective model of depression/obesity/learning/ME-CFS out of the door first.)
Arguably there's a place for both an em-dash and a hyphen. (For your example, a hyphen would be pretty normal style anyway.) But in a world where double quotes is a massively overloaded punctuation mark we probably don't need an en-dash at least.
TLDR; Using the right dashes is about the UX of text. If you don't care about UX of the reader your points are sound.
I however--as a typographer--strongly disagree. Typograpy is both about beautiful typesetting as well as making sure that the information contained in the text is understood easily.
The former is obvious to me. It may not be to you but that doesn't make your reasoning right.
As an analogy, there are quite a few people among my friends & acquaintances who cook occasionally or rarely. They usually share the trait that they care more about eating than how something tastes. Bluntly spoken.
They commonly have one kind of oil in their kitchen (most often suflower) and they use it when the recipe demands "oil".
Usually recipes specify what oil to use. It may say olive oil or peanut oil or sesame oil. They won't have these oils and they don't care.
Even though the effect of using a different oil is profound on many levels (not even only taste). If you care, that is. Same with the dashes. Text looks and reads very different when those different dashes are used correctly.
Which leads to the information part. Why do we have these different dashes? They actually map to spoken language.
A hypen is used to pull things together. A word can be hypenated (should be read as if the hypen didn't exist) or two words can be pulled together (making the pause between them shorter) "ever-changing" is pronounced differently than "ever changing".
An en dash used between points in time or space conveys that. A distance. The spoken pause is usually longer.
And finally, an em dash, like a comma, conveys an even longer pause between the words it separates.
I must say, truth is an absolute defense, and I'm certainly one to both value calories down the gullet crude and efficiently, and to not be terribly aware of what goes on in the font fetishizing circles (no disrespect). But I do understand information coding and that obsession over a good design. For me, its been subways and metros. I've been doing redesigns, obsessive recoloring, obsessively flipping between colors and shapes and other markers in an attempt to compress all that information down to the entropy limit. So I get it. I just don't get it with typography. Its all just letters to my viewing. Once the physical squiggle has been recognized for the abstract symbol it represents, the symbol and not the squiggle is all I remember seeing. I honestly couldn't tell you the last font I ever looked at, let alone if it had serifs or [insert typography feature, no really thats the full extent I know]. I can't say I'd ever noticed (or benefited from) a distinction between dash length. Any component of a letter under a certain length scale I mentally dismiss as likely printing dirt anyway. If it works for other people, well great and mad respect for it.
So I get it. visual design language serves a purpose. An important purpose. Its not the artful navel gazing outsiders think it is. Well, maybe some people are like that, but there really is objective purpose under it all. I'd even say I agree about rules for hyphens touching their neighbors or not. For compound words it should be a train-like-in-construction whereas in a delimiter roll like range of items it should go Boston - DC.
I just can't see having a whole dedicated set of minutely different characters fit for this purpose. I dislike it for the same reason I dislike lego sets that have a particular piece in them which isn't used for anything else in any other set and never will be. It ruins the elegance of the system. It offloads a minor design problem onto somewhere it doesn't belong (namely the character set). I want to know everything while learning as little as possible. Which is why I strive for encodings that express as much as they can with as few elements as possible.
If it were me, I'd just have '.' , '-' and '_' exist at mid, bottom and top heights and be done with it. Don't like my line length? make it whatever length you want either dotted dashed or continuous. Solves every use case, extremely composable, every permutation that should logically be there, is there. .,;:' notice anything incomplete? LHTIFE notice whats missing? qbhrnujdp damn that's frustrating. KRBPF where's the rest of the set?
I agree. I’m usually a stickler for punctuation and spelling, but I can barely tell the difference between these three hyphens. And that is when they are right next to each other. If they were alone in a document? There is no way I could know which is being used. If they aren’t easily distinguishable, I don’t see the point in using three separate symbols.
My takeaway wasn't that the article was being pedantic, just that it was being informative.
What's the point of punctuation? The point is that ambiguity exists in human communication. Where accuracy and precision are important — for example in formal communication — different punctuation marks and rules help prevent misunderstandings.
When engaged in less formal communication, or when the stakes of miscommunication are lower, these rules seem (as you observe) unnecessary. I think that insisting on proper syntax, spelling, grammar, or whatever else in an online forum like HN would be silly. But, internet forums aren't the entire world, and it is conceivable to me that there may be places where people need to depend on the meaning of their message being conveyed reliably.
I know the correct usages but often avoid them as it doesn't confuse the reader, but can break copy/paste usage. I can get by with ASCII hyphen/dash and double-dash for em-dash. I particularly dislike autocorrection of punctuation into more pleasing forms (e.g. smart quotes/apostrophes). This is one reason I tend to do outlining in Github issues more often than G.Docs.
Of course I'm mostly writing about computer/software topics and don't write for publications or a non-technical audience.
...and when the discussion on whether to use [mnxyz+]-dashes has finally been sorted we can start on which font to use to render these dashes, whether they should be proportionally rendered, how to handle ligatures with dashes, to RGBA or not to RGBA dashes, hinted dashes versus unhinted dashes, the big difference between the visually identical dashes in language A versus language B, et ce.te.ra.
The article misses the rather important piece of trivia about technology compromises that what it has been calling “hyphen” is actually U+002D HYPHEN-MINUS, rather than U+2010 HYPHEN. The situation there is a real mess: HYPHEN-MINUS is ugly in many fonts due to compromising between the ideal appearances of a hyphen and a minus sign, and HYPHEN is often missing from the font, leading to falling back to a hyphen from a different font rather than HYPHEN-MINUS from the same font (which is clearly more desirable, but technically unappealing).
A comment led to the follow-up https://www.punctuationmatters.com/the-difference-between-a-..., but it’s still very insufficient, only dealing with MINUS SIGN and assuming HYPHEN-MINUS was exclusively a hyphen. And appears to have suffered from the same replacement of lone HYPHEN-MINUS with EN DASH as this article.
I get why you wrote those words in all caps but it still feels like you’re yelling emphatically about nothing, and that coincidentally sums up how I feel about the rest of this topic.
It’s also very likely to be hypocritical: how many topics on HN are tuned towards a very specific kind of focus/nerdom? And what’s the point of commenting “aha, good for me that I don’t care aboutt this!…?
I guess the difference here is that someone’s boss might complain that they should follow this article, since we all write stuff from time to time.
Agreed it’s very HN. But it’s not just bad. Hackers are usually hard-wired to reduce entropy—we’re quick to point out when something is redundant or unnecessarily ambiguous. Formalia is also used for gatekeeping, which the HN Zeitgeist doesn’t like.
That said, personally I need my different dashes, commas and parentheses for my excessive wavering.
Made it 54 years without ever hearing about mdash/ndash/hyphen distinction. I've just been using the hyphen character for everything. Must have been absent that day in grade school.
This guide and most guides like it tend to miss the most important and powerful use of the em-dash and make it out like you can use it for anything but really they are just missing the wonderful simplicity of the em-dash and how versatile that simplicity is. The em-dash raises and lowers the narrative voice. In fiction this provides a way to provide insight into the narrator; an em-dash tells us we are switching from the story the narrator is telling us to the thoughts of the narrator, a second em-dash or a period lowers the voice back down to the story the narrator is conveying. This is the sense of dialog being introduced with em-dashes instead of being quoted, a new line starting with an em-dash lowers the narrative voice, narrator hands story off to character.
The simplified rules for the em-dash are pretty much intuited and prescribed versions of this which gut the effectiveness of em-dash. In general use an em-dash should be used to denote thoughts without having too restructure/delete what you just wrote to accommodate that thought.
Edit: I oversimplified. Consistency is what is important, using an em-dash like a comma that isn't a comma leads to ambiguity when you also use commas. A writer who avoids semicolons and quotes all dialog can use an em-dash very differently than raising the voice, but they can also use a semi-colon very differently than its standard accepted role, that is what these simple guides miss, the consistency of usage, they just list all of the various ways you could use any given mark and people start using an em-dash to "fix" their long run-on sentence with all of its commas.
The closest thing we have to standard use allows for wonderfully complex sentences which can convey great meaning but consistent and well defined use is most important.
comma - connects independent and dependent clauses
em-dash - raises and lowers the voice
semicolon - connects independent clauses in a more direct way than the paragraph
colon - elaborates an idea
parenthesis - an aside, stated instead of thought
period - end of thought
Question mark and exclamation points do not need to be at the end of a sentence, they can double as comma, semicolon, or colon.
I seem to be missing a nuance of HN's line breaks and formatting.
Reasonable choices. And a good description of a specific use for the em dash. But I think it’s a poor mind that can only conceive of a single use for a punctuation mark.
We could also use em dashes to signal excitedly running from one thought to the next—as if we’re just riffing on an idea—too fast to be interrupted—wouldn’t that be amazing?
Or we can use the em dash to slow us down—to pause and reflect on what we just said.
Or in dialog:
“Perhaps we can use it to signal an unexpected inter—“
“-rogation?”
“No, an interruption.”
“Yes, that would make more sense.”
“Oh! I just thought of something—we could also use it to indicate stunned silence.”
It is easy to conceive of uses, having a consistent style which conveys what you want to most any reader is another thing. If you had wrote all those examples without using the text to explain them the reader would have to stop and think about what you are doing and that is not a good thing.
Too quote myself "consistent and well defined use is most important," and I have repeated this sentiment in most if not every post I have made in this thread. My point to the previous comment was that if you were not consistent it would not make sense, if his examples did not explain themselves than they would leave the reader stopping to figure out what is going on at each punctuation mark. You can break your own conventions within a work but those conventions need to be well established before you do so and you need a good reason to do so, breaking your own conventions because relying on the punctuation is easier than relying on the language or on whim is a terrible idea.
I realized it on my own in an intuitive sense, my writing before I properly learned it shows this use but eventually I read some things on punctuation and fixed my naive use of the em-dash and other punctuation marks. I think "raising the voice" might be the old fashioned term but I can not remember the more current term or even if there is one, put some time last night into trying to find it but search engines are nearly useless and return page after page of sites conflating voice and tone or prescription punctuation guides which just list uses with no care about consistency in style.
I realized it on my own in an intuitive sense - my writing before I properly learned it shows this use but eventually I read some things on punctuation and fixed my naive use of the em-dash and other punctuation marks. I think "raising the voice" might be the old fashioned term but I can not remember the more current term or even if there is one - put some time last night into trying to find it but search engines are nearly useless and return page after page of sites conflating voice and tone, or prescription punctuation guides which just list uses with no care about consistency in style.
Do you have a reference for this? Never heard that particular framing about the narrative voice before. You call it a versatile simplicity, but to me it sounds rather restrictive and specific, to be honest.
Search engines seem to really fail here, they are just giving me more guides like the one here, I can not get them to give me anything about narrative voice beyond conflations of narrative voice and tone. You can see this use in a great deal of literature which uses the em-dash to introduce dialog in place of quotes, I believe Becket would apply but it has been years since I have read him so can not say for certain. Most of the authors known for their long complex sentences follow the conventions I outlined in my edit even if they do not use the em-dash for dialog.
>sounds rather restrictive and specific, to be honest.
Write a single sentence which clearly and concisely includes exposition, thought, aside, rhetorical question, self rebuttal and conclusion without following the "standard" I included in my edit. This is what allows writers like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc to write their wonderfully long and complex sentences and by complex I am referring too meaning as much as structure, we can have great meaning with simple structures but we have to accept a certain amount of ambiguity with that. Sure that challenge can be executed as a paragraph but then it ceases being a single thought, it is a collection of thoughts and that is a very different thing.
If you'll indulge me, I actually think your final paragraph could be copyedited to illustrate all of your suggested 'standard' rules — though in your own rendering you only used commas and periods.
> Write a single sentence, which clearly and concisely includes exposition, thought, aside, rhetorical question, self rebuttal and conclusion, without following the "standard" I included in my edit: This is what allows writers (like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc) to write their wonderfully long and complex sentences (and by complex I am referring to meaning as much as structure); we can have great meaning with simple structures, but we have to accept a certain amount of ambiguity with that—sure, that challenge can be executed as a paragraph, but then it ceases being a single thought; it is a collection of thoughts, and that is a very different thing.
I tried to stick to your 'standard', though you might disagree on some of my choices. I would say I found it a little constraining. Here's an alternative edit that doesn't follow your rules but – I find – creates a more fluid reading of your original words:
> Write a single sentence, which clearly and concisely includes: exposition; thought; aside; rhetorical question; self rebuttal; and conclusion – without following the "standard" I included in my edit. This is what allows writers like James, Joyce, Gass, Gaddis, Wallace, Pynchon, etc, to write their wonderfully long and complex sentences—and by complex I am referring to meaning, as much as structure. We can have great meaning with simple structures – but we have to accept a certain amount of ambiguity with that. Sure, that challenge can be executed as a paragraph; but then it ceases being a single thought—it is a collection of thoughts, and that is a very different thing.
All of which I hope goes to show that these choices are a matter of taste, not absolute rules
That paragraph of mine and most of my posts could stand some editing, I am terrible at editing on a screen and my casual use tends to be comma heavy. I think your edits wonderfully highlight an issue I believe I brought up in one of my posts in this thread, trying to "fix" things by changing the punctuation rarely works and restructuring the sentence(s) is almost always a better path.
>All of which I hope goes to show that these choices are a matter of taste, not absolute rules
Throughout this exchange I have avoided calling these rules and used convention instead, and that is what punctuation use is. Conventions are easy to break but you should have a good reason to do so if you want something to be readable and you must be consistent in your choices. I have tried to emphasize this throughout and I have repeated it many times, consistency of use is what really matters, punctuation marks are pretty much sign posts for the reader and as long as they remain consistent in their use and are well formed than most readers will have no issues figuring out any use.
Imagine if a town decided one day that they could save some money on removing a no longer needed stop sign by simply agreeing that it is not a stop sign, it is a small town and they can just pass the word that the stop sign on third is now a 30mph sign. This works quite well and save some money so now the town continues with this and starts changing the meaning of other signs. It is not difficult to see why this would be troublesome, eventually people will get confused and no one from out of town will have a clue. Thankfully our governments are consistent with their signs and a stop sign is a stop sign.
I gave I look through my books and English is wonderfully ambivalent when it comes to punctuation outside of prescriptive grammars. The descriptive grammars largely (if not completely) ignore punctuation and focus on spoken language, even the Cambridge Encyclopedia of The English Language reduces punctuation "rules" to a single page and reduces hyphen/en/em-dash to a typographical convention and does not say much more than the dash is often used in informal writing to replace other punctuation marks. All we really have here is convention and consistency, can you meet the challenge I outlined without following the conventions I laid out? It can be done but it will be considerably more verbose than it would be following those conventions which is not a bad thing. Authors like McCarthy, Krasznahorkai, Ellman, Bernhard have all built their style around breaking those conventions (yes, two are translations when it comes to English but they break the conventions in their own languages as well.) Even Joyce breaks the convention and he does it within single works, switches between adherence and breaking, but not many have pulled that off in the way he did.
It is a really complex thing and part of what makes English literature what it is. We have conventions which have evolved over time when it comes to punctuation and we have prescription, but we don't really have rules unless you are writing tech documents or journal submissions. It comes down to having a clear and consistent use more than anything else and using every punctuation mark for any accepted use based on whim is not clear or consistent.
The problem is that the real world, human thoughts and other things that language needs to try to express are not "sane." So if we are to have a common basis for communication, the guidelines will tend to get "insane."
This is way too much pedantry and hyper-hyphen-focus. Honestly, I don't care about endashes or emdashes. I've never seen them in business or personal writing, and I probably never will. They add nothing to anyone's communications.
Perhaps, typesetting still uses these, but that's okay. They can keep doing so, since these probably add aesthetic appeal to how flyers are designed.
I also noticed a pundit-battle brewing in the depths of the hyphen-m&ndash-soup.
The article:
Let’s make that even more clear.
THE EN DASH IS ABOUT AS WIDE AS AN UPPERCASE N; THE EM DASH IS
AS WIDE AS AN M.
Yet, from another dash-hyphen pundit... [1]
En and em dashes aren’t called that because they’re as wide as
a lowercase “n” and a lowercase “m.” They’re called that
because those are the specific typography jargon words that
refer to the height of a physical piece of type (the “em,”
also called the “mutton” to reduce confusion) and half that
height (the “en,” also called the “nut”). An em dash was
originally as wide as the font is tall.
> I've never seen them in business or personal writing, and I probably never will.
En dash is all over the place in personal/business writing, even just in email, thanks to Word and Outlook autocorrecting a hyphen to an en dash whenever it's between two spaces (rightfully in my opinion). If you've never seen it then that surely says more about what you notice than the content of what you've read.
That doesn't necessarily contradict your point – if you never notice the distinction then what's the point? But it's different from how I read the implication of your post.
(Funnily enough, without thinking, I put an en dash in the paragraph above by holding down on hyphen in the Android keyboard, and only caught myself after I did it.)
You might not be able to pick out the bassline in many of your favorite songs but that doesn't mean you wouldn't miss it were it not there.
---
"Spelling, grammar, and punctuation are a kind of magic; their purpose is to be invisible. If the sleight of hands works, we will not notice a comma or a quotation mark but will translate each instantly into a pause or an awareness of voice [...] When the mechanics are incorrectly used, the trick is revealed and the magic fails; the reader's focus is shifted from the story to its surface."
- Janet Burroway, Writing Fiction: A Guide to Narrative Craft
Following this thread, the discussion isn't about the existence or absence of punctuation. The discussion is about the case of three specific punctuation marks, which appear extremely similar if not identical. These punctuation marks are being discussed after reading an article about their differences, which are only apparent to those among us who find memorization more important than clarity.
In this exact context, the question is whether all three punctuation marks are needed when literally none of them is distinctive enough as punctuation from the other two. If you read the comment to which I had replied, you will see them also make that point.
FYI this is a pretty condescending response to come back to. From the site guidelines:
> Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.
Moving on from that.
> In this exact context, the question is whether all three punctuation marks are needed when literally none of them is distinctive enough as punctuation from the other two. If you read the comment to which I had replied, you will see them also make that point.
Indeed I did read it, I disagree.
Going back to your original comment I don't think it's reasonable that you've never seen an em-dash in "business or personal writing" but I would totally accept that you haven't noticed the punctuation in those contexts. This is partly my point, if these marks are used correctly then it makes sense you've never spotted them.
I'm saying that the people who read literature containing en and em dashes would notice the difference were they not there. I'd echo what another commenter said: these marks wouldn't be missed until they're gone but we would definitely miss them.
I wasn't expressing condescension but that you would have had a better understanding of what was being discussed after you would have read the thread. You should work at following the exact site guideline that you quoted, instead of making assumptions about my day-to-day communications and commenting about those non-existent communications.
> Going back to your original comment I don't think it's reasonable that you've never seen an em-dash in "business or personal writing" but I would totally accept that you haven't noticed the punctuation in those contexts. This is partly my point, if these marks are used correctly then it makes sense you've never spotted them.
What I was saying is that I don't see them in my day-to-day activities, which I don't. You are making assumptions about what type of communications I am involved with daily, and the types of people I communicate. I communicate with cryptographers and security professionals who all use mono-spaced text. I also communicate with C-level people who barely need to use punctuation other than a period. They have mastered brevity and communicate exceeding well.
It would make far more sense if you wrote, "I don't think it's reasonable that chownie has never seen an em-dash in business or personal writing."
The comment you're replying to makes perfect sense in exactly the context you describe, so your reply seems quite bizarre. Maybe you didn't read it properly? (See how unhelpful that tone is?)
For the analogy about not explicitly hearing the baseline, but the music still being affected by it: maybe you interpreted it just a little too directly? The analogue to removing the baseline is not the total removal all three of those punctuation marks. Instead, it's the removal of the distinction between them. I think it was pretty clear (and apt).
> if you never notice the distinction then what's the point?
Given there is usage of en dash in the wild as you mentioned, there's a possibility this may be a case of "you don't know what you got 'til it's gone."
Ah, I am cut to the quick. In truth, one must sometimes be cruel to be kind. To one such as I, neither the hyphen nor the dash are a dish fit for the gods. In tragic travesty, it's all Greek to me. All that glitters isn't gold! [1]
[1] a bunch of Shakespeare's sayings scraped together, after they were trampled in a mosh pit.
> I don't care about endashes or emdashes. I've never seen them in business or personal writing, and I probably never will.
There’s an en dash in the first line of text on apple.com right now. There are en dashes, em dashes, and hyphens in the most recent press release on that site, all used correctly.
> This is way too much pedantry and hyper-hyphen-focus. Honestly, I don't care about endashes or emdashes. I've never seen them in business or personal writing, and I probably never will. They add nothing to anyone's communications.
You have definitely seen them. All professional writing outlets, like e.g. the New York Times, use em-dashes, curly quotes, and other “typographic” characters that one is supposed to use in American English.
And newspapers in my own country follow the typographical rules. Even though no one uses it in informal communication on HN or FB. (Well, some on HN do.)
Except, I didn't write I hadn't seen them. "I've never seen them in business or personal writing".
We can discuss that I chose the word "seen", when I meant "noticed", but there is no doubt that I didn't write what you intimated. I have seen the dashes in formal writing and in newspapers.
A too-hurried reading is worse than not reading at all.
> THE EN DASH IS ABOUT AS WIDE AS AN UPPERCASE N; THE EM DASH IS AS WIDE AS AN M.
> They’re called that because those are the specific typography jargon words that refer to the height of a physical piece of type (the “em,” also called the “mutton” to reduce confusion) and half that height (the “en,” also called the “nut”).
An em was traditionally the width of an uppercase M and an en half that (around the width of an uppercase N). Nowadays, this relationship doesn't necessarily hold: one em is equal to the font size (e.g., a 12 pt font has one em = 12 pt).
Ironically on a punctuation blog, it looks like he has a punctuation typo in his title. In the headline, the semi-colon after "hyphen" should actually be a colon. So the corrected headline is "En dash, em dash and hyphen: what’s the difference?"
A colon is used in this context, when you're introducing the question that follows.
> Some people prefer the way a “space-en-dash-space” looks.
I think this isn’t just a matter of personal preference, but it’s also largely a cultural thing – in German, for example, the “space-en-dash-space” form is common.
This is true for a lot of other punctuation as well. For instance, in Germany, we quote „like this“ instead of “like this”. Whereas in Switzerland or France, it’s common to quote using Guillemets, as in «Hello there!». This style can also be found in German texts, though it’s less common than quotation marks, and it would typically be used »inversely«.
> in Germany, we quote „like this“ instead of “like this”
This is also the traditional style in Dutch; it's what I was taught at school. These days many just use "upper quotes". You can still find the traditional style in books and some newspapers, but others have switched over the years.
In traditional Ethiopian you would use ፡ as a word separator, and ። as a full stop. Over time, people have started to "just" use the space as a word separator. There's some Wikipedia pages that mix both styles; for example on [1] you can see ፡ being used for the first three paragraphs and then it switches to a space. I rather like being able to see the evolution of language/typography on a single page.
Thanks. I wasn't aware of this type of script either; I like it. Sort of a "missing link"(of course there is no historical relationship) between Kana in Japanese and Hangul in Korean.
Since you're quoting France, it's worth noting that there, double punctuations (?:!;) are preceded by a half-space (although in practice it's always a full space). Likewise, guillemets are surrounded by spaces (the space inside the guillemets might be a half-space, I'm not entirely sure). So it would be « Hello there ! »
Ahhhhh, thanks for that! I'm German speaking, and I must admit I questioned the intellectual capacity of some people I conversed with, due to that. In German there is even a slur for it: "Deppenleerzeichen" (fool's whitespace). Now that clears things up.
Just a convention. I used to snigger at the English language convention of capitalising the next sentence in a letter/email after the address - after all, you're still in the same sentence, so why capitalise it. But, it's a conventional thing, so now I do it myself.
Eastern Europeans often drop articles because that’s (apparently) what they do in some Slavic languages. That’s a minor second/third-language quirk, not about an intellectual deficiency (lack of capacity).
Of course, some extra whitespace is even more harmless.
I think most people have these biases in one form or another. It's mostly a matter of your experiences I've found. Rural dialects especially trip me up. I don't know a lot of them personally, and most of the ones I see on TV are talking about.. rural stuff. Also, I think a lot of rural people who go to universities naturally end up toning down their dialects because they tend to be in the minority. So it's kind of rarer to see an academic with a thick southern accent. It will often be less pronounced. On the other hand Eastern European English accents have the opposite effect because most of the Eastern Europeans I've seen speak at length are chess grandmasters and physicists.
The only thing we can really do is try to notice these biases in ourselves and ignore them as best we can.
I never heard about the "Deppen Leerzeichen" in the context of punctuation, but always when German texts split up compound words with a space for no reason.
An unbreakable half-space, to be pedantic (though in this case the pedantry makes sense: you don't want your punctuation mark to end up on the next line)
And for extra fun, while the French word for space (espace) is masculine gender (un espace) for most its meanings, in typography, it's feminine (une espace).
I consider a crime not to have any spaces between em-dashes and adjacent words. Traditionally, I guess, there were spaces of different sizes. Hair-thin spaces were typeset before and after em-dashes --- that's what I do in LaTeX using (\,). But, because different sized spaces have never been a thing on the Web, let alone plain text, people have preferred to not use any spaces, for some reason.
I wouldn't call it a crime, but a convention. In Europe it's an n-dash with surrounding spaces, in the US is an m-dash without spaces. For me, the former is nicer, but crimes are maybe a tad more serious.
Unspaced em & en dashes tend to stay glued to the surrounding words when there should instead be "word" wrapping at one end or the other of the dash. It is a crime against text aesthetics. We have met the criminals, and they is us - software types.
Not to mention, ems and ens are not Ascii and thus not strictly kosher.
That looks well thought out. I use a QWERTY layout with similar reasoning applied to the Option/AltGr levels (but entirely different in specific placements) and I routinely type various dashes and quotes without conscious thought, any more than I consciously think about Shift-level punctuation.
In Spain the RAE (equivalent to the Oxford Dictionary) recomends «this», but you will almost never find it except in professional printing. They are not in the keyboard, so everybody uses "this".
It's a shame when technology fails us in this way - I just mean that computers are created to be our tools, and if we want to easily write «this», we can make that happen. If we only have people with this mindset (computers are our tools) in the right places.
I get you mean chatGPT has solved the problem, but it feels as if its solved the problem without answering the deep questions. We still don't really get how the brain does it or the answer to any of the deep linguistic questions, instead we get two systems capable of language which no one understands. But at least its useful! So maybe there are natural language parsers yet to be written, for nothing else than to finally test our understanding of natural language parsing.
There are many other quote styles - my language uses „these signs” (which we call "ghilimele", similarly to French "guillaumets").
EDIT: Seems HN is eating up the right signs... You can see them on Wikipedia here, they essentially look like two small commas: https://ro.wikipedia.org/wiki/Ghilimele
Oh, you quoted correctly, but the display of the right quotes is messed up. They should go from upper left bottom to upper right top, but instead show as upper left top to upper right bottom.
To clarify, I was referring to the mere technical fact that only if you type in a character like `“` (U+201C, “Left Double Quotation Mark”) using one font, it isn’t guaranteed to be rendered in the exact same style in a different font.
E.g., when I type a comment on HN and enter said `“` in the input text field, it uses my system’s default monospace font (Courier), which renders the character so that the stroke appears to go from bottom left (thick) to top right (thin). After I submit my comment, HN uses Verdana (the one from my system), which renders the very same character so that the stroke appears to go from the top left (thick) to the bottom right (thin). It’s the same Unicode character, but both fonts happen to render them differently according to how the font maker laid out and mapped the respective characters. (I can observe the same behaviour when I compare both fonts in my word processor, so it’s not HN-specific.)
“” look like 66 99 in conventional serif text fonts, but have wide variation in sans-serif and decorative fonts where they often resemble ‶″ or ″‶ .
„‟ are more consistent in current computer fonts by virtue of their Unicode names strongly suggesting a particular appearance.
“ U+201C LEFT DOUBLE QUOTATION MARK
” U+201D RIGHT DOUBLE QUOTATION MARK
„ U+201E DOUBLE LOW-9 QUOTATION MARK
‟ U+201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
〝 U+301D REVERSED DOUBLE PRIME QUOTATION MARK
〞 U+301E DOUBLE PRIME QUOTATION MARK
I do often wonder whether we should maintain traditional typography when moving to a digital age because punctuation evolves as language does. If we’ve deemed it unnecessary to have seperate symbols for each of the dashes and everyone uses language that way then that’s fine.
We can also ask this question about smart quotes, you’ll notice I’ve been using the U+2019 as the apostrophe here and I could “quote” like this. It's a question of how much ambiguity it causes, how easy it is to input, and how subjectively aesthetically pleasing it is.
My personal opinion for hyphens is:
- Ambiguity: most can be cleared up with spaces, and for examples like 3-8 if it’s numbers we can tell it’s a range from context
- Ease of input: one character is a lot easier to decide between than 3 (or 4 if you include minus), and if there are rules for software to be able to input the correct character every time then the differences in characters become redundant
- Subjective aesthetics: I quite like the consistent compactness of the single hyphen
And for quotes:
- Ambiguity: They show when quotes start and end which is quite nice and we can have nested quotes. But these are things that are not critical to meaning and simply make it easier
- Ease of input: Usually automated but can absolutely tear through code if pasted in the wrong place. If we deem these smart quotes useful enough then they can coexist with typewriter quotes peacefully if we do not run the quote formatting on code blocks (which is where code should be anyway)
- Subjective aesthetics: I do like the look of smart quotes but would be willing to use straight quotes
The pragmatic thing is to stay glued to the typewriter and then escape our nested strings with Unix toothpicks everywhere.
> Ambiguity: They show when quotes start and end which is quite nice and we can have nested quotes. But these are things that are not critical to meaning and simply make it easier
Typographic conventions go further than that.
In Norwegian it’s `«»` for one level of nesting. For nested quotes you are supposed to use something else. Maybe `‘’` (single quotes) for the second level and then `“”` (American English double quotes).
Maybe American English uses `“”` and then `‘’`.
In my opinion that’s not necessary. At least for text storage.
Part of my complaint about that is that although I think the different punctuation marks are great, using them is a pain because of keyboard layouts.
It's easy to find a hyphen (or something close enough) on your physical keyboard, but there's no em dash. OSes also make it a pain to automate even when they claim otherwise.
I go out of my way to use em dashes but do I think others would? No way. So is lack of use because of lack of utility or because of idiosyncrasies in keyboards?
Hyphens are great for some things but are too short to visually offset text.
The Mac layouts handle the dashes well in my opinion (quotes not so much). Option+‘-’ is ‘–’ (en dash), Option+Shift+‘-’ is ‘—’ (em dash). Option is equivalent to AltGr in the Windows PC world.
No, ≈ is used for approximation, ~ is just the most similar ASCII character, and it became ingrained by people used to using old computers. Just like * is not a multiplication sign, but × is.
In other words, tilde is used for approximation just like the asterisk is used for multiplication and “literally” is used figuratively. We can argue over those uses being correct or incorrect, but they are used like that.
Thus I agree that using tilde for numeric ranges would be confusing. Might as well just use a hyphen, which is easier to type and most people won’t notice the difference from the correct character (en-dash).
Using that form of reasoning, it could be claimed that, say, “espresso” is pronounced ”expresso”, because some people do pronounce it like that.
But that would be disingenuous, since “is pronounced” does not generally mean “is sometimes, by some people, pronounced”, but “is supposed to be pronounced” or “is properly pronounced”. The same goes for “tilde is used for approximation”; no it isn’t. If would be different if scbrg had written “tilde is sometimes used for approximation”; it would have indicated a possible interpretation of the first meaning, and not the second.
> If would be different if scbrg had written “tilde is sometimes used for approximation”;
Oh, dear lord. I apologize for leaving out this very important word. I thought it was fairly clear that I didn't mean it was the only symbol used for approximation, pretty much like how, I don't know... nothing is the only thing used for anything.
Whatever phrase, symbol, word or tool in general you find, you can be fairly certain that there's something else that could be used instead.
In the really real world, people tend to use the symbols that are easy to type with their keyboards. Ironically, this is a bit like what TFA complains about; people always use the hyphen that's available with one keystroke when in fact they "should" (for some arbitrary value of "should") use a handful of different ones. And they use tilde for approximation, because nobody knows how to type a fucking ≈. You'll also note that they use " when they "should" have used “, ” or any of the umpteen other variants of quotation marks.
When it comes to ambiguity, which was what this sub thread was about, how things are often used is actually quite important. Because, you know, it's what people actually write that you have to disambiguate, not what they should have written.
Using × or ∙ for multiplication is, IIUC, a cultural differentiator – just like English uses . as a decimal separator, but many Europeans use , for the same purpose. But in Unicode, × is “MULTIPLICATION SIGN” and ∙ is “BULLET OPERATOR”, and * is more visually similar to × than ∙, so I assume that’s where it originates.
It is easy to say this doesn’t matter, and personally, I couldn’t care less which is used. However, professionally, I have twice in the past two months had a deal with text that was edited by line editor for my organisation, where they strongly criticised our use of these punctuation markers.
And, after much cursing, and my team spending time changing the text, I reflected, and came to like those punctuation markers. Took me a long time, but I have been converted.
Moving through text using Cmd + Left/Right arrow will jump over two words if there’s an em dash between them with no spaces. As a frequent em dash user that was very annoying, so I switched to adding spaces — to hell with the APA.
In Polish em dash is supposed to be surrounded with spaces. I got that rule ingrained in my subconscious so heavily that I feel very uneasy looking at em dashes without spaces even in English. Same way if someone didn’t put a space after a full stop. So I’ve decided to go the British way and use en dash surrounded with spaces. And, after doing that, em dash really feels way too long. :)
Interestingly, if I copy that first character in the table early enough in the page load, it's a hyphen. If I copy it later, it's an en dash. Considering that this article is from 2010, I assume there's some JS added in the last 12 or so years that's autoconverting it.
I guess they respected their own recommendations: "when you're trying to illustrate what a hyphen looks like" was not one of the recommended uses of a hyphen!
I should also note that this whole point seems at best a point for typography geeks. These are three almost identical marks that have very similar uses. I am completely convinced that no one has ever disambiguated a phrase by noticing that something is a hyphen and not an en-dash or vice-versa.
For a somewhat more advanced (and IMHO much more beautifully typeset) but still succinct overview of em dash (and some other dashes) in practical use, see https://twos.dev/dashes.html.
Suitable for those who are familiar with punctuation basics but may want a refresher, and AFAICT gets some things more correctly (e.g., the numbers in a range are generally separated by a figure dash, not en dash).
It ... somewhat ... saddens me that HN's parser doesn't distinguish these as Markdown-based comment systems do:
Hyphen: -
En dash: --
Em dash: ---
On usage --- I find the practice of using the em-dash without bounding spaces (typical of most modern style-guides) is visually distracting and more difficult to read than when spaces are provided around the punctuation (as I've done here, and my stylometric stalkers may file as a personal identification tell).
For a long time, I thought there is actually a word "noone" pronounced with an "oo" sound like "noon". You know, like no one says "whom" anymore but you still see it written.
I'm surprised no one has brought up the excessive waste of energy that has occurred when m-dashes have been misused when the correct character should have been an hyphen or an n-dash. Those additional pixels have no doubt contributed to kilograms of mankind's carbon footprint.
Not to mention the extra key strokes required to type an em dash! They have surely accelerated the onset of people's carpal tunnel syndrome by as much as a couple of minutes.
I can't get myself to care. I've written two novels, and hundreds of thousands of words of articles and internal documents, and I can't for the life of me remember the rules for this, and neither can most other people. For my novels, my editor fixed it, because there are the odd pedant that cares and leave negative reviews of these things are not "right". For everything else I just use hyphens. It does not matter - it is clear from context.
Is there a similar article that we can get to chide people that use the double dot ellipsis (..)? It's not a thing, but I see it everywhere from casual conversation to business's websites. I despise it.
Oh man, I didn’t know other people did this, I invented it in my friend group. Never in proper writing, only texting. To me it conveys a tone that other punctuation can’t replicate. For example:
Ok.. - Ok, but I’m unsure about this
Ok - Ok
Ok… - Ok, but I’m sad or resigned about this, and I want you to address that
Ok…? - I don’t know where you’re going with this, explain yourself
Even writing “Ok, but I’m unsure about this” isn’t the same, because that calls more attention to your hesitation. If you don’t use “..”, your only alternative is to spend a minute basically doing translation work between inflected English and monotone English, maybe arriving at something like “Ok, I’ll try”, or more likely just give up and communicate in lower fidelity.
One distinction that I had missed for a long time is that the en-dash is used instead of the hyphen to connect words in the “and” sense, such as “read–eval–print loop” or “Myers–Briggs personality type”. I find that the en-dash makes it a bit more clear that the words are sort of “on equal footing” and it’s not one word modifying the other one.
They are only mildly similar in appearance, and they have wildly different uses and purposes, all attested for hundreds of years. Unification would very obviously be a terrible idea. ASCII unified these and more for technical reasons, and it meant that nuance or correctness was occasionally lost, people doubled and/or tripled the character to make dashes, and the results were just plain ugly.
Look, even that HYPHEN-MINUS unification that ASCII foisted on us is problematic without considering dashes, because HYPHEN and MINUS SIGN were often fairly different in appearance, and still should normally be at least somewhat different, even after a few decades of misuse due to the bad unification. A hyphen is much shorter, typically lower-placed, and in serif fonts often slanted (the left end lower than the right), whereas the minus sign is the horizontal half of a plus sign.
> Unification would very obviously be a terrible idea
Why? In the entirety of my school education I never heard a mention that different kinds of dashes exist at all and I still have no idea what their individual purposes are, yet it never had any impact on my understanding of text. Maybe I'm overlooking something, but if people have no problems with reading/writing despite "decades of misuse due to the bad unification", then it's not so obvious to me how unification is such a bad idea.
They’re both drawn and used differently. Just because you can error-correct (and may not even know the difference—though I’d honestly expect almost all native English readers to at least recognise some difference between a hyphen and a dash) doesn’t make it right. It’d be similar to unifying 0 and O (which is likewise something that has been done before in some situations for technical reasons).
That's still not an explanation for why unifying them is bad. Most handwritten text has more variation in every single symbol than the printed dash vs. hyphen. I don't think anyone able to comprehend written text has trouble with that.
> It’d be similar to unifying 0 and O
Which similarly has no noteworthy impact on text readability because digits and letters aren't mixed in English words. Look at a single handwritten dash or zero without context and you won't be able to tell for any of the two whether it's a dash/hyphen or 0/O. And yet that's been good enough in practice for centuries.
To me your argument sounds more like a spaces vs. tabs debate - yes, technically a tab has a different meaning and usually a different length, but in reality it's completely irrelevant when readability of code is concerned.
The unification of minus, hyphen, en-dash and em-dash is entirely natural. Back when I was in school ~25 years ago, in newly-non-communist Romania where ASCII was at best a distant idea, no one taught any difference between these signs. We did have different names for the minus sign and the dash used in writing (and Romanian uses a lot of dashes), but that's it.
We were taught to use the exact same sign for compound words, for other Romanian orthography, for separating words at the end of a line, and as one option for introducing parenthetical clauses - like this. And it was the same sign we used for minus in math class. A slightly longer dash was often used for one particular purpose*, though even that was not explicitly stated, and you wouldn't get lower marks even in calligraphy classes for using shorter dashes instead.
* Romanian uses these longer dashes when representing lines of dialogue, especially in literature, as in:
Minus wasn't mentioned in the article, but the distinction between hyphen-minus (U+2D) and minus sign (U+2212) is very important.
When you put plus and minus side by side (+−) such as in a financial context, the two horizontal lines should be in the same vertical position and both characters should have the same width. Whereas plus and hyphen-minus (+-) will have the hyphen-minus narrower and higher/lower.
Awesome. I've been using em-dashes ever since I've been able to type them (and sometimes before with things like — in Markdown/HTML-supporting contexts). PowerToys Quick Accent finally added en- and em-dashes, and I've enjoyed being able to type them anywhere without having to sacrifice my clipboard.
For a little more on these with a focus on CSS and practical issues, I wrote an article a couple years back called Advanced Dashes: https://twos.dev/dashes.html
I don't think it's so much a matter as en-dash breaking the rules and em-dash following them. Robert Bringhurst argues, in Elements of Typographic Style, that em-dash is one of several Victorian innovations that he feels impede the flow of text, and en-dash (with spaces) is a more modern, typographically balanced alternative.
I conclude there are therefore two different rules you can choose from (as in many points of style) and I – a hacker, artist, and typographic enthusiast – picked the one I like best.
Though I revert to em-dash when parodying Emily Dickinson—as one should—
My opinion on this matter is entirely driven by the fact that I got used to easy '--' and '---' en/em dashes in LaTex. So now when I use Word, I added autocorrection triggers for '--' and '_-' to the same effect.
But, it makes this whole em dash-filled thread very confusing to read as everyone (from my viewpoint) is using en dashes (--) where em dashes belong!
En dashes are ugly bastards that have very little benefit. Word and Google Docs inaccurately convert hyphens into en dashes, and I never have gotten a satisfactory answer why. I used to see en dashes to connect a compound modifier to yet another word, like “billiard-ball–size hail” (that’s an en dash between “ball” and “size”), but I could never trace this “rule” to any style guide. Another popular case was a proper noun connected to another word, like “New York–based author.”
Computers require contortions to create em and en dashes (especially on PCs), so they’ve mostly gotten ditched. The choice of spaces or no spaces around an em dash created with two hyphens is largely stylistic, but I see spaces far more often than the closed-up version. (Oddly, in books and magazines it’s opposite, at least in the U.S.; the closed-up version of the “true” em dash is more prevalent.)
Hyphens and hyphenation rules really do deserve more attention, in my opinion. I was an editor in the publishing world before moving to tech 15 years ago, and a lot of the hyphenation could get really stupid in that universe. E.g., nobody’s going to misread “ice cream cone,” but some copy editors will insist on “ice-cream cone.” Same with “credit-card bill.” But there are lots of very technical documents I edit now that scream out for clarification, and the humble hyphen has been a godsend in making this painfully boring and headache-inducing matter easier to read.
Also, I’m on mobile so can’t verify, but it looks like the author is using an en dash throughout his post when an em dash is called for.
> Word and Google Docs inaccurately convert hyphens into en dashes
In Word's case, and I think GDocs as well, it is a switchable autocorrect setting, that mostly (not perfectly, because you can't do it purely structurally without semantic analysis) does it correctly, not inaccurately, AFAICT.
> En dashes are ugly bastards that have very little benefit. Word and Google Docs inaccurately convert hyphens into en dashes, and I never have gotten a satisfactory answer why. I used to see en dashes to connect a compound modifier to yet another word, like “billiard-ball–size hail” (that’s an en dash between “ball” and “size”), but I could never trace this “rule” to any style guide.
I’m fairly sure that the Chicago Manual specifies this rule (my older Chicago Manual isn’t handy, and the non-exhaustive information in the public FAQ [0] doesn’t cover it, though it does address the closely-related rule on using an en-dash to connect a modifier to an open compound.)
> Another popular case was a proper noun connected to another word, like “New York–based author.”
I’m pretty sure that’s not because New York is a proper noun, but because it is an open compound.
> Computers require contortions to create em and en dashes
They require a tiny bit of setup to make it easy, outside of the applications which already make it easy.
> Also, I’m on mobile so can’t verify, but it looks like the author is using an en dash throughout his post when an em dash is called for.
Some style guides call for an en-dash (usually, set open) in the places where the more common rule is to use an em-dash (usually, set closed.)
> I’m pretty sure that’s not because New York is a proper noun, but because it is an open compound.
You are correct -- it's not because it's a proper noun. The rule I recall is that if a proper noun is also a compound modifier, the en dash is warranted. (I think this is in Chicago.) The example you gave from Chicago's FAQ is sort of the same rule but with the en dash coming at the beginning.
And the author of the original blog post caveated his use of the en dash (instead of the em dash), which I missed when I read it the first time, and speaks to your point of some style guides calling for an en:
> Choosing between the en dash or em dash is not a big deal. In my writing (as a manager corresponding with government officials and politicians, and also as a marketer communicating with real people) I use ‘space-n-dash-space’ instead of the em-dash – just to keep everyone happy.
It's even more complicated as there's an additional near-identical character in Unicode. Copied from Wikipedia:
- is a hyphen-minus (ASCII 2D, Unicode 002D), normally used as a hyphen, or in math expressions as a minus sign
– is an en dash (Unicode 2013).
— is an em dash (Unicode 2014).
− is a minus (Unicode 2212).
if I use a dash on line one of some work - like this, the second line's dash is completely different (even when using columns.
So, I really don't trouble myself with the whole idea of "The primary importance of using the correct dashes is that it preserves a good flow for reading and is paramount to micro-typographic balance: ".
When I learned the uses of the three dashes back in design school, I felt like I had learned some new English superpowers -- and the rules are easy to remember, too.
It's silly to me that some folks find it cumbersome, rather than an opportunity to be more precise in their writing, not to mention, more aesthetic.
I seem to have taken a weird personal style of using dashes in the context that an en dash is used for a splitting clause – like this – whereas an em dash is used only before a finishing clause — like this.
Curious to see if there are any uses for steganography with this (or just identifying people by writing style).
In traditional typography, the em-rule is optional and can be replaced
by a (spaced) en-rule.
In TeX, en-rules are represented by two hyphens (--) and the em-rule
by three (---). This is always approriate except in typeset documents
(i.e. PDFs produced by TeX). Either you typeset something or you don't.
It isn't that punctuation doesn't matter; however, non phoneme based typographical elements are really hard to defend. Worse, characters that are not present on the vast majority of input mechanisms? Really? This is the line people are going to draw in the hopes of not dying?
Personally I’ve always preferred a minus sign closer, like the latter. While the subtraction operator looks better as the former. But I think this is just a calculator-ism that has infected my math syntax.
But especially for matrix inversion, the super wide subtraction symbol just looks awful to me. A little calculator style minus symbol is also nice because it’ll clear the matrix more easily…
The issue of non-ASCII-constrained environments is that it's still not easily accessible on most keyboards.
I do know and use the compose key but it's not the same as having a standard key for it. Trying on a mobile device, long pressing the dash key there suggests 2 dashes (not sure if the second choice is en-dash or em-dash), which is some but that's not the 4 types discussed here.
Do you know one where it's easy to type? There are many international keyboards but I don't think any has that many dashes. Compose key is the best I think.
It’s easier to type HYPHEN-MINUS‐the‐keyboard-key, but text should be displayed using the appropriate glyph, depending on its semantic meaning, which is never HYPHEN-MINUS‐the-glyph.
(This discussion is similar to the classic net discussion about TAB‐the‐ASCII‐character versus TAB‐the‐keyboard‐key, with some people having trouble conceptualizing the difference.)
ISO 8601 specifies to use a hyphen. In freeform text this also makes sense in that you could use an en-dash to specify a date range: 2023-03-12–2023-04-10. (ISO 8601 uses slashes for time intervals.)
This gives me a bad feeling as I am immediately imaging copy&pasting some code with dashes replaced by this aesthetic Unicode and having to fix every occurrence of it before I can run it...
I've always known about the three, but the author is correct saying it's about availability. You used to have to go into the "special character" pop up, click on the n- or m-dash, go back to the document and paste it in. In more formal documents, maybe I'd do that, but most of the time it is just a big pain in the ass, and most people don't know the difference, so why bother. I do use the space dash space now for the n-dash. But where is the m-dash??? Usually under the "Special Characters" option I went into Tools menu to check if Special Characters is there...nope. Format menu choice?? Nope. Insert? Ah! There it is...after having to look through each of the above very slowly to see if the Special Character option is there. Now I have to look at the characters - there used to be very few special characters and you could find the m-dash. Now there are thousands of special characters and I don't have the time to look through them all. So now I have to go to the help documentation to search for the m-dash.
OK, on in the help search box, nothing comes up under "m dash," "m-dash," "em-dash," or "em dash." Not even showing up under "dash." Fuck. OK, so now I have to go the ASCII table to find the ASCII decimal code. I found it - it is ASCII code for m-dash is 151, but how do I put that in the document??? I search online help - no help.
I go back to the Special Characters option under (Insert menu. OH! There's a search box on top. I type in "m dash" THERE IT IS!!
So I had to go through all that, just to find the m-dash. Why? Sheesh, what a nosebleed. You might say, "Of course, why didn't you do that in the first place?" Because 1) I'm an imperfect being, and 2) I've used all of those other ways before - I didn't start on computers in the last 5 years, I've gone through a lot of changes so I know a lot of ways to do the same thing. And from app to app, some still work one way way, some not.
I didn't do my old standby, though. That would have been my next step — just go to Google search, type in m-dash as I'll always find it there, then just copy and paste the m-dash. That almost always works. Why didn't I do that first just now? Because I just wanted do it within the word processor, because that's the way it "should" be. And eventually was, after much work.
So, fuck all the special characters. I just use commonly understood equivalents if I can.
But, the point is that I've always know the difference between all three, it's just — why even bother? It's a colossal pain in the ass.
How about stop fucking with people's brains. Hyphen - with spaces when needed - works just fine. I personally prefer round brackets ( have no idea if it is "legal" though ).
I understand it matters in publishing but then leave it to them to make things pretty.
Does anyone still care for the en-dash today outside of very formal literature?
It seems to be quasi dead and anyway often indistinguishable if not written side by side with a hyphen. Furthermore I would argue that if the meaning of your sentence is ambiguous if a hyphen instead of an en-dash is used you should reformulated it.
In an era when “they” can be a singular reference, worrying about the typography of various dashes seems like a pointless concern. Whether a hyphen, en-dash, or em-dash is used has far less impact on clarity than pronoun-antecedent disagreement.
and the modern hyphen that sits on the same line as the text is from… Gutenberg. 1455. 15th C.
maybe after almost 700 years we can stop complaining that non-binary and trans people are ruining language, and start accepting that they can be singular and plural. and that it has uses to refer to persons of unknown gender or as a standin for a known gender. it’s pretty common and is not going to change because you don’t like it.
Not according to all the grammar I ever learned. It’s not a trans/gender thing. It’s lazy and unclear writing. It would be much better to have declared a new singular gender neutral pronoun than having to disambiguate “they” every time it is used.
Everyone complaining about pedantry here, while I'm thinking this doesn't go nearly far enough. I'd like to propose that all punctuation marks should come in minutely different sizes with different meanings, not just the various dashes.
Take the full stop. There should be a second version that's about .25 pts larger and means an emphatic full stop (not to be confused with a bold full stop, of course. That would be dumb). But why stop there? Let's add another one that's raised a tiny fraction from the baseline. About 0.1 pts should do it. This one should mean a shorter pause, somewhere between a full stop and a comma.
And anyone who can't discern the difference, such as dyslexics and the short sighted, should be publicly whipped and forced to wear a hat off shame for a week. The hats, of course, being very slightly different shades of grey depending on which incorrect punctuation mark they used.
m dash: --
n dash: -.
.. I take it few people find morse code puns funny anymore.
Seriously, what's the point of this pedantry. What does having 3 basically identical characters add to the language other than a pointless rules for insufferable pedants to power trip over. We've all been using - just fine. On what basis does the person writing this article believe these rules matter, are important, disambiguate language?
Call me a hopeless philistine, but I say down with the dash. One symbol is fine for word-compounding, numerical ranges, subtraction, mid word line breaks. No one needs an em dash to tell them pages 3-8 is not a compound word.