The Tragedy of the Common Lisp: Why Large Languages Explode

btilly · on June 21, 2019

After programming for 20 years, I've noticed that programmers are aware of the potential benefits of any abstraction that they have internalized, and blithely unaware of the costs. Because, having climbed that learning curve, it is now free to them.

The result is that programmers introduce complexity lightly, and when they walk into a new language/organization/etc are inclined to add random abstractions that they are used to from past experience.

And yes, I am not immune.

azeirah · on June 21, 2019

I wouldn't look at it that negatively. Programming is like speaking a language, I don't mean what programming language you "speak" (ie, c++, or java, or whatever), but how you're used to speaking in it.

Like real language, you probably prefer certain idioms, you have your own style and you know certain words better than others. This is similar to how we program, we know certain algorithms, yet are oblivious to others, we have our favorite abstractions and have our etc.

There are downsides to these habits for both spoken language and computer programming languages, no doubt, you can get stuck in a rabbit hole without seeing you're in one. In the end, a similar approach works for both, read and write widely if you want to be great at these, meet different groups, work on uncomfortable projects, you know the drill.

Just like how you expand your language and therefore your world; socially.

darkpuma · on June 21, 2019

> "After programming for 20 years, I've noticed that programmers are aware of the potential benefits of any abstraction that they have internalized, and blithely unaware of the costs."

'Lisp programmers know the value of everything and the cost of nothing.'

braythwayt · on June 21, 2019

Underrated comment!

Here’s the footnote:

“Perlisisms: EPIGRAMS IN PROGRAMMING by Alan J. Perlis”

http://www.cs.yale.edu/homes/perlis-alan/quotes.html

tempguy9999 · on June 21, 2019

Your criticism of abstractions is very much in the abstract.

What abstractions?

What costs of these? In human comprehension, in runtime, in reliability, in mem/cpu?

Can you give examples which I can usefully learn so as to avoid?

TIA

btilly · on June 21, 2019

Any and all abstractions. The cost is usually in all of the above.

Examples that I commonly encounter include OO, closures, dependency injection frameworks, complex configuration systems, various code generation systems, and on and on and on.

In general the tradeoff is this. For those who have internalized the abstraction, they can think about more complex things. Those who have not internalized the abstraction find it hard to figure out how the system works at all until they internalize it. So when you work on code that has a lot of abstractions under the hood it becomes either a black box (that occasionally you dig into) or (very often) a requirement that you understand X before you can even start to work on the code.

A rule of thumb that I use is how long the stack backtrace is. If every bug creates a stack backtrace that is dozens of frames, there are a lot of abstraction layers in place. And when all of the layers are actively being worked on, it adds up - quickly.

Deciding whether given abstractions are worthwhile for a given problem involves a judgment call. Unfortunately the people who are in the best position to make those calls tend to be the most senior, and tend to be the least aware of the costs of the abstractions that they add.

tempguy9999 · on June 21, 2019

A nice reply, upvoted.

I would note that if you start your list with OO then you're including abstractions that almost all professional programmers would consider not abstractions but basic tools. It's impossible to work without such abstractions, except by working in purely procedural code, and even then...

I would add that if "a requirement that you understand X before you can even start to work on the code" is the case then your abstraction has - arguably, I may be wrong! - failed. My DOM iterator abstraction would take some effort of understanding to maintain, which was my concern, but it was a very simple black box to use.

btilly · on June 21, 2019

A simple example of an abstraction for which this is not true is the MVC design. If you don't understand how it works, you don't know where to begin looking to make a change to the system.

And yes, some abstractions truly are basic tools. And the more experience you have, the more basic they seem, and the more such tools you have.

I am not arguing against abstraction per se. What I am arguing is that abstractions bring a cost, and that cost adds up. A project needs to find an appropriate balance, and there is a tendency for experienced programmers to draw that balance at a point that may or may not be appropriate for the people who have to maintain the system after them.

jrochkind1 · on June 24, 2019

Which is why what really matters in programming is "communities of shared abstractions."

This is part of what you get with a language, but it depends on the language. (We've seen Javascript users split into several such communities, I'd say).

Or it can be what you get with a framework -- one of the main values of Rails is that people who have learn it's abstractions can look at each other's code and understand it. This applies to third-party extensions to Rails that get especially popular too, like say Devise. (The downside is when those abstractions aren't good, and you want to use something else instead... now everybody finds your code difficult to understand).

When we talk about a language having a good stdlib, a lot of what we're talking about is providing a good set of abstractions that everyone will learn, making their code understandable as well as interoperable with other developers'. JS's lack of much of a stdlib may be not unrelated to JS schism into several communities of shared abstractions...

I don't think it's really about "minimizing abstractions", it's not even possible to do so -- it's about which abstractions how. The important point you make is that abstractions that are understood by a community of programmers, from which those who work on your code are likely to be from -- actually practically have less cost than abstractions that will be unfamiliar with them.

I rememeber when OO was super confusing to me...

tempguy9999 · on June 21, 2019

Replying to myself as I thought of an example from my own past which is relevant.

After discovering the eye-opening expressivity of functional programming years ago with Dylan and Haskell, I had a substantial project in javascript. I found JS supported first class closures and lambdas, which allowed me to make iterators.

The novelty was they were iterators not over lists but over trees (DOM trees in this case but of course any tree could be made iterable).

I understood there would be a learning cost for anyone who took over from me so I left plenty of docs and pointers. It would not be a small cost either, my successors would have to learn to think differently, at a higher level and possibly rather alien (to them) way.

But was it worth the cost, bloody hell yes! Not having that iterable and rather declarative abstraction over DOM trees would have greatly bloated the code and consequently brought in bugs by the bagful. I would do in a couple of lines what would have taken half a page to do, in many locations.

So it had a human cost but if you could overcome that, a seriously huge human benefit.

dalyons · on June 21, 2019

haha did your successors think it was worth the cost?

tempguy9999 · on June 22, 2019

What happened was interesting and unexpected. I was building on top of an application (call it H) which had the DOM trees to manipulate, and JS to manipulate it with. I was reporting new bugs in H continuously, 1 or 2 every day.

Because of the abstractions I'd done (it wasn't only tree iterators) I could largely work around the bugs invisibly - I pushed the bug-special-case-handling code down to make it invisible when using my abstractions.

That made it all too viable to continue using an crappy, flaky product far longer than would have been possible - or sensible - without those abstractions. I'd turned the abstractions' value into a liability!

I finally told them it wasn't worth continuing with H, they junked it AFAIK and I walked away.

zimpenfish · on June 22, 2019

More importantly, I suppose, did the company?

GorgeRonde · on June 21, 2019

The absence of abstraction being what exactly ?

Also beware of that kind of pseudo-wise thinking. Is there a point in climbing up a learning curve anymore ? Maybe you have a constant and massive stream of fresh bodies to throw at your "simple" solutions before laying them aside once those abstractions have clogged up their heads ... And maybe if you can unskilled workers at such a scale, it's because you have massive fundings as well...

More a question of financial optimization than software engineering I think

btilly · on June 21, 2019

Before dismissing it out of hand, take a look at the Go language. It was designed to make specific kinds of common abstractions hard exactly because, when working at scale, programmers routinely create disasters by layering abstractions in a way that nobody can understand the consequences of.

jkachmar · on June 22, 2019

This is exactly the kind of pseudo-wisdom that I read the GP as referring to, though.

In the case of Go, the core team saw the pain of indirection-masquerading-as-abstraction in complex Java/C++ codebases and considered the whole thing to be a boondoggle. As a result of this we’ve been saddled with a popular language in which two massive projects (gvisor and kubernetes) have had to hack their own expressivity into the language just to build complex software (i.e. codegen’d generics)

I worry about the cyclic nature of progress in our industry, where wonderful advancements can be made and then walked back or under-utilized because we aren’t patient enough to learn them thoroughly.

pjmlp · on June 22, 2019

Go is Java 1.0 all over again.

If it's enterprise adoption ever goes beyond Kubernetes and Docker, expect GoEE and Go Design Patterns to make their appearance.

Worse, since its plugin support is really cramped down, expect any enterprise grade CMS to be built on hundreds of processes.

This happens all the time with simple languages, tons of library boilerplate code.

einpoklum · on June 23, 2019

That means you have probably not been programming C++.

In C++, the cost of abstraction is a fundamental consideration, and while you can still ignore it - most resources about using C++, and gradually even the language itself, tend to nudge you towards avoiding abstraction costs in various ways (sometimes ugly, sometimes elegant). Plus, core language designers and compiler architects bend over backwards to reduce the cost of various abstractions to nothing or very little.

Plus, it happens that sometimes, you can use stronger abstractions to _reduce_ cost rather than increase it.

lacker · on June 21, 2019

The Algol, Smalltalk, Pascal, and early Scheme languages were prized for being small and beautiful.

Those languages are also not very popular at all today. Perhaps "small and beautiful" are not the right metrics to optimize programming languages for.

In many ways, keeping the core JavaScript language small has led to the JavaScript ecosystem being too large and sprawling. For example, look at module importing, something which is a fairly normal part of most programming languages. It still works in a completely different way in Node.js and in browsers. I think if there had been a standard way to handle JavaScript imports years ago, the practical experience of working in JavaScript would be simpler today.

braythwayt · on June 21, 2019

Programming languages can be categorized in a number of ways: imperative, applicative, logic-based, problem-oriented, etc. But they all seem to be either an “agglutination of features” or a “crystallization of style.” COBOL, PL/1, Ada, etc., belong to the first kind; LISP, APL– and Smalltalk–are the second kind. It is probably not an accident that the agglutinative languages all seem to have been instigated by committees, and the crystallization languages by a single person.

Alan Kay's "The Early History of Smalltalk:"

http://gagne.homedns.org/~tgagne/contrib/EarlyHistoryST.html

mhuffman · on June 21, 2019

> Those languages are also not very popular at all today. Perhaps "small and beautiful" are not the right metrics to optimize programming languages for.

Perhaps popularity isn't.

Smithalicious · on June 22, 2019

I think that popularity is probably indicative of other qualities of a language, and at the very least, I think being popular is a good feature in itself (in terms of getting support, libraries, documentation etc)

pjmlp · on June 22, 2019

Platforms are products that sell languages, not the other way around.

So being popular just means having the luck to be on a platform that is doing well.

Usually most programming languages fade way when that platform stops being relevant, as history has proven a couple of times.

F-0X · on June 21, 2019

I quite strongly disagree. My favourite languages, Scheme, C, Go, share precisely the "small and focused" philosophy, and its a big reason why I like them so much. Your point about javascript is not very strong either, I imagine the real reason is its origins as a browser-embedded language meant that the same style of imports in Node.js were not feasible. Even if this is not true, I would then shift the blame to the implementor of Node.

Scheme in particular deserves so much more attention, it is so capable and it is a joy to write.

pjmlp · on June 22, 2019

C appears small.

I doubt even veteran C developers know by heart its 200 UB documented use cases on ISO C, or the specificalities of any C compiler other than the one they daily use, and which features of it are actually part of ISO C.

Unless they happen to do ISO work all day long.

microtherion · on June 21, 2019

Pascal also cheated on size, by omitting some facilities that were almost indispensable for practical programs, leading pretty much all implementations do add them in incompatible ways.

pjmlp · on June 22, 2019

The first version yes, ISO Extended Pascal fixed that, however by then everyone was mostly focused on being compatible with Turbo Pascal.

cat199 · on June 21, 2019

> Those languages are also not very popular at all today.

But yet, C (arguably 'current algol') and Python (arguably 'current scheme') are..

yes: python is verry loosely like scheme, this is meant in the sense of a 'dynamic loosely typed language you can interact with in a repl'

astrobe_ · on June 21, 2019

Lua is a far better counter-example as a successful small and (conceptually) beautiful language.

cat199 · on June 22, 2019

yes, probably so. was going for 'popularity' here

darkpuma · on June 21, 2019

> "dynamic loosely typed"

Both Python and Scheme are dynamic but strongly typed.

cat199 · on June 22, 2019

i'm going to disagree here.

yes, they have strong core types, but there is nothing preventing you calling some function with completely invalid arguments..

even within the 'interpreted fp' world, there are better examples (e.g. ML family)

dodobirdlord · on June 22, 2019

I think you may be missing some terminology here. Strong typing is about whether entities have definite types and resist type coercion. Ad-hoc typing in dynamic languages can make this hard to see, but it's still there. What would prevent you from calling a function with invalid arguments would be static typing, which Python and Scheme do not have.

em-bee · on June 22, 2019

the significant difference between python and scheme here is that python is not small.

but then, scheme is heading in the same direction if you look at the latest standards development

klez · on June 22, 2019

Depends on what you mean by "latest". R6RS is arguably huge. That's why they decided to split R7RS in two, a small core which is done and a huge version which, ironically, is still in the making.

pjmlp · on June 22, 2019

R7RS is still in the making, not because it is bigger, rather due to politics that many don't agree with it ever happening.

The small core exists, because that was the only subset their were willing to agree on.

em-bee · on June 22, 2019

right, it may not happen, but at least some people want to add stuff to scheme that will make it larger, to eventually reach the point where it no longer can be called a small language.

i don't know if/when that point will be reached. my comment was hinting at the potential for scheme to become larger.

threatofrain · on June 21, 2019

> Once a language gets beyond a certain complexity — say LaTeX, Common Lisp, C++, PL/1, modern Java — the experience of programming in it is more like carving out a subset of features for one’s personal use out of what seems like an infinite sea of features, most of which we become resigned to never learning.

This is how I felt about Racket, and I felt I had to do an opinionated pruning before I could use it for teaching, but the problem is that the docs don’t have such a boundary for learners so you feel tempted to rewrite the docs.

gmfawcett · on June 21, 2019

Last I checked, Racket literally has sublanguages defined for teaching and learning: "Beginning Student", "Beginning Student with List Abbreviations", etc.

https://docs.racket-lang.org/htdp-langs/index.html

threatofrain · on June 21, 2019

I wouldn't call those sub-languages, I think they're rightly different languages (of eerie similarity) born from specific pedagogical or academic vision. So the upkeep in dealing with the mismatch between proper Racket and these student languages wasn't something I was interested in. Students should be able to consult general Racket resources.

neilv · on June 21, 2019

Exactly, I think that the Racket teaching languages were started to support teaching young/new students (and not necessarily people who are or will be CS majors) a particular way of approaching program design problems.

Racket itself arose from building a cross-platform toolset to support building those programming tools. It also became a toolset and testbed for PL research.

I appreciate what you're saying about Racket having a big library. I was recently talking about how to to teach Racket to experienced programmers. I'd put them down in front of a full `#lang racket/base`, but start with teaching them only a specific subset of R7RS, and incrementally introduce a few more concepts and exercises to try for a day or more (things you can practice while writing your own real code, even if it briefly seems harder than something we told you that you could use before). If instead I just dumped the full Guide and Reference on experienced people, they'd become productive pretty quickly, albeit colored heavily by what they previously knew, and they might have a much longer path to becoming strong in some unfamiliar fundamentals.

BTW, another educational thing that Rackets `#lang` does is support students working through SICP use DrRacket. It emulates the particular older Scheme variant used in SICP, gives an IDE maybe a bit easier for new students, and runs on the student's own modern computer.

sigzero · on June 21, 2019

Indeed. You can even MAKE your own language with Racket to suite teaching needs.

pgtan · on June 21, 2019

Or you can rediscover LOGO, which is LISP to the bones:

? show (map [[x y] [output se :x sum :y #]] [a b c] [10 20 30])

[[a 11] [b 22] [c 33]]

darkpuma · on June 21, 2019

The racket docs having such great cross-referencing with hyperlinks is probably a decent part of why it's been relatively so popular. Cross-referenced documentation invites deeper investigation and learning.

kerkeslager · on June 21, 2019

I feel like this is the real success of Python: the "one and preferably only one obvious way to do it" idea means that new ways of doing something are rarely added, and when new they are added, old ones are likely deprecated. Complain all you want about the 2 to 3 transition, but it's resulted in a simpler, easier-to-use language. If the community really feels a different way is better, they add it in libraries. My only criticism of Python in this respect is that they didn't take it far enough: Do we need higher-order functions AND loops? Do we need classes AND first-class closures?

Common Lisp isn't even close to the most complicated language out there. Every

opportune · on June 21, 2019

I don't think this is really true of python, often the "correct" (pythonic, theoretically best performance) way to do something involves using more complicated language constructs than most people are familiar with. For example, when to use list/dict comprehensions, when to use reduce functions, when to use generators. Most beginner programmers and people coming from C-inspired languages will do things the "obvious" yet incorrect way by doing a for-loop of appends.

m463 · on June 21, 2019

I think that python is in general more cohesive than other languages. I like that perl has both "if" and "unless", which is expressive, but it give multiple ways to do the same thing.

I also think pythonic and theoretically best performance might not need to correlate.

I would personally stick to for-loops for general but tricky code and leave things complications like nested comprehensions to places like the guts of libraries or classes that make the tradeoff to have simplified externals.

(for example argparse - very nice externals, tricky tricky guts)

a1369209993 · on June 21, 2019

> Complain all you want about the 2 to 3 transition

Okay.

> it's resulted in a simpler, easier-to-use language.

Does `print(len("ẅ"))`[0] still produce a value (2) that is neither the number of characters (1) nor the number of bytes (3)?

0: "print\x28len\x28\x22\x77\xCC\x88\x22\x29\x29"

kerkeslager · on June 22, 2019

It took me a bit to understand what you were trying to do. Here's a paste of my Python 3 shell session, showing that Python 3 does indeed return the number of characters.

    ~$ python3
    Python 3.7.3 (default, Mar 27 2019, 09:23:15) 
    [Clang 10.0.1 (clang-1001.0.46.3)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> len("ẅ")
    1

Python 3 uses UTF 32 internally, so the byte representation you placed below your post is not how Python 3 represents it. Instead, it looks like this:

    >>> "ẅ".encode('utf32')
    b'\xff\xfe\x00\x00\x85\x1e\x00\x00'

This has disadvantages (memory usage) but for most cases where Python is used, it's an advantage (faster random access, more intuitive for situations like the one you've proposed).

zephyrfalcon · on June 22, 2019

So, I see:

    Python 3.7.3 (default, Mar 27 2019, 09:23:32)
    [Clang 9.0.0 (clang-900.0.39.2)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> len("ẅ")
    2

I wonder why this is? Is the Clang version relevant here?

EDIT: Your "ẅ" doesn't seem to be the same as the OP's "ẅ", although they look the same at first glance.

    >>> "ẅ".encode('utf-8')
    b'w\xcc\x88'
    >>> "ẅ".encode('utf-8')
    b'\xe1\xba\x85'

EDIT 2. More info:

    >>> import unicodedata
    >>> w1 = "ẅ"
    >>> w2 = "ẅ"
    >>> unicodedata.name(w1)
    'LATIN SMALL LETTER W WITH DIAERESIS'
    >>> unicodedata.name(w2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: name() argument 1 must be a unicode character, not str
    >>> unicodedata.name(w2[0])
    'LATIN SMALL LETTER W'
    >>> unicodedata.name(w2[1])
    'COMBINING DIAERESIS'

So the second version (w2) does seem to consist of two separate "characters", LATIN SMALL LETTER W and COMBINING DIAERESIS, which is apparently not the same as the single-character LATIN SMALL LETTER W WITH DIAERESIS. I guess these are actually Unicode code points and not so much "characters" to a human reader, but as another poster pointed out, what the number of characters should be in a string isn't always clear-cut.

a1369209993 · on June 22, 2019

Correct, w2 is one character (latin small w with umlaut) represented by two unicode code points. I didn't realize there was a NFC code point for that character; try "\x66\xCC\x88" (f̈) or "\x77\xCC\xBB" (w̻) instead.

> the number of characters should be in a string isn't always clear-cut.

This is why I use examples from latin-with-diacritics, where there is no ambiguity in character segmentation.

kerkeslager · on June 23, 2019

Interesting, I learned a bit about Unicode here. It looks like copy/pasting combined the two code points into one when I ran my code.

Still, to the original point, I think this is more of a criticism of Unicode than of Python. It seems to me that the answer is to not use combining diacritics, and that Unicode shouldn't include those.

a1369209993 · on June 23, 2019

> this is more of a criticism of Unicode than of Python

True, although it's more specifically a criticism of Python for using Unicode, where these kinds of warts are pervasive. See also "\xC7\xB1" (U+01F1 "Ǳ") which is two bytes, one code point, and two characters with no correspondence to those bytes.

> the answer is to not use combining diacritics

This doesn't actually work, sadly, because you can't represent eg "f̈"[0] without some means of composing arbitrary base characters with arbitrary diacritics.

0: If unicode has a added a specific NFC code point for that particular character, then that's bad example but the general point still stands.

wool_gather · on June 22, 2019

Well spotted. It was probably normalized when it was copy-pasted.

ferbivore · on June 21, 2019

len() counts code points, not abstract characters. Your "ẅ" contains two of them, U+77 and U+0308.

zephyrfalcon · on June 21, 2019

Apparently yes. A quick inspection suggests that this is the same in Ruby, Haskell, SWI-Prolog, Gauche Scheme, SBCL and D. (I might not have the latest version of everything, so maybe this has been fixed in some of them... assuming it needs fixing. Maybe there's a reason for the answer to be 2 if so many language implementations insist on it. Or, they all use the same faulty algorithm. I don't know.)

a1369209993 · on June 22, 2019

> they all use the same faulty algorithm

Well, yes. To be fair, it's not like any of them make a secret of the fact that they're mistakenly counting unicode code points instead of characters.

Smithalicious · on June 22, 2019

Are they really doing so "mistakenly"? I feel like there's more to this.

dodobirdlord · on June 22, 2019

It's not mistakenly. Unicode's complexity is a bit more than trivial, and since much work has gone into abstracting over it many people are surprised when the complexity rears up at them.

Consider, for example, the wonderful piece of writing in the answer to this question.

https://stackoverflow.com/questions/1732348/regex-match-open...

How many characters do you suppose are in this string?

.

"TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚ N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ"

.

And what should Python tell you the length of this string is?

jrochkind1 · on June 24, 2019

Yes, what you said. It's not a mistake. It's a... useful abstraction.

Unicode is complicated in some ways because the domain it is dealing with (representing all possible human written communication, basically) is complicated. Unicode is pretty ingenious. It pays to invest in learning about it, rather than assuming your "naive" conclusions are what it "should" do (and unicode's standard docs are pretty readable).

Unicode does offer an algorithm for segmenting text into "grapheme clusters", specifically "user-perceived characters." https://unicode.org/reports/tr29/

It's worth reading that document when deciding what you think the "right" thing to do with "len()" is.

The "user-perceived character segmentation" algorithm is complicated, it has a performance cost... and it's implemented in terms of the lower-level codepoint abstraction.

Dealing with codepoints is the right thing for most platforms to do, as the basic API. Codepoints are the basic API into unicode.

It's true that they ideally ought to also give you access to TR29 character segmentation. And most don't. Cause it's hard and confusing and nobody's done it I guess. It would be nice.

If you want to know "well, howe come codepoints are the basic unicode abstraction/API? Why couldn't user-perceived characters be?" Then start reading other unicode docs too, and eventually you'll understand how we got here. (For starters, a "user-perceived character" can actually be locale-dependent, what's two characters in one language may be one in another).

a1369209993 · on June 25, 2019

> It's not a mistake. It's a... useful abstraction.

It is specifially a abstraction that is not useful.

> It's worth reading that document when deciding what you think the "right" thing to do with "len()" is.

Technically not - the right thing to do is return the number of characters[0] - but the character segmentation parts are worth reading when deciding how to decode UTF-8 bytes into characters in the first place, so the distinction is somewhat academic.

> a [character] can actually be locale-dependent, what's two characters in one language may be one in another

[citation needed]; ch, ij, dz, etc are not examples, but I'm admitted not exhaustively familiar with non-latin scripts[1], so I would be interested to see what other scripts do.

0: or bytes, but that's trivial

1: Which is why I hate Unicode; I'd prefer to pawn that work off on someone else and just import a library, but Unicode has ensured that all available libraries are always unusably broken.

joshuamorton · on June 27, 2019

> Technically not - the right thing to do is return the number of characters[0]

> 0: or bytes, but that's trivial

In what encoding? The utf-8, utf-32, and utf-16 encodings of the same string are different numbers of bytes.

a1369209993 · on June 27, 2019

Number of bytes would apply in cases - like the len() of a python3 bytes or python str object, or something like C's strlen function - where you're not operating on characters in the first place. It's trivial precisely because there is no encoding.

"\xC4\xAC" is two bytes regardless of whether you interpret it as [latin capital i + breve] or [hangul gyeoh] or [latin capital a + umlaut][not sign] ("Ĭ" / "곃" / "Ä¬").

a1369209993 · on June 22, 2019

23 if I'm counting correctly (there's a space in "PO NY" for some reason). I would also accept 209 from a language that elected not to deal with large amounts of complexity in string handling. The problem with Unicode is they go to enormous amounts of effort to deliberately give a wrong answer.

lerax · on June 22, 2019

[hyperbolic] Common Lisp deserves more love. People are just too much jealous to admit that CL is one of the best language of the world. How someone can think that JavaScript, PHP, Java and other stuff alike are at least non-miserable in comparison with something so powerful as Common Lisp? [/hyperbolic]

Keeping the humor-ish stuff aside, actually taking CL here as example it's someway ignorant. That's doesn't make any sense to me. Seems that all that opinion are based in nothing more that nothing when it concerns about Common Lisp literature experience and personal experience with the language.

Some languages are popular, but they being popular doesn't proofs that they are good languages. This is a fact based on the type of problems most of the people are trying to solve.

CL being big it's some way comparable to C++, but for big problems, sometimes we need complex tools. CL is not for kids, neither C++. Common Lisp it's a language for hackers.

nydel · on June 23, 2019

my neck aches from nods in agreement. i know i’ve written at leeaasst one #’TRAMADOL-SIMULATOR but i’ll likely just lambda the notion, slightly faster &or funner than grepping my filesystem for “\.l*sp$”.

nonbirithm · on June 21, 2019

I kind of wish there was a lisp with the "recompile on error and continue" feature of Common Lisp but without a massive standard library. A standalone SBCL program seems to be around 40MB at minimum. It feels like it would be doable if only CL wasn't designed with the kitchen sink included. Recently I've gotten into Janet[0] and really like the language, although I do miss CL's recompilation magic at times. It feels like a Lisp dialect with similar style to Lua: tables, coroutines, small language core, embeddable as a single C file, etc.

[0] https://www.janet-lang.org

_19qg · on June 21, 2019

SBCL's programs are 40MB because they contain all dev tools and the code is native code, which is also large.

There are a bunch of Common Lisp implementations which are smaller (CLISP), can create small applications (Lispworks), can be embedded (ECL) or can compile to smallish C code (mocl).

sansnomme · on June 21, 2019

Also, Common Lisp's standard library cannot compare to Python, Java etc. They have a lot of language manipulation tools but not actually anything that will help you ship quickly if you are building generic CRUD style applications. CL tends to shine when it comes to extremely domain specific niche stuff where you practically have to write your own tooling from scratch.

soiheard · on June 22, 2019

Roswell comes to mind for application building and distro management. Add in ningle for web app building and woo for the server (the benchmarks are really good), and you're basically good to go.

I think the main problem is there's too many implementations of the same thing in CL, so devs basically have to agree on a very specific subset of tools for every project, which adds a fair bit of overhead.

https://roswell.github.io/ https://github.com/fukamachi/woo https://github.com/fukamachi/ningle

p.s. Not the author, but a big fan.

sansnomme · on June 22, 2019

Last I tried it's not reliable. E.g. generating binaries through VM images is coin toss.

xenophonf · on June 21, 2019

Just clang on my system is around 80MB, so when you consider that an SBCL image includes a compiler, a linker, and a loader—and if you have Quicklisp installed, a build system and a package manager—40MB starts to look like a bargain!

I personally wonder if CL's "computer program as an image" instead of the Unix process-oriented architecture will see a comeback in this modern era of virtualization and rump kernels and containers. In theory a Docker image for an SBCL program would be just the Lisp image itself plus the small handful of standard POSIX libraries on which the Linux version of SBCL depends (e.g., libc, libm, libpthread, etc.)

FPGAhacker · on June 21, 2019

You might find Fennel interesting:

https://github.com/bakpakin/Fennel

It's a lua based lisp. Technomancy of clojure fame has taken a shine to it and become a large contributor.

mindB · on June 21, 2019

If it's the same project I'm thinking of, Janet is actually by the same developer as fennel. Janet was kind of his followup project to fennel.

FPGAhacker · on June 21, 2019

That's interesting, you are right, same person.

kazinator · on June 21, 2019

You might like TXR Lisp.

https://www.nongnu.org/txr

dmos62 · on June 21, 2019

> recompile on error and continue

I'm not familiar with this. Anyone care to explain?

Jtsummers · on June 21, 2019

https://malisper.me/debugging-lisp-part-1-recompilation/

That's a good example. When an error is signaled you're able to modify the program (via recompilation of functions, changing of values) and restart at a point you select. There are other things you can do, but this is a particularly powerful option.

aidenn0 · on June 22, 2019

SBCL is so large because it's not particularly optimized for size. A stripped libecl.so is a couple of megabytes.

e40 · on June 22, 2019

I just tested this in Allegro CL 10.1: with and w/o the compiler: 15MB and 11MB.

klipt · on June 21, 2019

Why not just use a smarter compiler/linker that trims out unused library code?

gmfawcett · on June 21, 2019

Some CL implementations do this ("tree shaking", as it is called). But some of the more popular open-source implementations (SBCL, Clozure CL) do not.

darkpuma · on June 21, 2019

It can also be error prone. Analysising arbitrary lisp code to see which functions could conceivably ever be called isn't as trivial as it might first seems.

dTal · on June 22, 2019

Can you elaborate? It seems trivial - just scan every function call. Surely it's not possible to call a function without typing its name somewhere? (Obviously if there's funny business with 'eval' this won't work, but then it reduces to the halting problem in the general case so that's fine. Most programs don't use eval.)

callinyouin · on June 21, 2019

I get that the article used Common Lisp to create the punny title and much of its contents are broadly about why its important to keep languages relatively small, but I'm sort of disappointed most of the language-specific talk was about JS. I'm in the middle of learning CL at the moment (I have a mostly Java, Python, C++ background, in that order of experience) and would be interested to hear from experienced Lispers on why some other Lisp would be a better choice at a language level (so not based on convenience, tooling, popularity, etc... okay, maybe tooling).

Mikeb85 · on June 21, 2019

Racket seems to be favoured individuals who are using it to create a DSL or accomplish very specific tasks, but CL is definitely the most used Lisp (well, maybe apart from Clojure) to create 'big' applications.

dreamcompiler · on June 21, 2019

This comparison is inappropriate. When Common Lisp was standardized, the goal was to unify several different Lisp dialects, not to make the language "small." Furthermore, there was no standard library for Common Lisp. The language itself is its standard library. If anything, the language by itself is too small for a lot of modern applications. (The situation is different now; there are hundreds of libraries to fill in the missing pieces of Common Lisp. But they're not standardized like the core language is.)

_19qg · on June 21, 2019

> the goal was to unify several different Lisp dialects

the goal was to unify several similar Lisp dialects as a Maclisp successor

dang · on June 21, 2019

Related from 2015: https://news.ycombinator.com/item?id=9738866

didibus · on June 22, 2019

As an aside, I recently tried Common Lisp, and it's actually pretty good, you'd be surprised.

mateuszf · on June 21, 2019

Isn't c++ a large language? And yet it's still used.

cjensen · on June 21, 2019

Sure. But there are a lot of areas where you just shrug and say "here be dragons" and move on.

For example, consider the new for loop syntax

  for (auto item: a().b().c()) ...

Does this make you nervous? It should. In the above, suppose a(), b(), and c() are all returning objects. Which of these objects remains valid until the end of the for loop? And this is a dragon in one of the most helpful parts of new C++.

mateuszf · on June 22, 2019

Yeah, I'm not a big fan of the language, I'm only saying it's used a lot. Seems that in this case there was not a lot of portable fast alternatives for system / device / high performance programming.

bluGill · on June 21, 2019

C++ is only slightly larger than any other language if you subtract the standard libraries that make a language useful. If you add standard libraries C++ is a tiny language.

C++ does have a lot of weird, inconsistent warts that make it tricky to learn/use everything. Which is why most people who advocate to use C++ talk about modern C++ which is C++ where you stay away from those warts.

the_why_of_y · on June 22, 2019

"C++ a tiny language" - just in case anybody is missing the obvious sarcasm, people who have done it say that implementing a C++ front end takes a decade of effort: https://news.ycombinator.com/item?id=17130870

bluGill · on June 24, 2019

Those two statements are not incompatible. C++ has a lot of tricky areas that are hard to get right. It is still a tiny language.

didibus · on June 22, 2019

I think the word Large is pretty ambiguous. In my opinion, it is more about incoherent, inconsistent, complected, and with no clear way to learn more about things you do not understand. A language which has everything I would need already available and ready to use, in a simple and straightforward way is pretty great otherwise.

qwerty456127 · on June 22, 2019

The same happens to CPU instruction sets. Old stuff should get deprecated once better features are added (letting you write more concise and efficient code) but you can't deprecate that as it would break too much legacy code people rely on.

carapace · on June 21, 2019

Awesome to see Mark S. Miller on the front page. One of my favorite computer nerd jokes: Ted Nelson points out that Mark Miller's name is his vocation. :D Get it? He's a symbol grinder, a mark miller. I love it.

amelius · on June 21, 2019

What exactly is the link with the tragedy of the commons?

https://en.wikipedia.org/wiki/Tragedy_of_the_commons

nerdponx · on June 21, 2019

I think it's just a fun name. But maybe it's allegorical in that everyone "took what they wanted from the commons" (got their preferred features implemented), leaving the whole thing a mess (bloated language).

btilly · on June 21, 2019

The word "Common" is a reference to the language Common Lisp: https://en.wikipedia.org/wiki/Common_Lisp.

No explicit link with the tragedy of the commons was implied. Implicitly, of course, it applies - there are many participants, each of which is aware of their own gain and unaware of how their actions cost everyone else. The result is that when they all get their changes in, the result is terrible for everyone.

For another example, C++ is a terrible language, but it contains many good ones.

amelius · on June 22, 2019

> No explicit link with the tragedy of the commons was implied.

Okay, but the article says:

> Adapted from a 2015 es-discuss thread. “Common Lisp” is not the topic. It serves only as one of many illustrative counter-examples.

So if "Common Lisp" is not the topic, and the tragedy of the commons is not the topic, then it seems the title is not very well chosen.

gnaritas · on June 21, 2019

Lisp isn't a large language, it's a small language with a large library.

nabla9 · on June 21, 2019

I think comp.lang.lisp had this discussion 10 years ago and the "core language" semantics of Common Lisp is something like 25-30 functions/operators. Rest of the language could be a separated into libraries.

But even with the all those 'libraries' COMMON-LISP package has just 978 external symbols.

kazinator · on June 22, 2019

In TXR Lisp, I seem to have nearly double that in the analogous public library package called usr:

  1> (len (keep-if [orf boundp mboundp fboundp] (package-symbols 'usr)))
  1713

That's just a one-man project coming up to ten years two months hence.

That doesn't count any structure types or their slot names, FFI types, and local macros involved in syntaxes like awk and such.

I would say that Common Lisp shows amazing restraint, given its scope and number of people involved.

ska · on June 21, 2019

Common lisp is large-ish at least.

pfdietz · on June 21, 2019

Common Lisp is small enough that one person could write a reasonably complete test suite for it in his spare time.

deckard1 · on June 21, 2019

I'm not sure too many people could write a test suite for LOOP in his spare time. Never mind the rest of the trickier parts of the language such as MOP, CLOS, packages, conditions, restarts, etc. etc.

pfdietz · on June 21, 2019

https://github.com/pfdietz/ansi-test

>.>

zimpenfish · on June 22, 2019

> in his spare time

I don't wanna be -that guy- but ... github lists 5 contributors on that repo. "his" is doing a lot of work in your claim there.

pfdietz · on June 22, 2019

I stopped working on that for a number of years, and others maintained it, but the original test suite was just mine. Look at the ILC 2005 paper in the doc/ directory there.

ska · on June 21, 2019

maybe medium? Compare to scheme... this is why I said large-ish, not large. There are lot's of corners to get into in CL.

momokoko · on June 21, 2019

> The Algol, Smalltalk, Pascal, and early Scheme languages were prized for being small and beautiful.

But they did not achieve widespread adoption.

Widely used languages need to apply to a wide range of use cases. For example the fat arrow(=>) class syntax that binds this.

    class Foo {
      bar = (baz) => {
        // do something
      }
    }

For entry level JavaScript developers working in React, this made one of the most common hangups go away for them due to the automatic binding of `this`. But for a some other situations, that might end up an anti-pattern.

That's always the issue with small, clean implementations. They work when they only apply to one domain. Once they need to be used for different things, they then need to add new features. It is no different for software, blenders, bicycles etc. Once something is used for multiple domains, it now needs to become larger and more complex. Its basically a rule of nature.

stcredzero · on June 21, 2019

But they did not achieve widespread adoption.

That's always the issue with small, clean implementations. They work when they only apply to one domain.

Sorry, but this is pure BS with regards to Smalltalk. I worked for a Smalltalk vendor for 5 years, then consulted in the language for several years after that. I had a front row seat to these issues. Having a small language does not make it suitable for only one domain. The primary reason for Smalltalk not achieving widespread adoption was due to a series of decisions which, in retrospect, were highly anti-adoption. It started with no operator precedence in the language design, which acted to alienate engineers and scientists. Then the industry went through a "we're the boutique language of the Fortune 500" phase with $5000 and $10000 per-seat licenses.

It is no different for software, blenders, bicycles etc. Once something is used for multiple domains, it now needs to become larger and more complex. Its basically a rule of nature.

That just doesn't work for software. The reason why Python is so popular with certain scientific communities isn't at all specific language features to support them. It's general utility plus specific libraries. The reason why Smalltalk got popular at one time in finance and energy trading didn't have anything to do with specific language features or libraries. It was purely rapid development!

There is something which a small language can be prone to: fragmentation. It was so easy to roll your own Smalltalk. This, combined with a toothless language standard meant that the community got balkanized into multiple communities with incompatible code.

dbcurtis · on June 21, 2019

> > The Algol, Smalltalk, Pascal, and early Scheme languages were prized for being small and beautiful.

> But they did not achieve widespread adoption.

> Widely used languages need to apply to a wide range of use cases.

A long time ago, an old-timer, even more old-time than me (and I did CS home works on punch cards in PL/I...) was a C fanatic, and was dragging his mainframe colleagues into Unix, said it best: "C does not get in your way."

He (and I) came from an era when eventually pretty much every language eventually "got in your way" and you had to learn the assembly language calling and linking conventions and implement a few nuts & bolts to get your job done. C was the first popular language that didn't eventually knee-cap you like that. That is why C won and Pascal was an intense but passing fad.

To me, that is the essential message for language designers, more important than "keeping it small". Stay out of the programmer's way. Of course, there are different ways of staying out of the way.... Python's "duck typing" is a way of keeping the type checker out of your way. Rust's "unsafe" is a way of keeping the borrow checker out of your way when you really, really, need to carve with your sharpest knife.

So I would argue that "applies to a wide variety of use cases" simplifies to: does not get in your way no matter what your application is.

As an aside: The old timer was a role model to me in one important aspect: Learning to recognize progress when you see it. He not only lived through the transition from core to solid state memory, he lived through the transition from transistor logic to integrated circuits. But he was evangelizing C and Unix when many his age thought the list of interesting programming languages had len()==3: FORTRAN, COBOL, assembly. I took that lesson to heart, and try very hard to answer the question: "Is this progress, or is this just different?".

nostrademons · on June 21, 2019

Go does this really, really well. I'm not a huge fan of Go-the-language (largely because I enjoy having a little more abstractive power), but I have to admit that one of its huge strengths is that it's a language designed to take the focus off the language and onto the program you're writing. It avoided several of the pitfalls of more "advanced" languages (eg. Common Lisp, Haskell, even Rust and Python 3.5+) in the process, which is perhaps why it's seen more mainstream adoption.

Kotlin's another language that looked at say Scala (which is an immensely powerful language, but easily lets you get lost in abstractions), decided "We don't need that feature unless it actively helps remove friction when writing programs", and ended up with a lot more real-world programs written in it.

sdegutis · on June 21, 2019

For what it's worth, that class-field syntax proposal never settled well with many JavaScript and React developers, and now that Hooks are out, a significant number of us have abandoned class syntax altogether in favor of functions with hooks. (There are other proposals that add to the class syntax that overall just don't seem to mesh well together and seem to create as many issues as they solve, like #private_variables, and the @decorators.)

protomyth · on June 21, 2019

Pascal was pretty popular. Turbo Pascal sold a lot of copies for good reasons.

NeedMoreTea · on June 21, 2019

Algol was huge - for the scope of its era, and the ancestor of many modern languages so its influence was outsize too. Where modern books that aren't language specific might express ideas and algorithms in C, most books used to use Algol. There was an entire generation of later languages that were called Algol-like: B, C, Pascal, Ada, Modula 2. Probably others.