Hacker News new | past | comments | ask | show | jobs | submit login
Lua Doesn't Suck – Strange Loop 2010 20-minute talk video (kylecordes.com)
90 points by kylecordes on Oct 17, 2010 | hide | past | favorite | 59 comments



I haven't seen the video [do they discuss this?], but the biggest problem I have with Lua is its lack of good unicode support. The Lua wiki suggests slnunicode and ICU4Lua, but neither library has been updated for a year.

I'd love to use Lua instead of Python for some tasks, given that LuaJIT kicks ass. But poor support for something as important as unicode is a deal breaker.

Am I just missing some information in this regard?


> Am I just missing some information in this regard?

A major, and non-negotiable, design goal of Lua is that it targets the ANSI C environment to maximise portability. ANSI C does not define Unicode support.

Incidentally, those two libraries might be unchanged because they're stable. The rate of change in the main lua implementation is quite measured.


More specifically, Unicode is one of those things (like bignum support) where Lua's design for embedding assumes that if it's a priority, your project will already have settled on a library for it, so anything Lua came bundled with would just get in the way. The GMP library, for example, is almost four times larger than the whole Lua runtime.

Besides, Lua strings are interned raw byte-arrays (like atoms in Erlang or Lisp atoms), and can have \0s and other arbitrary binary data. While the standard string library is ignorant of Unicode, there's nothing preventing extra libraries from providing Unicode-aware string operations.

Lua also left out native regular expression support, because it would be larger than the rest of the standard libraries put together. You can just load LPEG if it matters.


I wouldn't have a problem with external unicode support if there was one "standard" recommended library that had a strong community behind it. My whole point is that that doesn't appear to exist.


Sure. This is a real issue for new Lua users, though in the long run it's not a showstopper.

For roughly half the people using Lua, library choices have already been decided by the big C++ (or whatever) project they're adding Lua to, so the question never arises in the first place. That's the primary use case for Lua. It's a really nice standalone language, too, but it was designed for embedding. Many of its strengths (such as the small, orthogonal, clean design) come from that focus, but so do some quirks. If you use Lua as a standalone language, you'll occasionally need to do some digging for libraries, particularly if you're not comfortable with C. The situation has been improving significantly over the last year or two, though.

I don't have a specific Unicode library recommendation, but would suggest checking the mailing list archives (http://lua-users.org/lists/lua-l/).


(I'm the guy in the video.)

I haven't had a need for Unicode in Lua, but when I stumbled across some information about it, the essential story was that there is nothing stopping you from writing code to manipulate Unicode data (obviously), but there is also nothing built in to help you. All the built in string stuff is ANSI/ASCII/whatever.

However, the language Io (which often gets mentioned in the same breath as Lua, though it's quite different in some ways) has a stronger Unicode story: http://www.iolanguage.com/scm/io/docs/IoGuide.html#Unicode


Are you affiliated with Lambda Lounge in STL?


Yes! I know the guy who started Lambda Lounge (Alex Miller http://tech.puredanger.com/about ) and heartily supported its founding; I spoke at it every once in a while; and my firm (Oasis Digital http://oasisdigital.com/ ) sponsors Lambda Lounge by paying for travel expenses for out-of-town speakers.

Lambda Lounge is the best software dev user group ever.


I'm Alex Miller, the founder of Lambda Lounge. If you want to drop me a line email me at

contact at puredanger.com


The Lua philosophy is far different from the philosophy of other languages. That's why videos like these, that cover the advantages before the tradeoffs, are a good thing. Knowing the positive things about something really effects my perceptions of the tradeoffs.

Also, Unicode is an odd thing to consider to be an essential part of a programming language. Usually when I hear it I think the person must think it's a part of accepting those that speak other languages. I would point out the fact that Ruby took much longer to get Unicode support than Python as a counterpoint to that, as Ruby's creator, Matz, is Japanese.


As a Finnish software dev I can say that in these days native support for Unicode is a must. We have a couple of special characters in the alphabet (åäö) and if you have to do a lot of manual work to use these, the programming language is pretty much unusable for real world stuff that involves any use of Finnish.

The fact that Ruby took so long to get real Unicode support is due to Japanese resisting Unicode in favor of their own encodings, EUC-JP and Shift JIS, just like Finnish used to cling to ISO-8859-1(5) for so long.


Is Unicode something that needs to be in the core of a language, or is it sufficient to leave it to libraries, if the language's design doesn't prevent it?

On this computer (OpenBSD/i386), icu has over 1 MB of libraries and a 15 MB data file. The whole Lua distribution fits in one 200k library. Bloating the core language with that seems impractical.


In my opinion, it needs to be in the core.

Imagine you need to go through a library every time your string includes or might include the letter "s" or "v". Basically you'd need to use this library for all your strings. But then you lose compatibility with 'normal' string type and need to be constantly aware of the difference. You might want to use some other library that doesn't support this Unicode library at all, etc. It quickly becomes a very painful world to live in.

As you might imagine, just having support for Unicode baked in the language is very nice. Defaulting to Unicode for all text is even better.

I can understand why Lua in particular doesn't come with Unicode support out of the box, being so small. My comment was written in response to the more general claim that Unicode support is a strange thing to consider essential in a programming language.

(According to http://www.bckelk.ukfsn.org/words/etaoin.html, s/ä and v/ö are comparable in frequency.)


I think we have slightly different ideas about the language core vs. library distinction. In C, for example, printf and strlen are library functions (stdio.h and string.h), while structs are part of the core language.

All the language core needs for Unicode is reasonable support for tagging string literals (i.e., U"blah") and a binary-safe string type. It's best if there's either a standard or de facto community standard library for doing Unicode string ops, but it doesn't need core support anymore than the Linux kernel needs to know about parsing HTTP.


Is Unicode something that needs to be in the core of a language, or is it sufficient to leave it to libraries, if the language's design doesn't prevent it?

At the very least, there should be a first-class type that maps 1-to-1 with a Unicode CodePoint. Then there should be easy ways to do common operations on strings in terms of CodePoints. (Like string comparison, substring matching, concatenation.) Furthermore, the encodings should be handled in a transparent way.

If the goal is to keep Lua to a 200k core, then there should be a mechanism to add such functionality as if it's built in.

EDIT: "transparent" meaning, it looks like core functionality.


> If the goal is to keep Lua to a 200k core, then there should be a mechanism to add such functionality as if it's built in.

There is. See "metatables" - the behavior of tables (Lua dicts) and userdata (handles to C pointers, or raw C pointers) is intentionally left minimal but extensible, so that new first-class ("transparent") types can be added.

For example, Lua doesn't have full regexp support* , but there's LPEG (http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html), a library that adds a PEG-based matching/parsing engine (which is a superset of REs). It's no less usable for being in a library rather than the core.

* Though what it does have (http://www.lua.org/manual/5.1/manual.html#5.4) if often good enough - the main thing missing is groups, e.g. "a+(ab|bc)?d".


The characters you listed are defined in latin-1. They really don't serve as a good example for the urgent need to use unicode.


They are not in ASCII, and that is the problem. You wouldn't believe how many times I've tripped over Python 2.x's UnicodeDecodeError for example.

Moreover, I wasn't searching for the best example -- merely using something I have personal experience of.


> Also, Unicode is an odd thing to consider to be an essential part of a programming language.

It really depends on what the language is used for. Web-facing applications are, in general, going to received unicode input, so for them it matters. But for an embedded programmer, for example, it might be utterly irrelevant.

Horses for courses, really. It just so happens that Lua has other properties that are very attractive to web apps too, which the presentation outlined.

-- and yes, still alive and commenting :P


but ruby supported different encodings from the start, such as shift_jis, utf8 and euc_jp, it's just that the support was not "good enough". The reason why it took so long to have an improved one is also to be searched in the will of core ruby devs to support something better than only unicode.


slnunicode hasn't changed or been updated, and has no forum posts or bugs since 2006 [http://luaforge.net/projects/sln/]

ICU4Lua hasn't been touched since April 2009, and is at 0.2B. [http://luaforge.net/projects/icu-lua/]

That sort of status just doesn't inspire confidence in a language. I understand you can use it for game programming, or that you can write your own libraries for it, but that sort of thing doesn't encourage adoption.


ICU4Lua is a wrapper for an existing, stable library. Does (for example) the Python wrapper for sqlite get a lot of commits these days? And if not, does that make it dead and abandoned?


Why is it in beta? Why does the mailing list [http://lists.luaforge.net/pipermail/icu-lua-users/] have no activity on it?


Unless something is a major standalone project, discussion usually happens on the main Lua mailing list instead. About the beta label - I don't know. gmail was in beta for a long time, too, though.


> I haven't seen the video [do they discuss this?], but the biggest problem I have with Lua is its lack of good unicode support.

Hear hear. I too don't care much for the argument that Lua doesn't have unicode because C doesn't have unicode. For two reasons. First of all, Lua can be straightforwardly embedded into a host of other languages (for example, http://tinyurl.com/39qhxzw). Second, who in their right mind thought that a language designed for embedding into commercial applications shouldn't have good I18N? (And honestly, how in the world could someone from Brazil develop a programming language that doesn't even support their own spoken language? It's as if Ruby was developed by someone in Japan.)

A bit of history. From what I have gathered in its documentation, Lua steals quite unabashedly from NewtonScript. NewtonScript was a proto-style OO language developed in a hurry for the Newton when the Dylan language wasn't going to deliver. Like Lua, it's a language designed originally for an interpreter, which runs embedded in an outer C++ environment and must interoperate with it. But NewtonScript doesn't just use Unicode pervasively: it was the first major language to do so. And C++ interoperability with the language is just fine. So this Lua-needs-to-work-with-C-and-thus-can't-do-unicode thing is nonsense both in fact and precedent.

And while we're on the subject of NewtonScript: Lua's let's-almost-do-proto-style-OO-but-require-the-user-to-do-extra-work is really incredibly annoying. Lua should have had proto OO built into the language, like NewtonScript, and not just "available", in a hacked way, through its meta model. Lua wants to be more "general" than NewtonScript, but it just winds up being (IMO) rather less usable.


Lua is as old as NewtonScript. Its authors have mentioned Scheme, SNOBOL, awk, bibtex, Icon, and (IIRC) Self as influences, but I don't recall them mentioning NewtonScript. Javascript also has a lot in common with Lua. I think it's because the trade-offs inherent in embedded scripting languages are making them converge on a similar overall language design, rather than plagiarism.

If you're putting Lua in a commercial application, the commercial application itself will provide the i18n, and Lua can use it with very little trouble. Lua's strings are raw byte arrays - you can load arbitrary binary data in them. A library that reads UTF-8 strings (say) can work with them just fine. (And while I don't speak Portuguese, there are examples in PiL that use it without problems.)

I disagree with you about whether metatables are annoying, but it's a matter of taste. I haven't ever used NewtonScript, but I do use metatables for quite a bit more than just prototype OO - it's easy to turn a table into a proxy + cache to a function, for example. Also, my Lua redis library (http://github.com/silentbicycle/sidereal) turns table reads and writes into syntactic sugar for redis db key, list, and set operations.


I'll have to look up the specific direct references I saw that the Lua authors made to NewtonScript which prompted this: though note that the sole reference to Self in Programming in Lua is made in the same breath as NewtonScript. At any rate, I very strongly disagree with you about the 18n. Let's say you're making a video game. Let's call it, oh, I dunno, how about "World of Warcraft". You've decided to craft much of the level design in Lua so you don't have to write it in C++. Now you want to port it to the Mongolian market, complete with Mongolian storyline, instructions, character dialogue, you name it. All this stuff was in Lua strings in English. If you had a decent 18N system in your scripting language you'd just type Mongolian in those strings instead. This is a real problem.

> I disagree with you about whether metatables are annoying, but it's a matter of taste.

I didn't say metatables are annoying in and of themselves. I said they're annoying as a hacked-together substitute for a true proto OO.

Lua has many good things. But 18N and a usable OO ain't among them.


Lua "strings" are raw byte arrays with a saved length. You can store JPEG data in Lua strings. Storing Mongolian is not a big deal. You'll need a lib to e.g. calculate UTF-8 lengths, but all that needs to happen is the Lua community (or a project's company) agreeing on a specific Unicode library. There are no technical limitations there.


> Storing Mongolian is not a big deal.

...

> You'll need a lib to e.g. calculate UTF-8 lengths,

...

> There are no technical limitations there.

You cannot be seriously making this argument. Why not just code everything in assembly?


I understand it's annoying that Lua doesn't have a a single, officially recommended Unicode library* (in the core distribution or otherwise), but as problems go, it's easily solved by loading an existing, freely available library and getting on with life. It's much less trouble than (say) removing the GIL from Python or retroactively fixing weird operator precendence in C.

* Though there are recommendations at http://lua-users.org/wiki/LuaUnicode .

In practice, it could look like this:

   U = require "unicode"
   s = U"some unicode"      -- using a library sure is hard
   length = s:len()
   replaced = s:gsub("thing1", "thing2") -- global substitute
   etc.
You can add regular expressions and bignums to Lua pretty painlessly, too.


Agreed; only, the fact that no one has done it conclusively so far says to me that the Lua community isn't/hasn't been thinking about these sorts of things, and makes me worry about the other holes I will encounter if I do delve into the language. I'd rather invest my time and energy into learning how to use Cython to speed up Python, or Clojure to make Java more fun.


http://luaforge.net/projects/ul-str/

Last updated June 21st 2010. Readme seems to indicate it supports all the important stuff like compare, concatenate, encode, lower<->upper, Regexp, url escaping.

Says beta and I haven't tried it yet tho : /


Thanks, that's useful to hear. It isn't on the wiki page: http://lua-users.org/wiki/LuaUnicode (which hasn't been touched since Sep 2009).


http://luaforge.net luaforge seems to still be active and doing a pretty good job aggregating libraries. Check out the "luarocks" and "kepler" project pages for ideas. Or just grab some mysql bindings and start from scratch. Let HNers know if you build a lua blog\cms etc... it's been on my todo list but probably won't get to it until December.


For a time I was seriously considering using Lua for building web applications but after playing with it for a four days I decided the community just wasn't ready and I wasn't prepared to put in the work myself. A lot of the libraries were incomplete/being changed (lots of bitrot) and very few unit tests. As a language though Lua is pretty cool, a cleaner, smaller, faster, JavaScript.


Full version of "Programming in Lua", available for free on Google Books: http://books.google.com/books?id=ZV5hXZ8QPKIC


That book is a little dated because it deals with Lua 5.0 but is still good. The differences between 5.0 and 5.1 can be found here http://www.lua.org/manual/5.1/manual.html#7 and new version of the book can be found on amazon.


It will still give you a good taste of the language, but PiL 2nd ed. and the reference manual will be much more useful in the long run.


Also available on its official site: http://www.lua.org/pil/


I'd say lua is great, well good, it's just that it's entirely reasonable, which makes it kind of boring ;) It as been the least surprising language I've used.


There's deep stuff there (look into coroutines and metatables), but Lua has been carefully designed so that it doesn't get in the way. If all you want from Lua is a (JSON-like) format for data dumps or config files, you can safely ignore the rest of the language. It's like the opposite of C++ that way. :)


That is exactly how I introduced Lua in to one of our projects: as a very - little code way to handling a complex configuration situation. It grew in to full scriptability.


Very unscientific but hopefully interesting:

    $ time php -r 'echo "Hello, world!";' # smaller php binaries can be compiled
    real 0m0.555s
    $ time perl -e 'print "Hello, world!";'
    real 0m0.385s
    $ time ruby -e 'puts "Hello, world!"' # mri
    real 0m0.073s
    $ time python -c 'print("Hello, world!")'
    real 0m0.067s
    $ time lua -e 'print "Hello, world!"'
    real 0m0.013s
    $ time js -e 'print("Hello, world!")' # spidermonkey
    real 0m0.010s
    $ time awk 'BEGIN { print("Hello, world!") }'
    real 0m0.004s


You're right about one thing. That is very unscientific. You're mainly testing your file system cache, not these programming language implementations.

E.g. perl(1) is faster than ruby(1) at printing "Hello, world!". But since you presumably had ruby hot in cache and not perl the latter seems to be almost 4 times as slow.

Here's a better benchmark, which runs each of these 500 times and takes the average: http://gist.github.com/630868

Which yields these results:

               Rate clojure   php emacs python  ruby   js perl  awk  lua shell     C
    clojure 0.844/s      --  -97%  -97%   -98%  -99% -99% -99% -99% -99%  -99% -100%
    php      24.6/s   2812%    --   -2%   -43%  -75% -76% -78% -82% -82%  -84%  -89%
    emacs    25.2/s   2882%    2%    --   -41%  -74% -75% -78% -81% -82%  -84%  -88%
    python   43.0/s   4995%   75%   71%     --  -56% -58% -62% -68% -69%  -72%  -80%
    ruby     96.7/s  11361%  294%  284%   125%    --  -5% -15% -28% -29%  -38%  -55%
    js        101/s  11919%  313%  303%   136%    5%   -- -11% -24% -26%  -35%  -53%
    perl      114/s  13397%  364%  353%   165%   18%  12%   -- -15% -17%  -27%  -47%
    awk       134/s  15743%  444%  431%   211%   38%  32%  17%   --  -2%  -14%  -38%
    lua       137/s  16134%  458%  444%   219%   42%  35%  20%   2%   --  -12%  -36%
    shell     156/s  18359%  534%  519%   262%   61%  54%  37%  17%  14%    --  -28%
    C         216/s  25440%  777%  756%   401%  123% 113%  89%  61%  57%   38%    --
Update: Added a C program and ran the shell program in a sub-shell (since perl's system function preloads a shell). Didn't add silentbicycle's SWI Prolog and OCaml since he didn't provide the source.


What we're essentially measuring here is runtime startup and (in some cases) byte-compiling.

           Rate python swipl ruby perl  lua luac bash ocaml  awk subshell ocamlopt    c py_pyc shell
    python   46.3/s     --  -17% -56% -70% -76% -78% -78%  -79% -83%     -87%     -88% -88%   -90%  -92%
    swipl    55.9/s    21%    -- -46% -63% -70% -73% -73%  -74% -80%     -84%     -85% -86%   -88%  -90%
    ruby      104/s   125%   86%   -- -31% -45% -49% -51%  -52% -62%     -71%     -72% -74%   -79%  -82%
    perl      152/s   229%  172%  46%   -- -20% -26% -28%  -30% -44%     -57%     -60% -61%   -69%  -74%
    lua       189/s   309%  239%  82%  25%   --  -8% -10%  -13% -31%     -47%     -50% -52%   -61%  -67%
    luac      206/s   345%  268%  98%  35%   9%   --  -2%   -6% -25%     -42%     -46% -48%   -58%  -64%
    bash      211/s   356%  277% 103%  39%  11%   3%   --   -3% -23%     -41%     -44% -46%   -57%  -63%
    ocaml     218/s   372%  290% 110%  44%  15%   6%   3%    -- -20%     -39%     -42% -45%   -55%  -62%
    awk       273/s   491%  389% 162%  80%  44%  33%  30%   25%   --     -23%     -28% -31%   -44%  -52%
    subshell  357/s   672%  539% 243% 135%  89%  74%  69%   64%  31%       --      -6%  -9%   -26%  -38%
    ocamlopt  379/s   719%  577% 264% 149% 100%  84%  80%   73%  39%       6%       --  -4%   -22%  -34%
    c         394/s   751%  604% 278% 159% 108%  91%  87%   80%  44%      10%       4%   --   -19%  -31%
    py_pyc    485/s   950%  768% 366% 219% 156% 136% 130%  122%  78%      36%      28%  23%     --  -16%
    shell     575/s  1143%  928% 452% 278% 203% 179% 172%  163% 110%      61%      52%  46%    18%    --
I added hello world programs for C, SWI Prolog, and OCaml (byte and native compilers). Timings on OpenBSD/amd64.

Edit: Since you can precompile Lua, I added that as 'luac'. It's not usually done, since Lua compiles VERY quickly, and the source is more portable. It's a data point, though. I also added a .pyc for Python - Python byte-compiles more slowly. There's little difference between lua and luac, but the difference between python and py_pyc is huge.

I also added shell with a new subshell, both sh and bash. And, source, as requested.

OCaml:

    let _ = print_string "Hello world.\n"
Compile with "ocamlc -o ochello foo.ml" for bytecode, "ocamlopt -o ochello.opt foo.ml" for native.

SWI Prolog:

    swipl => sub { system qq[swipl -g "write('Hello world.\n')." -t "halt."] },
luac:

    print "Hello, world."
And then compile with "luac hellow.lua", run as "lua luac.out".


Also interesting is how much memory is allocated running the empty program (detected with Valgrind):

    Lua 5.1.4: 34k allocated, all memory freed.
    Perl 5.10: 195k allocated (140k leaked!)
    Ruby 1.8.7: 670k allocated (665kb still reachable at exit)
    OpenJDK 6: 1.5M allocated (1M in use at exit).
    Python 2.6.5: 3M allocated (1.2MB in use at exit)


You forgot my favorite:

  $ time clojure -e '(println "Hello, world!")'
  real	0m0.781s
Yeah, I know it's the JVM but still.


the most senceless benchmark ever.


Great talk, Kyle. Also, the video turned out remarkably well given the camera it came from.


Thanks. Here is the exact camera I used:

http://www.usa.canon.com/cusa/support/consumer/digital_camer...

For the sake of any readers here who don't read the blog post: I do NOT recommend this camera for video. I happened to have it sitting around for family snapshots and chose to experiment with it here.


I've sat down to learn Lua a number of times because on the I really like the idea of it (small, fast, safe, etc). It really has quite a beautiful minimalism to it.

But each time I try I get to the part where I learn that array indices start at 1 and I get annoyed. Have we not learned anything since BASIC? Sigh.


I found that adopting the native idioms meant never having to notice the 1- based indexing. Everything is a hash table, and if you find yourself using literal numeric keys, there's probably a nicer way to do it.

The lua users wiki is full of interesting ideas for tackling the basics in ways that leverage lua's simple flexibility.


Actually, Lua's table object is split into two parts. The array part, which is accessed using numerical indices, and the dictionary. It's done like that to enable fast array access, and I do find it an incredible useful feature in Lua.

In my experience 1-based indexing is not a problem when you're just writing Lua code. I've sometimes been frustrated when creating Lua bindings to C-modules which involved some kind of array access though.


That is entirely possible. I can tell if someone doesn't know perl or python because they write for loops like a C coder instead of looping over containers. So I can believe that you don't come in contact with that detail very often.

It just bugs me that such an obvious wart exists in an otherwise very clean language.


The reason the arrays index from 1 is that Lua is designed to pare down to a data description & configuration language for non-programmers, and they felt that starting arrays from zero would be confusing. I don't like it it either, but in practice it's a minor issue.

People who write off Lua because of indexing from 1, Python because of the significant whitespace, Lisp for its pares, etc. probably haven't gotten to the really interesting stuff yet.


Where "obvious" = something only someone with a foreign viewpoint thinks about.

("foreign" with regards to programming language.)


What do you object to in counting from 1? (Other than it's not traditional.)

In C, it's an offset into a typed block of memory, so 0 indexing makes sense. But lua does not deal directly with memory.


It has nothing to do with indexing memory--it has to do with 1 based arrays causing "+1" and "-1" to be sprinkled all oever the place. I find Dykstra's argument to be compelling: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EW...


My experience with Lua hasn't seen this problem materialize.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: