Hacker News new | past | comments | ask | show | jobs | submit login
The Notation: Ken Iverson Centenary (kx.com)
84 points by 5jt on Dec 18, 2020 | hide | past | favorite | 31 comments



> ”K programs routinely outperform hand-coded C. This is theoretically impossible. K compiles into C. Every k program has a C equivalent that runs exactly as fast. Yet it is true in practice. Why? Because it is easier to see your error in four lines of code than in four hundred lines of C.”

How true is this? I hear this claim a lot but haven't seen any real benchmarks. E.g, https://benchmarksgame-team.pages.debian.net/benchmarksgame/... doesn't have K or any related languages.

I'm guessing K is fast due to autovectorization for certain cases, but are there benchmarks that provide hard numbers? The existence of a benchmark prohibition clause in the kdb license makes me skeptical of its performance claims.

https://tech.marksblogg.com/benchmarks.html has a kdb benchmark, but due to the use of Xeon Phis, it can't be compared with other benchmarks there.


He wrote solutions for Shootout (predecessor to that Debian page) problems a few years ago; a lack of k in benchmarks is not due to a lack of k being volunteered: http://www.kparc.com/z/comp.k

k is really fast. Half of the things on the Shakti mailing list are just Arthur getting really excited about how significantly he's beating x or y or z in performance and giving numbers for it. `grep`ping it now I see 40 in half a year that explicitly contain the word "benchmark," though not all of these are comparing to other things (some are just comparing to different k releases), and there are more comparisons without that word.

Arthur doesn't work at Kx anymore, by the way. He's at Shakti now. Shakti has a different (but still draconian/non-(A)GPL) license. It probably doesn't have the benchmark clause, but I don't care enough to check (I prefer J to k and don't have a proprietary k on my system).


> He wrote solutions for Shootout (predecessor to that Debian page) problems a few years ago; a lack of k in benchmarks is not due to a lack of k being volunteered

That's 6 of the programs, there were at-least 4 others ;-)

I lacked and still lack the knowledge to figure out if those snippets are doing what they should.

For example, do those scripts set arg n from the command line and read from stdin? Do those scripts write correctly formatted output to stdout?

What a pity that page does not show measurements for those scripts, and a comparison with some of the C programs written for the benchmarks game.


Thanks for that link. Do you know if the results are posted anywhere?

https://shakti.com/license.php says "Customer shall not... distribute ... any report regarding the performance of the Software benchmarks..." which I would need to agree to if I want to download the binaries at https://shakti.sh/benchmark/


You could check and see if it was ever posted on the Shootout site with archive.org, but when I checked coverage was spotty. The point of these benchmarks is everyone being able to run them themselves, though.


> was ever posted

No, K was never installed.

> The point of these benchmarks is everyone being able to run them themselves, though.

Exactly.

And compare with programs in some other language:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


Have you looked at any of the opensource k impls? (Kona, ok, klong, ngn/k...)


If you look at my post history, it's actually primarily free software APL implementations! ngn/k is amazing. John Earnest's work with his own variants is really fantastic. I dislike klong and Kona.


How has your experience been with J for on disk time series DBs.


Not developer, just a lone data wrangler but FWIW Jd has made me much faster at exploring data. I now use Jd to create my datasets and use R just for the models. IMO, it's far better than using R to tidy things.


> How true is this? I hear this claim a lot but haven't seen any real benchmarks ...

I think you're trying to read into the statement something other than Arthur meant; The last part of the quote is just as important as the first. Allow me to try and explain.

It's pretty easy for an experienced C programmer to beat K at some things, for example:

    int i,n;for(i=n=0;i<1000000;++i)n+=i;
is faster (with gcc -O2) than the "equivalent":

    +/!1000000
which is more literally:

    int N=1000000;*a=malloc(N*sizeof(int)),i;
    for(i=0;i<N;++i)a[i]=i;
    int n=a[0];for(i=1;i<N;++i)n+=a[i];
    free(a);
but even an experienced C programmer will experience some fatigue trying to convert a K program to C in this way, and the K implementation is certainly faster than a literal translation (largely because it doesn't use malloc, but yes also because of careful vectorisation and parallelising of many of the operators).

It's my experience this difference adds up faster than anything else, and that's why a 20kloc C++ implementation of a HIBP checker was beat 10x by a 5-line k/q solution:

https://news.ycombinator.com/item?id=22467866

But make no mistake, I don't use k because it's fast, but because it makes me fast: five lines of code ain't squat to get right over 20k lines, so I'd prefer the k solution even if it were "only" just as fast as someone's C++ solution.

It does help tremendously though, that it's usually much much faster.


I think you're missing the point. K certainly isn't faster than C. If you have a C guru, they should always be able to write faster code than K.

The difference is it should be a lot easier for someone like me (not a C guru) to write a K program that does something useful. I'll also be able to see the whole program as it will likely be a few lines long. If I wrote some C code, it would take me forever and take forever to debug. I see this as similar to any scripting language (Ex: Python).


Absolutely. The advantage is terseness of expression.

Less code means fewer bugs.


Arthur's new company, Shakti, has some of the benchmarks they've talked about[1][2].

[1] https://shakti.sh/

[2] https://shakti.sh/benchmark/about?eula=shakti.com/license


That shows kdb is faster than BigQuery, Spark & other similar systems. But it doesn't compare K to C.


What would be interesting, and perhaps informative, if is we could share our some of our favorite k programs and then people could try to write faster implementations in C.

I like benchmarks that I can easily run on any computer myself.

For me, a major part of k's appeal is the (original) author's appreciation for small interpeter size and language terseness. Generally, I think k will always beat C, and probably any scripting language you can think of, in that department.

The issue of verbosity is so polarising amongst the vocal programmers who comment online that even if k were faster than whatever language (and free, open source, etc.), I would bet many of them still would reject the idea of using it.


My guess is that K is faster in some situations because the nature of the language requires programmers to use the most direct solution to a problem, getting all the code and data in the cache. C being a verbose language makes this harder to achieve, unless you spend some time to optimize the memory layout of your data.


There are lies, there are damn lies, and there are benchmarks.

Also, consider how few comments there are in this thread. What does that usually mean on HN?


> "Iverson was more interested in how quickly a person could understand an algorithm"

Has there been any study of whether J is quicker or slower than APL for this? Subjectively, APL is pretty and enticing in a way that J isn't.

It's very strange to me that someone who promoted Iverson Notation as a better math notation, and wrote Notation as a Tool of Thought, could apparently completely switch notation from 30 years of established APL symbols to pairs of ASCII symbols, effectively overnight, and carry on as if nothing was any different.

How much of J's success is because J was free and the big APLs were a lot of money, so J has been "the free way to get an array language experience" for years?


It's also easier to do scripting with J since you can just use files. The (Dyalog) APL workspace infrastructure is very powerful but a bit heavy to use when all you want is a simple script. Of course, large banks and Fortune 500 companies seldom use just simple scripts, so that makes sense I guess.


Afaik the newest version of dyalog can run so. 'dyalog -script whatever.apl'


I think the notation is not the only difference in J. I believe that tacit programming is one of the improvements proposed by J, but I might be wrong.


Tacit programming was sure brought to the foreground by J, but function trains apparently precede it by a couple of years, and have since been implemented in Dyalog APL and NARS 2000.

The idea of functions having rank, and modifying rank, was also made central to J, but has been included in Dyalog APL more recently - as much as they can without breaking backward compatibility, I think.

But it's the "notation" change I'm struck by - the linked article says "discuss the future prospects of “the notation” in its various forms." as if APL and J and K are the same notation, when they casually, visually, aren't, and in the sense that J differs from APL of 1990 with rank and tacit functions and linguistic naming and behaviours (gerunds, conjunctions, et al), they aren't the same notation semantically either.


Compared to the various instantiations of Pidgin Algol, they're the same. (obligatory "next 700" reference...)


Iverson bracket notation is named after him: https://en.wikipedia.org/wiki/Iverson_bracket


> Glory days. Long before spreadsheets, IBM managers wrote their budgets in APL, and lucrative timesharing services supported users around the world.

Imagine a world where spreadsheets grew from being a living, easily accessible version of everything APL offered (which has been true for about two decades) into everything Q (and kdb+) supports including serving web content and cross-platform database queries.


How does using APL/J/K compare to using Julia, Numpy or Matlab? The fact that in the latter you still have the safety blanket of procedural programming makes it seem easier to learn and more productive.


Control flow in APLs is mostly imperative, and with explicit code, /i.e./ code that mentions variables, you're effectively writing procedural code. I'd say that APLs are harder to get started with, but easier to use right. With things such as Numpy, you need to know lots of things about the API that might change without warning. With APL, you learn the primitives and you're off to the races. The downside is that less popularity means far fewer library efforts.


Dyalog APL has a safety blanket of procedural programming[1], it's another strange aspect of APL and the internet that it seems almost completely overlooked - unremarked upon and never discussed. e.g. this is roughly a prompt-for-username and validate-username code:

    ∇result←verifyUserName userName

     isValid←1
     message←''

     :If (userName≡'')∨(8<≢userName)
         message←⊂'Please enter a name between 1 and 8 characters long'
         isValid←0
     :ElseIf ' ' ∊ userName
         message←⊂'Please enter a name with no spaces'
         isValid←0
     :EndIf
    ∇result←isValid message



    ∇userName←getUserName
  
      nl←⎕UCS 13 10   ⍝ carriage return, newline
  
      :Repeat
          ⍞←'Enter a username: ',nl
          newName←⍞       ⍝ Read from keyboard input
      
          nameIsValid message←verifyUserName newName
      
          :If 0≡nameIsValid
              ⍞←message
          :EndIf
      
      :Until 1≡nameIsValid
    ∇userName←newName

It's unexciting compared to the dense showoffy codegolf APL which is normally posted on the internet, and what you still don't get with it is things like `string.Length` or `or` or `Console.WriteLine()` etc. That is, procedural/imperative APL looks like APL and doesn't save you from having to know APL symbols or execution model. But it could save you from having to wrangle some weird mathematical dodge around a simple :If/:Else branch or trying to use ⍣ when you really just want to write :For i :In 2 4 6 8 10

It has an OOP/Class system as well, but that doesn't save you from having to know APL either.

[1] I don't think NARS2000 does, and I'm not sure about the bigger IBM / Sharp APLs. I don't think J or K have.


Very interesting article.


I found J in 2011 or early 2012. I fell in love with it due to the terse, expressive notation set against pages of typical programming code from other languages. I have since moved to Dyalog APL. Roger Hui of APL/J fame, now contributes to Dyalog APL, so a lot of the novel ideas in J have made their way back into APL (Dyalog and NARS).

I read my first book on neural networks in 1988, and I got the matrix math and the implementation of them, but this year I was able to really grasp them in a more basic way with an implementation of Convolutional Neural Network in APL in only 10 functions/lines of code [1]. Amazing! What's even more surprising is that they manually translated it to SAC (Single Assignment C), and it is faster than TensorFlow. The interpreter is not bad either - 20x less time to init than TF, but 20x and 5x slower to train and test respectively. Compilers for APL are being worked on to make it work without the manual translation. To me the similarity to the math formulas, being able to view and work through the code in front of me in one view, is priceless. I also enjoy it, and I believe (no evidence here) that it is exercising my mind on the problem more directly than winding my mind around pages of Python/C or other PLs. Certainly a lot of the original ideas of APL and current successes of things like Pandas and NumPy owe a lot to the array languages in the APL family.

There's an example of an Extreme Learning Machine in J, but I don't have the link at the moment. I go back and forth with J and APL, and I am currently learning Rust. Somebody coded APL in Rust, but it has not been fleshed out. I find myself attacking problems in J/APL on my desktop, and sometimes that's it, I don't require another solution, or if I do I recreate it in C/Python or now Rust.

Aaron Hsu does amazing work in APL. He is/was a former Schemer [2].

A taste of J or APL with an average function implementation:

J: avg=: +/%#

APL: avg←+/÷≢

They both map/apply the '+' operator over a vector/array, then divide the sum by the tally or count of terms given.

Great tribute to Ken Iverson!

10 line CNN implemented in APL (the stencil operator ⌺is great! It allows you to move a window over your array):

blog←{⍺×⍵×1-⍵}

backbias←{+/,⍵}

logistic←{÷1+-⍵}

maxpos←{(,⍵)⍳⌈/,⍵}

backavgpool←{2⌿2/⍵÷4}⍤2

meansqerr←{÷∘2+/,(⍺-⍵)2}

avgpool←{÷∘4{+/,⍵}⌺(22⍴2)⍤2⊢⍵}

conv←{s←1+(⍴⍵)-⍴⍺⋄⊃+/,⍺×(⍳⍴⍺){s↑⍺↓⍵} ̈⊂⍵}

backin←{(dwin)←⍵⋄⊃+/,w{(⍴in)↑(-⍵+⍴d)↑⍺×d} ̈⍳⍴w}

multiconv←{(awsbs)←⍵⋄bs{⍺+⍵conva}⍤(0,(⍴⍴a))⊢ws}

[1] https://dl.acm.org/doi/10.1145/3315454.3329960 (PDF link is on page)

[2] https://aplwiki.com/wiki/Aaron_Hsu




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: