The linked document only contains a warning about how versioning is weird, and a...

FjordWarden · 2025-02-10T14:32:14 1739197934

I've only played around with k and APL in my spare time so I can't speak to real world problems. It is a ridiculously powerful query language, where in SQL you have only started writing `SELECT ...`, in k you are already done. But you need to have very good tacit knowledge of algorithms and the weird syntax to be productive, like oh I need to calculate an integral-image of this time-series, but that just a pre-scan over addition, boom and you are done. The theory of array programming with a focus in combinators is also an interesting perspective on functional programming. IMHO not something you should write full program in, but that hasn't stopped some from trying.

bee_rider · 2025-02-10T19:34:42 1739216082

This was a helpful comment. After the article, the question that popped into my head was… so ok should I try and compare this to like BLAS or something like Jax?

But, this sort of language is more about writing and reading from the disk efficiently, right? I guess SIMD type optimizations would be less of a thing.

FjordWarden · 2025-02-10T21:28:30 1739222910

I think that array languages have historically used memory mapped files for IO, and treat them like a big data frame, but other versions also support streaming IO. Its up to the implementers of the runtime to use SIMD instructions if they deem this optimal but not something you would use yourself.

rak1507 · 2025-02-11T03:41:14 1739245274

Personally I think the best comparison would be Python+Pandas/polars+... or R+tidyverse+..., the key thing being there's less need for the "..." in a language with good table manipulation etc built in.

Pet_Ant · 2025-02-10T19:44:52 1739216692

I feel like measuring things in characters is not meaningful, but only in tokens. Replacing "SELECT" with "SEL" would not improve SQL in the slightest.

Thorrez · 2025-02-10T14:57:01 1739199421

A one-liner in k tends to be equivalent to a much larger program in another language.

Here's a program in k. I'm not sure exactly what it does. I think it might be a json encoder/decoder:

https://github.com/KxSystems/kdb/blob/master/e/json.k

saghm · 2025-02-10T16:58:28 1739206708

It says a lot that the name of the file for is more informative about what the code does than the entirety of the file itself. "Readability is a property of the reader" indeed, but also the writer...

bregma · 2025-02-10T15:48:25 1739202505

Dialup modems on a bad connection used to generate more readable code.

cubefox · 2025-02-10T15:33:57 1739201637

It appears you accidentally linked to log where someone fell on his keyboard.

andai · 2025-02-10T16:28:44 1739204924

I think Whitney's greatest achievement isn't even any of his languages—though they are very impressive—but that he convinced banks to pay him millions of dollars to write IOCCC style code!

poulpy123 · 2025-02-10T15:12:31 1739200351

The problem solved by K is the long-term employment of people writing K. You can't be fired if you're the only one understanding more or less the codebase

dboreham · 2025-02-10T16:44:24 1739205864

This is true about more software development than you realize.

bear8642 · 2025-02-10T13:56:21 1739195781

K is a fast vector language, used (primarily) for time series data analysis.

>What does a K program look like?

You might want to check out https://news.ycombinator.com/item?id=40335921

beagle3 and geocar both have various comments you might want to search for.

mananaysiempre · 2025-02-10T14:18:47 1739197127

> a fast vector language

With an Oracle-style DeWitt clause[1] prohibiting public benchmarks.

[1] https://mlochbaum.github.io/BQN/implementation/kclaims.html

rustc · 2025-02-10T16:18:07 1739204287

Shakti (the latest K implementation by the author of K) claims [1] to load a 50gb csv in 1.6 seconds which according to them takes 265 seconds with Polars. Has anyone independently verified these claims? Is Polars really leaving 2 orders of magnitude performance on the table?

[1]: https://shakti.com/ -> Compare -> h2o.k

orlp · 2025-02-10T19:06:15 1739214375

Disclaimer: I work for Polars inc.

As a sanity check I just cloned https://github.com/h2oai/db-benchmark, ran the data generation script and ran on a 64 core AMD EPYC (AWS c7a.16xlarge):

    import polars as pl
    lf = pl.scan_csv("G1_1e9_1e2_0_0.csv")
    print(lf.select(pl.col.v1.sum()).collect())

The above script ran in 7.58 seconds.

If I change the collect() to collect(new_streaming=True) to use the new streaming engine I've been working on, it runs in 6.90 seconds.

I can't realistically time the full "read CSV to memory" with this 50 GB file on this machine as we start swapping (this machine has 128GiB memory) and/or evicting data from disk cache (this machine has a slow EC2 SSD attached to it), so we do have a blow-up of memory usage (which could be as simple as loading small integers into an 8-byte Uint64 column). I think it's likely that on K's machine the "read full CSV to memory" approach also started swapping, giving the large runtime. However, in Polars you'd typically write your query using LazyFrames, which means we don't actually have to load the full CSV into memory.

EDIT: running on a m7a.16xlarge with twice the memory (256GiB) once the CSV file is in disk cache Polars can parse the full CSV file into an in-memory dataframe in 7.68 seconds.

K's claim that it parses the full 50GB CSV in 1.6 seconds if true is very impressive regardless.

mananaysiempre · 2025-02-10T20:16:31 1739218591

Honestly 7 seconds even just to parse the CSV is already pretty impressive, 7GB/s would be simdjson speeds if you did it on a single core. Do you have a single-threaded parser with really well-tuned SIMD, or a speculative parallel one, or ..?

orlp · 2025-02-10T20:19:05 1739218745

We have a single-threaded chunker that scans serially over the file. This chunker exclusively finds unquoted newlines (using SIMD) to find clean parallelization boundaries, it doesn't do any further parsing. Those parallelization boundaries are then used to feed worker threads chunks of data to properly parse into our in-memory representation (which mostly follows Arrow).

LegionMammal978 · 2025-02-10T21:15:29 1739222129

Would you know how much of the total runtime is devoted to the initial chunking process? Amdahl's law would prefer an entirely speculative approach in the limit, but I could imagine that the 2x overhead might not be worth it for reasonable file sizes and core counts.

(But even then, 1.6 s would be quite a feat. It makes me wonder if the K implementation is partially lazy, as you say typical Polars usage is.)

orlp · 2025-02-10T22:11:21 1739225481

It seems from a profile that on the eager engine the serial scanner is able to feed ~32 threads worth of decoding: https://share.firefox.dev/4hS1eJa.

It might be worth speculating, or at least optimizing the serial chunker more. You could theoretically start a second serial chunker from the end working backwards but that would not be wise with our ordered streams, as the decoded data would have to be buffered for a long time.

Similarly on the new streaming engine, each thread is active ~half of the time, except the thread running the chunking task: https://share.firefox.dev/3WQV9og.

Note that in a lot of realistic workloads on the streaming engine compute can happen in between decodes, completely hiding the bottleneck. Also all of the above is with the file being completely in file cache, if fed from a slow SSD it's not a bottleneck whatsoever.

mlochbaum · 2025-02-10T23:12:07 1739229127

Seems easy enough to use a parallel scan if you're willing to accept a little work inefficiency, right? Assign each scanner thread a block, first each one counts/xors how many quotes are in its block, exclusive scan on those (last thread's result is unused), and you have the quoting state at the start of each block. And hopefully that block's still in the core's cache.

Or since newlines in strings should be rare, maybe it works to save the index of every newline and tag it with the parity of preceding quotes in the block. Then you get the true parity once each thread's finished its block and filter with that, which is faster than going back over the block unless there were tons of newlines.

orlp · 2025-02-10T23:50:19 1739231419

Yes, I did already propose (at the office) a parity-agnostic chunker (we only need the number of lines + a splitpoint from the chunker) that can do parallel work and only needs a small moment of synchronization to find out which of the two parities it is to lock in a final answer. There would still be a global serial dependency, but on blocks rather than on bytes.

But we only have a finite amount of time and tons and tons of work, so no one has gotten around to it yet. At least now we know that it might be worthwhile for >= ~32 core machines. PRs welcome :)

mlochbaum · 2025-02-11T00:06:39 1739232399

All right, just threw me off a little that you'd consider speculating or backwards decoding as I wouldn't expect them to be easier, or significantly faster (or maybe you consider parity-independence to be speculation? I can see it).

orlp · 2025-02-11T01:00:55 1739235655

Yes, I meant parity-independence with speculation. Essentially you assume either you are or are not within a string at the start and do your computation based on that assumption, then throw away the result with the unsound assumption. Both assumptions can share most of their computation I believe, so I can understand one might see it from the other perspective where you'd start with calling it parity-independence rather than speculation with shared computation.

dzaima · 2025-02-11T01:04:50 1739235890

There might also be the option of just optimistically assuming that, for points in a file with a sequence of like >4K bytes of proper newlines with proper comma counts in each, that here probably isn't in the middle of a multiline string, and parsing it as such (of course with proper fallback if this turns out false; but you'll at least know that this whole run is in the middle of a multiline string).

Also, if you encounter a double-quote character anywhere with a comma on one side and neither a newline, double-quote nor comma on the other, you immediately know 100% whether it starts or ends a string.

bear8642 · 2025-02-10T17:17:00 1739207820

> [1]: https://shakti.com/ -> Compare -> h2o.k

You can link to the subsections: https://shakti.com/compare/h2o.k

vessenes · 2025-02-11T00:41:13 1739234473

Some snark in here, I'll try and give it a fair shake. Whitney's site mentions '300 spartans' as the rough number of people using k, although it's probably more than that.

Two reasons k folks like k: first, if you believe that programmer working memory, as in the number of chars or lines of code you personally can hold in your head is limited, then it might make sense to be as terse as possible -- this will significantly increase the range of things you can reason about.

Second, if such a language were to focus more on array and vector-level manipulation, then for certain sorts of math tasks, you might be pretty close to grad student nirvana -- programming looks like using a chalkboard to work out a strategy for some processing, and then straightforwardly translating this strategy without mucking around with all the 100s of lines of weird shit say python or java make you do to process something in bulk and in parallel.

On top of this, whitney is a mad genius, and his k interpreters tend to be SCREAMING fast, and, like a couple of hundred kilobytes compiled. Over time the language has built connections to large-scale data processing jobs (as in, you run a microsend-or-shorter-timeframe strategy based on realtime depth data from 500 different stocks, say), and it has benefitted from the path dependence you get there.

Anyway back to the top - it exists as both a rallying cry for and a great tool for a certain sort of engineer that wants to make millions of dollars and refer to him/herself as a "Spartan" of coders.

reedf1 · 2025-02-10T13:57:27 1739195847

K solves the problem of bank account for two groups of people, kX Systems and quants.

sz4kerto · 2025-02-10T13:54:07 1739195647

Absolutely not being sarcastic: one problem it solves is that it is very hard to read as a beginner, so it can be intimidating (although it becomes much easier to read a bit later). This, coupled with the general arrogance of k/q practitioners (again, not really saying this in a negative way) and that k, kdb, etc. deliberately doesn't give you guardrails makes people who write k/q seem a bit 'mythical' and make them feel very clever.

So I think k, q and kdb are fun to work with, but one of the major components of its success is that it allowed a community (in finance) to evolve that can earn 50-150% more than their peer groups who do the same work in Java or C++. 10 years ago a kx course cost $1500 per person per day.

pjmlp · 2025-02-10T13:58:49 1739195929

To note that those are typical prices for enterprise level certifications, including some products that some Java or C++ devs might need to interact with, when working on those kind of environments.

cppandjava · 2025-02-11T03:04:44 1739243084

Hmm. I work in finance writing C++ and Java and I doubt other people in finance make 50-150% more than me because they know `q`.

gitonthescene · 2025-02-12T01:40:13 1739324413

I don't know. If you're writing Java you may not be working on the same types of problems.

BoiledCabbage · 2025-02-11T17:13:13 1739293993

It's the first page of a 5 page post/book. Make sure to check out the other 4 pages linked at the footer.

swiftcoder · 2025-02-10T13:57:01 1739195821

This is kind of the problem with every introductory text to an APL-family language.

I get the idea that one either already knows one needs an array programming language, or doesn't grok why anyone would need one

skruger · 2025-02-11T08:01:04 1739260864

Yeah—true. I wrote it as “the missing manual” for ngn/k, enough to get someone over the initial hump. It’s not a “Mastering k” tome.