Hacker News new | past | comments | ask | show | jobs | submit login

> So far, the only programming language I know of that supports this type of crazy data transformations is JAI,

As far as I know JAI is indeed the only language that explicitly lets you switch between AoS and SoA with one bit, but the APL family of languages - (APL, J, K, Shakti, and a couple more) has basically - for 60 years no - taken the "data oriented" SoA approach for storage, and provides the language support that makes it as easy to use as the "object oriented approach" AoS. (For some definition of "as easy as" - the languages themselves are generally not considered easy to use, but within the language, treating things in either way is straightforward, at most a "flip" away, but usually not even that is needed).

Additionally, Nim macros (and I suspect Rust and D as well) allow this to be a library-level thing as well. Lisp does too, of course -- but Nim/Rust/D are much closer to the Algol family-and-friends list given in the article.




I also think that it's a shame he didn't bring up NumPy & Pandas, or R. He's just created a data frame and then complained that there's no functions to sort it, but we do have those. They are here.

    ants = pd.DataFrame({
      "name": ["bob", "alice", "carol"],
      "color":["red", "blue", "red"], 
      "age":[1.1, 0.5, 1.2], 
      "warrior":[True, False, True]})
    # Or read_csv to get the data in.

    # Count number of warriors.  True => 1, False => 0
    ants['warrior'].sum() # Returns 2

    # Count old red ants
    ((ants.color == "red") & (ants.age > 1.0)).sum() # 2 again

    ants.sort_values(by="age")
I think Pandas and NumPy should be up there as making the language easy to use in a SoA way.


Yeap. I think Pandas could and should see a lot more use outside data science as performant in-memory data storage in Python.

The best thing is? Most data scientists don't even care about SoA vs AoS - tabular structure is easy to grok, easy to use AND performant by default!


By coincidence, there is a new Nim blog post demonstrating exactly that. With the same ant example. https://nim-lang.org/blog/2021/05/01/this-month-with-nim.htm...


ISPC also has first class support for "(array of) structure of arrays", see: https://ispc.github.io/ispc.html#structure-of-array-types

For example:

  soa<8> Point pts[...];
declares an array of struct of arrays, where each inner struct array contains 8 elements. This would have the advantage of playing well with the streaming prefetcher if you're working with all fields of the `Point` type, and allowing the compiler to use and increment a single pointer for loading/storing, accessing each field with a small compile time offset (that can get folded into addressing calculations, making them essentially free).

Julia's StructArrays package is also very convenient: https://github.com/JuliaArrays/StructArrays.jl



> Additionally, Nim macros (and I suspect Rust and D as well)

Aye somebody posted their SoA library on /r/rust just a few days back (https://crates.io/crates/soa_derive), and I don’t think it was the first such.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: