Hacker News new | past | comments | ask | show | jobs | submit login

One-based indexing is also used in Fortran, which seems to be used in a great deal of numerical computing even today. Additionally, BLAS/LAPACK is an important linear algebra library written in Fortan.

I am somewhat confused by your discussion of startup times. Since Julia is a "programming language for technical computing", what scenario are you imagining where startup times would be a significant concern?




Not just FORTRAN, but R and MATLAB also use one-based indexing. It's also, at least historically, the convention for matrix notation. IMO do whatever attracts more users, as that is what Julia needs most. Without a large community moving from MATLAB, R, and other languages, Julia will never take off.


Erlang also.

In most functional languages using list indices are an anti-pattern. Pattern matching and generalized iteration is a much more elegant way to handle most things you would use an index for.


Well, sort of. Erlang has the array module, which does actually index at 0, intentionally to feel like an array from another language.

The primitive collection types index at 1, but, as you said, are almost never indexed that way. I'm not sure the motivation as to why, but the fact it feels clunky to use them that way is a benefit, as it raises resistance when you're using them wrong (as indexing them almost always is).


No less than Dykstra has weighed in on the numbering of arrays.

http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

Having worked with both, I'm inclined to prefer 0 based addressing in most cases. It's slightly less intuitive, but it generally leads to cleaner code.


That argument never felt very convincing to me. Perhaps because it really is about aesthetics, not some deep philosophical ideas connected to numbers.

I see it as rather simple. If you're listing off-sets, starting at 0 makes sense, because there'll be something there. But if you're counting elements starting at 1 makes more sense. I don't have 9 fingers offset from my first one, which I call finger zero; I have ten fingers, starting with the first one.

A set with no members is the zero set, once you add one element, you have a set of one. An empty list contains no elements. It does not have a first member, because it's empty. Add an element, and it will have one member, and it will be in the first position. Not in the zeroeth offset from the beginning.

You can the numeric names of letters of a string in an array, but I don't see that as a compelling reason for naming the first letter of a word, the letter which is at the zero-eth offset from where the word is held in memory. At best, it's a low level optimization, at worst it's just a leaky abstraction. Maybe you want to store the length of the word there (like in Pascal), for a different trade-off in terms of what is optimized.


Indexing is not for counting elements, it's to reach a particular one.


I still find it more useful to name it my tenth finger, rather than the ninth from the first. And I'm fine with my middle finger, finger number three - being the one that sits between the pairs of fingers made up of finger number one and two, and fingers four and five. I don't really see how it's any better to count up from the zeroeth offset of the first finger to the second offset to get to the middle finger.


No less than Dykstra spouted a half hearted list of some properties and then made a proclamation based entirely on aesthetics.


no less than dykstra was convicted of grand theft auto. https://en.wikipedia.org/wiki/Lenny_Dykstra

I think you mean Dijkstra, who did say that, because of an argument with mathematicians about indexing from 1.


In Dutch, ij and ÿ are interchangeable.


My Dutch professor in college actually wrote out the "ij", but the tail of the "i" connected with the "j", making it appear to be "ÿ". It kind of blew my mind the first time I saw him write it out haha


In cursive, it is often indistinguishable too.


In print, NO. You really can't interchange the two. As already said, in handwriting they may look similar, however we would never mean the actual ÿ, since it simply is something else which is used in Greek and French (and others).

There's even a section about this on wikipedia: https://en.m.wikipedia.org/wiki/IJ_(digraph) subsection "technical details".


At the very least it happens in Flanders. I've seen ijs ("ice") capitalized in cursive as Ys[0] on an ice truck, for example.

Edit: from Wikipedia[1]: "It used to be common, in particular when writing in capitals, to write Y instead of IJ."

So it's an obsolete practice...

----

0. something along those lines: http://alphabetprintables.org/alphabet_printables_cursive/up...

1. https://en.wikipedia.org/wiki/IJ_%28digraph%29


I don't buy the Dykstra argument,1 base is more intuitive for me and for most non programmers, I would guess. Funny though how Americans use 1 based indexing for building floor numbers, but Europeans use zero based i.e. ground floor is zero in a European lift, 1 in a US lift.


> what scenario are you imagining where startup times would be a significant concern?

Starting a REPL or running a computation that doesn't take long time, adjusting parameters, re-running. Startup time may not be too big in absolute numbers but it's noticeable and adds up quickly especially if you start using more packages rather than toy programs that I used as an example.


Ok that puts some bounds on it; we are talking 1/10ths of seconds as significant? Or 1/2 seconds? There is work on this (https://github.com/JuliaLang/IJulia.jl/issues/346) and later versions of Julia 0.4 feel pretty snappy to me compared to R (which is the other tool I generally use nowadays)


Sorry if I was being vague, but what situation would you re-run a short computation by executing the whole program again instead of using subroutines? Are we talking about a data scientist performing initial, exploratory analysis on a very small subset of data?


> Are we talking about a data scientist performing initial, exploratory analysis on a very small subset of data?

Yes, that's one example. Also when debugging one usually uses small data sets. There are plenty of cases where runtime is short.

I think the problem is that Julia is somewhat vague on how it should be used. If it stated explicitly that it is intended to be used in MATLAB-like fashion with one long-running instance that would save people from trying to use it as Python or other dynamic language.


In that case, it should seem like a fix is as easy as keeping an IJulia kernel running in the background all day.


I agree. I'd estimate I restart the REPL about three or four times a day while developing. My analysis runs are minutes or longer, so a few seconds spinning up Julia+packages just doesn't ammount to much.


Fortran was meant to be a high-level language at the time, so chose 1-based indexing. When C and lower-level languages came on the scene they reverted it back to zero-based, since this is more appropriate at the lower hardware level. CBLAS is zero-based.

Intel's Math Kernel Library, which is a performant math library hand-tuned for Intel processors is zero-based in MKL-CBLAS.

It all depends on what you are familiar with, and staying consistent in use. I just use zero-based indexing in J and C and my numerical low-level work.


It is all a matter of what the compiler does, array accesses should generate the same Assembly instructions.

I have used quite a few languages whose base index could be 0, 1 or whatever I choosed. Even enumerations.



And the Fortran-convention BLAS API is quite a bit more widely used than the CBLAS API. idamax in the Fortran API, and pivot vectors in LAPACK factorizations, return 1-based indices.


Fortran has the rather unusual feature that you can change the indexing to an arbitrary offset. You can declare arrays to be indexed from 0 if you want, but it defaults to 1 and almost nobody uses this feature for obvious reasons.


I index from -N:N so that 0 is in the centre of my computational boxes.


Yeah, something like this is the intended use, but it isn't used much at least in the Fortran I've seen. Mostly because it's really confusing for someone reading your code, as once the array is declared there's no indication what the index range is.


BASIC did as well IIRC.


Pascal as well I think. I dunno which was the first to support it, but it's not exactly a popular thing in modern programming.


Pascal even let you index with non-numeric types, so you could have an array indexed by the characters a to z, for example.


It was already there in Algol-60.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: