Hacker News new | past | comments | ask | show | jobs | submit login

Has anyone compared this with Incanter, another lisp-based (clojure) stats environment?

http://incanter.org/

Both look like activity has died out recently... This is a shame, I was thinking recently about this issue. It seems that there is a real need for a stats environment based around a straight-forward general purpose programming language that can be easily parallelized and can easily access GPU functions. Although GPU functionality is still pretty narrow, it's growing, and linear algebra seems to have a place in statistical computing.

I have not yet seen anything that fits this bill, but Incanter is the closest. Clojure has the Calx library for OpenCL. Last I asked around, nobody had put the peanut butter in the chocolate just yet.




My Kaggle partner was trying out Incanter and gave up due to performance reasons. Apparently, when Incanter loads a data set, it loads every cell as a hash as opposed to loading rows or columns as vectors. Correct me if I'm wrong - he would love to use it.


LUSH is still fairly active. It moves a little slowly, but this is good since it's stable and a lot more mature than Clojure. The authors are still using it for real projects, so it's likely to stay around.


I am curious about why you think numpy/scipy does not fit that bill ? Python is a straightforward general purpose programming language, has a lot of stats functionalities, can access GPU functions. While python's story for parallel programming is pretty poor, it is actually quite decent for numerical computation, because parallelizing those is not very difficult in general.


I suppose it's that last part. I tried using some of the ipython parallel stuff, but found it to be pretty clumsy (although this was about 2y ago). Is there a better solution for parallelization in python?


I have never used the parallel stuff in ipython, but my understanding was that it was more for cluster-kind of stuff. This is getting significantly rewritten, also, so things are changing there.

But the way most people use parallel stuff in numpy is: first use BLAS/LAPACK which is multithreaded, then use multiprocessing, etc... It depends on what you are trying to achieve, but my understanding is that as soon as you use arrays and the liked, clojure is not that helpful compared to a much more primitive runtime as python, because all the parallel goodies of clojure are not available anymore. But I have never seriously used clojure, so I may be dead wrong.


Thanks, I'll try that approach.


Don't hesitate to ask on the ML (while I know numpy fairly well inside out, I am not the best person to talk about parallelism). There is also a lot of talks available in the scipy conferences video. See http://conference.scipy.org/scipy2010/schedule.html for 2010.


incanter is somewhat slow. I still use it prefferably to R since I already know clojure, but don't use R enough to be comfortable with it. I certainly wouldn't use it to do real-time machine vision stuff, which is one of the use-cases for Lush.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: