Hacker News new | past | comments | ask | show | jobs | submit | tlmr's comments login

What about for just a smallish single machine corpus of document (1000)?


The size of the corpus isn't the issue (apart from processing time of course).

The key issue in estimating how big a job it is is how complex your entity extraction and inference rulesets are.


Check out continuum's blaze project for python. Provides table and array abstractions for a variety of backends, including SQL solutions.


Of modules or entire programs?


I don't know exactly the current status or roadmap (if there is one), but the idea is to have both eventually. Packages and files will be cached automatically so that they load much more quickly, and it will be possible (eventually) to output standalone executables.


How would you compare this to using python with an easy JIT like numba?


Whenever I have tried to use Numba, I find it is slower than standard python. OTOH, Julia just works. Gives me the same performance as C.


I actually both in terms of speed, and in terms of writing code they are remarkably similar approaches.

numba forces you into certain ways of writing code to achieve significant speed ups, but usually that's not too restrictive.

If you are used to writing numerical python code, you'll feel write at home in Julia - especially with the amazing PyCall module.


I haven't used numba or benchmarked it, so I'm not really qualified to comment. I do however vastly prefer the MATLAB-like syntax of julia to that of python.


Is this going to hamper it from being the next python for both general and data science programming? Do you think I should invest in learning julia, or continue with python?


I often say this, but I think if you have doubts about whether you should learn Julia or Python, then you should certainly learn Python.


What do you think of the CMU course vs this text book in question?


The CMU course is very traditional. It covers basic exploratory data analysis (summary statistics, plotting data), basic probability, and hypothesis testing and estimation. There's no programming, nothing Bayesian, and only brief discussion of regression.

(I taught 36-201, the intro stats course that was used to build the OLI course, this summer.)

Statistical Inference, on the other hand, seems to take a Bayesian perspective and is very much not your standard intro stats class. It looks interesting and I'll have to skim through some of it.


What about Julia?


Yeah, what about Julia?

which makes as much as sense as asking

- What about R? - What about GNU DAP? - What about FORTRAN?

for writing a TCP server.


How does the upcoming and even current numba+blaze power duo not allow library designers to easily write fast code?


Well that was the point of my original message. I am using numbapro and easily getting to c-speed with R-like vectorized convenience right now. It's why I question the "speed" argument as a non-argument when compared with Python. And I haven't even started using the cuda approach... You allude to another point though: Python is not standing still. Python and its environment is a mighty high mountain for Julia to climb if it's not going to move the game forward significantly so that its big ecosystem disadvantage is compensated. Julia cannot just do incremental improvement - it doesn't have enough momentum to make that a winning strategy. It needs to leapfrog to take on Python.

I should add one more point though. The post mentions Matlab 15 times and Python only 7. It's possible this whole Julia effort will be successful with the Matlab crowd which, up to now, has been watching with horrified fascination from the sidelines as open source ate its lunch.


Nice. Do you find yourself having to contort code to work with numbapro?


You mean like blaze in python?: http://blaze.pydata.org/docs/v_0_6_5/index.html


Blaze has a bit of a broader focus than what I was talking about, since blaze mostly offloads the actual computation to a particular backend. But a combination of blaze and a custom lowering of the computation into machine code using numba would be similar (although without the type safety for guaranteeing that certain optimizations are possible)


I believe that the numba compilation is planned for the coming year.


I don't think so. Have you checked out seaborn, blaze and bokeh?


Not OP, but I've also been look for a way to leave ggplot2 behind and make the jump to 100% python for data analysis. These look neat.

Seaborn - http://web.stanford.edu/~mwaskom/software/seaborn/

Blaze - http://blaze.pydata.org/docs/v_0_6_5/index.html

Bokeh - http://bokeh.pydata.org/


There's also the confusingly named ggplot (for python): https://github.com/yhat/ggplot


I've used that library, and I really don't like it at all. They tried to bring the R syntax to python, which ends up looking awful and is missing the point of the Graphical Grammar. In the same way that every language has it's own way of expressing control flow, every language should have it's own way to express the Graphical Grammar. We don't need R's GGplot2 in python, we need a pythonic way to express the Graphical Grammar.

If I had stronger python-fu I would love to build "GGPy".


Isn't GGPy seaborn?


Maybe. The plots look a lot like GGplot2 plots, any the syntax looks like python, but I haven't dug in to it to see if it builds plots using the Graphical Grammar under the hood.


we've shifted to plot.ly for visualization and moved away from expensive BI tools. We set up python/pandas scripts on CRON to output realtime data to our local plot.ly web server, which makes a local copy of the data and updates the chart. You simply embed the chart as an iframe where ever want internally, and BOOM, you've got a real time chart (beautiful I may add).

www.plot.ly


$60 per user per month is a bit steep...


Considering enterprise BI software/platforms these days can costs upwards of 4k per month, this is roundoff error.


I agree, just doesn't fit the projects I wish to use it for.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: