I was looking at doing BLAST searches on genetic data in Erlang and checked out the benchmarks for the algorithms on different languages. I was shocked to see the performance benchmark for FASTA on one of the benchmark sites: Java was only slightly slower than C++, about 20% off the mark. http://shootout.alioth.debian.org/u32q/fulldata.php?test=fas...
Found the url:
Java -server: 7.54 seconds
C gnu gcc: 9.33 seconds
C gnu gcc #4: 5.29 seconds
C++ GNU g++ #4 6.21 seconds
I'd always thought of Java as slow, from the early SWING days. Nope. Java is really fast. Not sure how globally applicable this is, but I thought it was neat. Considering what its features buy you... 20% for this algorithm is pretty cheap. Although for C I guess its a bit more.
Java is fast if a) you have lots of memory and b) you can write off the VM startup time. Now memory is cheap and you can buy boxes off the shelf these days with 128G in, but at the point you run out and need to swap, your performance is destroyed. Of course this is true for all languages, but C (et al) won't hit that problem 'til much much later. That's clearly shown in the statistics.
As for point b, no-one is using Java to write command-line tools for that very reason.
Wait a second... :-) gaius is completely right about the memory issues with Java and this is the major difference between Java and C++ nowadays.
But what you see in the shootout tables is irrelevant for applications that hold a lot of data in memory. These numbers comprise almost exclusively the JVMs own memory as the benchmark problems use almost no data. If you load gigs of data into memory the 15MB or so JVM baseline memory consumption becomes negligible.
My own experience is that Java uses about twice as much memory as C++ does. Where a lot of strings are used, Java's choice of UTF-16 for representing strings has a big effect on top of using references instead of values everywhere.
C#'s value types can help a lot to reduce memory consumption by the way.
You're right, I should've been more specific about which ones I mean, but if you look at the numbers they show exactly what I was explaining.
Those tests with very low memory usage show Java using 16 times as much memory as C++ because almost all of it is VM memory, whereas the tests using a lot of memory show Java using only around twice as much memory.
At least that's the general picture. I know very well that it depends hugely on what kind of data structures are being used and so on, so it's going to vary.
I've noticed a curious trend with managed language programmers: they tend to say that language X is almost as fast as C, or has C-like speed when in reality it is significantly slower. In your case, "slightly" means 20%.
You have a simulation that will take either four days to run or five. That's not tiny. And that's not so rare a problem. Not everyone is doing web apps.
I remain skeptical of these micro benchmarks. Almost every time I've messed around with porting pieces of C++ to Java or C# and benchmarking it I get a large difference in favor of C++.
Hadoop is a platform for traversing very large data sets. It is not really for "number crunching." One can use any language for the map/reduce pieces of Hadoop.
Actually, plenty of guys use Hadoop for CPU bound problems.
And the fastest way to use it this way is Java, which is what the number crunching users of Hadoop use. Which is to say that the emerging big data compute platform is Java. Which is to say that you're wrong.
Hadoop is not "for CPU bound problems." It's for huge data. There are much better frameworks for the problem of splitting a parallelizable computation across machines. Nobody would use Hadoop just for that.
Why would Java be the fastest way to do computations with Hadoop? You can use many languages for the map/reduce pieces. The fastest language would be fastest language, not necessarily Java.
Slight tangent: why does Android use a VM, but not a standard JVM?
As said in this article: [C/C++ is better than Java when] very small footprint is required (these days that does not include most phones). You can get JVMs that run well in a few hundred KB.
The situations where a VM, but not a JVM are sensible seem to be totally squashed between C/C++ on one side and the JVM on the other.
There was some theory that it was to do with licencing (http://www.betaversion.org/~stefano/linotype/news/110/) but the argument got a bit obscure for me. Something along the lines of Google couldn't officially call it Java because it wouldn't meet the licencing requirements of the test suite.
But not being able to call it "JavaTM" seems like a small price compared with having an incompatible VM.
The technical arguments for Dalvik I remember are twofold:
(a, minor) the dalvik vm is a closer fit to the underlying cpu (being a register-based vm iirc) which makes it simpler to implement and/or a little more performant
(b, major) the dalvik vm (contra the jvm) is designed to more-easily support the "single vm instance with multiple logically independent apps running in it" model; this reduces the vm-induced memory overhead b/c the vm and any system-wide libraries are only loaded once.
There's probably also some truth to the trademark argument.
These days, that INCLUDES most phones. Even if Symbian C++ is painful to write (weird API, Qt will make it much better), most programs are written in C++ and not Java. Java is as slow as a dog on S60 phones.
WinMo phones? It's not even available, but I've run .NET apps on a pocket PC with 500Mhz+ processor and ~128MB RAM and the GUI is sluggish.
iPhone? Not available, but the memory constraints are drastic. See Noel Llopis's blog posts about this.
I worry when people do benchmarks like this. People ignore the context that half the post is about and picks out the 3 lines where X performs better then Y. The Y crowd dislikes this so jumps at the article attacking it (eventually degrading into flamewar) Ignoring of course that the first thing the author says is "it depends" and does everything he can to establish he is NOT saying that X is faster then Y rather that the optimizations or something that X provides does increase the performance of that operation (backed with some simple benchmarks).
Many of these points apply to programming languages that are arguably more enjoyable than Java, including JavaScript (TraceMonkey), Python (PyPy), Common Lisp (SBCL), and Haskell (GHC).
For most workloads, even hard-core number crunching, it is pretty straightforward to get near-C performance out of those languages. I am not sure why anyone would even bother with Java.
If it's straightforward to get near-C performance out of those languages, why do they lose out so badly in the language shootout? eg: Tracemonkey vs Java -
Java varies between 10 and 100 times as fast as Tracemonkey in all the tests. If you think it is straightforward to write code that equals C performance, you should do so and submit it to the shootout to prove your point. Personally, I don't believe you can :-)
It's straightforward because talk is cheap. Maybe jrockway could write some near-C speed Python or JavaScript code to enlighten us. And by near-C speed I don't mean 20% slower...
No, I wasn't. I was talking about the speed-critical tight loops. Once you let GHC or TraceMonkey or PyPy or ... know what the machine types of your data is, they will generate very fast code; often the same instructions that the C compiler would generate.
C is fast because the programmer explicitly names the machine types to be used. Other languages are slow because their operations can't always be expressed in terms of operations the machine can do. (Consider Perl and "2 + 2". This is "slow" because 2 is a "SV", and that means before anything can happen, the integer 2 has to be unboxed from the SV, the runtime has to decide which "+" to use, and then the result has to be boxed in a "SV". This is much slower than running the "ADD" instruction. But if you can somehow specify, "this is not a special boxed value, just treat it as a machine integer and don't worry about overflow or anything special", then you are doing the same thing as C. Modern VMs (and static compilers) can guess this for you and generate the right code.)
After the industry accepted Java as an enterprise language, lot of development went into optimizing the compiler and these were not academic developments, like your examples (maybe except TraceMonkey).
Add this to the already existing huge amounts of libraries and the fact that Java is a perfect middle ground language in every ways, and you have the answer.
Personally I don't like Java, mainly because of the 'framework culture', but I have to admit that a lot of progress was done in the areas of compiling and running Java code.
Still despite that much effort, I still can feel when I use a non-statically compiled interactive application (e.g. text-editors, games etc.). Generally low latency responses for your actions and fast startup time seems to not to be the property of dynamic languages.
By "straightforward" to you mean "theoretically possible?" If that were the case, nearly everything would be written in Python. Maybe there is a specific case you're thinking of because in the general case it's non-trivial and possibly impossible. You're not getting near-C performance by simply dropping to C are you? That's not a terribly acceptable answer.
I don't know why these things come up all the time. Java is pretty fast for a managed language, certainly faster than a lot of people think but it's not C. Python, Ruby, etc? They're way far behind in performance, particularly tight loop number crunching type stuff. That doesn't make them bad choices for things though. There isn't a subjective answer, I think it was Top Gear 2 weeks ago that said "You don't buy a supercar with your mind because it doesn't make sense, you buy them with your heart because you love them." The same thing is true here. If you want to be subjective there are all sorts of other variables you'd have to factor in, tooling, libraries, dev time, availability of quality developers, maturity, likelihood of future support (you laugh, I wonder with Ruby and Python and Perl all going through these giant shifts, they may never be the same and as unified) Can you actually compile your code or do you distribute source code? All those things in play and it gives Java a fairly compelling story even though a lot of people want to hate it.
The better discussion, at least to me, is with so many compelling options that perform as well as they do, why does anyone bother with C? With various buffer overflows being so prevalent and it being so easy to have a bug that poses a real security risk, is it worth that?
I've (just about) finished moving my project to JRuby. The combination of the Ruby language for most of the code, the vast selection of libraries, and the ability to drop down into Java for performance sensitive code is a joyous thing.
Yes, but since Lua has been designed from the beginning to embed in a C program, its developers take the C API very seriously. (At the least, you can use Lua as a C library providing a great string/atom system, script & config file parser, and a garbage-collected Python-like repl for automated testing and debugging.)
Found the url:
Java -server: 7.54 seconds
C gnu gcc: 9.33 seconds
C gnu gcc #4: 5.29 seconds
C++ GNU g++ #4 6.21 seconds
I'd always thought of Java as slow, from the early SWING days. Nope. Java is really fast. Not sure how globally applicable this is, but I thought it was neat. Considering what its features buy you... 20% for this algorithm is pretty cheap. Although for C I guess its a bit more.