Hacker News new | past | comments | ask | show | jobs | submit login

Optimized C/C++ will usually beat out Go code, while Go will usually far outstrip the cpython interpreter[1]. Go itself doesn't really have any optimization for vector processing using SIMD that would make it a "fast" language for doing things like machine learning, machine vision, etc.; hand-coded C/C++ is always going to beat out any other language here. If you're talking python vs. go though, Go will win hands down.

To be honest my personal viewpoint is that we're better off reducing our dependence on SIMD CPU instructions and using GPUs for this sort of highly-parallel processing instead. Most Processors sold these days come with a GPU built-in, so why not make use of these SIMD units, rather than duplicating them on the CPU? This is just my opinion though, and is sort of irrelevant to the question.

Go has an excellent interface to C (cgo) built in, and SWIG can also be used to wrap C/C++ libraries so they are usable in Go. If your aim is core-for-core speed, then your best option is probably to take some highly optimised C/C++ library and create a wrapper that allows you to access it from Go. This route would be at least as performant as any python implementation using the same technique, if not much faster. If your goal is having a highly-concurrent and safe implementation, it is better to implement such libraries from scratch in Go; this is the method most Go libraries use.

Just as with benchmarks, I don't think anyone can give you a non-subjective answer on whether Go will be faster (so take mine with a grain of salt).

[1] http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...




I/O bandwidth between the GPU and main memory is a significant limiting factor, giving SIMD instructions on the CPU an advantage.


For big data sets or complex SIMD algorithms the I/O bandwidth overhead is tiny compared to the speedup achieved by moving the calculation to the GPU.

For the calculations that don't work well on the GPU due to small data sets, simple calculations, or bandwidth constraints we could just run the code in parallel across multiple cores/multiple goroutines.

I think eventually (and this seems to be the direction companies like AMD are headed in) we'll have a couple (maybe up to 4) big cores right next to a bunch of smaller whimpy GPU-like cores which handle SIMD, making SIMD on big cores all but redundant. We're not there yet but AMD and Intel are both working on trying to get their on-chip GPUs to share memory with the processor directly. At the moment the focus for this is mainly gaming performance, so textures, etc. don't have to be copied from main memory to the GPU; the same functionality will greatly benefit GPGPU though. Once we have this heterogeneous architecture and newer faster memory technologies, the problems with using the GPU for SIMD will disappear.

But for the moment, with the real-world technology constraints we have, you're absolutely right on the limitations of GPGPU.


Running code in parallel across multiple cores is going to lose to SIMD. I don't think SIMD is going away anytime soon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: