Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to learn OpenCL
25 points by lettergram on Jan 26, 2014 | hide | past | favorite | 16 comments
OpenCL seems to only have minimal examples, and although there is a fair to good amount of documentation, I feel the need to ask if there are any good books/tutorials on how to use/learn OpenCL?



Intel and AMD both have some good documentation on their site for getting going with OpenCL, so that's a good place to start. For example:

http://developer.amd.com/tools-and-sdks/heterogeneous-comput...

The Khronos website has a huge page with a list of OpenCL tutorials and books:

https://www.khronos.org/opencl/resources

Amazon has a number of OpenCL books available:

* http://www.amazon.com/OpenCL-Action-Accelerate-Graphics-Comp...

* http://www.amazon.com/Heterogeneous-Computing-OpenCL-Second-...

* http://www.amazon.com/OpenCL-Programming-Guide-Aaftab-Munshi...

This book is available on Amazon but the previous edition is available for free:

http://www.fixstars.com/en/opencl/book/

Intel's website also has some "Getting Started" articles and optimization guides for OpenCL (for CPU, GPU, and Xeon Phi):

http://software.intel.com/en-us/vcsource/tools/opencl


This is just a small sample of the OpenCL kernels available in the AMD SDK (it runs everywhere, at least it did last time I checked).

  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/AESEncryptDecrypt_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BinarySearch_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BinomialOption_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BitonicSort_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BlackScholesDP_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BlackScholes_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BoxFilterGL_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BoxFilter_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/BufferBandwidth_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/ConstantBandwidth_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/DCT_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/DeviceFission_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/DwtHaar1D_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/EigenValue_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FFT_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FastWalshTransform_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FloydWarshall_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/FluidSimulation2D_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/GaussianNoise_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/GlobalMemoryBandwidth_Kernels.cl
  AMD-APP-SDK-v2.4-lnx64/samples/opencl/bin/x86/HelloCL_Kernels.cl
I'd hook into OpenCL from the high level language of your choice. Look at

* http://mathema.tician.de/software/pyopencl/

* http://www.drdobbs.com/open-source/easy-opencl-with-python/2...

Or JRuby or Jython with https://code.google.com/p/aparapi/ (still have to write your inner kernel in .java and send that to aparapi)

If you go the "whole-stack-in-c" route much of your time will be spent doing memory operations.


If you're a complete beginner in data parallel programming, and you're having trouble finding good intro material for OpenCL, it might almost be worthwhile to check out CUDA instead. In terms of the programming model, OpenCL and CUDA are identical - significant differences don't come about until you start optimizing for specific devices.

I learned CUDA first on my own, and then took an OpenCL class and found that the whole first section was completely redundant. There's also a pretty great wealth of CUDA material online and a few published books if that's your sort of thing.


To add some reasons why you'd want to learn CUDA first: It turns out that simple things are a lot simpler, and take a lot less code, in Cuda than OpenCL. With Cuda, your kernel and host code will be close together in the same file. You'll need a /lot/ less boilerplate than are needed in OpenCL to accomplish even the most simple things. OpenCL exposes you to a lot more concepts, and a lot more extrinsic complexity, than Cuda. All this means just playing around a lot easier to do in Cuda.

Just take a look at some simple examples in both, and you'll quickly see what I mean. Even though I'm a fan of OpenCL because it is available on more platforms, Cuda is a lot better suited as a learning platform.


A word of warning: OpenCL, and heterogeneous computing in general, is very very difficult. It will take a lot of effort to get even the simplest hello world application working.

And when writing OpenCL, even you are using a single API to write code, if you want to get high performance you will need to rewrite parts of your application for each hardware you intend to run on if you want good performance. This is obvious since the code may end up running on Intel x86's or Intel, AMD or Nvidia GPU architectures which are all very different. If you're lucky, it's enough to rewrite your kernel code (the code running on the device). But you might also need changes to the host side code (running on the cpu) and make changes to the way you manage the memory and DMA transfers, etc.

Finally, when it comes to the basics, OpenCL is not too different from CUDA and there's a lot more material available on CUDA (because it's been around longer and is perhaps used a bit more). You should be able to pick up a book or a tutorial on CUDA and translate it to OpenCL without too much effort.

Finally, even though it may take quite a lot of learning to get started, parallel programming on GPUs is quite fun and it is very rewarding to see your code run with very high performance.


Disclaimer: this is a rant, it is obvious that I don't like OpenCL and that I think that it was designed by monkeys so take it with a grain of salt.

In short: don't learn OpenCL. Both CUDA and C++AMP are good languages for programming heterogeneous machines and nVidia's Thrust and Microsoft's PPL are both excellent libraries to write efficient and reusable code. These language extensions are also strongly typed and come with really good tools. My advice is: learn any of them instead.

Why not OpenCL? AMD's Bolt library is the live proof that OpenCL is fxxxxx up beyond all repair. It is not meant for humans to write, nor for machines to understand.

Kernels are just character strings!!! This is just so wrong! Forget about using functors and lambdas as kernels, and forget about mixing kernels with templates. You will be better off using Python and PyOpenCL (which is great) that using C and C++. In C++ generating kernels is really hard, and generating kernels from expression templates is insanely hard.

Furthermore, this also means that the language is not typed at all!! Forgetting a semicolon in C results in a runtime error! Do you want syntax highlighting? Write your kernels in separate files! This is even worse than the way people used to write functors far away from the call site in C++03, at least they were in the same file!

As stated above my advice is don't learn it. Let it die. Your time is better spent learning CUDA/C++AMP and their libraries. The design rules for OpenCL have been "let's not learn anything from OpenGL" + "we need something, this is something, let's standarize this". This of course has resulted in an hilarious language that came after CUDA and was worse in every possible way.


If you're using C++, check out Boost.Compute [1]. It provides a high-level STL-like API for OpenCL (without preventing you from directly using the low-level OpenCL APIs). It simplifies common tasks such as copying data to/from the device and also provides a number of built-in algorithms (e.g. sorting/reducing/transforming, etc).

[1] https://github.com/kylelutz/compute


I've started with this[1] blog post, then worked quite a bit on adding proper error checking to the example code to figure out why it failed :), now the author has merged my changes. So perhaps it's a worthwhile example to start from now. I haven't done much with OpenCL yet, though, in the end I figured out that my ~7-8 year old laptop ran SIMD optimized C code faster on the host CPU than on the GPU (I wrote this[2] with heavy SIMD optimization work, not sure anymore whether this was what I tested against OpenCL though), which is a reason why.

[1] http://www.thebigblob.com/getting-started-with-opencl-and-gp... [2] https://github.com/pflanze/mandelbrot/tree/master/c


Perhaps Apple's documentation and WWDC video might be of help: https://developer.apple.com/opencl/

To watch the video you need to be a registered Apple developer.


To clarify -- you can read the documentation and code samples without having to register as an Apple Developer. You only need to do that to watch the tutorial videos or access the developer forums.


One great way to start is to use OpenCL libraries. We work on clMath (https://github.com/clMathLibraries) and ArrayFire (http://arrayfire.com) which are both easy to pick up. Once you get comfortable with libraries, you can start trying to write your own kernels, and you'll know which things you'll need to write that aren't already in a freely available library. Good luck!


Mess around with open-source applications (such as Rodinia benchmark suite and NAS parallel benchmarks) after going through the basic tutorials on AMD, Intel and Nvidia webpages.


A good open-source OpenCL application might be cgminer or bfgminer, might have to get an older version though, it looks like the latest ones have abandoned OpenCL/GPU support.


Any idea why they abandoned it? CPU-only mining seems to be kinda pointless these days...


Cgminer and bfgminer are focused on usb based ASIC miners now. Bfgminer still supports cpu and gpu mining, it's just unsupported and not compiled by default.


You might wish to check out the University of Illinois's "Heterogeneous Parallel Programming" course, offered through Coursera:

https://www.coursera.org/course/hetero

The course is already currently ongoing, but it's not too late to enroll. The course is mainly focused on CUDA (since it's easier to learn, the professor believes), but covers OpenCL as well.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: