Hacker News new | past | comments | ask | show | jobs | submit login
KGPU - Augmenting Linux with the CUDA GPU (github.com/wbsun)
77 points by anfractuosity on Dec 16, 2012 | hide | past | favorite | 32 comments



Why is this cuda and not OpenCL? There's no reason to legitimize nvidia's proprietary nonsense, it just enables their bad behavior.


The GPU code seems to be relatively isolated to memory operations in gpuops.cu. Not altogether sure but a quick review suggests such mapping is supported by OpenCL. So one could rewrite that module and the whole thing would work without CUDA. Of course, in terms of compilers it is going to be a while before users can move away from nvcc for nvidia graphics card support.

However, as with the parent I'd really like to see a generic OpenCL vectorization kernel module. Strong suspicion that this work was directly or indirectly underwritten by Nvidia so I guess someone (Intel?) needs to step up and fund similar academic projects.


Suspicion confirmed:

"KGPU is a project of the Flux Research Group at the University of Utah. It is supported by NVIDIA through a graduate fellowship awarded to Weibin Sun."

http://code.google.com/p/kgpu/


It really makes you wonder why AMD does not do the same thing; the hw is essentially free and how much could a fellowship cost. Does anyone know if research grants like this are a tax write-off?



Nonetheless, it's a bad idea to favor proprietary languages that lock you into a particular company's products.

CUDA should have died when OpenCL came about.

Of course since such lockin is to Nvidia's benefit, it's understandable why they keep promoting their proprietary solution...


Like what, you mean instantly? What if OpenCL sucked? Should CUDA still die?


What sort of vitriol is this? How dare they invent a new technology and build an API to it.


Didn't you know all good API's are designed from scratch in open committees, instead of being standardized after multiple competing implementations?


You do NOT want to do this. Unlike the CPU which can be preempted current GPUs can not. If you give them 2-30 minutes of work to do they will not return until that work is done. Windows gets around this by resetting the GPU if it doesn't respond for more than a few seconds. Linux/osx no such luck, at least not yet.

But, once reset the state of the GPU is often unknown. Not a good thing if you are embedding GPU code into your kernel.


Where is this useful? The bus speed across to the GPU is so slow, I thought it was only meaningful for near-autonomous operations.


Right now, the bandwidth to modern GPUs is actually pretty decent (16GB/s bidirectional), but the latency is still horrid. This means that you need rather large operations for offloading to pay off. I think doing raid-5 or full disk encryption with large blocks might just barely be worth it.

However, with AMD and Intel integrated GPUs, this is about to change. AMD is doing a lot of work on HSA, which can be summarized as "GPU and CPU share same memory, and can communicate by passing pointers". I can see this kind of work being really useful in the near future.


More and more CPUs have AES instructions and my old lenovo ideapad has a crypto coprocessor. Do you think the GPU offload will be worth it when the sytem has hw accelerated crypto?


The question should be whether dedicated hardware for accelerated crypto will be worth it when GPU offload is suitable for it.

Although in that particular context (security) you might enjoy the isolation of dedicated hardware as opposed to sharing it with others. The GPU solution though of course has the advantage of being able to adapt to new ciphers etc.


Specialized, single-purpose hardware is right now some 10x more energy-efficient for the same task than a GPU, (and some 50x more efficient than a CPU). Given that modern chips are not limited by transistor density but by energy density, we're going to see more special-purpose hardware in our chips, not less.


Adapting/implementing the newest and hottest cipher on the block is not something that the crypto community advocates. Do you really think crypto accelerated hardware is going to fall behind and not support the ciphers that the crypto community (academia/industry) endorses?


I've been waiting for what, a decade, since VIA introduced padlock to get hardware accelerated encryption in mainstream CPUs. And, just recently, basic support have been introduced but only for AES and nothing else (and if I'm not mistaken (probably is) the padlock is vastly superior to the offerings of AMD and intel :P).

So yes, crypto accelerated hardware is behind and does not support the ciphers that the crypto community (academia/industry) endorses and in all likelihood will never bother catch up since doing it on the GPU will be good enough. Even if it takes another decade.


What algos are you missing?

The AES-NI instruction set was proposed in 2008 and the first intel cpus started shipping almost three years ago.[1] Soekris has had the vpnXXXX crypto accelerators since as long as I can remember.[2]

[1] http://ark.intel.com/search/advanced/?s=t&AESTech=true [2] http://soekris.com/


Blowfish or twofish wouldn't hurt. But I'd be happy with AES, too bad none of my devices have hardware acceleration for it.

The fact that it was proposed in 2008 is quite telling by itself. And when intel introduced it it was in their high-end product lines, to find a processor where AES-NI is less needed would be a challenge.

A suitable integrated GPU would penetrate the market much better and ultimately support products which today use the atom processor. The very same product segment where you barely can use encryption on today (in contrast to a i7 which saturates a fast ssd without breaking a sweat in AES-encryption throughput - without hardware acceleration).

A similar solution would also most likely allow me to encrypt files on my phone without a large impact, even if the manufacturer couldn't care less about security features.


That depends on which team you're playing for, no?


I am not trying to be difficult but i have no idea what you are talking about. I looked through your past comments and you seem to be a competent commenter, can you clarify that you meant?


Whether you are attacking the encryption, or whether you're a user of encryption; a defender.

If you're attacking, having flexibility is advantageous.


I had thought in the past that storing the index of a database (not the data, just index) on the card and using that to handle complex queries, might be interesting. Not sure if that has a practical, real-world use though.


Not so much RAID5, which is just an XOR operation that is as good as free on a modern CPU, but RAID6 where a more computation-intensive Reed-Solomon code is used.


One situation where you might see immediate results is in High volume routing. I think this project [1]. They were using the GPU to saturate multiple 10GbE interfaces with a commodity processor.

1.http://shader.kaist.edu/packetshader/


I actually spoke to them about doing a similar thing, and they said they were using a GPU because they were pushing lots of sub-1500 MTU packets, and that commodity h/w could probably saturate multiple 10GbE interfaces that were just doing large packets for high-throughput.


README suggests RAID processing, file system encryption, and AES.


Agreed. I'm also struggling to see what kind of massively parallel operations need to be done in kernel space in the first place.


Maybe it doesn't have to saturate the GPU to be a win. If you can just banish some cache busting, streaming work, like raid processing, to a tiny sliver of the GPU it could be a win.


Would this be possible with OpenCL, too?


I wonder whether this won't conflict with other user applications using the GPU.


I don't understand :)

Could someone explain what this kind of technology means in practice? Does this mean I can GPU accelerate my old code?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: