Hacker News new | past | comments | ask | show | jobs | submit login

Sure thing, here are two examples:

1) My current project at work is a GPU-accelerated keyword-matching engine. The project was started before I joined the company, so I had no say in the choice of Python. Keywords change infrequently, while we analyze a continuous stream of incoming text. There are several million keywords, ranging from small to enormous in size. Aho-Corasick (http://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_string_mat...) is a pretty ideal algorithm for this scenario, which we use for the GPU matching kernel.

AC requires some preprocessing of keywords into a deterministic finite automaton (basically a suffix trie). This is very expensive for a large number of keywords with a large number of characters. The DFA grows to something like 10GB while being built.

Meanwhile, the main engine loop has to be running continuously, while updating keywords in the background. The engine is a service available to other systems on our network, so it uses multiple threads for concurrent I/O. The problem is that the GPU performance is so ridiculously high that the CPU can't keep it fed with data. I've profiled it and this is not a memory-bound problem...the CPU simply cannot keep up with the document streams that we send to it.

The concurrent I/O threads cannot reasonably be split across processes because they need a shared memory space for the data structures driving the engine. So clearly, the background keyword updating is a problem if it runs in the same process as the rest of the engine. I spent a lot of time trying to figure out how to get the keyword updating working in its own multiprocessing process. It's a complete hack to work around the failings of Python (I can go into more depth about the implementation issues if you'd like). And this is why I loathe the GIL.

We use Cython for some aspects of the code, but the keyword updating has yielded very little gain. It's difficult to rewrite parts of the keyword updater as more optimized Cython because it uses some language features that do not seem to be supported in Cython.

2) For a personal project, I need to do a lot of timeseries processing. I'm using Python to prototype, with the intention of either optimizing it eventually or possibly rewriting it in a more suitable language. I've found parsing timestamps to be particularly CPU-intensive, while working on gigabytes of data. Most data I send to a multiprocessing process will have to be returned in some form eventually, so communication costs are huge. So huge, in fact, that I only see a 10% speedup from splitting the workloads evenly across six cores. Profiling reveals that the majority of the "processing" time is actually just waiting on data getting sent back to the main process. This would not be a problem with a shared memory space.





Interesting, thanks for the tip! I'll look into it and think about whether it makes sense for my timeseries analysis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: