Hacker News new | past | comments | ask | show | jobs | submit login

That's the kind of thing I stumble across all the time. Indexing all the symbols in a codebase:

  results = Counter()
  for file in here.glob('*.py'):
      symbols = parse(file)
      results.update(symbols)
Scanning image metadata:

  for image in here.glob('*.png'):
      headers = png.Reader(image)
      ...
Now that I think about it, most of my use cases involve doing expensive things to all the files in a directory, but in ways where it'd be really sweet if I could do it all in the same process space instead of using a multiprocessing pool (which is otherwise an excellent way to skin that cat).

I've never let that stop me from getting the job done. There's always a way, and if we can't use tool A, then we'll make tool B work. It'll still be nice if it pans out that decent threading is at least an option.




These are "embarassingly parallel" examples that multiprocessing is ok for, though. There was always the small caveat that you can't pickle a file handle, but it wasn't a real problem. Threads are more useful if you have lots of shared state and mutexes.

I think these examples would also perform well with GIL'd threads, since the actual Python part is just waiting on blocking calls that do the expensive work. But maybe not.


> Threads are more useful if you have lots of shared state and mutexes.

That's what always kicks me in such things. If the processes are truly completely separable, awesome! It never seems like they are as much as I wish they were.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: