Slightly tangential, but I'd like to shamelessly plug my HDFS client library (with nice Python bindings)[0].
If you want to access files on HDFS from your Python tasks outside of your typical map / shuffle inputs from the streaming API, it might be handy? It doesn't go through the JVM (the library is in C), so it might save a little latency for short Python tasks.
The main thing you need to do is publish your Python package to PyPI. This should fill in the gaps in your Python packaging knowledge: http://guide.python-distribute.org/
If you want to access files on HDFS from your Python tasks outside of your typical map / shuffle inputs from the streaming API, it might be handy? It doesn't go through the JVM (the library is in C), so it might save a little latency for short Python tasks.
[0]: https://github.com/cemeyer/hadoofus
Also, I'm pretty new to publishing my own open source libraries. If people would be so kind, I'd love some constructive criticism. Thanks HN!