Natural Language Processing in Python

andrewguenther · on Dec 2, 2014

NLTK is a wonderful toolkit. Its selection of corpera is great and its many utility functions for processing text are incredibly useful and easily extendable.

That being said, a lot of the ML, porter, and stemmer implementations are a bit out of date from the current cutting edge in the field. If you are interested in using NLTK for serious projects, I highly recommend writing custom implementations of these modules or using other libraries.

andreasvc · on Dec 2, 2014

It should be noted that NLTK is a library intended for didactic purposes. Efficient implementation and/or state-of-the-art algorithms are not priorities.

acosmism · on Dec 3, 2014

Agreed. I too had given a tutorial on this a while back (https://www.youtube.com/watch?v=kKe4M4iSclc) and nltk is a quick way to prototype up something neat,but yes- if you need more than "toy" functionality, there are currently better tools for the job.

ehurrell · on Dec 3, 2014

Seconding the power of NLTK for prototyping, it's very easy to build something quickly to test an approach on your data, and at least in my experience, designed in such a way that I could easily replace elements with my own code.

bane · on Dec 3, 2014

Thanks for the talk, I enjoyed the video. Do you have any good pointers to building a named entity extractor with NLTK?

acosmism · on Dec 4, 2014

Thank you, I'm glad you you enjoyed it - I hope to have one on more advanced topics at some point. If you are looking for a named entity extractor sample, I have a sample from my talk on github: https://github.com/shanbady/NLTK-Boston-Python-Meetup/blob/m...

The sample uses the built-in named entity tagger but nltk also has support for leveraging the Stanford named entity tagger: http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanfo...

bane · on Dec 4, 2014

Thanks for the links. Please put me on the list for when a video of your second talk is out.

rajibsingh · on Dec 2, 2014

which other libraries would you recommend?

nyir · on Dec 2, 2014

There are also Python wrappers for Stanford CoreNLP.

andreasvc · on Dec 2, 2014

You could look at pattern and TextBlob.

For machine learning turn to scikit-learn.