Hacker News new | past | comments | ask | show | jobs | submit login
Natural Language Processing in Python (trackmaven.com)
107 points by fheisler on Dec 2, 2014 | hide | past | favorite | 10 comments



NLTK is a wonderful toolkit. Its selection of corpera is great and its many utility functions for processing text are incredibly useful and easily extendable.

That being said, a lot of the ML, porter, and stemmer implementations are a bit out of date from the current cutting edge in the field. If you are interested in using NLTK for serious projects, I highly recommend writing custom implementations of these modules or using other libraries.


It should be noted that NLTK is a library intended for didactic purposes. Efficient implementation and/or state-of-the-art algorithms are not priorities.


Agreed. I too had given a tutorial on this a while back (https://www.youtube.com/watch?v=kKe4M4iSclc) and nltk is a quick way to prototype up something neat,but yes- if you need more than "toy" functionality, there are currently better tools for the job.


Seconding the power of NLTK for prototyping, it's very easy to build something quickly to test an approach on your data, and at least in my experience, designed in such a way that I could easily replace elements with my own code.


Thanks for the talk, I enjoyed the video. Do you have any good pointers to building a named entity extractor with NLTK?


Thank you, I'm glad you you enjoyed it - I hope to have one on more advanced topics at some point. If you are looking for a named entity extractor sample, I have a sample from my talk on github: https://github.com/shanbady/NLTK-Boston-Python-Meetup/blob/m...

The sample uses the built-in named entity tagger but nltk also has support for leveraging the Stanford named entity tagger: http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanfo...


Thanks for the links. Please put me on the list for when a video of your second talk is out.


which other libraries would you recommend?


There are also Python wrappers for Stanford CoreNLP.


You could look at pattern and TextBlob.

For machine learning turn to scikit-learn.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: