Hacker News new | past | comments | ask | show | jobs | submit login

> Otherwise I think it lacks structure and can't be harvested automatically easily

Indeed, it depends on the language and your goals - I had a very high success rate plucking out Russian grammatical tables from English Wiktionary with a few hours of scripting the data cleaning (https://github.com/thombles/declensions). I have a theory that you could get better results using an offline archive of the page sources but haven't tried this yet.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: