Hacker News new | past | comments | ask | show | jobs | submit login

What’s the state of Wiktionary like in your opinion?



I hadn’t used Wiktionary for a few years, so I just spent some time looking through it. It was pretty good a few years ago, and now it looks even better. I’m sure many people find it very useful. The amount of information on each page, though, might make it a bit intimidating to some users.

It also seems to have some unevenness in coverage. For example, the entry for the word “anecdata” (a word discussed recently at [1]), has five illustrative quotations, which are quite handy [2]. The entry for the more mundane “anecdote,” however, has none [3]. Such unevenness might be inevitable in volunteer dictionary projects, as volunteers like to work on the more interesting words.

[1] https://news.ycombinator.com/item?id=28375767

[2] https://en.wiktionary.org/wiki/anecdata

[3] https://en.wiktionary.org/wiki/anecdote


I use Wiktionary pretty often, and it has come in particularly useful this past week!

We're translating some strings on our software user interface, and checking the abbreviations and acronyms used. Sometimes there are amusing or [nsfw] connotations in other languages! Thank you Wiktionary for warning us about abbreviating "low pressure" as "LP" in Taiwan.

https://en.wiktionary.org/wiki/LP#Noun_2


I don't really use it often as a user nor i my projects to have a definite opinion. There is some pairs of words (about 5K) in Sino-Vietnamese that came with their chu nom writing which was very helpful to one of project. Otherwise I think it lacks structure and can't be harvested automatically easily (I don't think Wikidata integrate it all, and that website is a non-starter for me). Also every language is structured differently so Wiktionary can hardly be commented as a whole.


https://en.wiktionary.org/wiki/Help:FAQ:

Q: Is it possible to download Wiktionary?

A: Yes. https://dumps.wikimedia.org/enwiktionary/ should have the latest copy of the main namespace. The cleanest navigation page is https://dumps.wikimedia.org/. Just download a -articles.xml.bz2 file and some software to read it (for nix, for Windows).

Q: Can I use data from Wiktionary in my program?

A: As long as you meet the conditions of the GNU Free Documentation License or Creative Commons Attribution/Share-Alike License, certainly.

Latest dump for English is from September 1. I wouldn’t know whether it has all the data or how easy it is to parse it.


> Otherwise I think it lacks structure and can't be harvested automatically easily

Indeed, it depends on the language and your goals - I had a very high success rate plucking out Russian grammatical tables from English Wiktionary with a few hours of scripting the data cleaning (https://github.com/thombles/declensions). I have a theory that you could get better results using an offline archive of the page sources but haven't tried this yet.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: