As a credited contributor to the EDICT[1] Japanese/English dictionary, I am very pleased to see its successor JMdict[2] actively supported by this project. Bravo!
And as someone who now also speaks Italian, I am even more pleased to see that Italian support will be added tomorrow.
It is wonderful to see such a useful tool released as an open-source, self-hosted project. (^_^)
Can you please explain what do you mean by actively supporting JMDict? I hope I didn't make an attribution mistake, or misunderstood something. My understanding is that I can use those files in my project as long as I follow the license guidelines.
It makes me really happy that so many people are interested in it. :)
Sorry for the confusing language choice on my part. I just meant that I think it's great that your project supports JMdict. I think how you are using JMdict is indeed totally okay! :^)
I think the Jellyfin integration could be more than just a niche feature. I've used https://www.languagereactor.com/, but that only supports Netflix & YouTube, which is a bit limiting.
Reasons it's useful:
* If you've got both Native & Target Language subtitles, you can see a natural translation if you're struggling to understand something
* If there isn't a Native translation, then you can machine-translate one - especially useful early on to catch common idioms/etc that aren't just the sum of each individual word.
* Jellyfin also supports eBooks, although its reader isn't great - but if someone has already built their library, it would be nice to be able to re-use it somehow.
I would be very interested in seeing that particular feature expand, but I don't imagine it's at all simple!
Tangentially related, but I could see some desire for Calibre support as well, somehow. Calibre was very much designed to be completely stand-alone and it doesn't really support other apps trying to read its database, but it is possible.
I'd also really like some language-specific features, like separable-verb handling for German (see this comment: https://news.ycombinator.com/item?id=38915786) - it's relatively important and lacking support really limits the usefulness of vocab tools. It would also be a nightmare to handle for subtitles, since it's not always clear where a sentence ends, but such is life - subtitles are sadly not aimed at language leaners. For books and not-terrible Podcast transcripts, though, it wouldn't be so bad.
I thought of it as a niche feature because I thought most of the users would come from language learning communities, where most people are not into self-hosting. So even if someone would set up a server just for this, chances are they do not have or interested in Jellyfin also. But I've seen several comments about it, and it seems like a lot of people are from the self-hosting community so maybe it's more popular.
I'm also planning to support YouTube and improve on Jellyfin support, but I'll work on other issues and features first.
Well, part of it is being on Hacker News, which will definitely skew towards "self-host everything!", and on top of that Jellyfin is genuinely free and open-source while the more popular alternative (Plex) isn't, so probably more popular here again, and not necessarily reflective of the popularity amongst self-hosters in general!
I definitely wouldn't expect it to be high on the list of priorities, but I do appreciate that it's under consideration at the very least.
I know Christmas is over, but my letter to Santa would include:
- some Anki sync feature (over an external Anki sync server or any other solution)
- a non-docker install guide
- of course more languages!
I've been looking for a tool to study vocabulary this way, especially in languages I'm already fluent in, to learn more nuances or specific meanings to some words. Having tried several things I settled on the bookmark feature of my Wiktionary Android apps (Livio's, which are nice), and a small sync/script chain that would let me review words, compare definitions in different dictionaries, choose the best and edit/complete it, and make an Anki card of it. The whole process was still tedious.
Very cool! How do you handle segmenting sentences into individual words in Japanese? I've been building a similar app for Android, but gave up on Japanese partly because segmenting was so unreliable.
I don’t see a link on that page where I can download the software. (I am exceptionally slow-thinking today, so it may be in a very obvious place and I have overlooked it.)
I've added Chinese. However i couldn't find a dictionary for it yet, and it might need a custom font for Chinese characters. DeepL works with it as well. If it has issues, I will fix it soon.
I'll release a v0.4 update tomorrow or after. It makes a lot of things more simpler, I would recommend to wait for it before you install it. After that update I'll work on Chinese dictionaries and issues. It will take 1-2 days. It will have two built-in dictionaries for Chinese: cc-cedict and wiktionary.
I have been working on a similar project on-and-off in my spare time, the only remotely interesting feature that other similar software may not have is that it actually tries to parse/analyze sentences (with an NLP lib). It's made specifically for German, and the reason why I wanted to make it is that no existing software managed to handle separable verbs properly - for example learning "Wir fangen jetzt an." is just wrong if you learn it as 'fangen' and 'an' separately, you actually care about 'anfangen', dictionary-wise.
It unfortunately does have false-positives (a complete solution would require LLMs, I believe over the much less complicated NLP algorithms - I just don't want to send whole books to ChatGPT, as that would quickly become expensive), but I found it usable, so I made it public now: https://github.com/tenaf0/lwt
I don't want to "advertise" it even more, as the NLP lib is run by academia as a free service, and I don't want to overburden it (I have been planning on hosting it myself, but didn't yet get there).
You have my full support for your project, as I think natural language processing is a very exciting and underutilised technology for language learning. But if you want a low-tech solution, I've found Wiktionary to be ideal. Wiktionary has all the declensions and prefixes for German verbs; to use your example:
I chose to add Wiktionary to Kiwix Android (8GB download) for offline use. In addition, I can search by right-clicking or tap+holding on a word. All that information is available because of the (mostly manual) work done by Wiktionary contributors, but it reaches a very high standard. There is usually more digression and explanation for the usage notes in Wiktionary than, say, Collins German-English dictionary, which is a rather good thing for language learners.
FWIW, English Wikitionary (appears to!) have fewer words than German Wiktionary. I've run into this trying to extract words from eBooks (then converting to the "base" form, to essentially de-duplicate). I think it's mostly compound or more niche words, but I imagine you'd still run into them at least occasionally with most written works.
There's a nice project for converting and extracting the data from English Wiktionary into JSON but it doesn't support any other languages, AFAIK, which is a bit of a shame but also not very surprising - Wiktionary is a lot more complex, technically, than I expected!
The latter. I'm very definitely not at that level either, but looking at German words from books that couldn't be found on English Wiktionary, I was able to find them on German Wiktionary. One example would be "Weihnachtsfest" - not sure it's "officially" a compound word, though if you know "Weihnacht" and "Fest", then the meaning should be clear. In any case, it shows up as a single word and trying to "split" words made up of other words is an exercise in insanity.
Another example is "krächzender", which might also serve to give some idea of the particular pains in processing German text. It's not in English Wiktionary, but krächzen is, and is a verb. So "krächzender" is the adjectival form of the verb, and if you know "krächzen" and the general rules around adjective formation it would probably be obvious. But would you rely on a computer to parse those rules, or would you want a table with all the declensions laid out? And if you're building a vocab list for a book, is it a separate entry in the list, or does it fall under the verb?
Obviously, German Wiktionary only has definitions & explanations in German so it's not great for beginners, but any tool that's trying to automatically do stuff with German text would likely benefit from using German Wiktionary.
I have no idea if it's true for other languages, but I wouldn't be surprised if it's also true for other major languages spoken by Wikipedia users (e.g., French, Spanish, but maybe not Chinese).
Interesting! I have a partially-built, related, tool, to extract "words" from e-books, so I could build flashcard lists and make sure I knew the majority of words that were used - most of them would be common words but every book has a decently-sized selection of specialised vocabulary. I did think about trying to get something fancy done with an LLM or an NLP for figuring out the separable verbs, but in the end, I took a very... brute-force approach, basically grabbing the final word in the "phrase", then prepending that to every word in the phrase one by one and asking "is this a known separable verb?" - I'm not sure how well it worked, but that's a different story.
This looks great! One feature request I'd make is to load up two versions of the same text in both your source and target languages to have them displayed side by side. Bonus would be to have the audiobook as well, or some sort of text to speech. Basically, L-R method (Assimil) [1] but for any book!
Looks amazing and I'm keen to try it out. However, I cannot find the sources or set up instructions anywhere on the linked page. Please add at least a link to the github repository to the page so people that stumble upon it can find their way
I've added Chinese as an "experimental" language. I couldn't find a dictionary for it yet, and it might need a custom font type. DeepL works as well. I will fix the font issue soon.
Regarding 2. i noticed that when learning french, once you know how to form proper questions, conversations take a quantum leap.
It also helps to learn the 30ish verbs that are the most important.
Good luck.
What about writing? With Korean, learning to sound out the characters is usually the first step, even if you don't know what the words mean. I'm trying to learn some Thai within the next 10 months or so and I heard it's the same way (learn to read first)
This clearly takes a lot of inspiration from LingQ, but fixes some of LingQ's more glaring challenges such as letting you use a real dictionary, instead of relying on definitions that were crowdsourced from other learners using the app. (And therefore full of quality problems an inaccuracies.) On the other hand, it sounds like some nice features aren't implemented yet, or maybe not even planned, so maybe LingQ is still a good option if you don't want to hassle with self-hosting a webapp or hunting down your own resources, and don't mind paying the subscription fee.
Easy importing of lessons from YouTube and Netflix, the built-in libraries of lessons, guidance on what content might be most appropriate to your current level based on known vocabulary, the mobile apps with playlists and audio player, things like that.
(Disclaimer: I haven't actually used LinguaCafe, but am a longtime LingQ user, so I'm not really making a fair comparison. I know LingQ's feature set much, much better.)
Last time I checked, it couldn’t handle expressions that are not just tokens one after the other. For example, German separable verbs. I tried fixing it here: https://news.ycombinator.com/item?id=38915786
If you're shopping around for new ways to learn languages from watching movies/tv, I'm working on another language learning application. I just wrote up the basic features this weekend [1]
We support many languages out of the box, would love to hear what's making you consider LinguaCafe over LingQ :)
I would be delighted to beta test. I would want to work on my spoken Portuguese so I could interact more naturally with my Brazilian colleagues.
I study French, German, Swedish, Mandarin, Japanese, Portuguese, Latin, dabbling in Polish, though it’s hard to find shows dubbed in Latin :). Someday will get to Russian and maybe learn Icelandic as a way of getting closer to the roots of English… but alas life is not forever.
I’ve written some LLM-based software for generating podcasts (www.anyglot.com, but the server currently offline). This project showed me that GPT4 is excellent at generating content in English, and in doing various NLP tasks but not translation which was better left for Google. ElevenLabs voices are fantastic but their Japanese would invent weird Kanji readings, tho that was when Multilingual V2 just came out so maybe they’ve fixed that already.
This is incredibly cool. I have been trying to do this exact workflow manually by reading something in Kindle and copy/pasting to DeepL and Anki and it sucks. If the author is here, I'm wondering if you would be open to PRs for other languages? I'd like to try this for French or Italian.
This looks like a well thought out. It's honestly something I've thought about trying to put together for myself due my own language learning effort. I'm looking forward to trying it out.
Looks good! I've been thinking about building something similar but as a desktop app (and maybe browser extension) which would work with whatever text you have on the screen. It seems doable by (ab)using the OS accessibility APIs. I find it hard to stick to importing text, reading them in the app, and marking the words. Having something which works in the background and can tell you where you've previously seen words in different contexts would be ideal for me.
Are you familiar with Language Reactor or Migaku? I think there are a couple others too. They're all implemented as browser extensions, but that works out pretty well because most content that's useful for language learning gets accessed through a browser these days, anyway.
I wasn't, they look interesting thank you, I'll try it out! Indeed the browser is the most important, even though it would be nice to have something generic for any app.
Super smart idea. I ostensibly do this sort of thing manually to expand my vocabulary in other languages I speak.
What does adding a new language involve other than adding a dictionary? It doesn't seem like there are too many language-specific features at first blush.
Having built a similar app, maybe I can comment on this.
1. Languages that have conjugation (I am -> you are -> she is...) need a way to recognize different forms of the same word, as dictionaries typically don't have this information.
2. In some languages, nearby words dramatically modify a word's meaning (in Spanish quito=I take away, me quito=I take off). The modifier words can appear quite far away in the sentence (especially in German I think).
3. Languages that don't put spaces between words are a nightmare.
4. Even for languages that do put spaces, the set of characters that act as a separator can differ slightly.
4. Languages like Chinese and Japanese with their enourmous 'alphabets' need UI changes to help learners with pronounciation of new characters.
5. Fonts and text entry! Does your framework support all 10-jillion Chinese characters and the 5+ different types of on-screen keyboards that people use? Did you remember that Arabic is read right-to-left?
Those are all of the main challenges I encounted implementing Spanish, French, Chinese and Japanese. I have no idea what new challenges would come up in Finish, Hindi, Swahili...
Looks great! By the way, I use Apple Books (also works in Kindle) to do something similar - if you press and highlight a section of text, you can translate it, which has done wonders for building my vocabulary in context.
Looks really promising! Well done. Would love to host an installation for our local learning community. Hopefully we’ll get multiple user accounts soon.
It’s great that you can track your progress in this app!
When I was learning German, I used the dictionary lookup on Kindle a lot and made a web app to extract that vocabulary as Anki flashcards. It’s available on https://fluentcards.com. The code is open source on GitHub.
Being the resident licensing pedant, I'll point out that neither of those repos have any licensing information aside from package.json and I doubt gravely that's strong enough for any contributor's comfort level
It specifically refers to client/server software where the server-side component is run within one's own server environment, and usually describes FOSS alternatives to SaaS webapps.
Normal desktop applications aren't self-hosted, as they aren't hosted in this sense to begin with.
I didn't think this many people would be interested. I'll write a guide for Jellyfin, then add Italian, French and Dutch languages tomorrow.