LinguaCafe: Self-hosted software for language learners to read foreign languages

simjanos-dev · on Jan 8, 2024

Hey guys. I wrote LinguaCafe. I didn't know it was posted here, I've just read it now.

I didn't think this many people would be interested. I'll write a guide for Jellyfin, then add Italian, French and Dutch languages tomorrow.

justinmayer · on Jan 8, 2024

As a credited contributor to the EDICT[1] Japanese/English dictionary, I am very pleased to see its successor JMdict[2] actively supported by this project. Bravo!

And as someone who now also speaks Italian, I am even more pleased to see that Italian support will be added tomorrow.

It is wonderful to see such a useful tool released as an open-source, self-hosted project. (^_^)

[1] EDICT: http://edrdg.org/jmdict/edict_doc_2009.html

[2] JMdict: https://en.wikipedia.org/wiki/JMdict

simjanos-dev · on Jan 8, 2024

Hi!

Thank you for helping me learn Japanese! :)

Can you please explain what do you mean by actively supporting JMDict? I hope I didn't make an attribution mistake, or misunderstood something. My understanding is that I can use those files in my project as long as I follow the license guidelines.

It makes me really happy that so many people are interested in it. :)

justinmayer · on Jan 8, 2024

Sorry for the confusing language choice on my part. I just meant that I think it's great that your project supports JMdict. I think how you are using JMdict is indeed totally okay! :^)

simjanos-dev · on Jan 8, 2024

Oh, okay. Thank you!

ipsi · on Jan 8, 2024

I think the Jellyfin integration could be more than just a niche feature. I've used https://www.languagereactor.com/, but that only supports Netflix & YouTube, which is a bit limiting.

Reasons it's useful: * If you've got both Native & Target Language subtitles, you can see a natural translation if you're struggling to understand something * If there isn't a Native translation, then you can machine-translate one - especially useful early on to catch common idioms/etc that aren't just the sum of each individual word. * Jellyfin also supports eBooks, although its reader isn't great - but if someone has already built their library, it would be nice to be able to re-use it somehow.

I would be very interested in seeing that particular feature expand, but I don't imagine it's at all simple!

Tangentially related, but I could see some desire for Calibre support as well, somehow. Calibre was very much designed to be completely stand-alone and it doesn't really support other apps trying to read its database, but it is possible.

I'd also really like some language-specific features, like separable-verb handling for German (see this comment: https://news.ycombinator.com/item?id=38915786) - it's relatively important and lacking support really limits the usefulness of vocab tools. It would also be a nightmare to handle for subtitles, since it's not always clear where a sentence ends, but such is life - subtitles are sadly not aimed at language leaners. For books and not-terrible Podcast transcripts, though, it wouldn't be so bad.

simjanos-dev · on Jan 9, 2024

Hi!

I thought of it as a niche feature because I thought most of the users would come from language learning communities, where most people are not into self-hosting. So even if someone would set up a server just for this, chances are they do not have or interested in Jellyfin also. But I've seen several comments about it, and it seems like a lot of people are from the self-hosting community so maybe it's more popular.

I'm also planning to support YouTube and improve on Jellyfin support, but I'll work on other issues and features first.

ipsi · on Jan 10, 2024

Well, part of it is being on Hacker News, which will definitely skew towards "self-host everything!", and on top of that Jellyfin is genuinely free and open-source while the more popular alternative (Plex) isn't, so probably more popular here again, and not necessarily reflective of the popularity amongst self-hosters in general!

I definitely wouldn't expect it to be high on the list of priorities, but I do appreciate that it's under consideration at the very least.

pm3003 · on Jan 8, 2024

Seems great, I'll test it soon !

I know Christmas is over, but my letter to Santa would include: - some Anki sync feature (over an external Anki sync server or any other solution) - a non-docker install guide - of course more languages!

I've been looking for a tool to study vocabulary this way, especially in languages I'm already fluent in, to learn more nuances or specific meanings to some words. Having tried several things I settled on the bookmark feature of my Wiktionary Android apps (Livio's, which are nice), and a small sync/script chain that would let me review words, compare definitions in different dictionaries, choose the best and edit/complete it, and make an Anki card of it. The whole process was still tedious.

qnleigh · on Jan 8, 2024

Very cool! How do you handle segmenting sentences into individual words in Japanese? I've been building a similar app for Android, but gave up on Japanese partly because segmenting was so unreliable.

simjanos-dev · on Jan 8, 2024

Hi! Thank you so much. I am using Spacy tokenizer with python.

jdeisenberg · on Jan 8, 2024

I don’t see a link on that page where I can download the software. (I am exceptionally slow-thinking today, so it may be in a very obvious place and I have overlooked it.)

BigElephant · on Jan 8, 2024

https://github.com/simjanos-dev/LinguaCafe?tab=readme-ov-fil...

nexawave-ai · on Jan 8, 2024

This is very cool. Yes, please add French and Dutch. Dankjewel!

simjanos-dev · on Jan 9, 2024

Hi!

French and Dutch are added.

kegs_ · on Jan 8, 2024

How are you using the service on a boox tablet? Follow-up, what kind of battery draw does it have on the tablet?

simjanos-dev · on Jan 8, 2024

I installed it on a PC, and access it from my tablet's browser. I do not know how much battery draw it has.

Beijinger · on Jan 8, 2024

Chinese please.

simjanos-dev · on Jan 9, 2024

Hi!

I've added Chinese. However i couldn't find a dictionary for it yet, and it might need a custom font for Chinese characters. DeepL works with it as well. If it has issues, I will fix it soon.

Beijinger · on Jan 14, 2024

Thank you so much. There is also Perapeara Dictionary Plug in for Firefox. Maybe can be exctracted.

simjanos-dev · on Jan 14, 2024

Hi!

I'll release a v0.4 update tomorrow or after. It makes a lot of things more simpler, I would recommend to wait for it before you install it. After that update I'll work on Chinese dictionaries and issues. It will take 1-2 days. It will have two built-in dictionaries for Chinese: cc-cedict and wiktionary.

allan_s · on Jan 9, 2024

I m on my phone but hou have cedict for chinese-english as well as cfdict for chinese to french

tenaf0 · on Jan 8, 2024

I have been working on a similar project on-and-off in my spare time, the only remotely interesting feature that other similar software may not have is that it actually tries to parse/analyze sentences (with an NLP lib). It's made specifically for German, and the reason why I wanted to make it is that no existing software managed to handle separable verbs properly - for example learning "Wir fangen jetzt an." is just wrong if you learn it as 'fangen' and 'an' separately, you actually care about 'anfangen', dictionary-wise.

It unfortunately does have false-positives (a complete solution would require LLMs, I believe over the much less complicated NLP algorithms - I just don't want to send whole books to ChatGPT, as that would quickly become expensive), but I found it usable, so I made it public now: https://github.com/tenaf0/lwt

I don't want to "advertise" it even more, as the NLP lib is run by academia as a free service, and I don't want to overburden it (I have been planning on hosting it myself, but didn't yet get there).

seabass-labrax · on Jan 8, 2024

You have my full support for your project, as I think natural language processing is a very exciting and underutilised technology for language learning. But if you want a low-tech solution, I've found Wiktionary to be ideal. Wiktionary has all the declensions and prefixes for German verbs; to use your example:

https://en.wiktionary.org/wiki/f%C3%A4ngt_an

tells you what the word is, and gives a link back to:

https://en.wiktionary.org/wiki/anfangen#German

I chose to add Wiktionary to Kiwix Android (8GB download) for offline use. In addition, I can search by right-clicking or tap+holding on a word. All that information is available because of the (mostly manual) work done by Wiktionary contributors, but it reaches a very high standard. There is usually more digression and explanation for the usage notes in Wiktionary than, say, Collins German-English dictionary, which is a rather good thing for language learners.

ipsi · on Jan 9, 2024

FWIW, English Wikitionary (appears to!) have fewer words than German Wiktionary. I've run into this trying to extract words from eBooks (then converting to the "base" form, to essentially de-duplicate). I think it's mostly compound or more niche words, but I imagine you'd still run into them at least occasionally with most written works.

There's a nice project for converting and extracting the data from English Wiktionary into JSON but it doesn't support any other languages, AFAIK, which is a bit of a shame but also not very surprising - Wiktionary is a lot more complex, technically, than I expected!

seabass-labrax · on Jan 9, 2024

Interesting to hear that - I'm still at the level of German where I wouldn't know what I'm missing. For clarification: are you saying that:

- the English Wiktionary has fewer English words than the German Wiktionary has German words, or

- the English Wiktionary has fewer German words than the German Wiktionary does?

ipsi · on Jan 10, 2024

The latter. I'm very definitely not at that level either, but looking at German words from books that couldn't be found on English Wiktionary, I was able to find them on German Wiktionary. One example would be "Weihnachtsfest" - not sure it's "officially" a compound word, though if you know "Weihnacht" and "Fest", then the meaning should be clear. In any case, it shows up as a single word and trying to "split" words made up of other words is an exercise in insanity.

Another example is "krächzender", which might also serve to give some idea of the particular pains in processing German text. It's not in English Wiktionary, but krächzen is, and is a verb. So "krächzender" is the adjectival form of the verb, and if you know "krächzen" and the general rules around adjective formation it would probably be obvious. But would you rely on a computer to parse those rules, or would you want a table with all the declensions laid out? And if you're building a vocab list for a book, is it a separate entry in the list, or does it fall under the verb?

Obviously, German Wiktionary only has definitions & explanations in German so it's not great for beginners, but any tool that's trying to automatically do stuff with German text would likely benefit from using German Wiktionary.

I have no idea if it's true for other languages, but I wouldn't be surprised if it's also true for other major languages spoken by Wikipedia users (e.g., French, Spanish, but maybe not Chinese).

ipsi · on Jan 8, 2024

Interesting! I have a partially-built, related, tool, to extract "words" from e-books, so I could build flashcard lists and make sure I knew the majority of words that were used - most of them would be common words but every book has a decently-sized selection of specialised vocabulary. I did think about trying to get something fancy done with an LLM or an NLP for figuring out the separable verbs, but in the end, I took a very... brute-force approach, basically grabbing the final word in the "phrase", then prepending that to every word in the phrase one by one and asking "is this a known separable verb?" - I'm not sure how well it worked, but that's a different story.

jchook · on Jan 8, 2024

You could potentially use an NLP library like SpaCy, or even bundle with a free fine-tuned LLM like Mistral 7b.

The fine-tuned mistral models are known to out-perform GPT-4 on their specific tasks.

anadem · on Jan 8, 2024

I'd love to try it, but is there a way to get it? Maybe I'm missing a link but I can't figure out a way to try it.

Ah, found, it's here: https://github.com/simjanos-dev/LinguaCafe

i_am_a_squirrel · on Jan 8, 2024

+1 OP should add a link!

mtalantikite · on Jan 8, 2024

This looks great! One feature request I'd make is to load up two versions of the same text in both your source and target languages to have them displayed side by side. Bonus would be to have the audiobook as well, or some sort of text to speech. Basically, L-R method (Assimil) [1] but for any book!

[1] https://learnanylanguage.fandom.com/wiki/Listening-Reading_M...

seusscat · on Jan 8, 2024

Looks amazing and I'm keen to try it out. However, I cannot find the sources or set up instructions anywhere on the linked page. Please add at least a link to the github repository to the page so people that stumble upon it can find their way

yellow_lead · on Jan 8, 2024

Looks great. I would love to give it a try if it had Chinese support. The japanese support looks good though, maybe another reason to try learning.

simjanos-dev · on Jan 9, 2024

Hi!

I've added Chinese as an "experimental" language. I couldn't find a dictionary for it yet, and it might need a custom font type. DeepL works as well. I will fix the font issue soon.

yellow_lead · on Jan 9, 2024

Thanks! I'll give it a try. There's some various dictionaries I can recommend later, maybe in a GitHub issue.

zerop · on Jan 8, 2024

My method of learning new languages is always by starting to learn everyday conversations in the language.

1. Learn the translation of the commonly used everyday words

2. Learn the rule to build sentences in different tenses (Verb conjugation)

3. Keep practising in everyday conversations, starting with most simple ones and gradually learn more.

Tepix · on Jan 8, 2024

Regarding 2. i noticed that when learning french, once you know how to form proper questions, conversations take a quantum leap. It also helps to learn the 30ish verbs that are the most important. Good luck.

novagameco · on Jan 8, 2024

What about writing? With Korean, learning to sound out the characters is usually the first step, even if you don't know what the words mean. I'm trying to learn some Thai within the next 10 months or so and I heard it's the same way (learn to read first)

bunderbunder · on Jan 8, 2024

Very nice.

This clearly takes a lot of inspiration from LingQ, but fixes some of LingQ's more glaring challenges such as letting you use a real dictionary, instead of relying on definitions that were crowdsourced from other learners using the app. (And therefore full of quality problems an inaccuracies.) On the other hand, it sounds like some nice features aren't implemented yet, or maybe not even planned, so maybe LingQ is still a good option if you don't want to hassle with self-hosting a webapp or hunting down your own resources, and don't mind paying the subscription fee.

All in all, though, it looks very promising!

barrell · on Jan 8, 2024

I’d be curious to hear what niceties you feel LingQ has that it’s missing if you don’t mind sharing

bunderbunder · on Jan 8, 2024

Easy importing of lessons from YouTube and Netflix, the built-in libraries of lessons, guidance on what content might be most appropriate to your current level based on known vocabulary, the mobile apps with playlists and audio player, things like that.

(Disclaimer: I haven't actually used LinguaCafe, but am a longtime LingQ user, so I'm not really making a fair comparison. I know LingQ's feature set much, much better.)

tenaf0 · on Jan 8, 2024

Last time I checked, it couldn’t handle expressions that are not just tokens one after the other. For example, German separable verbs. I tried fixing it here: https://news.ycombinator.com/item?id=38915786

tenaf0 · on Jan 8, 2024

(Misunderstood the question, please ignore my above comment)

gstamp · on Jan 9, 2024

Importing audio along with the text is a big one. Failing that some sort of TTS would be very useful.

yurishimo · on Jan 8, 2024

This looks sweet! The jellyfin integration especially looks awesome as I find watching videos an excellent way to actively absorb new vocabulary.

My current self study centers around movies/tv and Linq, which this tool seems very similar to.

I'm learning Dutch, so it's a bummer that it's not supported currently, but I'm keen to dig in and see how much effort it takes to add a new language.

barrell · on Jan 8, 2024

If you're shopping around for new ways to learn languages from watching movies/tv, I'm working on another language learning application. I just wrote up the basic features this weekend [1]

We support many languages out of the box, would love to hear what's making you consider LinguaCafe over LingQ :)

[1] https://blog.phrasing.app/phrasing-first-look/

BigElephant · on Jan 8, 2024

Hello, when do you plan on launching the beta version?

barrell · on Jan 9, 2024

I'm onboarding my second cohort of beta applicants this week, I hope to have a public beta by the end of the quarter or so!

I noticed a lot of signups from this post, I'll try to do another round of onboarding this month.

clbrmbr · on Jan 9, 2024

I would be delighted to beta test. I would want to work on my spoken Portuguese so I could interact more naturally with my Brazilian colleagues.

I study French, German, Swedish, Mandarin, Japanese, Portuguese, Latin, dabbling in Polish, though it’s hard to find shows dubbed in Latin :). Someday will get to Russian and maybe learn Icelandic as a way of getting closer to the roots of English… but alas life is not forever.

I’ve written some LLM-based software for generating podcasts (www.anyglot.com, but the server currently offline). This project showed me that GPT4 is excellent at generating content in English, and in doing various NLP tasks but not translation which was better left for Google. ElevenLabs voices are fantastic but their Japanese would invent weird Kanji readings, tho that was when Multilingual V2 just came out so maybe they’ve fixed that already.

barrell · on Jan 16, 2024

If you apply for the beta, you should get an invite sometime this month :) or feel free to reach out to me directly ben [at] phrasing [dot] app

seusscat · on Jan 8, 2024

I agree there. The Jellyfin integration here is the absolute killer feature. I hope to see the first documentation on how to set it up soon.

simjanos-dev · on Jan 9, 2024

Hi!

I've added Dutch.

burkaman · on Jan 8, 2024

This is incredibly cool. I have been trying to do this exact workflow manually by reading something in Kindle and copy/pasting to DeepL and Anki and it sucks. If the author is here, I'm wondering if you would be open to PRs for other languages? I'd like to try this for French or Italian.

simjanos-dev · on Jan 9, 2024

Hi!

I've added French and Italian.

burkaman · on Jan 9, 2024

Amazing, thank you! I'll try it out later this week.

mionhe · on Jan 8, 2024

This looks like a well thought out. It's honestly something I've thought about trying to put together for myself due my own language learning effort. I'm looking forward to trying it out.

tracnar · on Jan 8, 2024

Looks good! I've been thinking about building something similar but as a desktop app (and maybe browser extension) which would work with whatever text you have on the screen. It seems doable by (ab)using the OS accessibility APIs. I find it hard to stick to importing text, reading them in the app, and marking the words. Having something which works in the background and can tell you where you've previously seen words in different contexts would be ideal for me.

bunderbunder · on Jan 8, 2024

Are you familiar with Language Reactor or Migaku? I think there are a couple others too. They're all implemented as browser extensions, but that works out pretty well because most content that's useful for language learning gets accessed through a browser these days, anyway.

tracnar · on Jan 8, 2024

I wasn't, they look interesting thank you, I'll try it out! Indeed the browser is the most important, even though it would be nice to have something generic for any app.

wahnfrieden · on Jan 8, 2024

My Manabi Reader app is a browser

https://reader.manabi.io

wahnfrieden · on Jan 8, 2024

I made an iOS / macOS app with similar functionality, Manabi Reader. It has its own flashcard companion app and also integrates with Anki.

https://reader.manabi.io

Japanese only but I am expanding it to more languages early this year.

jszymborski · on Jan 9, 2024

Super smart idea. I ostensibly do this sort of thing manually to expand my vocabulary in other languages I speak.

What does adding a new language involve other than adding a dictionary? It doesn't seem like there are too many language-specific features at first blush.

qnleigh · on Jan 11, 2024

Having built a similar app, maybe I can comment on this.

1. Languages that have conjugation (I am -> you are -> she is...) need a way to recognize different forms of the same word, as dictionaries typically don't have this information. 2. In some languages, nearby words dramatically modify a word's meaning (in Spanish quito=I take away, me quito=I take off). The modifier words can appear quite far away in the sentence (especially in German I think). 3. Languages that don't put spaces between words are a nightmare. 4. Even for languages that do put spaces, the set of characters that act as a separator can differ slightly. 4. Languages like Chinese and Japanese with their enourmous 'alphabets' need UI changes to help learners with pronounciation of new characters. 5. Fonts and text entry! Does your framework support all 10-jillion Chinese characters and the 5+ different types of on-screen keyboards that people use? Did you remember that Arabic is read right-to-left?

Those are all of the main challenges I encounted implementing Spanish, French, Chinese and Japanese. I have no idea what new challenges would come up in Finish, Hindi, Swahili...

outside1234 · on Jan 8, 2024

Looks great! By the way, I use Apple Books (also works in Kindle) to do something similar - if you press and highlight a section of text, you can translate it, which has done wonders for building my vocabulary in context.

znjxz · on Jan 10, 2024

Thank you very much. I think this project would be great for my son who just started elementary school to learn English. Please add English.

qnleigh · on Jan 11, 2024

What's his native language? One would need a dictionary going from English into that.

z3n0n · on Jan 8, 2024

Looks really promising! Well done. Would love to host an installation for our local learning community. Hopefully we’ll get multiple user accounts soon.

huimang · on Jan 8, 2024

This is something I've thought about building for a while.

I would definitely use it if there were a Korean option!

simjanos-dev · on Jan 9, 2024

Hi!

I've added Korean as an "experimental" language. You can try it out if you would like to.

AlchemistCamp · on Jan 9, 2024

Interesting. This looks like an open source version of Lingq, minus the content library.

katspaugh · on Jan 8, 2024

Nice work!

It’s great that you can track your progress in this app!

When I was learning German, I used the dictionary lookup on Kindle a lot and made a web app to extract that vocabulary as Anki flashcards. It’s available on https://fluentcards.com. The code is open source on GitHub.

mdaniel · on Jan 8, 2024

because I didn't see any obvious link to said repo, for convenience: https://github.com/katspaugh/fluentcards and https://github.com/katspaugh/fluentcards-grammar

Being the resident licensing pedant, I'll point out that neither of those repos have any licensing information aside from package.json and I doubt gravely that's strong enough for any contributor's comfort level

katspaugh · on Jan 9, 2024

Added licenses, thanks!

acheong08 · on Jan 8, 2024

Nice! I was just trying to learn Japanese this week but Duolingo is painful

CyberDildonics · on Jan 8, 2024

Wouldn't "self hosted software" just be software?

s0ss · on Jan 8, 2024

Sure. But the clarifying words are useful in this context. It’s software; a web app that you can host yourself.

Gormo · on Jan 8, 2024

It specifically refers to client/server software where the server-side component is run within one's own server environment, and usually describes FOSS alternatives to SaaS webapps.

Normal desktop applications aren't self-hosted, as they aren't hosted in this sense to begin with.

davely · on Jan 8, 2024

I found the additional context to be helpful in quickly understanding how / where I could run it.