Thanks for working on that! Didn't know it was so bad. The following is a possibly stupid idea, but I'd like to hear your thoughts:
What if you just render the content into HTML and then "screen scraped" the text, and then convert into a more useful format (MarkDown, JSON, etc). Is that plausible?
That would allow a basic UI change on Wikipedia to break your code. Sometimes it is necessary, but not usually the best option in my experience, and it's pretty annoying to do.
You can get what amounts to an HTML dump (which is then indexed and compressed in a single huge archive) from Kiwix. Although they do them basically twice a year or so.
What if you just render the content into HTML and then "screen scraped" the text, and then convert into a more useful format (MarkDown, JSON, etc). Is that plausible?