Hacker News new | past | comments | ask | show | jobs | submit login

I was very actively involved in MediaWiki development & Wikimedia ops (less so though) in 2004-2006 back when IIRC there were just 1-4 paid Wikimedia employees.

It was a very different time, and the whole thing was run much more like a typical open source project.

I think the whole project has gone in completely the wrong direction since then. Wikipedia itself is still awesome, but what's not awesome is that the typical reader / contributor experience is pretty much the same as it was in 2005.

Moreover, because of the growing number of employees & need for revenue the foundation's main goal is still to host a centrally maintained site that must get your pageviews & donations directly.

The goal of Wikipedia should be to spread the content as far & wide as possible, the way OpenStreetMap operates is a much better model. Success should be measured as a function of how likely any given person is to see factually accurate content sourced from Wikipedia, and it shouldn't matter if they're viewing it on some third party site.

Instead it's still run as one massive monolithic website, and it's still hard to get any sort of machine readable data out of it. This IMO should have been the main focus of Wikimedia's development efforts.




> the foundation's main goal is still to host a centrally maintained site

Wikimedia universe is way bigger than one site. There's Wikidata, Commons, Wikisource, Wiktionary, Wikivoyage, Wikibooks and so on. And there's a lot of language versions too - English is not the only way to store knowledge, you know.

> The goal of Wikipedia should be to spread the content as far & wide as possible

The requires a) creating the content and b) presenting the content in the form consumable by the users. Creating tools for this is far from trivial, especially if you want it to be consumable and not just unpalatable infodump accessible only to the most determined.

> Instead it's still run as one massive monolithic website

This is not accurate. A lot has changed since 2004. It's not one monolithic website, it's a constellation of dozens, if not hundreds, of communities. They are using common technical infrastructure (datacenters, operations, etc.) and common software (Mediawiki, plus myriad of extensions for specific projects), but they are separate sites and separate communities, united by common goal of making knowledge available to everyone.

> it's still hard to get any sort of machine readable data out of it

Please check out Wikidata. This is literally the goal of the project. You can also be interested in "structured Commons" and "structured Wiktionary" projects, both in active development as we speak.

> This IMO should have been the main focus of Wikimedia's development efforts.

It is. One of focuses - for a project of this size, there's always several directions. BTW, right now Wikimedia is in the process of formulating movement strategy, with active community participation. You are welcome to contribute: https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/...

Disclosure: working for WMF, not speaking for anybody but myself.


> The goal of Wikipedia should be to spread the content as far & wide as possible

>> The requires a) creating the content and b) presenting the content in the form consumable by the users. Creating tools for this is far from trivial, especially if you want it to be consumable and not just unpalatable infodump accessible only to the most determined.

Yes, and as emphasized in the article, WMF has done a terrible job at building better tools. For crying out loud, we are still typing in by hand the complete bibliographic information for each cited reference.

Your other comments are similar. The fact that "WMF is trying", or have a named task force whose formal mission includes a complaint, is not enough justify years of high spending.


> Yes, and as emphasized in the article, WMF has done a terrible job at building better tools.

I respectfully disagree. I think WMF has done pretty good job. Could it be better? Of course, everything could. Is it "terrible"? not even close.

> For crying out loud, we are still typing in by hand the complete bibliographic information for each cited reference.

https://www.mediawiki.org/wiki/Citoid ? In any case "it misses my pet feature" and "the whole multi-year effort is terrible" are not exactly the same thing.

> is not enough justify years of high spending.

I think the work that has been done and is being done justifies it. All this work is publicly documented. You think it's too much and you have the ideas how to do it better - you're welcome to comment. I can not comment on your value judgements - you may seem some projects are more valuable and not done, you are entitled to it. There's a process which gets some things done and some things left out, and by nature not everybody will be satisfied. I only want to correct completely factually false claims in the Op-ed, and I believe I have done so. If I can help with more information, you are welcome to ask. As for value judgements, I think we'd have to agree to disagree here.


> In any case "it misses my pet feature" and "the whole multi-year effort is terrible" are not exactly the same thing.

It's clear from context that this is just an example. The issues with the Wikipedia editing UI are legion and described in many other places.

> You think it's too much and you have the ideas how to do it better - you're welcome to comment

Clean house. Put the people who built Zotero in charge.


> The issues with the Wikipedia editing UI are legion

Any existing UI can be analyzed to find a legion issues, no UI is ever perfect, especially over time and changing requirements. Wikipedia UI is certainly not perfect, and much work is to be done (and being done), but I would stop very far from calling the work that was already done "terrible".

> Clean house. Put the people who built Zotero in charge.

Err, I am having hard time making sense of this advice - why exactly people who built a reference management software must be running Wikimedia Foundation?


> I would stop very far from calling the work that was already done "terrible".

You already declared you weren't going to debate me on this point, so I don't know why you're bringing it up again, especially since you're not saying anything substantive.

> why exactly people who built a reference management software must be running Wikimedia Foundation?

Because they are philanthropically funded non-profit who build great academic/research software on a small budget while responding rapidly to user feedback.

If your objections center around the fact that WMF does a lot more than develop Wikipedia software, then you are missing the whole point of this thread: that WMF's primary contribution is Wikipedia, and almost everything else is secondary. So long as it's being funded by private citizens because of the value they get from Wikipedia, then this should be the focus. Yes, that means the people running Wikipedia conferences and local meetups will have less power.


> If your objections center around the fact that WMF does a lot more than develop Wikipedia software

WMF does a lot more than develops one piece of software to manage citations, yes. Nothing wrong with the software, I'm sure people who made it are awesome. But it's like discovering US federal government didn't solve a problem with a faulty light on your street and proposing that an electrician that did should thus be the President of the USA. Nothing wrong with the electrician or fixing the light, and maybe he'd even be a great President, but that in no way follows from his ability to fix the light. That's just completely unrelated things.

> that WMF's primary contribution is Wikipedia, and almost everything else is secondary

Not so for some time. Also, Wikipedia as a project is way bigger than just software.

> So long as it's being funded by private citizens because of the value they get from Wikipedia, then this should be the focus.

It is. I mean the value and improving it (again, if we correct from Wikipedia to "Wiki sites to gather and disseminate knowledge", which are more that just Wikipedia). But opinion on how to improve that value may not only be "improve this one particular feature".

> Yes, that means the people running Wikipedia conferences and local meetups will have less power.

Than who? And why? There are processes that decide which directions are prioritized and which are not. Right now two of them are happening as we speak - board elections and strategy consultation. Any decision that happens leaves somebody unsatisfied, because it's not possible to satisfy everyone. That doesn't mean everything is terrible, sorry.


> > that WMF's primary contribution is Wikipedia, and almost everything else is secondary

> Not so for some time. Also, Wikipedia as a project is way bigger than just software.

well my donation certainly is aimed that way and the insistent nagscreen certainly made me think "yeah I don't want this resource to go away"

and that is Wikipedia. I occasionally use some of the other wikimedia projects, but they should be secondary, it's definitely specifically that one great body of knowledge that got me to donate.

wiktionary is the project I use second most. if they were to beg for donations or else it may go away, I'd be like "eh"

it's Wikipedia only that got me "no wait this is super important, take my money" every year.


> But it's like discovering US federal government didn't solve a problem with a faulty light on your street...

I was really confused by this comment until I realized you thought I was suggesting Zotero run things because they power Citoid. In fact, as any of the people I eat lunch with can tell you, I have been singing the praises of Zotero for years.

http://blog.jessriedel.com/2014/11/12/zotero-is-great-tex-sh...

The fact that Citoid is very flawed but the part of it that actually works is made by Zotero was merely delicious coincidence.

Your remaining comments then do a better job than I could possibly hope of illustrating exactly the pathological attitude that afflict non-profits. The whole point of my criticism, which is partially shared by the OP and many others in this thread, is that the proper focus of WMF is determined by the people who donate their money and, especially, their time. (That's a normative claim.) That fact that you responded to these points by sayings "No, actually, we at the WMF have expanded well beyond such trifling concerns as the base functionality of Wikipedia" perfectly captures this destructive mindset.

> if we correct from Wikipedia to "Wiki sites to gather and disseminate knowledge", which are more that just Wikipedia

Incorrect. For instance, I use and love Wikivoyage, but I do not pretend that the millions of people who donate to Wikipedia intend to subsidize it! Yes, if Wikivoyage ends up better off through Wikipedia-financed improvements on the general Wikimedia software, all the better. But my friends should not be made to feel like Wikipedia will shutdown if he doesn't donate yearly just so WMF can hold more conferences.

> But opinion on how to improve that value may not only be "improve this one particular feature".

Again, for the second time, the comment on the antiquated citation process was an illustrative example. I have resisted diving into the millions of issues with Wikipedia's software.

>> Yes, that means the people running Wikipedia conferences and local meetups will have less power.

> Than who?

Well, not "less power than other people", but "less power than they did before", i.e., fewer resources and less influence. (WMF can simply get smaller, so that no one gets more power.) But for clarity, I'm happy to suggest that more institutional power within WMF should be given to technical people, to (say) Zotero staff or other people from software non-profits with a better track record, and to anyone who internalizes the idea that the non-profit exists to as a servant to the people who donate time and money.

> There are processes that decide which directions are prioritized and which are not.

Oh thank goodness! There are processes! Just like there are processes for new Wikipedians to dispute the deletion of content they write.

I guess so long as a nation is nominally democratic we don't ever have to worry about it being badly run. And if anyone complains, we can just say they should vote and be satisfied. After, we can't make everyone happy, so if people are unhappy there's no reason to worry about it!


>> that WMF's primary contribution is Wikipedia, and almost everything else is secondary

> Not so for some time.

Oh yeah? To me, as Johnny Q. Public, definitely so, now as always.

Could well be that you're doing Crom-knows-what too, nowadays -- but who cares? Why should we?

> Also, Wikipedia as a project is way bigger than just software.

Yup, you're right there: It's all about the knowledge, the actual _content_ of the on-line encyclopedia.

Which worked prefectly fine with the software of ca 2004, so why waste millions and millions to, AFAICS, hardly any benefit at all compared to that?


> Which worked prefectly fine with the software of ca 2004, so why waste millions

Because it's not 2004 anymore. What worked perfectly in 2004 (which btw it didn't, people complained back then no less than they do now), doesn't work that perfectly now. 10 years is a long time on the Internet, and the project has grown since then.


A) Oh bullpucky. It worked well enough _for the essentials._ The only important sense that "the project has grown since then" is the number of WP pages and pageviews, but all your visual editors and umpteen new wikithis and wikithat projects haven't changed how that works in any significant way.

B) You forgot to answer the primary question: What is there to say that all the other shtuff the WMF is doing lately isn't secondary to Wikipedia, that Wikipedia isn't by far its most significant product and project? Do you have any actual support for your thesis that this is not the case ("Not so for some time"); _who does_ actually care about all that -- besides yourselves! -- and _why should_ Johnny Q. Eyeball care about any of it?


>the whole point of this thread: that WMF's primary contribution is Wikipedia, and almost everything else is secondary. So long as it's being funded by private citizens because of the value they get from Wikipedia, then this should be the focus.

I have to agree. I keep seeing pleas for donations on Wikipedia when I browse it, but now that I'm reading that they're spending most of that money on other bullshit besides Wikipedia itself, that means I no longer feel any need or duty to donate. I don't use all that other stuff, nor do I care about it, I only care about Wikipedia itself. Surely I'm not the only person who feels this way; anyone reading this article is going to see all the largesse that WMF is spending on, and many are going to question these donation pleas, which likely means donations are going to fall.


"One massive monolithic website" is, I think, meant to be read as referring to the WMF sites being a thing that have a shared telecommunications Single Point of Failure—a "choke point" where a given piece of information can only get from a given WMF site, to a user, by travelling through WMF-managed Internet infrastructure.

Remember Napster, back in the day? It was able to be shut down because it had an SPOF: Napster-the-corporation owned and maintained all the "supernodes" that formed the backbone of the network.

Or consider the Great Firewall of China. If the Great Firewall can block your site/content entirely with a single rule, you have an SPOF.

The answer to such problems isn't simple sharding-by-content-type into "communities" like you're talking about; this is still centralized, in the sense of "centralized allocation."

Instead, to answer such problems, you need true distribution. This can take the form of protocols allowing Wiki articles to be accessed and edited in a peer-to-peer fashion with no focal point that can be blocked; this can take the form of Wikipedia "apps" that are offline-first, such that you can "bring Wikipedia with you" to places where state actors would rather you don't have it; this can take the form of preloaded "Wikipedia mirror in a box" appliances (plus a syncing logistics solution, ala AWS Snowball) which can be used by local libraries in countries with little internet access to allow people there access to Wikipedia.


> WMF sites being a thing that have a shared telecommunications Single Point of Failure

In fact, one of the long-term projects in WMF is making sure the infrastructure is resistant to single-point-of-failure problems - up to whole data center going down. We are pretty close to it (not sure if 100%, but if not close to it). Of course, if you consider existence of WMF to be point of failure, it's another question, by that logic existence of Wikipedia can be treated as single point of failure too. Anybody is welcome to create a new Wikipedia, but that's certainly not a point of criticism towards WMF.

> It was able to be shut down because it had an SPOF: Napster-the-corporation owned and maintained all the "supernodes"

WMF does not own the content or the code, both are in open access and extensively mirrored. WMF does own the hardware - I don't think there's a way to do anything about it, unless somebody wants to donate a data center :)

> If the Great Firewall can block your site/content entirely with a single rule, you have an SPOF.

True, though there are ways around it. Currently witnesses with Turkey blocking Wikipedia. See e.g. https://github.com/ipfs/distributed-wikipedia-mirror for ways around it.

> Instead, to answer such problems, you need true distribution.

I am skeptical about the possibility of making community work using "true distribution". Even though we have good means to distribute hardware and software, be it production or development code, we still do not have any ways to make a community without having gathering points. I won't say it is impossible. I'd say I have yet to see anybody having done it. But if somebody wants to try, all power to them. You can read more about Wikimedia discussions on the topic here: https://strategy.wikimedia.org/wiki/Proposal:Distributed_Wik...

> such that you can "bring Wikipedia with you"

That already exists, there are several offline wikipedia setups and projects: https://strategy.wikimedia.org/wiki/Offline/List_of_Offline_...

> this can take the form of preloaded "Wikipedia mirror in a box" appliances

We are pretty close to this - you can install working Mediawiki setup very quickly (vagrant or I think there are some other containers too, I use vagrant), dumps are there. Won't be 100% copy of true site since there are some complex operational structures that ensure caching, high availability, etc. which kinda hard to put into a box - they are in public (mostly as puppet recipes) but implementing them is not out-of-the-box experience. But you can make a simple mirror with relatively low effort (probably several hours excluding time to actually load the data, that depends on how beefy your hardware is :)

Most of this, btw, is made possible by the work of WMF Engineers :)


> We are pretty close to this ... [things you'd expect ops staff to do]

That doesn't come close at all, from the perspective of a librarian who wants a "copy of Wikipedia" for their library, no? It assumes a ton of IT knowledge, just from the point where you need to combine software with hardware with database dumps.

The average library staff who'd want to set this up in some African village would be less on the side of the knowledge spectrum of "knows what to do with a VM image", and more toward the side of "can plug in and go through the configuration wizard for a NAS/router/streaming box."

Once I can tell such a person to buy some little box with a 4TB hard disk inside it, that you plug in, go to the URL printed on the top, and there Wikipedia is—and then it can keep itself up to date, with a combination of "large patches that get mailed on USB sticks that you plug in, wait, and then drop back into the mail", and critical quick updates to text content for WikiNews et al that it can manage to do using a 20kbps line that's only on for two hours per day—then you'll have something.


I presume you have you tried Kiwix? For less than $100, you can install the full Wikipedia (with reduced size graphics) on a cheap Android tablet with a 64GB card. The installation the first time is a little clumsy, but the experience once it's local is solid: http://www.kiwix.org/downloads/.

I don't think "critical updates" are really that necessary. Swapping SD cards a couple times a year would solve most of it. I think it's pretty incredible (and useful) to to be able to have access to all that information for such a low cost even if it's a few months (or even years) out of date.


> That doesn't come close at all, from the perspective of a librarian who wants a "copy of Wikipedia" for their library, no?

Depends what you mean by copy. If it's just a static data source, any offline project would do it. If it has to update it's trickier, but some offline projects do it too. If you want to run a full clone of Web's fifth popular website, yes, it requires some effort. Sorry, no magic here :)

> "can plug in and go through the configuration wizard for a NAS/router/streaming box."

There are boxes that are integrated with one or another of the offline projects. There's also Wikipedia Zero - which in the world where mobile coverage is becoming more and more widespread even in poor regions, may be even better alternative.


> and it's still hard to get any sort of machine readable data out of it.

Huh? https://dumps.wikimedia.org/

Doesn't that qualify?


That gives you Wikitext encapsulated in XML. How do you get at the content of the Wikitext?

I work on a Wikitext parser [1]. So do many other people, in different ways. Wikitext syntax is horrible and it mixes content and presentation indiscriminately (for example, it contains most of HTML as a sub-syntax).

The problem is basically unsolvable, as the result of parsing a Wiki page is defined only by a complete implementation of MediaWiki (with all its recursively-evaluated template pages, PHP code, and Lua code), but if you run that whole stack what you get in the end is HTML -- just the presentation, not the content you presumably wanted.

So people solve various pieces of the problem instead, creating approximate parsers that oversimplify various situations to meet their needs.

One of these solutions is DBPedia [2], but if you use DBPedia you have to watch out for the parts that are false or misleading due to parse errors.

[1] https://github.com/LuminosoInsight/wikiparsec

[2] http://wiki.dbpedia.org/


"That gives you Wikitext encapsulated in XML."

avar: "The goal of Wikipedia should be to spread the content as far & wide as possible, the way OpenStreetMap operates is a better model."

I am confused.

Doesn't OSM data come encapsulated in XML or some binary format?

As for dispersion of content, I could have sworn I have seen Wikipedia content on non-Wikipedia websites. Is there some restriction that prohibits this?

I have seen Wikipedia data offered in DNS TXT records as well.


For each article there is some metadata, but the entire text of an article is just a blob inside one XML element.

For anyone who has not worked with the Wikipedia data dumps extensively before, trust us that it is not easily machine-readable and that even solutions like DBPedia / Wikidata are not yet suitable for many purposes.


As someone who contributes to many knowledge projects, including Wikipedia and Wikidata frequently, I'm curious about what you mean that Wikidata is not yet suitable for any purposes. Am I wasting my time contributing to it? I thought that it was helping a lot of machines understand data. Can you please explain further?


Please reread, for many purposes! I love Wikipedia.

The Wiki markup is extremely complicated and being user created, it is also inconsistent and error prone. I believe the MediaWiki parser itself is something like a single 5000 line PHP function! All of the alternate parsers I've tried are not perfect. There is a ton of information encoded in the semi-structured markup, but it's still not easy to turn that into actual structured data. That's where the problem lies.


> believe the MediaWiki parser itself is something like a single 5000 line PHP function!

It's not. I'm on mobile so not easiest to link, but the PHP versio of the parser is nothing like a single function. There is also a nodejs version of the parser under active development with the goal of replacing the php parser.


Thanks, I had heard that somewhere but stand corrected.


"... into actual structured data."

Would there be some particular structure that everyone would agree on?

Alternatively, what is the desired structure you want?

Because the current format is so messy, I just focus on what I believe is most important: titles and externallinks. IMO, often the most interesting content in an article is lifted from content found via the external links. I also would like to capture the talk pages. Maybe just the contributing usernames and IP addresses.

Opinions or explanations that have no supporting reference are inexpensive. One can always these for free on the web. No problem recruiting "contributors" for that sort of "content".

Back to the question: I am curious what structure would you envision would be best for Wikipedia data? Assume hypothetically that a "perfect" parser has been written for you to do the transformation.


The structure I need for my particular project (ConceptNet) is:

* The definitions from each Wiktionary entry.

* The links between those definitions, whether they are explicit templated links in a Translation or Etymology section, or vaguer links such as words in double-brackets in the definition. (These links carry a lot of information, and they're why I started my own parser instead of using DBnary.)

* The relations being conveyed by those links. (Synonyms? Hypernyms? Part of the definition of the word?)

* The links should clarify the language of the word they are linking to. (This takes some heuristics and some guessing so far, because Wiktionary pages define every word in any language that uses the same string of letters, and often the link target doesn't specify the language.)

* The languages involved should be identified by BCP 47 language codes, not by their names, because names are ambiguous. (Every Wiktionary but the English one is good at this.)

There are probably analogous relations to be extracted from Wikipedia, but it seems like an even bigger task whose fruit is higher-hanging.

Don't get me wrong: Wiktionary is an amazing, world-changing source of multilingual knowledge. Wiktionary plus Games With A Purpose are most of the reason why ConceptNet works so well and is mopping the floor with word2vec. And that's why I'm so desperate to get at what the knowledge is.


I don't think you are using this in the way it was meant to be used. Wikipedia is a user edited, human centered project. Humans are error prone and that's something that you are going to have to live with if you want to re-purpose the data.

The burden of repurposing falls on you and wikipedia makes the exact same data that they have at their disposal available to you, to expect it in a more structured format that is usable by you and your project but that goes beyond what Wikipedia needs in order to function is asking for a bit much I think.

They make the dumps available, they make the parser they use available, what more could you reasonably ask for that does not exceed the intended use case for Wikipedia?

Afaics any work they do that increases the burden on Wikipedia contributors that would make your life easier would be too much.

But since you are already so far along with this and you have your parser, what you could do is to re-release your own intermediary format dumps that would make the lives of other researchers easier.


Yeah, I understand that. I'm re-purposing the data and it's my job to decide how that works.

But this could be easier. What I hate about Wikimedia's format is templates. They are not very human-editable (try editing a template sometime; unless you're an absolute pro, you will break thousands of articles and be kindly asked to never to do that again) and not very computer-parseable. They're just the first thing someone thought of that worked with MediaWiki's existing feature set and put the right thing on the screen.

Replacing templates with something better -- which would require a project-wide engineering effort -- could make things more accessible to everyone.

FWIW, I do make the intermediate results of my parser downloadable, although to consider them "released" would require documenting them. For example: [1]

[1] https://s3.amazonaws.com/conceptnet/precomputed-data/2016/wi...


Agreed, editing anything more complex than a simple text i.e. a table or some note is a shore. And I'm an advanced user!


The GP said Wikidata isn't suitable for many purposes, different from any.

It's a nice agreed-upon vocabulary for linked data. But you still need the data that the vocabulary refers to. The information you can get without ever leaving the Wikidata representation is still too sparse.


He's saying that Wikipedia doesn't give you clean, usable data, it gives you data with weird markup everywhere.


Thanks for working on that! Didn't know it was so bad. The following is a possibly stupid idea, but I'd like to hear your thoughts:

What if you just render the content into HTML and then "screen scraped" the text, and then convert into a more useful format (MarkDown, JSON, etc). Is that plausible?


That would allow a basic UI change on Wikipedia to break your code. Sometimes it is necessary, but not usually the best option in my experience, and it's pretty annoying to do.


Wikipedia used to do HTML dumps but stopped a long time ago, unfortunately.


You can get what amounts to an HTML dump (which is then indexed and compressed in a single huge archive) from Kiwix. Although they do them basically twice a year or so.


You should have a look at [1] that outputs an HTML rendering of pages with a lot of metadata.

[1] https://en.wikipedia.org/api/rest_v1/#!/Page_content/get_pag...


You could download the search indexes, also on the dumps site, that has the text content among other things.


Go click through those links. Most of them are hardly maintained. E.g. last static HTML dump was in 2008. Current enwiki raw data dump is in progress and reads:

"2017-05-07 23:24:34: enwiki (ID 13918) 103 pages (0.0|1.2/sec all|curr), 921000 revs (14.4|11.4/sec all|curr), 100.0%|100.0% prefetched (all|curr), ETA 2019-01-23 15:17:29 [max 779130995]"

There are real logistical challenges in making these dumps and making them _useful_. For all Wikimedia's spending, they have not invested sufficiently in this area.


No, not at all.

Years back Wikipedia released HTML dumps of the entire site, which was closer to providing the actual content of Wikipedia as structured data, but that was discontinued.


Random thought. Why can't something like Wikipedia be run distributed through a blockchain? Edits are just transactions that are broadcast over the network. I imagine the total cost of that to individual contributors of nodes would be less than the millions they're paying right now.

EDIT: Turns out someone is already working on it: https://lunyr.com/


Saw this the other day. Someone working on making Wikipedia on MaidSafe

https://safenetforum.org/t/safe-drive-wikipedia-on-safe-tech...

https://github.com/loureirorg/wikirofs

Not sure how far along it is, but it looks interesting.


Most of Wikipedia's money goes to pay for things besides hosting. Centralized hosting also happens to be more efficient than a decentralized version.


Don't they have MediaWiki APIs and dumps? Look at other wiki sites. Also there is Kiwix and various offline apps. Have you seen them?

Thoughts?

Also what's the difference between WikiData and DBPedia?


> Also what's the difference between WikiData and DBPedia?

Wikidata is a Wikimedia project with aim to create a structured knowledge based. It is mostly filled and curated by humans: https://www.wikidata.org

DBPedia is a knowledge base which content is extracted from Wikipedia (mostly from the infoboxes). It is a project run by researchers: http://dbpedia.org


How would you quantify the success of spreading data as far and as wide as possible?


Completely with you. The goal of Wikipedia should be to spread the content and allow more new content.

Sadly it seems the opposite is true, whole parts of Wikipedia are infested by cancer (aka corrupt/out-od-mind admins who are acting in their own world/turf and interest), have a closer look at certain languages like de.Wikipedia.org where more new and old articles get deleted than content can grow (source: various German news media incl. Heise.de reported about it). And why is Wikia a thing? And why is it from the Wikipedia founder, has he a double interest!? And now he is starting a news offspring as well! Something like the Wikipedia frontpage and WikiNews, just under his own company. And on the otherside Wikipedia banned trivia sections to make the Wikia offspring even possible (happened 10 years ago, but you probably remember it; yet Wikipedia deleted/buried the trivia section history). Why even delete non-corporate-spam artices? Why are fictional manga creatures all o er on Wikipedia but info about movie characters all deleted? Many Wikipedia admin seem to be deletionists that care only about their turf, the care about "their own" articles, they revert changes to them just for their own sake. Look at the WikiData project. Why is it implemented by a German Wikipedia org that has little to do with intl Wikimedia foundation, it's not a sister project, they do their own fundraiser and media news reported not so nice things over the years.

Look at OpenStreetMap project, it works a lot better. Maybe the Wikipedia project should be transfered over or forked by OpenStreetMaps project. And delete all admin rights, and start over with the in some way toxic community that scares away most normal people who don't want to engage in childish turf wars and see their contributions deleted and cut down for no reason but admin power play.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: