Hacker News new | past | comments | ask | show | jobs | submit login
Wikifunctions (wikimediafoundation.org)
296 points by edward on Dec 6, 2023 | hide | past | favorite | 149 comments



Previously: https://news.ycombinator.com/item?id=36927695

cf. https://www.wikifunctions.org/wiki/Wikifunctions:About

> Wikifunctions is a Wikimedia project for everyone to collaboratively create and maintain a library of code functions to support the Wikimedia projects and beyond, for everyone to call and re-use in the world's natural and programming languages.

It's a support project of the Abstract Wikipedia initiative to model facts with data not specific to any spoken or written language, building on and supporting things like Wikidata.


Thanks! Macroexpanded:

Welcome to Wikifunctions - https://news.ycombinator.com/item?id=36927695 - July 2023 (163 comments)


It sounds as some kind of anti-Sapir-Whorf hypothesis - that we can express things entirely without using any language at all. I am very curious how it turns out - while I think Wikidata is taking it as far as it practically possible, and Abstract Wikipedia takes it way further - I'd be very curious to know how it turns out. Sometimes people just need to do things that look crazy to see what happens.


You don't need language to do maths — the Greeks went pretty far with geometry alone. While language makes it easier to understand things, it's not strictly necessary: you can learn a lot from mimicry, for instance.


I don't really understand how this is different from https://rosettacode.org/


Here's an example function which shows that they can have multiple implementations (click the Details tab) - in this case there's an implementation in Python and one in JavaScript: https://www.wikifunctions.org/view/en/Z10070

And a much more interesting example which helps illustrate why Wikipedia built this: "genitive case of Bangla word in Python": https://www.wikifunctions.org/view/en/Z10594


All the functions are numbered and call each other. substring_exists(haystack, needle) is implemented as

    function Z10070( Z10070K1, Z10070K2 ) {
     return Z10070K1.includes(Z10070K2);
    }
This is atrocious! Might as well come straight out of a decompiler.


And an explicit goal of the project: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.or...

> In particular, according to Keet, Grammatical Framework notably lacks the flexibility to handle certain aspects of Niger-Congo B languages’ morphology. The traditional answer to this kind of critique would be to say, “Let’s just organize an effort to work on Grammatical Framework.” The design philosophy behind Abstract Wikipedia is to make as few assumptions about a contributor’s facility with English and programming experience as possible. Organized efforts around Grammatical Framework (or otherwise) are not a bad idea at all. They may well solve some problems for some languages. However, the contributor pools for under-resourced languages are already small. Demanding expertise with specific natural languages like English and also specific programming paradigms contracts those pools still further.

and

> A solution designed by a small group of Westerners is likely to produce a system that replicates the trends of an imperialist English-focused Western-thinking industry. Existing tools tell a one-voice story; they are built by the same people (in a socio-cultural sense) and produce the same outcome, which is crafted to the needs of their creators (or by the limits of their understanding). Tools are then used to build tools, which will be again used to build more tools; step by step, every level of architectural decision-making limits more and more the space that can be benefitted by these efforts.

and

> The evaluation acknowledges little to no value to the effort that Wikifunctions is investing in being multilingual in terms of natural languages, even for implementations. It is unclear as to how a solution that is built on top of (the existing Lua-based Scribunto scripting system) would succeed in being multilingual, and not just be an English project which the other languages may be invited to use, if they understand it, but would be mostly blocked from contributing to.


The plan seems to actually be auto replace the Z values in the editor views with human readable values, which will automatically use the contributors preferred language.

This of course only works if all functions, and all parameters, etc have had labels submitted for your language.

They indicate wanting to cater to cultures that might not use a base 10 representation of numbers, so would presumably autotranforms numeric literals for people from those cultures.

They are worried about programming languages being englished based, but seem to think that similar autotranslating of keywords is a good enough approach, so that is why they NEED to support multiple languages, so that they might hypothetically support some language that supports base 60 math and keywords based on [insert language here].

Plenty of room for abuse here, as it would be really easy to edit functions being used for text generation of less popular languages to output offensive text, for things not tested by the test cases. And nobody except people who know that language will be able to tell for sure that this is not a legit contribution.

Then of course people will by trying to code sandbox and then container breaks, cryptominers, etc. I expect a giant game of whack-a-mole between the project and the admins.

On that note even ignoring the abuse, or people trying to utilize the service for free compute resources, I can see the costs of this service becoming substantial really quick. Just from things like legitimate contributions of test cases, and new language implementations, and running the test cases again all the different implementations.


> And nobody except people who know that language will be able to tell for sure that this is not a legit contribution

This is already the case with wikipedia, there was the recent story that most of the Scots wikipedia was written by someone who does not know the language, in the style of "English, as she is written".


Ah so pretty much how Excel has localized function names


Which solves about zero problems and creates many. Let's stop this woke nonsense and instead focus on improving english speaking ability of the world. Having a common language everyone understands only has benefits.


> ...Westerners... > ...Western-thinking

When did we become Westerners and started to think in the same way?

I recognised a trend for such expressions in the news, too, in the last few years. It's misleading and creates a "we against them" mentality.


When I think of 'Western culture', I think of the culture that developed in Protestant Europe and English-speaking America from the early modern period onwards, and aspects of which are diffusing through the rest of the world. I agree that it's a bit too broad of a characterisation to really be of use, but it is a thing.


For many, many centuries. It's not anything from the last few years, nor is it "we against them".

The West is essentially all the world's nations whose cultures are derived from or greatly influenced by Latin speakers and Judeo-Christian values, which traces its influence back to Ancient Rome.

It's a very valuable and important thing to recognize, because things that people in the US and Europe often implicitly assume are human universals, aren't. They're only universal across the West, and not found in e.g. China or India.

So it's not creating division. It's recognizing difference that already exists, but that not everyone is always already aware of, because we tend to exist in our own little bubbles.


Allegedly, somewhere between 1500 and 1980. "The decline of the west" by Spengler was written a century ago.


A solution designed by a small group of Westerners is likely to produce a system that replicates the trends of an imperialist English-focused Western-thinking industry.

WTF. Virtue-signaling identity politics has no place in computer science.


> WTF. Virtue-signaling identity politics has no place in computer science.

The fact that I couldn't type my name for decades in most IT systems because of the charsets used is not "virtue-signaling" to complain about, thank you.


You could just use whatever romanized version of your name is the standard for your language and writing system without any real down side except EEWWWWW ANGLOCENTRISM NOOOOO. That way people from other cultures around the world (who generally don't have trouble learning the latin alphabet in addition to their own) will be able to do something with your name rather than seing random symbols they can't even attempt to pronounce, won't remember and certainly won't be able to re-type.


Please don't be a dick and dismiss it as "EEWWWWW ANGLOCENTRISM NOOOOO". Fuck off.


When in Rome, do as the Romans (and use ASCII). Please don't be a dick and DDOS the Romans' IT systems with needing to support the vagaries of every foreign dialect in the world.


I'm not in Rome. I'm trying to book an airplane ticket in my country. But it's not possible to write my name, even though my name contains letters found in our alphabet.

> with needing to support the vagaries of every foreign dialect in the world

Then don't impact the Roman software on people outside Rome. Basically the definition of imperialism.

Edit:

> I think the Internet was a better place before Eternal September, when it was the exclusive precinct of relatively affluent white male nerds. (...) I think elitism is right.

Ah, I see this from you in another thread. So you're basically against other cultures and groups being present online. No wonder you want everyone to conform to your definition of how things should be. Too bad the ship has sailed, we're here.


> I'm not in Rome.

You (or others you depend on) are choosing to use "Roman" software.

> I'm trying to book an airplane ticket in my country. But it's not possible to write my name, even though my name contains letters found in our alphabet.

It is possible to write your name in a way that lets the airline and airports identify you, which is what matters. It might not be possible to write it exactly the way you'd like but that's not causing any real issues except making you upset.

It's also pretty ironic to complain about having to deal with other cultures for (presumably) international travel.

> Then don't impact the Roman software on people outside Rome. Basically the definition of imperialism.

"Anglocentric" software isn't forced on anyone. You can campaing for your country to disregard all English writings and re-discover all that knowledge from first principles instead if you'd prefer. But that would make you much worse off, wouldn't it?

English-first computing dominates precisely because not having to deal with localization (or at least not initially) lets people focus on what actually matters to make work. Don't like it? Write your own software instead of complaining that relying other people's work means you sometimes have to adapt a bit.

> So you're basically against other cultures and groups being present online.

Plenty of people from other cultures (including me) have no problem with learning English to interact with the world and tap into the vast pool of English knowledge and tools. Language != Culture.

> Too bad the ship has sailed, we're here.

So is English as the lingua franca of the computing age.


Thanks for again dismissing peoples' feelings and saying they don't matter. I'm not gonna bother replying to your ill conceived points.


Hate to be the one to break it to you but there's no such thing as "apolitical" technology. Everything designed by humans has and always will have some kind of ideological or political underpinnings or context. Just closing your ears and pretending that everyone is neutral and objective is foolish.


They could have say it in more neutral way:

> A solution designed by a small group of people is likely to produce a system that replicates trends specific to this group and are not universal enough for our need.


What? It's a true statement, but it has little to do with computer science, but does to do with software engineering. Did the word "imperialist" trigger reflexive typing?

In my opinion this project is about 20 layers too abstract to actually produce working and useful software. But it's not virtue signaling if they're actually trying to do the thing, is it? Virtue signaling is when you act like you're doing something virtuous and then don't.

If they're trying to create a programming syntax that maps to every known human language, good on them. They'll probably fail, but it's not really a stupid concept, especially when your organization's purpose is to spread knowledge to the whole world.


It’s all a bit silly, any global encyclopedia is almost imperialist by definition. Organizing and categorizing knowledge in a centralized way is at its roots an imperialist project (see the history of science and its development alongside the rise of the British power for example). So might as well pick the best tools for the job (English and well known programming languages). Doing otherwise makes it seem like someone didn’t completely think through their motivations. Non-imperialists don’t build encyclopedias of global knowledge. Let local stay local.


No. A global encyclopaedia might be global, but does not have to be imperialist, if the control over the content and the ability to contribute to the encyclopaedia is not limited to imperial structures.

Unless you equate "global" with "imperialist", in which case your statement is true by definition. But then everything that is done on a global scale is necessarily imperialist, and then one of the two terms does not seem useful anymore.

Equity can be global.

That would require the equation to be wrong.


> In my opinion this project is about 20 layers too abstract to actually produce working and useful software. But it's not virtue signaling if they're actually trying to do the thing, is it? Virtue signaling is when you act like you're doing something virtuous and then don't.

If one belives avoiding well-known tools and (english-based) programming languages is likely to actually make it easier for contributors using those languages to contribute to Wikipedia in those languages, then it's real. If one believes that this is not actually going to result in more usable tools, and the intention of avoiding tools from English-speaking Westerners is signalling rather than actually getting better results, then it's virtue signalling.


But they actually wrote that in English (while being fellows of Google - one of the most imperial corporations in existence - as I understand) and are sure to use a lot of existing - and English-based - tools to implement it. I mean, if they abandoned the whole Western English-based foundation they rest on - including Von Neumann architecture for computers they use, for example - and started building it from physical laws back up (oh wait, physical laws also mostly discovered by Westerners and they probably learned them in English, which already imbued them with the imperialist perspective) then maybe it could be recognized as a honest - if mind-bogglingly misguided - effort to expunge the presumed imperialist bias. Since it is never going to happen, it can not be anything but virtue signalling.


By that account, above discussion of what is and isn't virtue signalling is itself virtue signalling--mere platitudes rather than contributing to actionable outcomes.

I'd think us, or any foundation for that matter, should be able to lay out an agenda before outcomes are achieved, how (un)realistic they may be. A dot on the horizon to aim for, not necessarily achievable but a heading at least.


> By that account, above discussion of what is and isn't virtue signalling is itself virtue signalling--mere platitudes rather than contributing to actionable outcomes.

If you're just saying it for the upvotes, yes. If you're actually trying to get people to shift their efforts towards more effective things (such as translation support libraries that don't insist on avoiding English at every level), then no, it's legitimate effort.

> I'd think us, or any foundation for that matter, should be able to lay out an agenda before outcomes are achieved, how (un)realistic they may be. A dot on the horizon to aim for, not necessarily achievable but a heading at least.

An aim is fine. It's a question of sincerity.


It's more vice-signalling. His complaint is a reaction to try and show others he doesn't care about the thoughtfulness behind the original statement.


s/virtue signaling identity politics/caring about other people/g


I don't see how this is caring. This is dismissing and invalidating the work of large class of people (Westerners and English-speaking people) by applying a jargon label "imperialist" (what does that even mean here? That they are about to don pith helmets and rush out to re-conquer India?) and implying that this label automatically makes their contributions less worthy - to which implication you just added the implication that they are presumed to be incapable of caring about other people. Does look like particularly helpful for anything, rather quite alienating.


I know it's triggering to you. But it's just choosing to be nice to people. Not a bad thing. You're comment just comes across as vice-signalling.


I always wonder why can't one choose to be nice to people without using divisive language, emotionally charged accusatory jargon (like "imperialist"), devaluing somebody's sincere contributions and framing the situation in group vs. group terms? Like, if the point is that you choose to be nice to people, why can't you start with being, you know, nice to people?

For example, if the same sentence was formulated as something like "expanding the contributor community by including people knowledgeable and proficient in languages not commonly spoken in the West and soliciting feedback from such contributors would greatly expand the appeal of the resulting solution and make it more widely applicable" - wouldn't that sound more nice to people?


It's pretending to be nice to one group by dunking on another group largely responsible for the foundations they are standing on.


I don't know why he rails against English.

Computer programming uses a small subset of English in code and the industry nearly always uses the American socio-cultural kind of English.

English is the fullest language we have, with all the infamy of pilfered concepts from the European continent.

The language/natural language distinction is reductionist, halving all language, when in reality it is a structure that cannot be halved.

He recommends reducing languages down to their most basic elements. This is a mistake that doesn't go anywhere. You end up with a pile of sticks.

All his crictisms of English (one voice, limited understanding, ect) can be leveled at any living, breathing human.

I wonder where the experiment goes. His criticisms of English should be better.


Ok so its just overcomplicated for woke reasons.

Having a lingua franca for computing is a good thing. All these efforts to localize computing are nothing but a giant waste of time.


This is brilliant! It serves as a single source of truth, builds on first principles, and acts like a universal test case.

Just as each Wikipedia article is about one "thing", each Wikifunction performs one job. Once you know a Wikifunction works reliably, you know every other Wikifunction which depends on it can theoretically trust its output.

This project seems really exciting, especially in the context of AI. Very cool and I hope to see it flourish.


Yes the whole goal with this is to extend the general notion of FaaS (Functions as a Service) into something that can work seamlessly, both within their own projects and across the Web. Similar to what Wikidata is already doing for common real-world entities and concepts. Of course, the devil is in the details with these kinds of things; it's likely that implementations of even the most basic functions will behave in subtly different ways across languages. Their testing framework is supposed to mitigate this but it looks quite ad-hoc.


> it's likely that implementations of even the most basic functions will behave in subtly different ways across languages

Stares at C/C++


I assume it's intentional, so that source code can be stored in a form that's independent of natural language. They can just call to a LSP backend for the implementation language to rename identifiers in the source code before showing them to the user, much like Wikidata does today with its semantic content.


I think you're about the goal, but in practice it looks more line they are making code that is readable in 0 languages instead of at least 1 language. It looks like in practice people are unsurprisingly coming up with workarounds for this by adding some boilerplate code, these are the first two lines from the python example above (also all the local variables in the function are in English too):

  def Z10591(Z10591K1, Z10591K2):
    word, is_ra_plural = Z10591K1, Z10591K2


so we will need to learn to read gzip (and other compressed binary formats) as if they were not compressed

this is also related to how 2 hour long movies will diminish into a niche... and movies of the future will be more like [0]

this observation is on the same trend as operas; which way back would last all evening (over 3 hours was typical)

[0] https://www.youtube.com/watch?v=6Asx_XhPH80


You get used to it. I…I don’t even see the huffman code. All I see is blonde, brunette, red-head.


> so we will need to learn to read gzip

this rant gave me a proper laught. thank you.


There's no reason why English words couldn't be translated any better than Z11410, but now editors will have an extra layer of indirection.


And what happens when they try to implement a language that doesn't allow function names like these? Or has multiple possible syntaxes for a function? Or differentiates between methods, lambdas, procs, etc? Or straight up doesn't have functions but is still very capable of solving the problem?


They are unique hashes, similar to Unison language. Interesting.


How is the 'genitive case of Bangla word' a good example? This function's signature contains many hard-coded language specific assumptions.

- Bangla according to the function has only few irregularities, rest is algorithmic transformation of the word given

- Many languages have a huge number of irregularities

- Many languages express case information somewhere else in the sentence-construct, so it's not just a word-transform.

This very much seems to try to replicate the machine-translation efforts of foregone ages.


Whoa, I never knew (and accidentally discovered through this) that empty strings are valid substrings of other strings.


It’s a bit like the empty set that is a subset of any other set.


It depends on language, library, and which substring function. And in some languages, like Bash IIRC, some functions and scripts handle differently `foo=""` (variable foo is set to empty string) and `foo=` (variable foo is not set).


`foo=` does set the variable to empty in Bash and other POSIX compliant shells.


So long as you consider "empty strings" to be "strings" in the first place. String of what?

It's like saying you have an "empty cup of water".


As long as you consider cups of water and cups of milk be different objects (i.e. cups are not interchangeable, and you can't just throw out milk and fill it with water) it is a completely sensible statement.

For example, in Jewish religious families, there are often two sets of kitchen utensils - for meat products and for dairy/other products, since mixing those makes the food unfit for consumption by a religious person. In this situation, if you said "I have an empty meat pot" and somebody answered you "No, I need an empty milk pot, give me another one!" - it would make a total sense. Even though both pots are empty, they are not the same kind of pot for their users.

Similarly, if a string is a sequence of certain objects (characters, bytes, runes, graphemes, whatever you please) then empty sequence of such objects is not the same as an empty sequence of other objects - you could append a character to it, but you can't append a database connection to it, for example, and expect something sane to happen.

So yes, "empty cup of water" is exactly what it is, because in most programming languages, empty cups are not all alike. In some languages, there are sequence types that are agnostic towards their elements, and then an empty sequence of that kind wouldn't be a sequence of anything - it'd be just a separate entity. But strings rarely are implemented this way, for many practical reasons.


People can come up with whatever weird believes they want, that doesn't mean this actually makes sense.


You entirely missed the point of the example. The point is not that you are supposed to embrace their beliefs. The point is that object that are all alike in one system can be very different in another, and if you assume they are all alike, you will misunderstand what is going on. Such as, if you assume that empty string is just "nothing", you will misunderstand how strings - and in general, typed sequences - work in most programming languages.

It's like a bad student that when the teacher says "assume the train departs from the station at 9 am", to formulate a math problem, objects "but the train actually departs from our station at 10am!". Way to miss the point!


You just... Blew my mind.

So, but wait... Is the genitive "of water" functioning like an assertion? "All of the contents of the cup are water."

And wouldn't that assertion, in the case where the cup has no contents, be a vacuous truth?[0]

In other words, a cup of water, if it actually has water in it, cannot also be a cup of magma. But there's nothing stopping an cup with no contents at all being "of" both.

[0] https://en.wikipedia.org/wiki/Vacuous_truth


It holds true mathematically speaking. You can make (mathematical) sets of everything but there's only one empty set.

It's a bit like that with pointers in C: a char* is not a int* but null pointers are convertible to any pointer type.


It's time to put down the encyclopedia and pick up Plato and Aristotle


Well, not just _other_ strings.


As long as your language includes an empty string concept, why not? Similarly, is an empty array a subarray of any array? And on the other end, can you have an empty number boolean?


But then if empty string are part of a string at what index do you find an empty string within a string you cannot dereference the space between letters


At every index of course, including one past the last character.

You don't need to dereference anything here as the empty string has length 0.


I see thanks


Gonna dig deeper, but I get the sense Wikipedia is preparing a native dataformat for LLM ingestion.


You're probably thinking of something more along the lines of Wikidata [1], which is over 10 years old.

[1]: https://www.wikidata.org/wiki/Wikidata:Main_Page


It's very much designed for and around Wikidata.

> Wikifunctions will allow easy access to large knowledge bases such as Wikidata, but also to binary input and output files

https://diff.wikimedia.org/2023/08/07/wikifunctions-is-start...

> Using the simple facts housed in Wikidata, you will be able to write functions that make calculations, provide a person’s age, estimate population densities, and more, and integrate the results into Wikipedia.

https://www.wikifunctions.org/wiki/Wikifunctions:FAQ

> In the future:

> - It will be possible to call Wikifunctions functions from other Wikimedia projects, and integrate their results in the output of the page.

> - It will be possible to use data from Wikidata in functions.


Wikidata cannot express the "semantic" content of average encyclopedic text, which is an express goal of Wikifunctions. So they will have to expand the data model quite a bit compared to what Wikidata has today. (This can be done quite cleanly though, since the whole point of RDF is to be able to express general graphs, and this aspect has been made even stronger with RDF* which aligns fairly well with the "frame" semantics Wikifunctions plans to use for the above purpose.)


Wikifunctions are arguably the prerequisite to making that effort, since even if you extended Wikidata with more semantic connections, you couldn't do anything with them.

This is a large part of why Abstract Wikipedia's goal is NLG, and almost all of the initial Wikifunctions facilitate NLG.


> NLG

what is NLG?..


Natural Language Generation



Yes but there are significant restrictions as to what can be expressed there - it's limited to assertions of the form 'well-known entity X has pre-defined property P with value Y (or "some value" or "no value"), with one further layer of "qualifiers"'. RDF itself is fully compositional, and RDF* extends that compositionality even further, providing a standard means to reify arbitrary RDF statements and make complex assertions about them.


And it has full data dumps and query service (https://query.wikidata.org/) so if you wanted to use it in any LLM project or as side service, there's absolutely no problem with that.


This was my first impression as well. Huge untapped possibilities with a project like this...


isn't the whole point of LLMs that their "native data format" is unstructured?


I think I need an explainer for this. Is this like the number sequences library but for pure functions?


The description is a little vague and hand-wavey. Here's a concrete example:

A lot of Wikipedia sites have scripts embedded in the wikitext which automatically generate or transform information on a page, e.g. automatically performing unit conversions to generate text like "I would walk 500 miles (804.67 km)", performing date math to automatically generate and update a person's age based on their birthdate, or querying structured data from Wikidata [1] to display in an infobox. One example of these scripts is the {{convert}} [2] template on the English Wikipedia.

Initially, these scripts were written in MediaWiki template logic [3], and were maintained individually on each wiki. This quickly proved unmaintainable, and some of those scripts were rewritten in Lua using the Scribunto extension [4], but these were still per-wiki, and there were frequently issues where different wikis would copy scripts from each other and introduce their own incompatible features.

The WikiFunctions project is an attempt to centralize development of these scripts, much like how Wikimedia Commons [5] centralizes hosting of freely licensed images and other media.

[1]: https://wikidata.org/

[2]: https://en.wikipedia.org/wiki/Template:Convert

[3]: https://www.mediawiki.org/wiki/Help:Extension:ParserFunction...

[4]: https://www.mediawiki.org/wiki/Extension:Scribunto

[5]: https://commons.wikimedia.org/


It'd be an even better example if there was an equivalent conversion function on Wikifunctions to link to. There doesn't appear to be one yet.

The project's response to the funding Google.org Fellows' evaluation is a deeper explanation that explicitly compares the goals of Wikifunctions to Scribunto (those Lua functions) — and is especially notable in the context of replacing templates and Lua modules since Google.org made a bunch of recommendations to further decouple Abstract Wikipedia's goals from Wikifunctions that Wikimedia thanked them for and subsequently ignored: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.or...

The catalogue of functions useful to Wikimedia is much more prosaic than the {{Convert}} template and in line with the Google.org examination — it's more about Abstract Wikipedia's goals of natural language generation (NLG): https://www.wikifunctions.org/wiki/Wikifunctions:Catalogue

The list of all functions is also similarly lower level; language transformations, form validation, sub/string operations: https://www.wikifunctions.org/wiki/Special:ListObjectsByType...


> The description is a little vague and hand-wavey. Here's a concrete example:

Much better description of what wikifunctions (the Wikimedia blog post seems more about where it fits in a broader strategy) is on wikifunctions about page:

https://www.wikifunctions.org/wiki/Wikifunctions:About


That makes way more sense to me, and my handful of edits and additions to Wiktionary's Finnish lexicon has definitely had me wondering how exactly those {{shortcode looking things}} actually worked. I'm seriously considering getting involved in this project now thanks to your explanation, that seems like it could be a very good thing indeed.


For a fairly horrifying example of how parser function templates looked, here's an old version of the Convert template from the English Wikipedia:

https://en.wikipedia.org/w/index.php?title=Template:Convert&...

This isn't even all of the code for the template, either; there were a bunch of subtemplates for individual operations. And now you probably have a decent idea of why Wikimedia wants to move away from that. :)


I've not kept up with the template complexity situation: do you have a sense for whether they're succeeding in simplifying things yet? (besides the benefit that Wikifunctions should bring)


They aren't simplifying templates so much as moving them to a language better suited for the complexity. There are some similarly complex Lua modules that have rats' nests of dependencies and piles of undocumented code.

Templates like {{Convert}} are also by and large unrelated to Wikifunctions, which are initially and primarily more concerned about lower-language natural language generation problems, like cross-language conjugation, that are relevant to Abstract Wikipedia.

Or, in the Google.org evaluation I mention in a sibling: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.or...

> Instead of attempting to invent so much from scratch, Wikifunctions could have been based on Scribunto. We think this is a huge missed opportunity. One of the longest-standing requests from the Wikimedia community has been to provide a means of reusing code across different projects by centralizing common code on a global wiki. This is a tractable, well-scoped problem (at least in comparison with Wikifunctions as currently conceived).

> “V1” of Wikifunctions could have been a new, central wiki for common Lua code. Structuring the development plan for Wikifunctions in this way would have meant a much shorter, surer path to delivering value to Wikimedians. It would have satisfied a long-standing community request, and (we suspect) earned the team quite a lot of goodwill. The team could then innovate on top of this initial version by building new graphical interfaces for producing and interacting with code, and by extending the capabilities of querying Wikidata.

> We are not sure why this was overlooked, particularly in light of the project requirements described above. In conversations with the Abstract Wikipedia team, team members suggested to us that this option was considered and rejected because Scribunto is an expert system and that Lua is based on English, and that these properties made it incompatible with the Wikifunction goal of broad accessibility. We think this is a bad argument, for the following reasons:

> - Lua has been successfully adopted by many projects whose primary language (as in, the language spoken by the users and developers) is not English. It is used successfully in hundreds of different language editions of Wikipedia.

> - Wikifunctions can provide localized, graphical user interfaces that generate Lua code behind the scenes (see Blockly, for example). Lua can also be used as a “transpilation” target for text-based programming languages (there is precedent for this on Wikimedia wikis). As an intermediate representation, Lua would be far more efficient and robust than an ad-hoc function composition language.

Which are all great points explicitly disregarded by Abstract Wikipedia for Wikifunctions.


> We are not sure why this was overlooked, particularly in light of the project requirements described above.

I can guess the actual reason for this and it amounts to "managerial pride causing NIH". It's no big secret that the WMF is so overly ambitious with the projects it works on that it would throw away any and all prior art they've already made, just to make something that seems new (not to mention that many WMF projects kinda just quietly strand and are left to rot because of that).

I really wouldn't be suprised if some manager at the WMF decided that "no, using Scribunto for this is off the table", just because Scribunto is an existing project and they wanted the project to have additional complexity (so they wouldn't have to consider the viability of the project since a "demo" would then be years out).


The whole point of Wikifunctions is to expand capabilities beyond what Scribunto/Lua provides at present. There's nothing obviously wrong with starting from scratch and adding features such as multi-language implementation or allowing for test cases out of the box, that Scribunto has never even come close to providing.


> The whole point of Wikifunctions is to expand capabilities beyond what Scribunto/Lua provides

The whole point of Wikifunctions is to support Abstract Wikipedia. Abstract Wikipedia deemed Scribunto/Lua as irrelevant ("out of scope", specifically) to their work.

The Google.org Fellows suggestion was to consider detaching Wikifunctions from Abstract Wikipedia, since Wikifunctions has value beyond Abstract Wikipedia's goals as a general-purpose community library of computing functions, but could still be used to serve the purposes Abstract wants from it.

Wikimedia said no to that suggestion, so here we are.


Thank you so much for this. I'm pretty Wiki-literate and was having a hard time understanding this from their initial pages.


Adding to what others have said, it contains functions that can be used to do grammatical transformations of nouns/verbs/... as in some languages those rules can be quite complicated. So having a central function that converts singular->plural for example is a huge win for maintainability


It is a formalization of knowledge with formal expressions. One could say it is a formalization of knowledge with executable/computational semantics. WikiMedia is actually an AI company and that's why they are always asking for donations.


Awful interface. I have to click three times and scroll to see code in a tiny font, the name of the function is repeated on the page like 7 times.


Python is an awful choice of language for this (although I understand why they used it, it's so popular). What do they mean by 'Python'?

Most programming languages make considerable effort in supporting old versions (gcc has c90 and c++98 for example, fortran goes even earlier). With Java, they make very strong efforts to support older code (as long as it uses the library which are labelled as stable). Old javascript code also tends to work well, as there are lots of old websites that don't want breaking.

Python on the other hand has gone out of it's way to nuke Python 2, and many new version of Python 3 deletes something from the standard library.


Python 3 came out 15 years ago. Are we still really having this discussion?


New Python versions do continue to remove features from the standard library, which can break programs. Most other languages I'm familiar with make efforts to only do this in extreme cases (like c removed gets, but basically nothing else).

Also, I have Python 2 programs I'm keeping running, and expect to continue doing so for several years to come. Many libraries never made the jump to Python 3.


Terrible UI. I found two example functions (call them `f` and `g`), found the dropdown to change the input type of `f` from "string" to "function call", and even manage to get `g` suggested when I enter it's unique identifier - but cannot set up any configuration to compute `f(g(x))` (even after exhaustively trying all options).

It seems like it should be a core functionality of such a project to easily set up complex computations as function compositions.


Probably gotta define h(x) = f(g(x)) and then invoke h. Or they should build a REPL or Fiddle interface to let you test arbitrary expressions.


Mixed feelings about the longevity of this. It feels like it wants to be an open source alternative to WolframAlpha, but without the UX of "just typing a question". One has to navigate and search for functions that do the thing they want done.

Wanna convert a string to the nato phonetic alphabet? Sure, that function is there... You gotta search for it first though. Does it accept lowercase inputs? No, not at the time of writing this comment... But you know, someone might fix that. No problem.

It sounds like a project that would be dead on arrival if it were launched by and for any other community. If wiki maintainers like this? Then great!

I do see value if people could integrate functions within wikipedia article. Maybe it makes sense for certain articles to include some dynamic widgets in them. Unsure I'd like that but it may work.


I'm kind of disappointed. Based on the name, I thought it's a searchable collection of functions, a mixture of OEIS, searchonmath, and a classic formula encyclopedia. But it seems to be only about little algorithmic snippets without explanation or definition?


The details page for each function seems to let you look at the implementations of that function. I expect more interesting ones to show up in the coming months or years.


I like the idea but the catalog has the look of something that will quickly grow unmanageable. I hope it doesn't end up like other Wiki projects that die from lack of momentum/direction like wikinews. To be clear, the latter still runs but hardly anyone uses it.

https://www.wikifunctions.org/wiki/Wikifunctions:Catalogue

Does this have the potential to become some standard reference for code proving/transpilation purposes, or is it likely to end up like past attempts to unify all programming languages?


Related: Hoogle [1] a function search index for the Haskell language. It searches functions by signature.

[1] https://hoogle.haskell.org/


Needs some UI work but this is a really great idea. Hope it gains traction


Amazing to see RustPython being used in the wild. I wonder why they are using it. Is it easier to sandbox? smaller memory footprint than CPytohn and PyPy? just because it's cool?


I gather this might be changing per [1] as Wikifunctions is investigating the possibility of implementing interpreters such as CPython with WebAssembly.

[1] https://phabricator.wikimedia.org/T308250


Very cool. How long until someone writes PyPi/npm packages to import/use these functions by Z number?

Speaking of which, are there already packages to access wikidata by objects' Q numbers? The two could work together quite well, with a bit of caching.


Cool project! I hope they improve their UI.


So this is something like Rosetta stone, but also has features or Unison language


Will be cool if these functions were available via API endpoints :)


Yet another Wikimedia project with way too lofty ambitions... this is what Wikimedia donation money gets spent on rather than the operation of Wikipedia. For reference, the greater project this is a part of has been in development since 2013[0] and seems to be an attempt to basically remove the editor from Wikipedia.

For the uninitiated - this is part of an attempt to basically turn WikiData[1] into article text. From what I understand, the idea behind this is to create a bunch of text rendering functions to make it easier to provide automatic translations of rendered text.

Don't get me wrong, the idea here is neat, but from what I understand the Wikipedia community has been having far greater needs not being met rather than some lofty goal to drive even more editors out of the site and replacing them with data entry drones. It's a bit too often that you hear about some ailing extension and the WMF response being "no budget for it", while a lot of it gets spend on projects like this.

[0]: https://en.wikipedia.org/wiki/Abstract_Wikipedia

[1]: https://www.wikidata.org/wiki/Wikidata:Main_Page - basically, Wikipedia data meant to be in a machine-readable format.


The goal of Wikipedia is for every "person on the planet [to be] given free access to the sum of all human knowledge".

The community is a means to that end, not the end.


> Yet another Wikimedia project with way too lofty ambitions...

Perhaps, but so what. Wikidata was "a project with way too lofty ambitions" back when it started (a.k.a. make the Semantic Web something that actually works in a useful way) and it's been wildly successful. It now gets more edits per minute than the most popular Wikipedia and gets used all over the place by Big Tech firms.


What are some of those greater needs that are not being met?


It's been a hot minute since I last looked at the most recent list, but the short of it is that the WMF likes to release extensions onto Wikipedias that ship with a ton of technical bugs, aren't tested beforehand by the Wikipedia community for usability (which is ironic given the WMFs obsession with making tools as accessible as possible) and when the complaints get leveled, the response from the WMF basically ends up being "yeah so there's no budget for that". Rinse and repeat for a ton of tiny technical issues that affect all their extensions on the regular. It's a bit odd to be putting so many engineers and money onto lofty projects when the maintenance of Wikimedia extensions is in such an extremely precarious state already.

The most evergreen example of this is and probably always will be the rollout of VisualEditor, their WYSIWG editor. It was completely unwanted, was proven to be not ready for general use (due to all the problems that plague "combining a WYSIWG editor with a code view"), yet forcibly deployed on all Wikipedia languages without much room for warning or feedback. After that, the WMF promised communities the option to opt-out if they didn't want it... before violating it on the German Wikipedia because they voted to not roll it out. To my knowledge, VisualEditor hasn't improved over the years (beyond "rewrite Parsoid into PHP"), but editors have just gritted their teeth. That's the extreme example, but the WMFs relationship with building features for Wikipedia is... pretty much in a vacuum from what users want. A lot of their projects seem to be conceived by WMF managers first, without consultation to their existing community of volunteers.

Which is I guess fine for moonshot projects like Wikifunctions (since they don't involve the volunteers before release) but results in a lot of friction when a managers pet project is forcibly rolled out on all language Wikipedias. Very few of their big extension deployments have gone smoothly with the community. I distinctly remember the Minerva skin (mobile Wikipedia) needing tons of changes and improvements before it was usable as well, after it was already rolled out and caused a ton of negative fedback.


VisualEditor has actually rolled out lots of features since it first came out, and more are in the pipeline that rely on the same infrastructure (particularly around helping novices find things they can help with). You don't have to use it if you don't like it, but it's been huge for helping less technical users become contributors.


Both can be, and are, true.

The Visual Editor works better now for its designed purpose than it did before, and rewriting Parsoid in PHP made it easier to deploy and maintain across projects (and in general Mediawiki). For editing text and non-templated content, which is the bulk of the work and the entry point for new contributors, it's matured into a useful tool.

The Visual Editor was also designed around English and European Wikipedia usage, and it still causes problems for other Wikipedia languages and for Wikimedia projects outside of Wikipedia. It has strong built-in opinions about the usage and design of skins and templates that other Wikimedia projects disagreed with.

And it's still frustrating as tech, both in bugs and by design, and especially when used in situations where wikitext editing is still common or required. It has caching and session bugs that either don't affect or aren't as destructive to other editing tools, or even pre-PHP Parsoid versions of VE. It still makes unnecessary wikitext changes — and arbitrary, if you weren't in the discussions where they were implemented or closed out as Won't Fix — to formatting and templates based on its own strong opinions. It breaks completely on interwiki redirects, a bigger problem for projects outside of Wikipedia than inside of it.

It's a useful, arguably invaluable, tool for new contributors that was badly designed at launch, has improved since, but is still fundamentally incompatible with many use cases into which it's been forced.


Thanks, that was really informative.


Not assaulting users with giant ad banners which contain deceitful language about the need for funding to trick people who are on average much worse off than employees of SF-headquartered Wikimedia.


Is it me or does it not seem very well thought out? Every example I've seen only has implementations in JavaScript and/or Python (without any mention of which version of this language the code is meant to run in. A critical piece of info for certain languages). I haven't seen any other languages nor a way to search by language. What a "string" means in one language can be completely different in another language. The primitive data types that the project assumes are not really supported across all programming languages.

Also if anyone hasn't already seen them, similar projects already exist and are more complete. E.g.

* https://rosettacode.org/

* https://programming-idioms.org/

* https://the-algorithms.com/

Not to mention LeetCode, CodeWars, Project Euler, Exercism can kinda serve the same role.


> Is it me or does it not seem very well thought out?

Well, the first thing I expected to see when I clicked on a function was... the function.

It takes three or four clicks to get that, and the code is at the bottom of a fairly information dense page, where none of the information seems to have any relevance to using the function in your own code.

The other thing I expected to see is some discussion of the implementation, possibly allowing me to choose based on my needs (e.g. if I need to run a function 10,000 times each frame I might choose a different version than if I need to run it occasionally).

However, perhaps my expectations are wrong? Is this a tool for computer scientists to use in mapping the standard functions in various languages, rather than a reference resource for the working coder? The blog post doesn't go into any useful details, it's more like a standard marketing team announcement. They do mention it's part of wider initiative but it's not clear what that will be.

I do agree though that, especially in the age of AI autocomplete tools, I will never find this useful.

... actually, maybe it's a great resource for LLMs to learn from?


> Is this a tool for computer scientists to use in mapping the standard functions in various languages, rather than a reference resource for the working coder?

Neither of these. The video on https://www.wikifunctions.org/ gives a better explanation. If I understand it correctly, the project seems to be aimed at providing building blocks to more complex functions. Once the basic functions are defined in programming languages and described using natural language, you'll then be able to create new functions without writing any code, just by using existing blocks. Because each function is supposed to be described in multiple natural languages, you will be able to utilize them even if you don't speak English. Whatever you create you'll be able to embed on Wikipedia pages to perform calculations (e.g. automatic unit conversions) or string transformations.


https://mas.to/@vrandecic/110866878265998471

> I love Rosetta Code and we took a look at it. IIUC Rosetta Code does not allow to run the code, or to compose existing functions together to create new ones. Also, Rosetta Code's aims to have complete sets of implementations for their tasks, which is not our focus.


Wikimedia Foundation and letting their main product languish while they pursue badly-thought-out side projects? Name a more iconic duo.


Wikidata, Wiktionary, and Wiki Commons all heavily contribute to and compliment Wikipedia.

Wikinews is fine. It's not original but it's also not harmful to have more in that space.

Wikibooks and Wikiversity seem like they should be merged. Same for Wikisource and Wikiquote.

Wikivoyage is just not well made and there are other better projects out there

Wikispecies should just be a result of Wikidata, not a new project. Catalogue of Life (taxonomy) is much more thorough, rigorous, complete, and maintainable. No academic would use Wikispecies over COL. Misguided hobbyists might and that's a problem. And Open Tree of Life (phylogeny) already (thoroughly) serves the purpose of centralizing phylogenetic studies so there's really no unique thing Wikispecies provides and it gets outcompeted on the things it does provide.

Wikifunctions... I'm gonna be honest. I really think it's not well thought out and wish it didn't exist. I know a lot of people have already poured in a lot of effort into it and I always hate to shit on that, but as a long time Wikipedian I hate to see Wikimedia's name backing this up. Especially when I feel there are already other efforts out there that are more complete, better thought out, and better organized that would benefit from more contributors


Wikibooks and Wikiversity are pretty different. The former is about book-like or tutorial-like content (not just textbooks, but mostly any non-fictional content) that doesn't fit in the reference-work format of Wikipedia, whereas the latter is a catchall project for useful education-focused content. Wikisource hosts complete primary sources where allowed by copyright, while Wikiquote focuses on short well-known citations, including from modern works.

But yes I'm pretty sure that Wikispecies would not exist if it was proposed today, Wikidata has basically superseded it. The Wiktionary is a lot more popular but it could also end up being broadly superseded by Wikidata's lexical content.


I'd argue that sister projects such as Wikidata, Wikiversity, etc, are greatly useful and that the solution to improving them is not to neglect them, but to provide more resources to them.


Wikidata was launched by the same people launching Wikifunctions, in service of the same broader goals around abstraction of facts and semantic usage. A lot of similar design decisions are being made there, and it's arguably why Google.org invested in the Wikifunctions project (more and better programmatic interfaces to Wikidata = more semantic context across more languages that Google gets for free).


Im not super active in the wiki community anymore but last time I heard it being discussed, wikiuniversity seemed like something that caused more issues than it solved for wiki admins. Is that not the case anymore? Did it get better?


That's a byproduct of a lack of volunteer contributors rather than the project design itself.


Maybe so, but Wikipedia itself is fraying at the seams these days. Less and less Administrators every year, less and less committed editors, and more and more tech rot. See this comment https://news.ycombinator.com/item?id=38550441 for more context.


[flagged]


I also can’t see anything when I disable my screen output. Very disappointed.


This is in fact OK. If you hate js that much then just use CURL and read it offline. I personally do not like js much either but complaining about every instance of it is pointless.


How am I supposed to use CURL to read it offline if it requires JS?

Won't work in links2 or lynx either.


Open the document in an editor and parse the script tags, find a way to reverse engineer it. It seems like you're just complaining for the sake of it.


Then there's something wrong with your browser. This is a fairly standard Wordpress site and the content is all present as perfectly standard HTML.


No, I'm referring to each Wikifunction page on the Wikifunctions site itself.


How do you propose the functions should run if you don't have JavaScript enabled?


I don't want to run the functions, I want to view them.

Also the functions execute on the server anyway (plus JavaScript isn't even the only supported language), so execution should also be possible without a browser having JavaScript enabled.


They are server side functions, not client side.


C’mon it’s 2023.


20 years after the term "progressive enhancement" was first used in a tech talk[0]. People really ought to have received the message by now: javascript as a client-side feature is a way to enhance a website rather than to build its basic functionality.

0: https://web.archive.org/web/20141108064903/http://hesketh.co...


> javascript as a client-side feature is a way to enhance a website rather than to build its basic functionality.

And who decided that, exactly?


The free market. Much as I miss static HTML, the people like widgets.


The Free Market™ has very much decided it doesn't care whether a website is accessible with JavaScript disabled.


That is not how anyone wants to interact with the modern web.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: