It's amusing that he argues (correctly) that "there is no Great Chain of Being with humans at the top," but then claims that LLMs cannot tell us anything about language because they can learn "impossible languages" that infants cannot learn. Isn't that an anthropomorphic argument, saying that what a language is inherently defined by human cognition?
Yes, studying human language is actually inherently defined by what humans do, just -- as he points out, if you could understand the article -- studying insect navigation is defined by what insects do and not what navigation systems human could design.
I read this paper and I still feel lost as to how can this even be possible. It seems to understand how to tokenize, merge lines, remove tokens etc for arbitrary programming languages. Is there another paper that explains this algorithm alone?
I would guess `pass_lines` is the most important for non-C code; I'm guessing (it's written in unreadable Perl) that it removes lines.
So while it can work for languages other than C, most of its features are C-specific so it's not going to work nearly as well. Still I'd never heard of C-Reduce; pretty cool tool!
unreadable perl?! I clicked on the link expecting something super-terse and unstructured, but it's anything but. What the hell is wrong with people casually dropping such remarks.
I was wondering the same thing but I guess the key is probably that you don't actually have to do it correctly every time. Just tokenizing based on a few common characteristics (brace pairing, quotes, indentation, newlines, etc.) should let you trim a ton without knowing anything about the language, I imagine?
My real worry is what if this ends up running dangerous code. Like what if you have a disabled line that writes instead of reading, that randomly gets reactivated?
> Just tokenizing based on a few common characteristics (brace pairing, quotes, indentation, newlines, etc.) should let you trim a ton without knowing anything about the language, I imagine?
Yep, here's an explanation from a related tool, which was spawned by the parser-parser-combinator[0] approach to syntactical analysis and transformation: https://comby.dev/blog/2021/03/26/comby-reducer - which is based on what you've said.
It's intended to run for producing compiler test cases, so there shouldn't be any code that's actually running.
CPython includes a flag to only run parsing/compiling to bytecode. While you can use it like they did here and run the code - it really depends on how much you trust every possible subset of your code
The author claims that RST is nice because it's extensible, but as someone who's written some Sphinx extensions, I can tell you that extending RST is not as pleasant of a task as the author makes it out to be. I don't really know what it's like in Markdown world, but the underlying tools (Sphinx and docutils) are very difficult to work with, and make many implicit assumptions that make certain things difficult or impossible.
I've written a few Sphinx extensions ... it seemed OK to me, a bit awkward but long term - this is the kind of thing you do rarely & use often, so how nice it is to write extensions is barely an issue.
The important thing to note about SymPy Gamma is that it does only the mathematics part of WolframAlpha. It's also relatively new. There is no natural language input. There are no non-mathematical capabilities. The syntax should match Python syntax for the most part, though there are extensions to allow things like "sin x" or "x^2" or "2 x". All this will hopefully improve in the future (and pull requests are welcome!).
Most of the code was written by David Li (who is actually a high school student). You can watch a presentation about it here: http://conference.scipy.org/scipy2013/presentation_detail.ph.... It started out as a "because we can" toy, and it's gotten much better.
The real benefit of SymPy Gamma over WolframAlpha is that there are no barriers around it, since it's entirely (BSD) open source. For example, if you start computing something interesting and want to try more, you can move to SymPy Live (http://live.sympy.org/) and compute in a more session like environment. Or you can use SymPy locally on your own computer.
Regarding the comments that wolfram is mostly used for play, I'm not so sure about it. Wolfram is invaluable to students as a calculator. Sure Google can compute 100 * pi, but it falls apart when you try to compute integrate(sin(x) * x, x). When I was in college (which was last year), I saw people use it all the time. It's been very successful in making computer algebra accessible to virtually everyone.
By the way, probably the best feature of SymPy Gamma right now is the integration steps. See for instance the "integral steps" section of http://www.sympygamma.com/input/?i=integrate%28sin%28x%29*x%.... This is a feature that used to be free at WolframAlpha, and it's extremely useful if you are learning integration in calculus. It doesn't work for all integrals, because not all integrals are computed the way you would by hand.