To be honest, what I don't like about this is that it once again operates on the character level. I feel this brings us back to all the issues we had with the C preprocessor and, in addition, makes any IDE analysis/assistance hard to impossible.
I feel the tool would be more useful if you could process the target language's AST instead. This would result in hygienic macros as well as making the code easier to analyse (and might solve the whitespace problems as well, as a formatter could render the whole tree in the end, after any code generation was already applied)
There is a big difference between Cog and the C preprocessor -- the generated code actually lives within the file, and your IDE / type checker / linter will process it. Sure, they aren't hygienic, but for lightweight macros (which, in non lisps, is the majority of the macros people write imo) hygiene isn't a must-have.
I have found Cog to be very helpful while writing Rust code. I love Rust, but I'd prefer to write my macros in a different language (like Python). I also find debugging macro expansions to be much more painful than debugging Cog macros.
Maybe I don't understand the nuances here, but I don't see how Cog code isn't hygenic: the Python namespace it operates in has nothing to do with the namespaces in the containing file as a whole. There's no way for names chosen by the Cog -writer to conflict with the names in the larger file.
Rather language-specific, but this is exactly the use case of Lisp's homoiconicity. The language's code and data are structured identically, so you can write code that takes other code as input.
The file containing the code is text. What AST do you mean? Cog doesn't have to understand anything about the structure of the containing file, so it can be used on any text file.
Processing the language’s abstract syntax tree is much less error-prone, for one. C macros (and any other text based macro system, including Cog) can be harder to debug than an AST version. While you could implement something complex, like list comprehensions in Cog, it would be a lot more complex and buggy than the AST-modifying equivalent[0].
That’s probably not what the author (you?) had in mind for Cog, but it is effectively a macro system so tying into the AST could be a big advantage. The only nice part about the text version is that it doesn’t presuppose a host language.
Less error-prone in one sense, more error-prone in another. Any time you have an AST integration, you will run into versioning and integration issues. Consider for example a modern ES6 stack with Babel -- will all your third party tooling recognize the latest syntaxes recognized by Babel? And if so, will it all output AST to text in the correct way? Probably not. Same goes for versioning in languages like Python 2 vs 3 where the AST is only slightly different.
It's much simpler for a tool like this to be "dumb" -- leave the correct syntax to the human, since it's a lot easier for a human to deal with on a one-off basis than to have a group of developers writing and maintaining dozens of AST parsers and code generators.
Given the number of code injection vulnerabilities and escaping confusions we see, I disagree with the assertion that humans are good at understanding code in the same way that parsers are.
Also, any modern IDE already needs to parse the AST anyway to provide any half-decent inspection features. The grammars/specs of most popular languages are also readily available and well-maintained.
On the contrary, if you add search/replace style macro processing, you actually make the code more difficult to formally analyze because, then not even an IDE could build a meaningful AST without actually expanding the macros. (which in this case would mean executing arbitrary python code)
> Also, any modern IDE already needs to parse the AST anyway to provide any half-decent inspection features. The grammars/specs of most popular languages are also readily available and well-maintained.
But do those IDEs actually execute the in-language generators? Which AST are they reporting, the preprocessed or postprocessed?
I've heavily used M4 with C code before and what I liked was the ability to see and run tools against the postprocessed code. Textual replacement can be thought of as a worse is better approach, a principle behind many of the systems that people enjoy and even find to be elegant (at least as long as they don't look too closely).
I can understand the priority of making the tool language-neutral, but by making it agnostic of the underyling structure, this can also cause lots of "code injection" like problems where the generated code behaves in many unintuitive ways. See e.g. https://stuff.mit.edu/afs/athena/project/rhel-doc/3/rhel-cpp... for examples.
The Swift code base uses something similar that the core team wrote: GYB, Generate Your Boilerplate. It’s used to generate several variants of similar code that would be cumbersome to maintain otherwise.
I use this extensively in a production codebase to help facilitate keeping things DRY across multiple languages/filetypes (java, xml, less, html) while not locking me into a framework since, at the end of the day, if I want to stop using cog, I'm still left with completely normal code.
I've layered a kind of DSL (more Python in comments with a different marker) on top of cog so multiple files can reference the same metadata (domain model fields in my case) when doing the codegen.
Having worked with script generated C++ in the past, this looks annoying as hell to debug.
Whereas macros and templates have compiler support to give line numbers inside the macro/template, generated code errors have an extra step of having to look at the C++, find the error in the generator, rinse, repeat.
Lambdas are better, but if you have to repeat yourself in a way that's too syntactically weird for a template or lambda, you can "#define MY_MACRO(...)" and end it with "#undef MY_MACRO" to keep the namespace clean.
Funny you say that, as we use Cog to avoid having to deal with C++ templates and their associated pains. The nice thing about Cog is that it operates like any Linux command-line tool, just at the text level, and as a result, if something has gone wrong at the compilation phase, you can see what was fed into the compiler to see what exactly was generated. Interpreting C++ template error outputs is an art in and of itself.
Sometimes you're working with a language that doesn't support a sophisticated preprocessor, or where it's convenient to be able to use a higher-level language to generate constants/do math at compile time.
AFAIK, VC doesn't support line numbers in macros. If you want better line numbers generated code, you can generate #line pragmas.
Typically, I don't debug any generated code though, because it's only boilerplate that has been tested a million times already. So that's a bit of a non-issue in the real world.
This is cool and I could see myself using this, but I wonder why it's necessary to `import cog` in the examples? Seems like it'd be better to just include cog implicitly in the namespace by design for these snippets since you're practically always going to use it.
I wrote a generic version of cog that can use any language as the generator code. It's called gocog, because it's written in go, but once compiled, it's a static binary, and you don't need go on the host machine.
It's directly built off of cog's ideas and mimics much of cog's interface. (I worked with Ned, cog's author back in the day, and really enjoyed having cog to write boilerplate for me).
gocog is some of the first code I wrote in Go, so it's not super pretty code, but it's a very useful little tool for generating boilerplate.
This is better in a sense that the user doesn't need to learn something new except for the language the user is programming in and python (even though theoretically manipulating the AST will be less error prone, etc...). I use javascript everywhere, but I still didn't learn how to make babel plugins/macros because copy/pasting snippets of code two or three times is easier than learning.
It's a pity that people still couldn't make a language that has a super intuitive macro system like lisp (homoiconicity, the AST is the language), and a intuitive syntax like python.
I actually believe that this is partly because most Lisp users don't like the idea of new syntax and that all major lisps (CL, Clojure, Scheme) doesn't have syntax sugar as default.
I would appreciate if a new CL tutorial appears that uses infix notation with reader macros(`#I`) or a Clojure tutorial that uses the infix package(https://github.com/rm-hull/infix).
It will be great to beginners because 1. they wouldn't be scared of prefix notation and 2. it shows (a part of) what lisp macros can do (introduce syntax sugar in a way that is natural to the language).
I built something like this in ~2006 called "PHPinPHP" because I wanted to generate PHP classes from my mysql schema. It even used "[[[...]]]" blocks like cog does.
I eventually realized that A) generated code is completely unmaintainable; and B) the reason I thought I needed code generation is because my base language wasn't flexible enough.
Later on I switched to python and haven't yet hit a problem that I need code generation to solve.
* Insert generated C++ and Python boilerplate code
* Generate parametrized tests based on an external data files
* Copy code from one place to another and keep them up-to-date
* Pinning dependencies across multiple projects using a single source of truth
I can't remember if I saw Cog first, and wrote a different version that fit my needs better, or if I only found Cog afterwards...
It uses a more PHP-esque syntax for inserting Python code. It has inline expression syntax, and quote functions, which IMO make it nicer than Cog for using as a code preprocessor--it's easy to make e.g. function specializations, or loop over code blocks. It's not very well documented though, and is probably missing some nice features.
Why does this need to be a standalone tool? I can already do this in Emacs by pasting emacs-lisp code snippets and executing then while editing the file, inserting the output into it. Do other editors not have this feature?
What I like about it (I use it) is that it lets me drive code in low-level, somewhat math-incapable languages at compile time. I have a project right now that does real-time signal processing in an FPGA. The actual project itself is organized in Verilog, which is cumbersome to do math raw math in. I use cog in my build to do a bunch of preprocessing math in python. For example, since I use cog to calculate a bunch of constants (for coefficients and the like), I can change the sample rate in one control file and the compilation process will re-do all the math for me.
Doing it manually in an editor is fine, but it's useful to automate the process also. Just because something can be done in an editor doesn't mean you don't want to be able to do it at the command line too.
I'm not exactly sure I understand the use-case, but as far as the example used in the article, shell-command-on-region accomplishes the same kind of thing. Why not leave the "generation code" as a comment in your file the same way cog does?
Template-generators like cog are meant to run periodically, for example every time you 'compile' your project. Often they contain dynamic elements which can change between each run.
Using your emacs-command would defeat that purpose, because you would need to search the region at every run and re-execute it manually again and again and again. And you would need to documentate the command anyway, because nobody can remember all those regions. So why not automate this task then?
> Why does this need to be a standalone tool? I can already do this in Emacs...
It needs to be standalone so that you can make it part of your build process instead of messing around in some text editor that not everybody wants to use.
C++ macros can't read a configuration file to generate code (at least I don't want to know that they can!). And Cog works in any text file, so it can be used for languages (like HTML) that don't have macros.
One use case for this could be dumping generated algorithms from sympy. I was doing some constraint programming and ended up almost writing something very similar albeit poorly and ad-hoc, by generating .c files that I #included into other .c files, it was very messy. The use case was to write some mathematical relations and generate C functions to calculate their differentials. It was a lot of manual copy-pasting until I came up with the #include trick, but this would have been better.
Your example could do with some syntax highlighting. There's so much ugly punctuation in it I didn't even notice the actual code at the end. Plus my first impression was that I would never put something so unreadable in my source, whereas in a nice green comment I wouldn't care so much.
In Java I solved the problem of whitespace by just running my result through google-java-format, but I see how Python's offside rule would make that totally impossible.
Interesting stuff. Quite a while ago I wrote something vaguely similar using JScript to make a demo of a sort of 'mathematical' document editor as part of the VB Classic Wikibook, see https://en.wikibooks.org/wiki/Visual_Basic/JArithmetic
I think jq would be a better DSL for this, not least because it's easy to integrate libjq into C/C++/Rust programs.
I've also been thinking of building a trivial little library to use jq for configuration files, where jq syntax is more convenient than JSON, and too where you can always write or alter configuration objects using path-based assignments, so you get to choose JSON-style or TOML-style.
Or you can generate pieces of code with non-embedded Python scripts at the earliest stage of make and inline them with the host language's preprocessor.
This way you'll have the same functionality but with standard tooling for every language. This means conventional debugging, static analysis, testing etc.
No-no, I mean if generating code is in a separate Python file, then I can debug it with pdb, I can profile it with cProfile, I can run Pylint on it, - all the standard tools.
I like the idea of everything sharing the same file, too, but it does make working with Python part a bit more difficult.
Also, with keeping it separate, the "every language" part comes up. It doesn't have to by Python if it's a separate code generator. Whatever suits you will work. You can generate code for C in C++. Or assembly in Common Lisp. Everything in anything.
We have something at work that's very similar but perl-based for our VHDL source. It saves a lot of boilerplate for conversion functions, null types, read/write cpu functions etc. Vhdl has pretty awful templating so it's really useful, you just need to use it sparingly else it becomes unreadable very quickly.
I do this using cog for Verilog code. It lets me generate signal processing constants/math in the signal processing code itself, driven by a single control file (with, i.e. system sample rate) without having to rely on Verilog's awkward (where even available) math.
So, it's like m4 macro processor which author used as XSLT. Could be a good call today if Cog had access to python's AST instead of plain text and interacted with Swagger/OpenAPI or something, I mean tools like autorest.
I feel the tool would be more useful if you could process the target language's AST instead. This would result in hygienic macros as well as making the code easier to analyse (and might solve the whitespace problems as well, as a formatter could render the whole tree in the end, after any code generation was already applied)