Then I added features like news and an RSS feed, a way to automatically list my research publications and course materials, a list of books filterable with tags, etc. So now it still is a Makefile but the Makefile itself is a bit simpler than it used to be, but it calls a few Bash scripts that in particular make use of the awesome xml2 and 2xml utilities to be able to manipulate HTML in a line-oriented manner using the core utils (grep and sed mostly).
On top of that I have a few git hooks that call make automatically when needed, in particular on the remote server where the website is hosted so that the public version is rebuilt when I push updates the repository there.
It's been working like a charm for years! My git history goes back to 2009.
EDIT: I just had a look at the first commits…
beccad7 (FIRST_VERSION) Initial commit
d1cc6d7 adding link to Google Reader shared items
6ccfd0c fix typo
d337959 adding link to Identi.ca account
One of the nifty parts about having a static site generated offline from trusted inputs is that it doesn't matter whether the generator components are "abandoned" or complete.
Ehhh... assuming their dependencies and your operating system maintain compatibility with it on the order of decades, yes. Which do exist, but they're understandably rare.
And it's complicated by this only being knowable in retrospect, as you can't predict the future. "Not abandoned" is a positive sign for "if we failed to predict the future correctly, it'll be fixed", rather than mostly relying on luck.
(Thankfully full-blown simulation is often an option nowadays too)
If converting markup to/from line format is your thing to put awk, perl, and other line-oriented tools to use, there's also the ESIS format understood by traditional SGML tools and used by SGML formal test suites even.
This defeats a big part of why you’d want a build system in the first place (incremental builds), but at least if you know the page you want to regenerate you can still `make` that file directly.
If there’s a common workaround for this pattern in makefiles I’d love to learn it.
Not sure if it’s a common pattern, but my solution to this was to always run a command that deletes all “unexpected” files, using GNU Make’s “shell” function to enumerate files and the “filter-out” function to filter out “expected” outputs. Edit: I ensure this runs every time using an ugly hack: running the command as part of variable expansion via the “shell” function.
File deletions and renames are common problems with many revision control / build systems.
Other than the nuclear option ("make clean"), another is to have a specific rename / remove make target, so:
make rm sourcefile
or
make mv sourcefile newsourcefile
... which will handle the deletion and renaming of both the original and generated targets.
In practice for even fairly large blog and online projects, a make clean / make all cycle is reasonably quick (seconds, perhaps minutes), and is often called for when revising templates or other elements of the site design. If you're operating at a scale where rebuild time is a concern, you probably want to be using an actual CMS (content management system) in which source is managed in a database and generated dynamically on client access.
The cleanest way to do it is essentially "make install". You do all the heavy build steps into a build directory, and then the final stage is to delete the "output" directory and copy all the files you need there. Incremental builds should still be pretty fast since the only repeated action is copying files (and you could link them if you want instead).
Conventionally, install puts the outputs in their where they will live for use. It does so in a way that 'make uninstall' will leave things as they were before install. The install target should also run any pre- and post-install commands. There's also a 'make dist' convention, to build a release tarball.
This is the way, because intermediate build artefacts also end up in `build/`. You don't want those in your `output/` directory, but you also don't want to delete them because they help speed up the incremental builds.
Edit: `make install` also protects you against broken builds breaking your live site.
But I think the best solution (that also works with make) is to have a "make dist" target that creates a final .tar.gz archive of the result. If the rule is written properly then it won't contain any stale files. The disadvantage is for large project it may be slow, but you are not supposed to use this rule during development (where it is useless anyway), only for releases (which still can be built incrementally -- only the final .tar.gz needs to be created from scratch)
Not sure if anyone actually uses it, but I would approach the problem with find, comm, and a sprinkle of sed:
comm -23 <(find build -type f -iname "*.html" -printf "%P\n" | sed 's/\.html$//' | sort) \
<(find source -type f -iname "*.md" -printf "%P\n" | sed 's/\.md$//' | sort)
The find commands get you a list of all the files (and only files - directories will have to be removed in a separate step) in each of the build and source folder, sed chops off the extension, while comm -23 compares them, printing only the files unique to the build folder, which you can then deal with as you see fit (e.g., by feeding them to xargs rm).
I think the best solution is to use something like webpack or vite or whatever. These usually have their own dev server and can watch directories for changes.
My personal site is also using a custom make-like ssg, but after spending a disproportionate amount of time writing the bundling/packaging code, I decided to just switch over to one of these tools. It’s a solved problem, and it greatly reduced the complexity of my site.
My initial reaction is: I should probably get around to learning about Flakes. I’m not sure I’d want each blog post to pin its templates, but it’s nice to have that choice.
A flake is a function, right? Invoking `nix build` resolves its inputs and computes its outputs. So if the static site is the output, wouldn't the content have to be one of the inputs?
Personally, I think I'd use two flakes, one that builds the content into something that's ready to hand-off to code, and a second one that turns it into a usable site. That way you wouldn't end up with a new input for each post, but instead would have a versioned something which represents your content all bundled together, and then the site builder consumes it as a single dependency--but conceptually it's the same.
I guess I'm just saying that it's not conventional, but it's a pretty logical conclusion to reach.
I admire how nimble you are. I aspire to write blog posts at the drop of a hat like this, but I rarely do.
I also like the use of flake inputs for content.
It reminds me of a world that I've been imagining where the conclusions in scientific papers are generated as a flake outputs (an adjacent output would be the human readable thing, a PDF or whatever).
In this world, you can just run `nix flake update && nix build`, and if a paper that you cite published an update which invalidates your conclusion, you know right away because their output is your input, so your build fails.
We think about repeatable builds being for executable binaries, but they could equally well be for conclusions and assumptions.
Perhaps nix is too big of a hammer for the job, but it seems like the best shot we have at achieving this without also constraining the scientist re: tooling.
I realize that you don't want to be storing mountains of data in the nix store, but it would work just as well if the output in question is an IPFS CID, to be resolved during the build instead. The publisher can then be in charge of keeping that CID pinned and of notifying scientists when they're "build" starts failing.
> I admire how nimble you are. I aspire to write blog posts at the drop of a hat like this, but I rarely do.
Thanks! I took up blogging more often as of recent, and for me, having a manageable system is a large part of that. The last thing I want to happen on a Sunday evening is breaking some page of my website. That being said, I hope to one day make the workflow easier.
> It reminds me of a world that I've been imagining where the conclusions in scientific papers are generated as a flake outputs (and adjacent output would be the human readable thing, a PDF or whatever).
I happen to be a reviewer for software artifacts in a scientific journal, and I often use Nix here. Not that many projects do use it, but if I'm able to reproduce it with Nix, then I know the author has not missed any implicit dependencies. I like to imagine it's also useful for the authors as a feedback, whether they use Nix or not.
> I realize that you don't want to be storing mountains of data in the nix store, but it would work just as well if the output in question is an IPFS CID, to be resolved during the build instead.
I maintain separate build serves of my own using Nix integrations and the Nix cache is quite large already (so called remote builders) sitting at around 500GB. I host these at Hetzner.
I have also thought adding IPFS integration for my website, but haven't got around to it.
> I happen to be a reviewer for software artifacts in a scientific journal
That's very cool. I have a question for you.
I'm taking a bioinformatics class, despite not having the chemistry prerequisites. I'm getting a crash course in biochem, and the rest of the class benefits from having an expert in what-kind-of-quotes-to-use.
I've been thinking: would it be helpful if the care and maintenance of these compute environments wasn't left to each scientist but was instead aggregated (perhaps per-class or per-university)?
We're setting these chemists up with conda in Ubuntu in WSL in a terminal whose startup command activates the conda environment. Not exactly a recipe for reproducibility after they get a new laptop.
What if certain compute-heavy classes published flakes which the students could...
a) use while taking the class so we stop wasting time on troubleshooting ssl deps via conda
b) reference in publications after the fact. They could say:
> Here's a Jupyter notebook, download it and run it in the UCCS biochem environment like so: `nix run github:UCCS/CHEM4573?rev=16afd67`, its output lets us make the following conclusions...
I know it would be helpful for the students in the class. Do you think it would be helpful to them later on when they were publishing things?
I'm thinking about packaging the dependencies for this class, giving it to the teacher, and pitching it to the university:
> Set up a technical fellowes program. Waive tuition for us nerds and in exchange we'll support your students and faculty through the maintenance of these environments.
I don't mind paying tuition so much, but I'd like to do something to get a bit more cross pollination going between scientists in need of tech support and techies in need of something meaningful to work on.
Am I dreaming here, or would it solve some problems? Do you think I have a shot at convincing anybody?
Not sure on your issue with reproducibility using conda is.
We (team of RSE working with many researchers) have had good success with storing conda environment files in git along side the code, only a few commands to get a working environment. We provide class room training to researchers and provide the training material and environments this way.
I don't have the link to the github issue handy, but I remember that the key was to ask for a lower version of python and then supply the `--force-reinstall` param. The fact that it only happened to some students was evidence that conda wasn't as hermetically sealed as nix.
My real gripe is that in that issue, the app developer couldn't really help--since it was a packaging problem--and the conda folks were unaware because the users had gone straight to the app developer. If there must be a third party doing curation, it seems to me that they should be more narrowly focused on whatever particular suite of tools enables whatever particular group of people--not on individual packages.
I know that conda lets users do this too, but I don't think that the environments compose as well as they do with nix. If you want Jim's envioronment, but with Susie's custom build of foo-tool, you can just take both as inputs, overwrite foo-tool as desired, and output composition. Your maintenance burden remains small. If conda handles environments with this kind of compositional attitude, I'm unaware of it.
> I don't have the link to the github issue handy, but I remember that the key was to ask for a lower version of python and then supply the `--force-reinstall` param. The fact that it only happened to some students was evidence that conda wasn't as hermetically sealed as nix.
Conda can be a bit fiddly, it makes life much easier if you specify the required version of python when you create a new environment.
>I know that conda lets users do this too, but I don't think that the environments compose as well as they do with nix.
Yes, though we find Conda great for most of our use cases, we still have to resort to creating containers.
> I've been thinking: would it be helpful if the care and maintenance of these compute environments wasn't left to each scientist but was instead aggregated (perhaps per-class or per-university)?
This is definitely something that Nix can abstract quite well. In my company we have [an infrastructure of computers](https://github.com/ponkila/homestaking-infra) that we manage with NixOS. We have gone over the system such that `cd` into a directory "conjures" the environment using devenv or direnv. We don't do anything too fancy yet, but we have a project commencing next month in which we start to also manage routers this way. We speculate that this will help us to do things such as follows: register new node, and it gets automatically imported by the router which establishes DNS entries and SSH keys for each user. The idea is that we could have different "views" of the infrastructure depending on the user which the router could control. For administrators, we have a separate UI created with React that pulls NixOS configuration declarations from a git repository (note: these don't have to be public) and shows how the nodes connect with each other. The UI is still under construction, but imagine this but now with more nodes: https://imgur.com/a/obBfRk0. We have this set up at https://homestakeros.com.
Depending on a project you are working on, you could then have a subset of the infrastructure be shown to the user and have things such as SSH aliases and other niceties set up on `cd` in. When you `cd` out, then your view is destroyed.
We have quite overengineered this approach -- we run the nodes from RAM. NixOS has the idea of "delete your darlings" which is having a temporary rootfs. We have gone the extra mile that we don't even store the OS on the computer, the computers boot via PXE and load the latest image from the router (though any HTTP server will do, I boot some from CloudFlare). We do this because it also forces the administrators to document changes that they do -- there is nothing worse than starting to call up people when theres is downtime and try to figure your way back up from what the mutations are. PXE booting establishes a working initial state for each node -- you just reboot the computer, and you are guaranteed to get into a working initial state. I'm personally big on this -- all my servers and even my laptop works like this. We upgrade servers by using kexec -- the NixOS configurations produce self-contained kexec scripts and ISO images for hypervisors (some stakeholders insist on running on Proxmox). I've suggested some kernel changes to NixOS which would allow boostrapping arbitrary size initial ramdisks, because otherwise you are limited to 2GB file size.
> We're setting these chemists up with conda in Ubuntu in WSL in a terminal whose startup command activates the conda environment. Not exactly setting them up for reproducibility if they ever move to a different laptop.
Python in specific is a PITA to setup with Nix, dream2nix etc., might help but it's definitely the hardest environment to set up of all languages I've tried -- even GPGPU environments are easier. Oftentimes, the only problem is not the packaging, but also the infrastructure used. For that, you could also publish the NixOS configurations and maybe distribute the kexec or ISO images.
A notable thing is that devenv also allows creation of containers from the devShell environment, which may further help your case. Researchers could reference docker images instead of insisting on everyone to use Nix.
In any case, I put some emails on my HN profile so we can also take the discussion off platform -- we are looking for test users for the holistic approach using PXE, and we are currently funded until Q3 next year.
Reading through your link I caught myself thinking if I would put up with all those boilerplate nix steps just to add a new page to the site.
Don't get me wrong, I get that you gain big amounts of flexibility out of it the way you do it but if we think about the tasksat hand, adding a page to a predefined blog, it seems a bit involved.
A fair comment, I do not disagree. I do plan to one day do an `ls` command on the root Nix file so that the manual update to the root flake for both the inputs and the RSS feed would be redundant.
I was instantly inspired by Karl's work on his "blog.sh" shell script[0] that he mentions in this article. I took it and tweaked it to create my own minimalist SSG called "barf"[1]. That wouldn't exist if Karl didn't share his wonderful work publicly!
Adding a pinch of m4 [1] can give you a bit more of flexibility while sticking with the same barebones approach.
I used to maintain a small website built like that some 20 years back. But I can't see the model working today, personal websites excluded. The problem is that the approach essentially enforces Web 1.0 roles: You either need every contributing user to be html-proficient, or someone willing to assume the drudgery of being the "webmaster".
There is no such thing as a "pinch of m4". You start a clean project promising that you won't touch m4 this time. Then you add a small m4 invocation to save yourself from some boilerplate.
A year later, when you are trying to figure out why all instances of the word "cat" are silently disapearing from your website, you dig through 5 layers of macro expansions to discover that a junior dev tried implementing a for loop instead of copying it from the manual and messed up the quotation marks.
Having solved the immediate issue, you decide that debbuging your DSL is too hard, so you import M4 macro file you have been copying between projects. You then spend a day replacinf all usages of 'define' with your macro-creating-macro that adds comments to the output enabling your stacktrace generation script to work.
Next project, I am putting down a hard rule: no m4! (Except for maybe that one instance)
Not "this" story. Everything above has happened on several projects. The cat thing comes up because it is tricky to expand two macros next to each other without whitespace. So if you do:
define(`foo',`hello')
define(`bar',`world')
foo bar
foobar
You will get:
hello world
foobar
Working around this gets tricky, so someone inevitably ends up writing a cat-like macro such that you can do
cat(foo,bar)
To get
helloworld.
A side effect of this is that now "cat" is really "cat()" which expands to "". You can work around this by doing `cat'. However, if `cat' is used as an argument to another macro (such as a for loop), the quotation only prevents escaping the first time. When the for macro is expanded, the quotation marks are stripped, giving you just "cat", which gets expanded again. A correctly written for macro would add new quotes as needed, but I have never seen someone correctly write such a macro without just copying it.
Not sure if I have seen this interaction specifically with for and cat, but I have seen an interaction like it on almost every project that used m4.
You can place an empty "expansion" in that line to get the behavior you want without an additional cat-like function
foo`'bar
I only know of this feature because recently read the manual page for m4, and it's mentioned rather early in there, but might have been not as well emphasized in past iterations of the manual.
I've only ever used m4 via autoconf and sendmail configuration files, so I don't know if it's m4 that has the bizarre syntax or whether it's autoconf's and sendmail's use of it. I'm not sure I've ever tried to use m4 directly for anything.
That's why I wrote a "pinch" in the OC. m4 is arguably better than cat as a barebones template engine. The moment you start doing anything beyond simple includes and variable extrapolations, it is time to switch something more modern and robust.
I guess autoconf/sendmail still use m4 because there wasn't anything better at the time that doesn't come with a kitchensink attached.
Rather than relying on generic text substitution using m4 or perl or whatever, I suggest using SGML, the basis and common superset of HTML and XML, which comes with easy type checked text macro (entity) expansion for free or even type-aware parametric macro expansion. Where "type" refers to the regular content type of a markup element (ie. its allowed child elements and their expected order) but also considers expansion and escaping into attributes or other context such CDATA or RCDATA. Only SGML can extend to the case of properly expanding/escaping potentially malicious user comments with custom rules such as eg. allowing span-level markup but disallowing script elements, does markdown or other Wiki syntax expansion into HTML, can import external/syndicated HTML content, produce RSS and outlines for navigation, etc. Works well for nontrivial static site preparation tasks on the command-line; cf. linked tutorial and command line reference.
A comprehensive package for processing, converting, and serving SGML on the command line, on the server side, or in the browser; see [1]. Also features SGML DTDs (grammars) for W3C HTML 5, 5.1, 5.2, and Review Drafts January 2020 and 2023, which are the latest non-volatile W3C and WHATWG HTML recommendations/spec versions.
Edit: your comment is a welcome reminder to improve the site, which isn't an easy thing to do however due to sheer volume of the material, even though it's using SGML for shared boilerplate inclusion, ToC/site nav and page nav generation, etc. (in fact, by calling sgmlproc from a Makefile)
Instead of `m4` or `sed` find and replace, the author should try `envsubst`. It's a program that replaces bash style variable references (for example `$TITLE`) with their value in the environment.
I agree that `envsubst` is a good choice for this. Unfortunately, it is not part of posix, so you can't rely on it being present everywhere. But as part of gettext, it is still very common.
At the dawn of the age of PHP, I created a user management system (registration, verification, admin interface, …) that was based on well-established ideas (how login worked at Yahoo, Amazon, and every other process major site) but got no traction at all as a open source project. In any language that wasn’t PHP it would be necessary to write an “authentication module” which as about 50lines of cookie handling code. Multiple times I managed to out several existing apps together and make an advanced web site.
About 10 years ago the idea suddenly got traction once it was legitimized by the SAAS fad, I would tell people “don’t you know they’re going to end the free tier or go out of business or both?” and sure enough they did.
Anyhow, I bring it up because the system used M4 to interpolate variables into PHP, other languages, shell scripts, SQL scripts, etc.
Ugh, I know exactly how this feels. You resist the urge so hard to say “I told you so” and instead relish in the fact that you saw it. “The Way”, so to speak.
I remember having to write cgi cookie handling code. I remember having to write session-cookie sync code. PHP was a small slice of heaven in the cgi world. Until it wasn’t. Still, being able to import libraries of script functions without having to recompile was wizardry. The problem with php now is they let a certain product somewhat dictate their direction. Class namespaces with slashes is the ugliest design choice.
What was your oss project that couldn’t get traction?
It was called Tapir User Manager but the web site was down for some time. It was an open source failure but a career success because I used it around 8 projects including the arXiv preprint service, a voice chat service that got 400,000+ users, and the web site for our county green party (which had national impact.)
I too had a small web site with M4 around 1999/2000. Why M4? Because I'd learned enough of it to be useful/dangerous when wrestling with Sendmail, and it seemed to do the trick (at least when the trick was simply "be easier than manually editing lots of HTML files every time there's a site-wide change").
I suspect I was never doing anything complicated enough to encounter the gotchas mentioned by other commenters...
I like it that (almost) every dev blog I come across on HN has an RSS feed.
For every interesting article that I read here I follow the feed. Whether you have a Wordpress site, a Bear Blog, a Micro blog, a blog on Havenweb, or a feed on your self-built site, I add them to the 'Really Social Sites' module of Hey Homepage.
Ultimately, I would like to publish this list of blogs, just like Kagi now does with their Small Web initiative. But I guess curating is key to adding quality. And when I think about curating, starting some kind of online magazine seems only natural.
I'm trying to understand (as a dev) if there is something "wrong with me" for not wanting to have my own blog. Where do people get the "entitlement" (I mean that in the best way possible) to share with other people/assume other people care what they are working on? It feels like a competition sometimes. "I need to work on something as cool as possible so I'll get some likes/impressions on my blog".
Collaboration is obviously cool and only works with making it all public, I just don't know where "I'm doing this because I think it's cool" and "I'm going to put effort in to share it with others to get reactions"
I have a blog, but I mostly assume people _don't_ care what I'm doing or thinking. Some of my posts have probably never been read by anybody. I still personally find it worthwhile for a few reasons:
- The mere possibility that someone will see it pushes me to put more thought and effort into what I write. Sometimes this reveals weaknesses in my ideas that I would have glossed over if I were just writing private notes for myself; sometimes it leads me to actually change my opinions. It also means the blog posts are easier for me to understand / get value out of than notes are if I come back and reread them years later.
- It creates opportunities for people to connect with me which can pay off at unexpected times. Occasionally people have reached out to me to say a post helped them or resonated with them, or to give a thoughtful reply or ask a question. Those sorts of interactions are really satisfying even if they're rare. (One time, I was interviewing for a dev job and the interviewer asked a question about a post I'd written on the philosophy of John Rawls, and how it could connect to software engineering. I found that absolutely delightful.)
- It's just nice to have an outlet when I feel like writing about something.
I don't have a blog myself but am this close to creating one.
Some guy said that it's a progression.
You start using the web by being a casual reader. At some point you get more comfortable in public spaces and start replying small comments like you would reply to someone afk.
Then you start reading more and more about specific subjects, amassing knowledge, and your replies have more content. They start being organized. They have a structure, to guide future readers and show them how you came up with your conclusion. They have links to sources. They leave open doors for the parts you don't know.
Then you start writing more and more comments, with more and more content, as a result of your experience.
Then comes a moment where you realize you're going to write the same thing for the nth time, and being a good engineer with a focus on DRY, you want to write your thoughts once and for all and link to it every time. This is the moment you start writing a blog that you actually maintain: you write not because you feel the need to write more, but because you want to write less and direct people to it rather than repeating yourself.
I don't think there's something wrong with you. I also think there's nothing wrong with people sharing _interesting_ stuff, whether they do it ultimately for shallow likes or for ... you know... just sharing _interesting_ stuff.
On a side note, I get the "entitlement" from nobody. I take it. I also mean that in the best way possible. Nobody's asking for my software, my (future) articles, my point of view, etc. Still, I make stuff and sometimes share stuff. I think it can be a net value for some people (definitely not for everyone). This is only the reasoning behind it, the main motivator was me realizing I matter as a human being and I have only one life to live. I learned that because of experiencing a 'dark night of the soul' a couple of years back. Luckily I got through. And to be honest, if it wasn't for the internet - made up of personal websites and real people sharing their own experience on forums - that taught me everything there is to know about Cluster B disordered personalities (just an example, cough nothing personal cough), I don't think I would be sitting here typing this lengthy response.
I realized I can not sit back, enjoy the decline of the internet, and only complain about it. I would love to see the web have a lot of personal websites and blogs about every kind of subject, so I started to build a website software. The web/internet, and all the information shared and made easily accessible, made me able to save myself. I was probably helped more by some random dude who put up a website fifteen years ago with everything he knew about certain stuff than I was helped by anything else.
Odd take. If you spend several hours figuring something out, it’s quite neighborly to write it up for the next person. “Shoulders of giants” and all that.
I’m certainly grateful for their help, and even written up a few of my own.
Not saying you're right or wrong, but I myself don't want to look at it like it's a competition of the loudest people.
I've read so many blogs through HN over the last years, and every one of 'em had something interesting to say while also portraying something personal from the author. Whether that's a nice layout, nice color scheme, or even some nice jokes in their bio text.
To me, it can not get any more human than this. Pure individuals connecting on a world wide web. By links, by email, by RSS feeds. All without big tech.
I agree with everything you wrote — what I was trying to communicate is that there’s no shame in not feeling the urge to share as it happens to be that the vast majority of us, like the gp post, don’t but that’s not easy to see or quantify.
Aside, I almost wrote “silent majority” but that seemed like it was veering towards politics so I went with vocal minority; I suspect there is a better term out there but I didn’t find it quickly.
I admit I interpreted more in your short post than was there. People definitely should not feel shame for not feeling an urge to share!
I still encourage people to share though, because I think a lot of people would like to read personal stuff about topics that interest them. Doesn't even have to be with your name and all next to it, anonymous/pseudonymous homepages are usually possible.
Therefor I offer free websites (on a subdomain though) for people that would like to write or post photos about their hobbies. And know that there are way more possibilities to go online, just look at the OP of this thread with a nice SSG.
Writing my longer reply I realized that early social media is a strong counterpoint — people absolutely loved to share when the barrier to doing so was low, the platforms hadn’t been given over to commercialization, and it was less obvious that those details were going to be ingested into an advertising profile. It sounds like you offer a bit of that without the motive or intent that turned mainstream social media into what it is and I think that’s great!
Yeah, somewhere between the homepages and webrings of the nineties and the added social functionality of the early social media platforms. Ideally without the platforms and their incentives. The web itself is already a social platform, a social medium. No need for more layers, especially if they ultimately are against my interests. I think RSS still holds the potential to connect individual websites/people, albeit in a slightly (or maybe even fundamental?) different way than the social media platforms do.
Question: what would be your number one topic/subject to blog about, other than anything tech related?
A friend of mine described using make to generate scientific papers. He explained that if he changed a single test file, the entire paper could be regenerated including running tests and generating graphs the changed test with a single command.
It's a neat idea, though I have to point out that if you're already pushing to Github, you could just push the source and Github will publish your markdown as a hosted page: https://pages.github.com/
I love the code [1]. Mine [2] is a bit over engineered because I wanted hot-reloading (without JS), and it was a delightful yak shave.
But the basic idea is the same --- heredocs for templating, using a plaintext -> html compiler (pandoc in my case), an intermediate CSV for index generation. Also some handy sed-fu [3] to lift out front matter. Classic :)
case ${file_type} in
org )
# Multiline processing of org-style header/preamble syntax, boxed
# between begin/end markers we have defined. We use org-mode's own
# comment line syntax to write the begin/end markers.
# cf. https://orgmode.org/guide/Comment-Lines.html
sed -n -E \
-e '/^\#\s+shite_meta/I,/^\#\s+shite_meta/I{/\#\s+shite_meta.*/Id; s/^\#\+(\w+)\:\s+(.*)/\L\1\E,\2/Ip}'
;;
md )
# Multiline processing of Jekyll-style YAML front matter, boxed
# between `---` separators.
sed -n -E \
-e '/^\-{3,}/,/^\-{3,}/{/^\-{3,}.*/d; s/^(\w+)\:\s+(.*)/\L\1\E,\2/Ip}'
;;
html )
# Use HTML meta tags and parse them, according to this convention:
# <meta name="KEY" content="VALUE">
# cf. https://developer.mozilla.org/en-US/docs/Learn/HTML/Introduction_to_HTML/The_head_metadata_in_HTML
sed -n -E \
-e 's;^\s?<meta\s+name="?(\w+)"?\s+content="(.*)">;\L\1\E,\2;Ip'
;;
esac
Huh, I previously skim-read the code and didn't notice the GEMINI regex detail. I wonder why they're doing that.
Re: namespace organisation. I thought about that a lot, and decided to adopt namespace-only convention for symmetry between text file layout, html file layout, and url scheme.
I've treated Date/time as metadata, which I can use to organise index pages. If I get to years worth of posts, then I'll group them by year/month or something reasonable. Likewise tags. I debated tags _and_ categories. But I decided on "everything is a post with tags, and categories will emerge based on topical coverage + post format".
The benefit of make is that large programs that are built by slow compilers can be incrementally rebuilt much faster in the face of small changes. Something that would take 40 minutes to do a full rebuild can build in three seconds or whatever.
If your static site can be generated from scratch in under a second by just catting a few hundred HTML files with a common header, there is no benefit to using make over a script. You only risk performing an incomplete build due to a bug in the dependencies.
Wow, this is almost exactly what I was planning to do for my site. For another small project, I wrote a tiny shell script as a makeshift "bundler" (just embeds the CSS and JS inside the HTML) with the goal of also being able to serve the unbuilt files locally:
and the script just deletes the <link> tag and replaces the /style/ comment with the contents of styles.css. Definitely not my finest work but it worked well enough.
Interesting. So I'm a weird sort, I imagine, in that I'm the type that has been using Linux and shell scripts for 20+ years, but never actually done any big-time coding, and thus I really don't know "make."
Point being, I do something very similar to this; except I first simply write/create my website in Zim-wiki, but then I have a bunch of little tasks to "clean up," i.e. fix/modify some links and then use the Canvas API to update my main course page (which, because I hate Canvas that much, simply links out to my own site).
Makefiles honestly are just glorified shell scripts. Some of the syntax is a little odd, but you trade that for a more standarized format and the ability to add different build options without mucking around with argparse yourself.
As someone who also writes a lot of shell scripts and has for decades, I’d guess that if you learn just a little make, you’ll find lots of non-coding uses for it and wish you’d learned it earlier. ;) It’s just another great unix tool that is sometimes very handy to augment shell scripts when you need it, not unlike find+grep+sort, cut+join, awk+sed, gnu parallel, etc.. I think make is under-appreciated for its uses outside of code compilation. I use it anytime I’m making gnuplots, or doing image processing on the command line, for example. Whenever you have a batch of files to process, and the batch might need to re-run, and it transforms into other files or a single big file, then make may be the right tool
Make has at its core one thing that would be pretty tedious to do in shell scripts: update the target (output) file only if it’s older than the prerequisite (dependency/input) file(s). This applies transitively to files that depend on other files that might change during the run of make, which is the part that really separates make from a shell script.
The thing you do when the target needs updating is run a little snippet of shell script, so that part you already know.
After learning how a rule works, you can combine it with ‘pattern rules’ to abstract a rule over all files that share a common prefix or common extension. Suddenly you have a loop but without any loop syntax, and can process a thousand files on demand with 2 lines of make - and without modification you can change a single input file, have it process only a single output file, and not waste your time re-running the 999 other files.
Also pro-tip: make will run in parallel if you use -j, and it will do it without breaking any dependency chains. If you have a process that turns text files into sql files and then turns this sql files into html files (possibly nonsensical example), make running in parallel will not blindly update html files first, it will run parallel jobs in the correct dependency order. You can use make to build something like gnu parallel, but is able to resume a batch job that was interrupted in the middle!
My number one reason to use make is to have a single centralized location for project commands. If I see a Makefile at the root, I can quickly scan it and have an overview of what high level actions I might execute.
Note that I have recently switched to Just (https://github.com/casey/just). While not technically the exact feature set as make, it covers all of the ground for what I typically do. It gets the benefit of history and discards a lot of make cruft to make a more predictable experience.
The other thing I've been working on is a personal calendar todo list thing; after years of trying to use other peoples things, I realize I know just enough to make something that works with how I think and what I need (in short, a little like Linux Remind, but with the ability to mark things as done in a way, and the flexibility for "due" dates to go "stale" and still show up)
Anyway, the units/items are individual files with content + some tags with dates.
And I've been doing a thing where I source a file with a ton of functions. And..it looks like this is a better version of that. Wild. hmm.
I used a shell script for that, but vaguely thought of changing to a Makefile for a while, and finally did now, thanks to the article reminding of that; it is more appropriate. Though the shell script still invokes make, and then rsync, since rsync seems less appropriate for a Makefile. But now it synchronizes fewer files.
As a side note, I am quite happy with XSLT templates to produce the pages (instead of attaching a static header, as in the article), as well as to generate indexes and an Atom feed.
It’s fun to make your own SSG tool, and this is a great example of keeping it simple.
It’s also interesting to read so many comments of people doing similar things.
For my own site, I find that I want an SSG tool that is simple, intuitive, and stays out of the way. With these goals in mind, I have been able to slowly improve my tool over and over. It’s been awesome to be able to do more using less.
Mine is "an SSG is just a source to HTML compiler and compositor, plus file organiser".
I reviewed a few tools (jekyll, emacs.love/weblorg, hugo), and ended up making mine in big part because I went down the rabbit hole of "well, why is this part here? why is it like this? why can't we do this other thing? wow this bit is cool, how do I make it myself?".
The syntax takes a little trial and error and usually finding real-world examples, but I like "make".
I had one project that involved downloading a jdk, using it to build a project specific jre, patching some class files with local java sources, bundling it, etc.
Without being a make expert, it took me a couple of hours of reading, finding examples, etc...but now I have the dependency stuff working perfectly. Where it now only re-downloads or re-builds things if something upstream from it changed, handles errors well and points out what broke, etc.
All that to say, for some things, it's worth looking into for it's built-in dependency chain stuff. When you need to run arbitrary outside scripts/tools, it sometimes does things that other build tools can't (gradle in my case, couldn't easily do this, or at least I couldn't figure out how).
Make is excellent for tasks where you have a file that needs to exist and steps that reliably cause it to exist.
Excellent too for building a tree of things that depend on prior stages; in your example needing java to run a java applet which generates a file.
the syntax takes some getting used to, but in those cases there is little better.
But I do find people using it as a glorified task runner and it works but it’s quite possibly the least ergonomic tool available. - especially for things like making docker images, which I see extremely often
There's not really any better option though. Once you start setting environment variables, customising tagging, adding file prerequisites, handling building from a different directory (monorepo requiring access to shared resources), you need some sort of wrapper, with options, argparsinc and some rudimentary dependency checking. Make is a super low barrier to entry solution that with a small workaround (phony targets) gives you all of the above.
GitHub’s newer version of pages that lets you deploy via GitHub Actions rather than being forced into using Jekyll is just so amazing. I have converted a bunch of static sites to using it as hosting.
I was there 3,000 years ago… I used SSI’s in the late 90s/early 2000’s before replacing all that with PHP which was a huge step up at the time. I am more than familiar.
The convenience here is largely in being able to use GitHub Pages to host the page while being able to do almost anything you want for a build process. It’s really neat.
I did something similar for mine, I do markdown-to-html using pandoc, then replace the language labels using find (so that prism.js works). I've got it all running via a little Python script (I would've done bash but I'm terrible at it) to generate all the the files easily, rather than going through one-by-one: https://git.askiiart.net/askiiart/askiiart-net/src/branch/ma...
I might move to something make-based like this, looks interesting.
Update: I made it into a bash script, and now it only runs on changed or new files. Far more efficient, both because it's just bash, and because it only runs on what's needed.
Most Static Site Generators generate blog from markdown, which is not feasible for projects like company websites etc. For such projects I like Middleman (https://middlemanapp.com) which provides layouts/partials and things like haml templates.
This amazing course by Avdi Grimm on make and rake for the same purpose has completely changed my understanding of rake and I recommend anyone checking it out:
Well, I implemented the main idea in a day or two. That being the pp preprocessor. The rest, I really can't remember, it was mainly grunt work to see what are the minimum things a web site is required to have. I still have some stuff to remove.
Up until a certain point, yes. Then you start wanting back links, navigation, etc., and doing that with make alone doesn’t quite work, especially if you have a deep tree of files - single folder sites don’t typically have a lot of content in them.
(My site is generated by a Python commit webhook that indexes new files, diffs an Azure storage container and uploads updated files only).
Make provides incremental execution of the build graph. It is aware of the dependency graph, and the state of inputs and outputs. It will (ideally) only execute the minimal build rules necessary.
A shell script that checks file mtimes to determine what has already been built, and therefore what can be skipped, is close in spirit.
Make variants like GNU Make have additional functionality, like automatic concurrent execution of parts of the build graph.
One of the things I did during the pandemic lockdown was work on the simplest possible blog in a single html file. Something that requires essentially no technical knowledge beyond typing text into a file. I recently dusted it off and yesterday I posted the most recent iteration.
a sed script that modifies HTML fragments isn't nothing. And this just does one thing that you might want to customize in the header for each page. It doesn't do things like handle pages having different <title> tags. Every feature like this that you come up with becomes another thing to maintain, and another thing that can catch you out later. When you come back to fix this later, will you even remember what it's supposed to do?
From a web publishing point of view, make and sed are exotic dependencies. There's not going to be a bunch of helpful pointers online to help you debug issues with using them for this purpose. When you google how to fix your sed regex to match specific HTML attributes and not match others, you're going to find stack overflow posts about Zalgo, not quick answers.
Or maybe don't make a site generator and just make a website using HTML files? You'll spend a lot less time painting the shed and a lot more time actually putting your ideas/content on the website.
If you want templating then use server side includes. It's much less attack surface than say, PHP or some complex CMS, but you can still just make a footer.html and stick in at the bottom of every other .html page easily and avoid the one problem with file based sites: updating shared bits.
If only there were a way to make consistent structure in an online document (such as hypertext) and separate the styling into distinct files. Even better, what if we could make separate styling for mobile, desktop, and printing, all with the same content?
If only it were possible using existing standard web technology. Sadly it was never designed with such goals in mind.
Then I added features like news and an RSS feed, a way to automatically list my research publications and course materials, a list of books filterable with tags, etc. So now it still is a Makefile but the Makefile itself is a bit simpler than it used to be, but it calls a few Bash scripts that in particular make use of the awesome xml2 and 2xml utilities to be able to manipulate HTML in a line-oriented manner using the core utils (grep and sed mostly).
On top of that I have a few git hooks that call make automatically when needed, in particular on the remote server where the website is hosted so that the public version is rebuilt when I push updates the repository there.
It's been working like a charm for years! My git history goes back to 2009.
EDIT: I just had a look at the first commits…
… 15 years have passed indeed.