Show HN: A Web-to-RSS Parser in Common Lisp

armitron · on Nov 12, 2016

This is far from ideal (and a bad idea in general):

  (load "~/quicklisp/setup.lisp")
   
  (ql:quickload '(:datafly        ;; for database access
  
                  ;; WEB SERVER:
                  :hunchentoot    ;; for providing a web server
                  :cl-who         ;; for building the HTML output
                  :parenscript    ;; for the avoiding of the horrible JS syntax
                  :smackjack      ;; for AJAX requests
                  :lass           ;; for building the (S)CSS styles

You should not use Quicklisp to silently install dependencies as part of your normal operation sequence, because:

+ It imposes its own view on how libraries are retrieved, stored, managed and updated.

+ It is vulnerable to man-in-the-middle attacks.

+ Used in this fashion, it effectively becomes part of your program. The more people that do this, the more ingrained it becomes.

+ Plenty of us do not use Quicklisp at all, preferring other management schemes.

A better scheme for your project would be to have an install-deps-via-quicklisp.lisp script file, that pulls down and installs the dependencies via Quicklisp, for those that want to go that route and also a README that lists all the dependencies (associated github repos or homepages) . That way you decouple normal execution from dependency installation and those of us that do not use Quicklisp are satisfied.

TeMPOraL · on Nov 12, 2016

Wouldn't the "righter" way be to create an ASDF system definition file? This doesn't tie you to Quicklisp, but Quicklisp can handle ASDF system definitions just fine.

armitron · on Nov 12, 2016

Sure, I forgot to mention that.

ruricolist · on Nov 13, 2016

Out of curiosity, what do you use instead of Quicklisp?

armitron · on Nov 13, 2016

I've rolled my own. The way it works is:

+ It parses source.txt files from quicklisp-projects and builds its own database. Then it can go out and fetch those projects directly [no curation] from their homepages over https/git/hg and so on.

  So, quicklisp-projects (https://github.com/quicklisp/quicklisp-projects)
  I find to be pretty useful as a centralized index of CL
  software, and my thing piggybacks on top of.

+ Alternatively, you give it a Github/Bitbucket/Gitlab/... URL and it will fetch and register the project for you.

Finally, for every project that it has installed/registered, either via following the source URLs from quicklisp-projects or directly, it can automatically check if upstream has been updated and fetch those updates with optional merge/rebase.

This matches the way I work, which is mainly on top of repos, and it also gives me direct access (no curation) to the projects without having to worry about Xach's curated repo being compromised or quicklisp client not doing HTTPS/certificate verification / checksum checks.

ruricolist · on Nov 13, 2016

That's fascinating. You've essentially kept the social infrastructure of Quicklisp (quicklisp-projects, repos exposed to Quicklisp) but replaced the technical infrastructure.

rhabarba · on Nov 12, 2016

Thank you. But what would be the preferred way to load the installed libraries then? (Sorry, I rarely read style books.)

PuercoPop · on Nov 12, 2016

One can use quicklisp bundles so that the user doesn't have to install quicklisp and download dependencies themselves

https://www.quicklisp.org/beta/bundles.html

armitron · on Nov 12, 2016

Again, these are not ideal for software distribution in the general case (but can be useful for certain cases, so fine as an extra option).

Do you see many Python projects being distributed as virtualenv tarballs which quicklisp bundles may be seen as the equivalent to?

Let me present a sane scheme for dependency management, in terms of software distribution:

+ Create an ASDF system definition.

+ Create a README and list all dependencies with links to the code, version requirements (if any) and other notes/gotchas. This is useful to have regardless.

* Optionally, create a install-deps-via-quicklisp.lisp script that pulls down given dependencies via quicklisp.

* Optionally, create a quicklisp bundle.

Quicklisp (and the way quicklisp does things) should never be forced on the user. Don't get me wrong, it simplifies dependency management and has made things easier for newcomers but it compromises on other fronts and is not (and should not become) the standard way of managing dependencies in CL-land since a lot of its design choices do not mesh well with many real-world scenarios. Always leave a fallback.

rhabarba · on Nov 12, 2016

I'm officially too dumb for that.

    (loop for system in (list "sxql" "cl-ppcre" "dexador" "clss" "plump" "plump-sexp" "datafly" "xml-emitter" "local-time" "hunchentoot" "cl-who" "lass" "smackjack" "parenscript") do (asdf:load-system system))

I get a load of warnings about "redefining functions". Rather annoying...

armitron · on Nov 12, 2016

You do not need to explicitly load them, ASDF will do it for you. You do need to declare the dependencies though.

Here is an example (example.asd)

  (defsystem :example
    :name "Example"
    :description "Example."
    :author "foo@bar"
    :serial t
    :license "BSD"
    :version "1.0"
    :depends-on (:sxql :cl-ppcre :dexador :clss :plump)
    :components ((:file "packages")
                 (:file "file1")
                 (:file "file2")
                 (:file "file3")))

You don't need to specify .lisp extension for files. Your packages.lisp may be:

  (in-package #:cl-user)

  (defpackage #:example
    (:use #:cl #:sxql #:cl-ppcre)
    (:export #:*global-symbol*
             #:exported-function))

Then you can load your system through ASDF, including all its dependencies, like so:

  (asdf:oos 'asdf:load-op :example)

This assumes that example.asd is inside a directory that can be found in:

  asdf:*central-registry*

rhabarba · on Nov 12, 2016

Ah, I see, thank you. So I'll have to "install" the script into the registry before being able to just "use" it? :(

fiddlerwoaroof · on Nov 13, 2016

The easiest way to do it is to create a .asd for your project and then symlink or copy your project into ~/quicklisp/local-projects (i.e. the folder "local-projects" in the same directory as setup.lisp). Then, either run (ql:register-local-projects) or restart lisp and then do (ql:quickload :my-project) to load it.

rhabarba · on Nov 13, 2016

Hmm, that makes "just clone the repository and you're done" rather hard. I'll reconsider...

I added the list of packages to the README and I'll think about a good way to use ASDF without QL and/or usability loss later. Thank you though!

rhabarba · on Nov 12, 2016

I see, thanks. Now I need to read up about ASDF systems (I haven't fully understood defsystem yet). I'll extend the README tonight. :-)

rhabarba · on Nov 12, 2016

That would basically include Quicklisp again. I thought I shouldn't?

PuercoPop · on Nov 12, 2016

No, it creates bundle you can load using only ASDF. The developer uses quicklisp, not the client

armitron · on Nov 12, 2016

Use ASDF (https://common-lisp.net/project/asdf/), it is the de-facto standard in CL.

Quicklisp does not replace ASDF (it uses ASDF internally).

StreamBright · on Nov 12, 2016

I am still amazed sometimes how naturally Lisp handles data and code. You can just represent html with css and all as a nested data structure and use it when you need it. It does look or feel weird to have that next to your business logic.

junke · on Nov 12, 2016

This request handler, in particular, produces HTML/CSS and Javascript in the same scope: https://bitbucket.org/tux_/rssparser.lisp/src/7a9f5ed45aca8b...

wtbob · on Nov 13, 2016

What blows my mind is how, after seeing that, some folks would want to program in anything other than Lisp. Code, HTML, CSS, JavaScript, all with a single uniform representation!

Seriously, anyone who's not taken a look: take a look.

nextos · on Nov 13, 2016

Offtopic, I use Clojure these days.

Is Common Lisp still worth considering?

wtbob · on Nov 13, 2016

> Is Common Lisp still worth considering?

I think so. The more I use it, the more I realise how well-put-together it really is. There are numerous places that annoyed me once, before I understood them, and which I now appreciate (pathnames, logical pathnames and Lisp-2 all jump immediately to mind, but there are a few others). The only thing I don't like is the upcasing, but by setting PRINT-CASE appropriately I rarely have to notice that.

The ecosystem is actually quite wonderful these days. Quicklisp is a godsend — xach deserves an award.

All in all, Lisp feels like a well-engineered solution. Not perfect, but really damn good — and better than anything else I've used.

rhabarba · on Nov 13, 2016

Does your runtime still barf JVM stack traces? (attr.: http://www.loper-os.org/?p=42&cpage=2#comment-16767)

Leaving this aside (after all, it might or might not be a matter of taste), it totally is. The Common Lisp ecosystem is pretty well alive, despite of the rise of Racket and Clojure. You should really give it a try one day. :-)

nextos · on Nov 13, 2016

Common Lisp was the first Lisp I tried. But I found the library ecosystem a bit disappointing. This was more than a decade ago, though. No idea if Quicklisp has made everything more lively.

Clojure is nice. I like the emphasis on functional programming, and a very clean set of basic data structures and operations on them. Plus lots of great libraries.

Nonetheless, being on the JVM seems both a blessing and a curse. And I would love if it was a lot more performant. Clasp is tempting: https://drmeister.wordpress.com/2015/11/23/why-common-lisp-f.... I wish concurrency was better.

rhabarba · on Nov 13, 2016

Clasp seems to be much slower than SBCL (yet?). But yes, the amount of libraries has surely grown over the past few years, probably even too much:

http://eudoxia.me/article/common-lisp-sotu-2015

rhabarba · on Nov 12, 2016

Inline HTML/CSS would work too, but writing half of the software as a string would feel wrong.

faleidel · on Nov 12, 2016

We should do a web-to-mardown thing since so much websites are unreadable.

The funny thing is that chrome for cellphone already has something like that with the "make this website mobile friebdly".

Anyway, nice project!

bootload · on Nov 13, 2016

"We should do a web-to-markdown thing since so much websites are unreadable."

THE FASCINATOR: that cheeky hacker Aaron beat you to this ~ http://www.aaronsw.com/2002/html2text/html2text.py and http://www.aaronsw.com/2002/html2text/ ... old, interesting to see if it's still usable after almost 15yrs.

Latest code at: https://github.com/aaronsw/html2text

rhabarba · on Nov 14, 2016

There are quite some forks, this could be interesting to watch, thanks.

sedachv · on Nov 14, 2016

> We should do a web-to-mardown thing since so much websites are unreadable.

Opera used to have really great settings (something like 20 different presets) for overriding CSS. Really made things a lot more legible and was a quick fix for broken websites. Sometimes I still use Lynx.

danappelxx · on Nov 12, 2016

Safari (for macOS and iOS) also has this with reader mode. Works well for most websites.

throwanem · on Nov 12, 2016

Firefox in recent versions, too.

bhrgunatha · on Nov 13, 2016

I've been disappointed so much with the changes to firefox UI over the last couple of years, so it's great to have something positive. Reader is a resounding success with me, given the terrible low contrast, difficult to read, web design trends over the same time span.

rhabarba · on Nov 12, 2016

Thanks. :-)

bootload · on Nov 12, 2016

"written because a disappointing number of websites still does not have an RSS or Atom feed"

Or a way to programatically parse websites, process with a bit of code, to view sites off-line. Really interesting piece of code.

rhabarba · on Nov 13, 2016

Thanks. :)

srgseg · on Nov 12, 2016

FYI this functionality is 'automagically' implemented by specifying just a web site URL within the http://www.protopage.com RSS reader. It scans a web site and figures out which are the article headlines and links.

rhabarba · on Nov 12, 2016

Which might (and probably will) fail for a lot of websites.

nreece · on Nov 13, 2016

Pretty cool example of Lisp's capabilities.

When trying to create feeds from webpages in the real-world, there are plenty of pitfalls involved, for example: handling JavaScript content, graceful retries, bypass IP blocks, throttle & rate-limit requests, accessing public social media (Facebook, Twitter, Google) etc.

At Feedity (https://feedity.com), we've developed our own little system over the years using .NET (C#) and node.js, with a bunch of tweaks and optimizations, for generating custom feeds from public webpages.

noobermin · on Nov 12, 2016

Random question, how many people still use RSS? Just out of curiosity.

EDIT: And while I'm at it, are his/er packages what one usually uses in common lisp for web development?

tomtoise · on Nov 12, 2016

I use Feedly and IFTT to pipe my feed of 150 or so lists straight into a dedicated Slack channel for monitoring new articles coming out on InfoSec. It's pretty convenient, but also not too intrusive as if you need to switch off, you can just mute the dedicated Slack channel for a bit.

rcarmo · on Nov 12, 2016

I follow around 200 feeds on Feedly. It is pretty much my sole source of business and personal interest news, because I can consume it all using a single app (Reeder) in a very effective way.

symlinkk · on Nov 12, 2016

That's a ton of feeds. Don't you end up with thousands of unread articles every day? And how do you sift through all of them to find the most relevant / important news? When I open nytimes.com I see the top story right there on the front page. When I open Reeder, I see an enormous list of articles sorted in reverse chronological order.

rcarmo · on Nov 12, 2016

I get around 1500 articles a day, yes, but following feeds doesn't mean you need to actually _read_ it all :)

I simply scan through the headlines. Typically business news is repetitive enough that you can get a good feel for how "hot" a topic is, and then I just read one or two pieces on it (usually from the "best" writers) and discard the other 50.

Even headline-only feeds are useful in that regard - they're just signal boosters.

On the other hand, I tend to read a fair number of personal/tech blog posts in their entirety - around 20 a day, over breakfast.

bhrgunatha · on Nov 13, 2016

This is absolutely the best feature of RSS for me. Rather than have to open a browser, discover the quirks of its navigation and then page backwards and forwards through articles. RSS gives a sane way to skim through those things and focus on the interesting parts.

Plus keeping up with podcasts which tend to be published less often than articles.

rhabarba · on Nov 12, 2016

RSS makes it easier to shuffle through the headlines.

kripy · on Nov 12, 2016

Using Feedbin and Reader but only check feeds over the weekend when I can spend time reading and sorting.

One thing on this approach: you are screwed if the website isn't using well structured HTML. Having to write rules for individual websites will become cumbersome and will break if they decide to change their markup.

I keep track of a few websites off the command line by doing a checksum of the HTML page and then doing a diff. It doesn't work if the HTML includes a timestamp but it does work for the websites that I am watching.

nyolfen · on Nov 12, 2016

many? here's a thread from last night with dozens of comments about rss readers https://news.ycombinator.com/item?id=12932429

shakna · on Nov 12, 2016

I use RSS everywhere I can. It allows me to build a central source for everything I'm interested in.

Quicklisp is par for the course with Common Lisp nowadays, and the dependencies all look fairly normal to me.

If you were going for production code, I'd throw Huchentoot behind Nginx.

jlg23 · on Nov 12, 2016

> EDIT: And while I'm at it, are his/er packages what one usually uses in common lisp for web development?

No - but there are no packages one "usually uses in common lisp for web development". There are choices for every aspect and one is expected to make an informed decision or simply write yet another implementation of the same functionality if preferable in the specific project.

Of the packages mentioned I only use cl-who regularly; I have a modified hunchentoot for dev work and my own css generator. Using a package for handling ajax requests never occurred to me because for all my projects a 5 line macro did the job.

protomyth · on Nov 12, 2016

I go with 211 feeds in a desktop app. It allows me to group news and take a quick look from a lot of sources. I don't follow news orgs that don't use RSS.

netghost · on Nov 12, 2016

I find it really valuable, and follow a few dozen sites.

I'm curious, if folks don't use an RSS reader, how do they find articles and follow sites they're interested in?

symlinkk · on Nov 12, 2016

Twitter / Facebook / Reddit

omaranto · on Nov 12, 2016

For infrequently updated websites at aren't very popular (so their new articles wouldn't show up on your Facebook or Twitter feeds), I don't think there are good alternatives to RSS: I guess you can stop reading them or remember to visit them every now and then.

Certainly the stop-reading-those-websites option is easier to setup than an RSS reader. Maybe that's the "solution" most people adopt?

brewski · on Nov 12, 2016

I use rss2email. Have it running as a nightly cronjob on my raspberry pi.

icanhackit · on Nov 14, 2016

Big time RSS user. NetNewsWire under OSX and IOS with cloud sync. Means I'm able to punch through local news, world news, sub-Reddits, tech news, blogs etc in around 15-20 minutes a day - usually when I'm perched on the porcelain.

Although that added efficiency just means I trawl HN for longer...

webwanderings · on Nov 12, 2016

I use Digg RSS Reader with hundreds of feeds. It became my only go to place after I abandoned social media.

xrstf · on Nov 12, 2016

I have been running my own instance of Tiny Tiny RSS (with ~30 feeds) for years and couldn't be happier. Migrated from Feedly because of their usability and annoying upgrade reminders.

pjmlp · on Nov 12, 2016

I do, using a native client.

rhabarba · on Nov 12, 2016

at EDIT: I don't care as long as it does the job. There's Caveman2 for a full-features web stack but I thought it was overkill. After all, the web UI is - more or less - an "add-on". :-)

eth0up · on Nov 12, 2016

"...how many people still use RSS?"

+1

burtonator · on Nov 14, 2016

I posted here the other day about the death of RSS:

https://www.spinn3r.com/blog/2016/11/09/The-Death-of-RSS-Lon...

And this kind of codifies my point even though a number of people in the comments here were calling me crazy ;)

rhabarba · on Nov 14, 2016

"Visual Content is better". Exhibit 1: Your website, instantly greeting me with a large "pleeeasse subscribe" pop-up. Wouldn't happen with RSS. :-)

zouhair · on Nov 15, 2016

feed43.com

evmar · on Nov 13, 2016

> This software was written because a disappointing number of websites still does not have an RSS or Atom feed so I could subscribe to their updates, e.g. the KiTTY website.

It seems the KiTTY website does have an RSS feed at http://www.9bis.net/kitty/data/rss/rssen.xml

rhabarba · on Nov 13, 2016

Yes, it does. But you might see that this does not just list new KiTTY updates but anything that happens on that website; which is not exactly handy, is it?

k1m · on Nov 13, 2016

This looks really nice. We've got something similar (written in PHP) at FiveFilters.org http://createfeed.fivefilters.org

rhabarba · on Nov 14, 2016

I had a look at your software a while ago. I would probably use it if it was free. (I totally respect you for writing it though!)

williamle8300 · on Nov 12, 2016

Which RSS reader do you use?

rhabarba · on Nov 12, 2016

I use NewsBlur but I also keep a Tiny Tiny RSS installation ready as a fallback.

nercht12 · on Nov 13, 2016

liferea.

cheiVia0 · on Nov 13, 2016

Does this work with Facebook? They turned off RSS recently and it was very annoying.

rhabarba · on Nov 13, 2016

It can't log in for you, sorry. :-) But it should be able to process publicly visible contents.

narrowrail · on Nov 13, 2016

This could be paired with a browser extension or selenium?

rhabarba · on Nov 13, 2016

Technically, yes.