Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A Web-to-RSS Parser in Common Lisp (bitbucket.org/tux_)
125 points by rhabarba on Nov 12, 2016 | hide | past | favorite | 76 comments



This is far from ideal (and a bad idea in general):

  (load "~/quicklisp/setup.lisp")
   
  (ql:quickload '(:datafly        ;; for database access
  
                  ;; WEB SERVER:
                  :hunchentoot    ;; for providing a web server
                  :cl-who         ;; for building the HTML output
                  :parenscript    ;; for the avoiding of the horrible JS syntax
                  :smackjack      ;; for AJAX requests
                  :lass           ;; for building the (S)CSS styles

You should not use Quicklisp to silently install dependencies as part of your normal operation sequence, because:

+ It imposes its own view on how libraries are retrieved, stored, managed and updated.

+ It is vulnerable to man-in-the-middle attacks.

+ Used in this fashion, it effectively becomes part of your program. The more people that do this, the more ingrained it becomes.

+ Plenty of us do not use Quicklisp at all, preferring other management schemes.

A better scheme for your project would be to have an install-deps-via-quicklisp.lisp script file, that pulls down and installs the dependencies via Quicklisp, for those that want to go that route and also a README that lists all the dependencies (associated github repos or homepages) . That way you decouple normal execution from dependency installation and those of us that do not use Quicklisp are satisfied.


Wouldn't the "righter" way be to create an ASDF system definition file? This doesn't tie you to Quicklisp, but Quicklisp can handle ASDF system definitions just fine.


Sure, I forgot to mention that.


Out of curiosity, what do you use instead of Quicklisp?


I've rolled my own. The way it works is:

+ It parses source.txt files from quicklisp-projects and builds its own database. Then it can go out and fetch those projects directly [no curation] from their homepages over https/git/hg and so on.

  So, quicklisp-projects (https://github.com/quicklisp/quicklisp-projects)
  I find to be pretty useful as a centralized index of CL
  software, and my thing piggybacks on top of.
+ Alternatively, you give it a Github/Bitbucket/Gitlab/... URL and it will fetch and register the project for you.

Finally, for every project that it has installed/registered, either via following the source URLs from quicklisp-projects or directly, it can automatically check if upstream has been updated and fetch those updates with optional merge/rebase.

This matches the way I work, which is mainly on top of repos, and it also gives me direct access (no curation) to the projects without having to worry about Xach's curated repo being compromised or quicklisp client not doing HTTPS/certificate verification / checksum checks.


That's fascinating. You've essentially kept the social infrastructure of Quicklisp (quicklisp-projects, repos exposed to Quicklisp) but replaced the technical infrastructure.


Thank you. But what would be the preferred way to load the installed libraries then? (Sorry, I rarely read style books.)


One can use quicklisp bundles so that the user doesn't have to install quicklisp and download dependencies themselves

https://www.quicklisp.org/beta/bundles.html


Again, these are not ideal for software distribution in the general case (but can be useful for certain cases, so fine as an extra option).

Do you see many Python projects being distributed as virtualenv tarballs which quicklisp bundles may be seen as the equivalent to?

Let me present a sane scheme for dependency management, in terms of software distribution:

+ Create an ASDF system definition.

+ Create a README and list all dependencies with links to the code, version requirements (if any) and other notes/gotchas. This is useful to have regardless.

* Optionally, create a install-deps-via-quicklisp.lisp script that pulls down given dependencies via quicklisp.

* Optionally, create a quicklisp bundle.

Quicklisp (and the way quicklisp does things) should never be forced on the user. Don't get me wrong, it simplifies dependency management and has made things easier for newcomers but it compromises on other fronts and is not (and should not become) the standard way of managing dependencies in CL-land since a lot of its design choices do not mesh well with many real-world scenarios. Always leave a fallback.


I'm officially too dumb for that.

    (loop for system in (list "sxql" "cl-ppcre" "dexador" "clss" "plump" "plump-sexp" "datafly" "xml-emitter" "local-time" "hunchentoot" "cl-who" "lass" "smackjack" "parenscript") do (asdf:load-system system))
I get a load of warnings about "redefining functions". Rather annoying...


You do not need to explicitly load them, ASDF will do it for you. You do need to declare the dependencies though.

Here is an example (example.asd)

  (defsystem :example
    :name "Example"
    :description "Example."
    :author "foo@bar"
    :serial t
    :license "BSD"
    :version "1.0"
    :depends-on (:sxql :cl-ppcre :dexador :clss :plump)
    :components ((:file "packages")
                 (:file "file1")
                 (:file "file2")
                 (:file "file3")))
You don't need to specify .lisp extension for files. Your packages.lisp may be:

  (in-package #:cl-user)

  (defpackage #:example
    (:use #:cl #:sxql #:cl-ppcre)
    (:export #:*global-symbol*
             #:exported-function))
Then you can load your system through ASDF, including all its dependencies, like so:

  (asdf:oos 'asdf:load-op :example)
This assumes that example.asd is inside a directory that can be found in:

  asdf:*central-registry*


Ah, I see, thank you. So I'll have to "install" the script into the registry before being able to just "use" it? :(


The easiest way to do it is to create a .asd for your project and then symlink or copy your project into ~/quicklisp/local-projects (i.e. the folder "local-projects" in the same directory as setup.lisp). Then, either run (ql:register-local-projects) or restart lisp and then do (ql:quickload :my-project) to load it.


Hmm, that makes "just clone the repository and you're done" rather hard. I'll reconsider...

I added the list of packages to the README and I'll think about a good way to use ASDF without QL and/or usability loss later. Thank you though!


I see, thanks. Now I need to read up about ASDF systems (I haven't fully understood defsystem yet). I'll extend the README tonight. :-)


That would basically include Quicklisp again. I thought I shouldn't?


No, it creates bundle you can load using only ASDF. The developer uses quicklisp, not the client


Use ASDF (https://common-lisp.net/project/asdf/), it is the de-facto standard in CL.

Quicklisp does not replace ASDF (it uses ASDF internally).


I am still amazed sometimes how naturally Lisp handles data and code. You can just represent html with css and all as a nested data structure and use it when you need it. It does look or feel weird to have that next to your business logic.


This request handler, in particular, produces HTML/CSS and Javascript in the same scope: https://bitbucket.org/tux_/rssparser.lisp/src/7a9f5ed45aca8b...


What blows my mind is how, after seeing that, some folks would want to program in anything other than Lisp. Code, HTML, CSS, JavaScript, all with a single uniform representation!

Seriously, anyone who's not taken a look: take a look.


Offtopic, I use Clojure these days.

Is Common Lisp still worth considering?


> Is Common Lisp still worth considering?

I think so. The more I use it, the more I realise how well-put-together it really is. There are numerous places that annoyed me once, before I understood them, and which I now appreciate (pathnames, logical pathnames and Lisp-2 all jump immediately to mind, but there are a few others). The only thing I don't like is the upcasing, but by setting PRINT-CASE appropriately I rarely have to notice that.

The ecosystem is actually quite wonderful these days. Quicklisp is a godsend — xach deserves an award.

All in all, Lisp feels like a well-engineered solution. Not perfect, but really damn good — and better than anything else I've used.


Does your runtime still barf JVM stack traces? (attr.: http://www.loper-os.org/?p=42&cpage=2#comment-16767)

Leaving this aside (after all, it might or might not be a matter of taste), it totally is. The Common Lisp ecosystem is pretty well alive, despite of the rise of Racket and Clojure. You should really give it a try one day. :-)


Common Lisp was the first Lisp I tried. But I found the library ecosystem a bit disappointing. This was more than a decade ago, though. No idea if Quicklisp has made everything more lively.

Clojure is nice. I like the emphasis on functional programming, and a very clean set of basic data structures and operations on them. Plus lots of great libraries.

Nonetheless, being on the JVM seems both a blessing and a curse. And I would love if it was a lot more performant. Clasp is tempting: https://drmeister.wordpress.com/2015/11/23/why-common-lisp-f.... I wish concurrency was better.


Clasp seems to be much slower than SBCL (yet?). But yes, the amount of libraries has surely grown over the past few years, probably even too much:

http://eudoxia.me/article/common-lisp-sotu-2015


Inline HTML/CSS would work too, but writing half of the software as a string would feel wrong.


We should do a web-to-mardown thing since so much websites are unreadable.

The funny thing is that chrome for cellphone already has something like that with the "make this website mobile friebdly".

Anyway, nice project!


"We should do a web-to-markdown thing since so much websites are unreadable."

THE FASCINATOR: that cheeky hacker Aaron beat you to this ~ http://www.aaronsw.com/2002/html2text/html2text.py and http://www.aaronsw.com/2002/html2text/ ... old, interesting to see if it's still usable after almost 15yrs.

Latest code at: https://github.com/aaronsw/html2text


There are quite some forks, this could be interesting to watch, thanks.


> We should do a web-to-mardown thing since so much websites are unreadable.

Opera used to have really great settings (something like 20 different presets) for overriding CSS. Really made things a lot more legible and was a quick fix for broken websites. Sometimes I still use Lynx.


Safari (for macOS and iOS) also has this with reader mode. Works well for most websites.


Firefox in recent versions, too.


I've been disappointed so much with the changes to firefox UI over the last couple of years, so it's great to have something positive. Reader is a resounding success with me, given the terrible low contrast, difficult to read, web design trends over the same time span.


Thanks. :-)


"written because a disappointing number of websites still does not have an RSS or Atom feed"

Or a way to programatically parse websites, process with a bit of code, to view sites off-line. Really interesting piece of code.


Thanks. :)


FYI this functionality is 'automagically' implemented by specifying just a web site URL within the http://www.protopage.com RSS reader. It scans a web site and figures out which are the article headlines and links.


Which might (and probably will) fail for a lot of websites.


Pretty cool example of Lisp's capabilities.

When trying to create feeds from webpages in the real-world, there are plenty of pitfalls involved, for example: handling JavaScript content, graceful retries, bypass IP blocks, throttle & rate-limit requests, accessing public social media (Facebook, Twitter, Google) etc.

At Feedity (https://feedity.com), we've developed our own little system over the years using .NET (C#) and node.js, with a bunch of tweaks and optimizations, for generating custom feeds from public webpages.


Random question, how many people still use RSS? Just out of curiosity.

EDIT: And while I'm at it, are his/er packages what one usually uses in common lisp for web development?


I use Feedly and IFTT to pipe my feed of 150 or so lists straight into a dedicated Slack channel for monitoring new articles coming out on InfoSec. It's pretty convenient, but also not too intrusive as if you need to switch off, you can just mute the dedicated Slack channel for a bit.


I follow around 200 feeds on Feedly. It is pretty much my sole source of business and personal interest news, because I can consume it all using a single app (Reeder) in a very effective way.


That's a ton of feeds. Don't you end up with thousands of unread articles every day? And how do you sift through all of them to find the most relevant / important news? When I open nytimes.com I see the top story right there on the front page. When I open Reeder, I see an enormous list of articles sorted in reverse chronological order.


I get around 1500 articles a day, yes, but following feeds doesn't mean you need to actually _read_ it all :)

I simply scan through the headlines. Typically business news is repetitive enough that you can get a good feel for how "hot" a topic is, and then I just read one or two pieces on it (usually from the "best" writers) and discard the other 50.

Even headline-only feeds are useful in that regard - they're just signal boosters.

On the other hand, I tend to read a fair number of personal/tech blog posts in their entirety - around 20 a day, over breakfast.


This is absolutely the best feature of RSS for me. Rather than have to open a browser, discover the quirks of its navigation and then page backwards and forwards through articles. RSS gives a sane way to skim through those things and focus on the interesting parts.

Plus keeping up with podcasts which tend to be published less often than articles.


RSS makes it easier to shuffle through the headlines.


Using Feedbin and Reader but only check feeds over the weekend when I can spend time reading and sorting.

One thing on this approach: you are screwed if the website isn't using well structured HTML. Having to write rules for individual websites will become cumbersome and will break if they decide to change their markup.

I keep track of a few websites off the command line by doing a checksum of the HTML page and then doing a diff. It doesn't work if the HTML includes a timestamp but it does work for the websites that I am watching.


many? here's a thread from last night with dozens of comments about rss readers https://news.ycombinator.com/item?id=12932429


I use RSS everywhere I can. It allows me to build a central source for everything I'm interested in.

Quicklisp is par for the course with Common Lisp nowadays, and the dependencies all look fairly normal to me.

If you were going for production code, I'd throw Huchentoot behind Nginx.


> EDIT: And while I'm at it, are his/er packages what one usually uses in common lisp for web development?

No - but there are no packages one "usually uses in common lisp for web development". There are choices for every aspect and one is expected to make an informed decision or simply write yet another implementation of the same functionality if preferable in the specific project.

Of the packages mentioned I only use cl-who regularly; I have a modified hunchentoot for dev work and my own css generator. Using a package for handling ajax requests never occurred to me because for all my projects a 5 line macro did the job.


I go with 211 feeds in a desktop app. It allows me to group news and take a quick look from a lot of sources. I don't follow news orgs that don't use RSS.


I find it really valuable, and follow a few dozen sites.

I'm curious, if folks don't use an RSS reader, how do they find articles and follow sites they're interested in?


Twitter / Facebook / Reddit


For infrequently updated websites at aren't very popular (so their new articles wouldn't show up on your Facebook or Twitter feeds), I don't think there are good alternatives to RSS: I guess you can stop reading them or remember to visit them every now and then.

Certainly the stop-reading-those-websites option is easier to setup than an RSS reader. Maybe that's the "solution" most people adopt?


I use rss2email. Have it running as a nightly cronjob on my raspberry pi.


Big time RSS user. NetNewsWire under OSX and IOS with cloud sync. Means I'm able to punch through local news, world news, sub-Reddits, tech news, blogs etc in around 15-20 minutes a day - usually when I'm perched on the porcelain.

Although that added efficiency just means I trawl HN for longer...


I use Digg RSS Reader with hundreds of feeds. It became my only go to place after I abandoned social media.


I have been running my own instance of Tiny Tiny RSS (with ~30 feeds) for years and couldn't be happier. Migrated from Feedly because of their usability and annoying upgrade reminders.


I do, using a native client.


at EDIT: I don't care as long as it does the job. There's Caveman2 for a full-features web stack but I thought it was overkill. After all, the web UI is - more or less - an "add-on". :-)


"...how many people still use RSS?"

+1


I posted here the other day about the death of RSS:

https://www.spinn3r.com/blog/2016/11/09/The-Death-of-RSS-Lon...

And this kind of codifies my point even though a number of people in the comments here were calling me crazy ;)


"Visual Content is better". Exhibit 1: Your website, instantly greeting me with a large "pleeeasse subscribe" pop-up. Wouldn't happen with RSS. :-)


feed43.com


> This software was written because a disappointing number of websites still does not have an RSS or Atom feed so I could subscribe to their updates, e.g. the KiTTY website.

It seems the KiTTY website does have an RSS feed at http://www.9bis.net/kitty/data/rss/rssen.xml


Yes, it does. But you might see that this does not just list new KiTTY updates but anything that happens on that website; which is not exactly handy, is it?


This looks really nice. We've got something similar (written in PHP) at FiveFilters.org http://createfeed.fivefilters.org


I had a look at your software a while ago. I would probably use it if it was free. (I totally respect you for writing it though!)


Which RSS reader do you use?


I use NewsBlur but I also keep a Tiny Tiny RSS installation ready as a fallback.


liferea.


Does this work with Facebook? They turned off RSS recently and it was very annoying.


It can't log in for you, sorry. :-) But it should be able to process publicly visible contents.


This could be paired with a browser extension or selenium?


Technically, yes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: