Hacker News new | past | comments | ask | show | jobs | submit login

Does this have the capability of setting an option to periodically check the page for updates and save a revision?

My ideal bookmark/page archiver would have this workflow:

1) Find a page I like or find valuable for whatever reason, so I click on a browser addon button.

2) A little dialog would then show up from the button, allowing me to set the following

2a) Add tags, as well as offer suggested tags I could add or remove.

2b) Set an optional update frequency, preferably with an option that would slowly reduce the frequency of checking for changes, first if no changes are found, and eventually as an absolute regardless of changes.

2c) Set specific technical page save settings

3) Once done, I click a “save” button in the dialog, and the page would be saved at a single html file, like the browser addon “SingleFile”, (which has some adjustable default settings previously mentioned). This allows saving pages with very simple javascript/dynamic functionality instead of essentially an static image. It also inlines some media: see https://addons.mozilla.org/en-US/firefox/addon/single-file/. That said, perhaps a WARC file may be better when it comes to handling things like compression, multiple revisions, indexing, and possibly following links to download and store linked media.

4) Then it would automatically open the saved page in the browser, so I could have a quick look can make sure it’s not broken for some reason

5) Finally it would then occasionally check for updates, saving a revision. On future visists to the page, the addon would have a little badge to let me know the page has already been saved and is being watched.

It kinda sounds like I want a browser integrated front end with sane and intuitive settings for HTTrack. As and example, let’s say I find a post on hackernews full of insightful comments about something and want to save it. The post might be new, so comments are going to continue to be added (or possibly removed, though this is more of a reddit problem) after I’ve saved the link and page. It’d also be nice to automatically grab the linked webpage for the context. Something that makes this easy would be great.

It might also be nice to be able to select comments (select elements like ublock does?) for highlighting.




Other things that come to mind as complications.

Saving the page as presented in my current browser session can be vastly different vs a non-logged in guest with no changes from browser addons.

Many websites require browser addons to be tolerable. Reddit likes to hide the end of comment chains to artificially inflate their fucking click metrics, and addons are required to load those comments inline. Saving pages with ublock enabled is also a must. I think selenium can do this: https://stackoverflow.com/questions/52153398/how-can-i-add-a...

So being able to use a login token or auto login with an would be useful. It’s probably best to create a special archive only user for each website. Otherwise it’d be a nightmare trying to remove the elements such as username, favorites, subscribed, etc and make sure the redactions aren’t broken by a future site design update.


I suggest trying out HamsterBase (HamsterBase is not open source).

1. It supports direct binding with SingleFile, enabling one-click web page saving. Because it saves in the browser, all other plugins will take effect.

2. It provides an open-source plugin https://github.com/hamsterbase/hamsterbase-highlighter, allowing you to annotate directly in the browser, and it automatically saves a snapshot of the web page when you annotate. When you visit the page again, it automatically displays the previous snapshots.

3. All data is stored on your local device, with both a docker version and a desktop version available. Different versions support P2P synchronization.

4. Provide full-text search function, which can search all the articles on the webpage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: