I'd be interested to know how you fully resolve external dependencies. For example - do you pull in js libraries that are linked to dynamically within other js files (as opposed to those that are simply referenced statically as includes in the html)? If so, are you rendering the page in a 'headless browser' to do this?
By the way, I think the idea for the service is great, although a little too pricey for me to start using yet ;-) I always need to search my bookmarks. As a proxy for doing this, I currently use Google's "search visited pages" feature: when you're logged in to Google and search, you now get the option to constrain the search to only those pages that you have visited in the past - a superset of bookmarks, but useful nonetheless.
connected to this: a lot of content in pages is often pulled in via js.
For example, facebook's page as seen via links is basically a long list of script tags without any content.
Without javascript evaluation it seems that a lot of content would be lost.
Based on my usage of Firebug I am under the impression that even if stuff was put onto the page with js (or if the html source is badly broken), the browser will build a valid DOM structure of the page and should be able to save it out as html/css.
By the way, I think the idea for the service is great, although a little too pricey for me to start using yet ;-) I always need to search my bookmarks. As a proxy for doing this, I currently use Google's "search visited pages" feature: when you're logged in to Google and search, you now get the option to constrain the search to only those pages that you have visited in the past - a superset of bookmarks, but useful nonetheless.