Hacker News new | past | comments | ask | show | jobs | submit login

The tool captures screenshots in addition to text.



That seems awfully simillar to archive.org wayback machine. I do like to see all these archival projects though, they are certainly worthwhile.


irchiver captures text on the page, and separately OCRs the screenshots (specifically, the screenshot from your viewport). So you can search just what was shown on the page, or what was in the page. Both techniques have pros and cons.

While archive.org is fantastic, it can only capture pages that are both 1) publicly accessible (i.e. no social media content) that it happens to crawl, and 2) static content (you're out of luck if the content you want is loaded dynamically, or changes depending on user input).


IIRC archive.org does save the JS and things it downloads, so you can replay them when you visit the archived site later.


I guess the difference in this case is that the JS on web archive relies on future browsers being backwards compatible, whereas irchiver relies on much less to stay timeless which is good. Although I don't think JavaScript will ever get a major update (as in breaking comparability) I believe relying on that is not a perfect way to archive web content. This kind of backwards compatibility breakage is something we have seen before with the deprecation of Adobe Flash and it could theoretically happen elsewhere on the web stack.


Agreed, I wish they would archive the DOM too like archive.is does instead of just the requests.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: