Hacker News new | past | comments | ask | show | jobs | submit login




How does one do the whole site mirror on the Wayback machine, rather than just saving individual URLS?

or just download everything and then upload?


I do my own local mirrors because I suspect we will wake up one day and find something has happened to Wayback, so I just zipped that up and uploaded it as a "text collection" item. Here's my wget mirror shell function:

  wget-mirror () {
    wget --mirror --convert-links --adjust-extension
    --page-requisites --no-parent --content-disposition
    --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
    --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:119.0) Gecko/20100101 Firefox/119.0"
    --restrict-file-names="windows,nocontrol" -e robots=off
    --no-check-certificate "$1"
  }

(Linebreaks added to avoid yuge horizontal scrollbar)

I suppose the new hotness would be to stuff the entire thing into a WARC/WACZ, but I haven't looked into that yet since I do my mirrors to a compression-enabled ZFS filesystem already anyway.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: