Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think that works. It's not remotely browsable or searchable. It would be quite challenging to put these scrapes up, anyway. They're regular wget crawls with a regular directory/file structure, the problem is that there's so much material and so many files that it can be almost impossible to find what you are looking for... (Plus you need to rewrite links into relative links to make everything render properly.)


Hmm. Now I'm thinking that I might end up using your idea (scraping the dark web) and using something like httrack[0] to do exactly that: structure.

[0] https://en.wikipedia.org/wiki/HTTrack


I once tried using HTTrack, but I found it was doing too much magic under the hood and was hard to work with. As dumb as wget is (that blacklist bug is over 12 years old now!), it at least is understandable.


Thanks for saving me the headache :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: