Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

At some point they must become more cost efficient by pure market economics mechanisms. That implies less load on sites. Much of the scraping that I see is still very dumb/repetative. Like Googlebot in like 2001.

(Blocking Chinese IP ranges with the help of some geoip db helps a lot in the short term. Azure as a whole is the second largest source of pure idiocy.)



They seem to have so much bubble money at the moment that the cost of scraping is probably a rounding error in their pocket change.


So the cost of caching should be a rounding error as well. If The Internet Archive can afford to cache vast swathes of the web, then surely the big AI companies can do so.


Exactly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: