Are in some way robots.txt and those Cloudflare measures anticompetitive and/or ...

chris_f · on Nov 6, 2020

>Are in some way robots.txt and those Cloudflare measures anticompetitive?

Article by Matt Wells from Gigablast on Cloudflare. Gigablast has one of the largest independent search indexes.

https://www.gigablast.com/blog.html

kavalg · on Nov 6, 2020

While I generally agree with Matt about the anti-competitive behavior of said tech giants, I just want to point out that:

  1.Contents of robots.txt is created by the site owner, not those giants.

  2.robots.txt is generally non-enforceable, unless someone like cloudflare translates it into proxy/DDoS settings (and can also actually distinguish a real google bot from a fake one).

  3.Above behavior of cloudflare would be very problematic (in anti-trust terms) if this is not what the site owner wanted to achieve by writing said robots.txt. Now this is a (m/b)illion dollar question: *What is the site owner actually trying to achieve?* Are they only going for the DDoS protection or do they actually want to restrict other crawlers? Is it even legal to restrict other crawlers?

From my limited experience I have come to several conclusions.

First, there isn't a very big short term incentive for site owners to tolerate an infinite amount of crawlers and they usually don't bother. Heck, most of the times you talk to a SEO specialist, they only talk about Google. It is somehow assumed that if your site is indexed well in Google, all the other search engines (Bing, DDG ...) will work automagically.

Second, there is this category of listing sites, where most of the value is in the database itself (e.g. LinkedIn, Craiglist, various property/car/younameit listings). They generally don't want to be crawled by their wannabe competitors. But is it legal to restrict anyone but Google/Bing? From a legal standpoint, one can explore the analogy between physical business and a website. There is public and private access. Public access is considered the case when people can just walk in into your office/store and "look around". You can take some measures against "misbehavers", but not allowed to restrict access to only "good" people. Furthermore, visitors are not bound by anything more than the country's law. On the other hand with private access you require a login and some form of a contract (e.g. ToS). In this case crawling and copying portions of the site may be specifically excluded in such ToS as long as it doesn't contradict the law.

There are still some problems left though. Once you are blocked by such a data aggregator then the burden of proof is on your side and since there still isn't a clear ruling about these matters[1], how many average Joes will take it to the court? And even if you file it, how do you know the line between crawling and DDoS?

  [1] https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-protects-scraping-public-data

stainforth · on Nov 6, 2020

Perhaps Google can be seen as a public utility like the phone wires, and has to provide access to its index

BlueTemplar · on Nov 6, 2020

Oh, wow, and he mentions that Cloudflare has received up to $110M from Microsoft, Google and Baidu !!

https://techcrunch.com/2015/09/22/cloudflare-locks-down-110m...