ai.robots.txt/FAQ.md

1.4 KiB

Frequently asked questions

How do we know AI companies/bots respect robots.txt?

The short answer is that we don't. robots.txt is a well-established standard but compliance is voluntary. There is no enforcement mechanism.

Can we block crawlers based on user agent strings?

Yes, provided the crawlers identify themselves and your application/hosting supports doing so.

Why should we block these crawlers?

They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities.

How Tech Giants Cut Corners to Harvest Data for A.I.

OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.

How AI copyright lawsuits could make the whole industry go extinct

The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI.

How can I contribute?

Open a pull request. It will be reviewed and acted upon appropriately. We really appreciate contributions — this is a community effort.