chore: move FAQ into repo

2025-05-20 01:03:11 +00:00 · 2024-08-01 07:53:43 -07:00 · 2024-08-01 07:53:43 -07:00 · 6c596a50ea
commit 6c596a50ea
parent 6a8e7a8eb0
1 changed files with 23 additions and 0 deletions
--- a/FAQ.md
+++ b/FAQ.md
@ -0,0 +1,23 @@
+# Frequently asked questions
+
+## How do we know AI companies/bots respect `robots.txt`?
+
+The short answer is that we don't. `robots.txt` is a well-established standard but compliance is voluntary. There is no enforcement mechanism.
+
+## Can we block crawlers based on user agent strings?
+
+Yes, provided the crawlers identify themselves and your application/hosting supports doing so.
+
+## Why should we block these crawlers?
+
+They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities.
+
+**[How Tech Giants Cut Corners to Harvest Data for A.I.](https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?unlocked_article_code=1.ik0.Ofja.L21c1wyW-0xj&ugrp=m)**
+> OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems.
+
+**[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)**
+> The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI.
+
+## How can I contribute?
+
+Open a pull request. It will be reviewed and acted upon appropriately. **We really appreciate contributions** — this is a community effort.