From 9295b6a963f0ccba30392e005419b42eed2b264e Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sat, 9 Nov 2024 04:45:47 +0000 Subject: [PATCH] Clarify our rationale I deleted the point about excessive load on crawled sites as any other crawler could potentially be guilty of this and I wouldn't want our scope to creep to all crawlers. Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/53#issuecomment-2466042550 --- FAQ.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/FAQ.md b/FAQ.md index 4d58350..15db540 100644 --- a/FAQ.md +++ b/FAQ.md @@ -2,7 +2,7 @@ ## Why should we block these crawlers? -They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities. +They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities: particularly copyright abuse and environmental impact. **[How Tech Giants Cut Corners to Harvest Data for A.I.](https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?unlocked_article_code=1.ik0.Ofja.L21c1wyW-0xj&ugrp=m)** > OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems. @@ -10,7 +10,11 @@ They're extractive, confer no benefit to the creators of data they're ingesting **[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)** > The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI. -Crawlers also sometimes impact the performance of crawled sites, or even take them down. +**[Reconciling the contrasting narratives on the environmental impact of large language models](https://www.nature.com/articles/s41598-024-76682-6) +> Studies have shown that the training of just one LLM can consume as much energy as five cars do across their lifetimes. The water footprint of AI is also substantial; for example, recent work has highlighted that water consumption associated with AI models involves data centers using millions of gallons of water per day for cooling. Additionally, the energy consumption and carbon emissions of AI are projected to grow quickly in the coming years [...]. + +**[Scientists Predict AI to Generate Millions of Tons of E-Waste](https://www.sciencealert.com/scientists-predict-ai-to-generate-millions-of-tons-of-e-waste) +> we could end up with between 1.2 million and 5 million metric tons of additional electronic waste by the end of this decade [the 2020's]. ## How do we know AI companies/bots respect `robots.txt`?