From b5d4d5bc74882b820031134041f2bc5757ea65fd Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Wed, 12 Jun 2024 19:01:23 -0700 Subject: [PATCH 1/3] chore: update readme --- README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9a370a9..d9d1c40 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,13 @@ -# AI robots.txt +# ai.robots.txt +**[Subscribe to updates via RSS/Atom by clicking on this link.](https://github.com/ai-robots-txt/ai.robots.txt/commits/main/robots.txt.atom)** + +_(Or paste the link into your preferred feed reader.)_ + +--- + This is an open list of web crawlers associated with AI companies and the training of LLMs to block. We encourage you to contribute to and implement this list on your own site. A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers. From 1a4868f1a2972eda02e034e24f53f0a28c65d9a4 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Wed, 12 Jun 2024 19:05:15 -0700 Subject: [PATCH 2/3] chore: update readme --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index d9d1c40..2931ef6 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ -**[Subscribe to updates via RSS/Atom by clicking on this link.](https://github.com/ai-robots-txt/ai.robots.txt/commits/main/robots.txt.atom)** +**[Subscribe to updates via RSS/Atom by clicking on this link.](https://github.com/ai-robots-txt/ai.robots.txt/releases.atom)** _(Or paste the link into your preferred feed reader.)_ From ce0b99124bdbe1c357e80b3811728da87030da6e Mon Sep 17 00:00:00 2001 From: Christian Sievers Date: Thu, 13 Jun 2024 14:17:31 +0200 Subject: [PATCH 3/3] Update table-of-bot-metrics.md add link to Apple docs and description for Applebot-Extended --- table-of-bot-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 213ac81..5c800ea 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -4,7 +4,7 @@ |Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | |anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |Applebot | Apple | Yes | Indexes sites to provide answers and search results for Siri users. | Irregular and may be prompted by user queries. | Used to answer queries from users; may included references to the indexed site. | -|Applebot-Extended | | | | | | +|Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | | | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Appleā€™s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | |AwarioRssBot | | | | | | |AwarioSmartBot | | | | | | |Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. |