Merge branch 'main' of github.com:ai-robots-txt/ai.robots.txt

2025-05-20 09:13:11 +00:00 · 2024-06-10 13:23:55 -07:00 · 2024-06-10 13:23:55 -07:00 · d464f260e0
commit d464f260e0
parent fd10fbda94 7a5a21fdad
1 changed files with 5 additions and 5 deletions
--- a/table-of-bot-metrics.md
+++ b/table-of-bot-metrics.md
@ -1,6 +1,6 @@
 |Name            |Operator |Respects `robots.txt`  |Data use  |Visit regularity  |Description  |
 |----------------|---------|-----------------------|----------|------------------|-------------|
-| AdsBot-Google   | Google  | Yes (Exceptions for Dynamic Search Ads) | Analyzes website content for ad relevancy, improves ad serving for Google Ads. Data anonymized according to Google's Privacy Policy (https://policies.google.com/privacy?hl=en-US). Unclear on data retention or use by other products. | Varies depending on campaign activity and website updates. Crawls optimized to minimize impact, specific frequency not public. | Web crawler by Google Ads to analyze websites for ad effectiveness and ensure ad relevancy to webpage content. |
+| AdsBot-Google   | Google  | Yes (Exceptions for Dynamic Search Ads) | Analyzes website content for ad relevancy, improves ad serving for Google Ads. Data anonymized according to [Google's Privacy Policy](https://policies.google.com/privacy). Unclear on data retention or use by other products. | Varies depending on campaign activity and website updates. Crawls optimized to minimize impact, specific frequency not public. | Web crawler by Google Ads to analyze websites for ad effectiveness and ensure ad relevancy to webpage content. |
 |Amazonbot      | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. |
 |anthropic-ai  | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. |
 |Applebot      | Apple         | Yes | Indexes sites to provide answers and search results for Siri users. | Irregular and may be prompted by user queries. | Used to answer queries from users; may included references to the indexed site. |
@ -10,10 +10,10 @@
 |CCBot         | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. |
 |ChatGPT-User   | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. |
 |ClaudeBot      | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. |
-|Claude-Web [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. |
+|Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. |
 |cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. |
-|DataForSeoBot |         |                       |          |                  |             |
-|Diffbot |         |                       |          |                  |             |
+|DataForSeoBot | [DataForSEO](https://dataforseo.com/) | [Yes](https://dataforseo.com/dataforseo-bot) | Backlink checking and SEO data collection to be resolt to clients. | As often as every 5 seconds. | Operated by DataForSEO to check backlinks and scrape SEO data for resale. |
+|Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. |
 |FacebookBot    |         |                       |          |                  |             |
 |Google-Extended|         |                       |          |                  |             |
 |GoogleOther    |         |                       |          |                  |             |
@ -26,7 +26,7 @@
 |omgilibot     |         |                       |          |                  |             |
 |peer39_crawler|         |                       |          |                  |             |
 |peer39_crawler/1.0|         |                       |          |                  |             |
-|PerplexityBot |         |                       |          |                  |             |
+|PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/docs/perplexitybot) | Used to answer queries at the request of users. | Takes action based on user prompts.  | Operated by Perplexity to obtain results in response to user queries. |
 |PiplBot       |         |                       |          |                  |             |
 |scoop.it      |         |                       |          |                  |             |
 |Seekr         |         |                       |          |                  |             |