From 04688e56fb3bef0e5a9339339c6544a54ebb79e4 Mon Sep 17 00:00:00 2001 From: KG <41345727+kg583@users.noreply.github.com> Date: Thu, 16 May 2024 18:25:46 -0500 Subject: [PATCH 1/3] Update table-of-bot-metrics.md Fix Markdown --- table-of-bot-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index aa011c2..464cca1 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -10,7 +10,7 @@ |CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | |ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | |ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -|Claude-Web [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +|Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | |DataForSeoBot | | | | | | |Diffbot | | | | | | From 2ca4154d4e67f135aff40ffa3bfcce0e2885ec10 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Wed, 29 May 2024 09:14:35 -0700 Subject: [PATCH 2/3] chore: update perplexity entry --- table-of-bot-metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 464cca1..795f46c 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -1,6 +1,6 @@ |Name |Operator |Respects `robots.txt` |Data use |Visit regularity |Description | |----------------|---------|-----------------------|----------|------------------|-------------| -| AdsBot-Google | Google | Yes (Exceptions for Dynamic Search Ads) | Analyzes website content for ad relevancy, improves ad serving for Google Ads. Data anonymized according to Google's Privacy Policy (https://policies.google.com/privacy?hl=en-US). Unclear on data retention or use by other products. | Varies depending on campaign activity and website updates. Crawls optimized to minimize impact, specific frequency not public. | Web crawler by Google Ads to analyze websites for ad effectiveness and ensure ad relevancy to webpage content. | +| AdsBot-Google | Google | Yes (Exceptions for Dynamic Search Ads) | Analyzes website content for ad relevancy, improves ad serving for Google Ads. Data anonymized according to [Google's Privacy Policy](https://policies.google.com/privacy). Unclear on data retention or use by other products. | Varies depending on campaign activity and website updates. Crawls optimized to minimize impact, specific frequency not public. | Web crawler by Google Ads to analyze websites for ad effectiveness and ensure ad relevancy to webpage content. | |Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | |anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |Applebot | Apple | Yes | Indexes sites to provide answers and search results for Siri users. | Irregular and may be prompted by user queries. | Used to answer queries from users; may included references to the indexed site. | @@ -26,7 +26,7 @@ |omgilibot | | | | | | |peer39_crawler| | | | | | |peer39_crawler/1.0| | | | | | -|PerplexityBot | | | | | | +|PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/docs/perplexitybot) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | |PiplBot | | | | | | |scoop.it | | | | | | |Seekr | | | | | | From 7a5a21fdad6f6ac8592669cf7f7d0e0e8081742f Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Wed, 29 May 2024 09:21:31 -0700 Subject: [PATCH 3/3] chore: update bots table --- table-of-bot-metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 795f46c..f64242b 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -12,8 +12,8 @@ |ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -|DataForSeoBot | | | | | | -|Diffbot | | | | | | +|DataForSeoBot | [DataForSEO](https://dataforseo.com/) | [Yes](https://dataforseo.com/dataforseo-bot) | Backlink checking and SEO data collection to be resolt to clients. | As often as every 5 seconds. | Operated by DataForSEO to check backlinks and scrape SEO data for resale. | +|Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | |FacebookBot | | | | | | |Google-Extended| | | | | | |GoogleOther | | | | | |