From c17cae6e9da024121b7f2e705990f78a456be11a Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Wed, 17 Jul 2024 02:28:32 +0100 Subject: [PATCH 001/249] link to bot metrics table Make it easier to view the table. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4e3db3a..8f51c9c 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ -This is an open list of web crawlers associated with AI companies and the training of LLMs to block. We encourage you to contribute to and implement this list on your own site. +This is an open list of web crawlers associated with AI companies and the training of LLMs to block. We encourage you to contribute to and implement this list on your own site. See [information about the listed crawlers](./table-of-bot-metrics.md). A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers. From 29729265320afcf8e9347f079dbbf8fbf15ffb28 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Fri, 26 Jul 2024 09:06:10 -0700 Subject: [PATCH 002/249] chore: add OAI-SearchBot --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index 7d9cc92..111d678 100644 --- a/robots.txt +++ b/robots.txt @@ -17,6 +17,7 @@ User-agent: GoogleOther-Video User-agent: GPTBot User-agent: ImagesiftBot User-agent: img2dataset +User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot User-agent: PerplexityBot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index f30f6f9..b4d54cf 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -17,6 +17,7 @@ |GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | |GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | | img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | At the discretion of img2dataset users. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +|OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information | Crawls sites to surface as results in SearchGPT. | |omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | |omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | |PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | From 6e323554c697ffbfc35b6f1299b9eaa05a092ae6 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Mon, 29 Jul 2024 08:27:31 -0700 Subject: [PATCH 003/249] chore: add Meta-ExternalAgent --- robots.txt | 1 + table-of-bot-metrics.md | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/robots.txt b/robots.txt index 111d678..091a675 100644 --- a/robots.txt +++ b/robots.txt @@ -17,6 +17,7 @@ User-agent: GoogleOther-Video User-agent: GPTBot User-agent: ImagesiftBot User-agent: img2dataset +User-agent: Meta-ExternalAgent User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index b4d54cf..5873926 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -16,7 +16,8 @@ |GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | |GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | |GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | At the discretion of img2dataset users. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | |OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information | Crawls sites to surface as results in SearchGPT. | |omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | |omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From d49e860b7464439089ccb131ce063e15459a05a8 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Mon, 29 Jul 2024 12:12:42 -0700 Subject: [PATCH 004/249] chore: add VelenPublicWebCrawler --- robots.txt | 1 + table-of-bot-metrics.md | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/robots.txt b/robots.txt index 091a675..d687d86 100644 --- a/robots.txt +++ b/robots.txt @@ -22,5 +22,6 @@ User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot User-agent: PerplexityBot +User-agent: VelenPublicWebCrawler User-agent: YouBot Disallow: / diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 5873926..701e104 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -22,4 +22,5 @@ |omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | |omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | |PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -|YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information | Retrieves data used for You.com web search engine and LLMs. | +|VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +|YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information | Retrieves data used for You.com web search engine and LLMs. | \ No newline at end of file From 55b4505e30cab27376ee4f3b2a114a4977068611 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Mon, 29 Jul 2024 12:38:22 -0700 Subject: [PATCH 005/249] chore: add Timpibot --- robots.txt | 1 + table-of-bot-metrics.md | 31 ++++++++++++++++--------------- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/robots.txt b/robots.txt index d687d86..6295e08 100644 --- a/robots.txt +++ b/robots.txt @@ -22,6 +22,7 @@ User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot User-agent: PerplexityBot +User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: YouBot Disallow: / diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 701e104..02cfca1 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -1,26 +1,27 @@ |Name |Operator |Respects `robots.txt` |Data use |Visit regularity |Description | |----------------|---------|-----------------------|----------|------------------|-------------| -|Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -|anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +|Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +|anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | |Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | |CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | |ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -|ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -|Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +|ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +|Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | |Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | |FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -|Google-Extended| Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -|GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -|GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -|GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -|GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +|Google-Extended| Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +|GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +|GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +|GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +|GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | | img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -|OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information | Crawls sites to surface as results in SearchGPT. | -|omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -|omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +|OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +|omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +|omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | |PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -|VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -|YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information | Retrieves data used for You.com web search engine and LLMs. | \ No newline at end of file +|Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +|VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +|YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From fa7b64ae4bca1e4dfb6adceee6d8f0cd89797432 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Tue, 30 Jul 2024 10:28:46 -0700 Subject: [PATCH 006/249] chore: add Scrapy --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index 6295e08..51c5e31 100644 --- a/robots.txt +++ b/robots.txt @@ -22,6 +22,7 @@ User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot User-agent: PerplexityBot +User-agent: Scrapy User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: YouBot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 02cfca1..71958c2 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -22,6 +22,7 @@ |omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | |omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | |PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +|Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | |Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | |VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | |YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From df89722038fc42923f462320dd5f21ed92451228 Mon Sep 17 00:00:00 2001 From: nisbet-hubbard <87453615+nisbet-hubbard@users.noreply.github.com> Date: Wed, 31 Jul 2024 18:27:29 +0800 Subject: [PATCH 007/249] Add `PetalBot` (and `facebookexternalhit`?) --- robots.txt | 2 ++ 1 file changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index 51c5e31..cf20329 100644 --- a/robots.txt +++ b/robots.txt @@ -10,6 +10,7 @@ User-agent: cohere-ai User-agent: Diffbot User-agent: FacebookBot User-agent: FriendlyCrawler +User-agent: facebookexternalhit User-agent: Google-Extended User-agent: GoogleOther User-agent: GoogleOther-Image @@ -22,6 +23,7 @@ User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot User-agent: PerplexityBot +User-agent: PetalBot User-agent: Scrapy User-agent: Timpibot User-agent: VelenPublicWebCrawler From 6c596a50ea9ca620e4f48a400d8c3bc817a7bdb0 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Thu, 1 Aug 2024 07:53:43 -0700 Subject: [PATCH 008/249] chore: move FAQ into repo --- FAQ.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 FAQ.md diff --git a/FAQ.md b/FAQ.md new file mode 100644 index 0000000..0bb1ac9 --- /dev/null +++ b/FAQ.md @@ -0,0 +1,23 @@ +# Frequently asked questions + +## How do we know AI companies/bots respect `robots.txt`? + +The short answer is that we don't. `robots.txt` is a well-established standard but compliance is voluntary. There is no enforcement mechanism. + +## Can we block crawlers based on user agent strings? + +Yes, provided the crawlers identify themselves and your application/hosting supports doing so. + +## Why should we block these crawlers? + +They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities. + +**[How Tech Giants Cut Corners to Harvest Data for A.I.](https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?unlocked_article_code=1.ik0.Ofja.L21c1wyW-0xj&ugrp=m)** +> OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems. + +**[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)** +> The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI. + +## How can I contribute? + +Open a pull request. It will be reviewed and acted upon appropriately. **We really appreciate contributions** — this is a community effort. From 17a84f2c2d26ff759c2996879557399e7f3d2505 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Thu, 1 Aug 2024 15:06:49 -0700 Subject: [PATCH 009/249] chore: update robots table --- table-of-bot-metrics.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 71958c2..2db0cdc 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -1,16 +1,17 @@ |Name |Operator |Respects `robots.txt` |Data use |Visit regularity |Description | |----------------|---------|-----------------------|----------|------------------|-------------| -|Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -|anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +|Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +|anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | |Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | |CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | |ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -|ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -|Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +|ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +|Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | |cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | |Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | |FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +|facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | |Google-Extended| Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | |GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | |GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | @@ -21,6 +22,7 @@ |OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | |omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | |omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +|PetalBot | [Huawei](https://huawei.com/) | [Yes](https://webmaster.petalsearch.com/site/petalbot#blockpetalBot) | [Used to provide recommendations in Hauwei assistant and AI search services.](https://webmaster.petalsearch.com/site/petalbot#whatispetalBot) | [No explicit frequency provided.](https://webmaster.petalsearch.com/site/petalbot#pressure) | Operated by Huawei to provide search and AI assistant services. | |PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | |Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | |Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | From 1fdc79dacb23e86671cbde3a43a7565f08126f38 Mon Sep 17 00:00:00 2001 From: Adam Newbold Date: Thu, 1 Aug 2024 18:17:19 -0400 Subject: [PATCH 010/249] Adding GitHub Action --- .github/workflows/main.yml | 25 +++++ code/action.php | 28 ++++++ robots.json | 191 +++++++++++++++++++++++++++++++++++++ 3 files changed, 244 insertions(+) create mode 100644 .github/workflows/main.yml create mode 100644 code/action.php create mode 100644 robots.json diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml new file mode 100644 index 0000000..df6678f --- /dev/null +++ b/.github/workflows/main.yml @@ -0,0 +1,25 @@ +on: [push] + +jobs: + ai-robots-txt: + runs-on: ubuntu-latest + name: ai-robots-txt + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 2 + - run: | + git config --global user.name "ai.robots.txt" + git config --global user.email "ai.robots.txt@users.noreply.github.com" + git rm robots.txt + git rm table-of-bot-metrics.md + git add -A + git commit -m "Removing previously generated files" + git push + php -f code/action.php + git config --global user.name "ai.robots.txt" + git config --global user.email "ai.robots.txt@users.noreply.github.com" + git add -A + git commit -m "${{ github.event.head_commit.message }}" + git push + shell: bash \ No newline at end of file diff --git a/code/action.php b/code/action.php new file mode 100644 index 0000000..52ebbe6 --- /dev/null +++ b/code/action.php @@ -0,0 +1,28 @@ + $details) { + $robots_txt .= 'User-agent: '.$robot."\n"; + $robots_table .= '| '.$robot.' | '.$details['operator'].' | '.$details['respect'].' | '.$details['function'].' | '.$details['frequency'].' | '.$details['description'].' | '."\n"; +} + +$robots_txt .= 'Disallow: /'; + +file_put_contents('robots.txt', $robots_txt); +file_put_contents('table-of-bot-metrics.md', $robots_table); diff --git a/robots.json b/robots.json new file mode 100644 index 0000000..523b2bf --- /dev/null +++ b/robots.json @@ -0,0 +1,191 @@ +{ + "Amazonbot": { + "operator": "Amazon", + "respect": "Yes", + "function": "Service improvement and enabling answers for Alexa users.", + "frequency": "No information. provided.", + "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." + }, + "anthropic-ai": { + "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "respect": "Unclear at this time.", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information. provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + }, + "Applebot-Extended": { + "operator": "[Apple](https:\/\/support.apple.com\/en-us\/119829#datausage)", + "respect": "Yes", + "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", + "frequency": "Unclear at this time.", + "description": "Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools." + }, + "Bytespider": { + "operator": "ByteDance", + "respect": "No", + "function": "LLM training.", + "frequency": "Unclear at this time.", + "description": "Downloads data to train LLMS, including ChatGPT competitors." + }, + "CCBot": { + "operator": "[Common Crawl](https:\/\/commoncrawl.org)", + "respect": "[Yes](https:\/\/commoncrawl.org\/ccbot)", + "function": "Provides crawl data for an open source repository that has been used to train LLMs.", + "frequency": "Unclear at this time.", + "description": "Sources data that is made openly available and is used to train AI models." + }, + "ChatGPT-User": { + "operator": "[OpenAI](https:\/\/openai.com)", + "respect": "Yes", + "function": "Takes action based on user prompts.", + "frequency": "Only when prompted by a user.", + "description": "Used by plugins in ChatGPT to answer queries based on user input." + }, + "ClaudeBot": { + "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "respect": "Unclear at this time.", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information. provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + }, + "Claude-Web": { + "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "respect": "Unclear at this time.", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information. provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + }, + "cohere-ai": { + "operator": "[Cohere](https:\/\/cohere.com)", + "respect": "Unclear at this time.", + "function": "Retrieves data to provide responses to user-initiated prompts.", + "frequency": "Takes action based on user prompts.", + "description": "Retrieves data based on user prompts." + }, + "Diffbot": { + "operator": "[Diffbot](https:\/\/www.diffbot.com\/)", + "respect": "At the discretion of Diffbot users.", + "function": "Aggregates structured web data for monitoring and AI model training.", + "frequency": "Unclear at this time.", + "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." + }, + "FacebookBot": { + "operator": "Meta\/Facebook", + "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "function": "Training language models", + "frequency": "Up to 1 page per second", + "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." + }, + "Google-Extended": { + "operator": "Google", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "function": "LLM training.", + "frequency": "No information.", + "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." + }, + "GoogleOther": { + "operator": "Google", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "function": "Scrapes data.", + "frequency": "No information.", + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" + }, + "GoogleOther-Image": { + "operator": "Google", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "function": "Scrapes data.", + "frequency": "No information.", + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" + }, + "GoogleOther-Video": { + "operator": "Google", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "function": "Scrapes data.", + "frequency": "No information.", + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" + }, + "GPTBot": { + "operator": "[OpenAI](https:\/\/openai.com)", + "respect": "Yes", + "function": "Scrapes data to train OpenAI's products.", + "frequency": "No information.", + "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." + }, + "img2dataset": { + "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", + "respect": "Unclear at this time.", + "function": "Scrapes images for use in LLMs.", + "frequency": "At the discretion of img2dataset users.", + "description": "Downloads large sets of images into datasets for LLM training or other purposes." + }, + "Meta-ExternalAgent": { + "operator": "[Meta](https:\/\/developers.facebook.com\/docs\/sharing\/webmasters\/web-crawlers)", + "respect": "Yes.", + "function": "Used to train models and improve products.", + "frequency": "No information.", + "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" + }, + "OAI-SearchBot": { + "operator": "[OpenAI](https:\/\/openai.com)", + "respect": "[Yes](https:\/\/platform.openai.com\/docs\/bots)", + "function": "Search result generation.", + "frequency": "No information.", + "description": "Crawls sites to surface as results in SearchGPT." + }, + "omgili": { + "operator": "[Webz.io](https:\/\/webz.io\/)", + "respect": "[Yes](https:\/\/webz.io\/blog\/web-data\/what-is-the-omgili-bot-and-why-is-it-crawling-your-website\/)", + "function": "Data is sold.", + "frequency": "No information.", + "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." + }, + "omgilibot": { + "operator": "[Webz.io](https:\/\/webz.io\/)", + "respect": "[Yes](https:\/\/web.archive.org\/web\/20170704003301\/http:\/\/omgili.com\/Crawler.html)", + "function": "Data is sold.", + "frequency": "No information.", + "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." + }, + "PerplexityBot": { + "operator": "[Perplexity](https:\/\/www.perplexity.ai\/)", + "respect": "[No](https:\/\/www.macstories.net\/stories\/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler\/)", + "function": "Used to answer queries at the request of users.", + "frequency": "Takes action based on user prompts.", + "description": "Operated by Perplexity to obtain results in response to user queries." + }, + "Scrapy": { + "operator": "[Zyte](https:\/\/www.zyte.com)", + "respect": "Unclear at this time.", + "function": "Scrapes data a variety of uses including training AI.", + "frequency": "No information.", + "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" + }, + "Timpibot": { + "operator": "[Timpi](https:\/\/timpi.io)", + "respect": "Unclear at this time.", + "function": "Scrapes data for use in training LLMs.", + "frequency": "No information.", + "description": "Makes data available for training AI models." + }, + "VelenPublicWebCrawler": { + "operator": "[Velen Crawler](https:\/\/velen.io)", + "respect": "[Yes](https:\/\/velen.io)", + "function": "Scrapes data for business data sets and machine learning models.", + "frequency": "No information.", + "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" + }, + "YouBot": { + "operator": "[You](https:\/\/about.you.com\/youchat\/)", + "respect": "[Yes](https:\/\/about.you.com\/youbot\/)", + "function": "Scrapes data for search engine and LLMs.", + "frequency": "No information.", + "description": "Retrieves data used for You.com web search engine and LLMs." + }, + "TestBot2": { + "operator": "Testing operator", + "respect": "Testing respect", + "function": "Testing function", + "frequency": "Testing frequency", + "description": "Testing description" + } +} \ No newline at end of file From efabf3e721a8a7dbc8bda17bbdbd17f9ac7b6b71 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Thu, 1 Aug 2024 15:25:55 -0700 Subject: [PATCH 011/249] chore: remove test data --- robots.json | 7 ------- 1 file changed, 7 deletions(-) diff --git a/robots.json b/robots.json index 523b2bf..5559432 100644 --- a/robots.json +++ b/robots.json @@ -180,12 +180,5 @@ "function": "Scrapes data for search engine and LLMs.", "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." - }, - "TestBot2": { - "operator": "Testing operator", - "respect": "Testing respect", - "function": "Testing function", - "frequency": "Testing frequency", - "description": "Testing description" } } \ No newline at end of file From 747cc834c4c078cd47d4ebb90ec5bcd5ca626f9c Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 1 Aug 2024 22:29:01 +0000 Subject: [PATCH 012/249] Removing previously generated files --- robots.txt | 31 ------------------------------- table-of-bot-metrics.md | 30 ------------------------------ 2 files changed, 61 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cf20329..0000000 --- a/robots.txt +++ /dev/null @@ -1,31 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: facebookexternalhit -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2db0cdc..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,30 +0,0 @@ -|Name |Operator |Respects `robots.txt` |Data use |Visit regularity |Description | -|----------------|---------|-----------------------|----------|------------------|-------------| -|Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -|anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -|Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -|Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -|CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -|ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -|ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -|Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -|cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -|Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -|FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -|facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -|Google-Extended| Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -|GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -|GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -|GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -|GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -|OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -|omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -|omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -|PetalBot | [Huawei](https://huawei.com/) | [Yes](https://webmaster.petalsearch.com/site/petalbot#blockpetalBot) | [Used to provide recommendations in Hauwei assistant and AI search services.](https://webmaster.petalsearch.com/site/petalbot#whatispetalBot) | [No explicit frequency provided.](https://webmaster.petalsearch.com/site/petalbot#pressure) | Operated by Huawei to provide search and AI assistant services. | -|PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -|Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -|Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -|VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -|YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From f18f0d99b9b5d19c18c5d6ef95db0be04526633e Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 1 Aug 2024 22:29:02 +0000 Subject: [PATCH 013/249] chore: remove test data --- robots.txt | 27 +++++++++++++++++++++++++++ table-of-bot-metrics.md | 28 ++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..482470f --- /dev/null +++ b/robots.txt @@ -0,0 +1,27 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..4d84969 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,28 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From b20dfec1e401604761a5db229e59c8c0471ab96e Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Thu, 1 Aug 2024 15:33:07 -0700 Subject: [PATCH 014/249] chore: drop in additional data --- robots.json | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/robots.json b/robots.json index 5559432..159fe6e 100644 --- a/robots.json +++ b/robots.json @@ -76,6 +76,13 @@ "frequency": "Up to 1 page per second", "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." }, + "facebookexternalhit": { + "operator": "Meta\/Facebook", + "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "function": "No information.", + "frequency": "Unclear at this time.", + "description": "Unclear at this time." + }, "Google-Extended": { "operator": "Google", "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", @@ -153,6 +160,13 @@ "frequency": "Takes action based on user prompts.", "description": "Operated by Perplexity to obtain results in response to user queries." }, + "PetalBot": { + "operator": "[Huawei](https:\/\/huawei.com\/)", + "respect": "Yes", + "function": "Used to provide recommendations in Hauwei assistant and AI search services.", + "frequency": "No explicit frequency provided.", + "description": "Operated by Huawei to provide search and AI assistant services." + }, "Scrapy": { "operator": "[Zyte](https:\/\/www.zyte.com)", "respect": "Unclear at this time.", From 06b950bce9a48f6e903ccffaa7689a53fe9d445e Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 1 Aug 2024 22:33:23 +0000 Subject: [PATCH 015/249] Removing previously generated files --- robots.txt | 27 --------------------------- table-of-bot-metrics.md | 28 ---------------------------- 2 files changed, 55 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 482470f..0000000 --- a/robots.txt +++ /dev/null @@ -1,27 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 4d84969..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,28 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From b144225ece0582afd92fd6322b1a10c2c455618c Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 1 Aug 2024 22:33:23 +0000 Subject: [PATCH 016/249] chore: drop in additional data --- robots.txt | 29 +++++++++++++++++++++++++++++ table-of-bot-metrics.md | 30 ++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..8ee7424 --- /dev/null +++ b/robots.txt @@ -0,0 +1,29 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1407fd4 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,30 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 349c35eed634ab090f8880dc0fad2d54a06f6036 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Fri, 2 Aug 2024 09:31:48 -0700 Subject: [PATCH 017/249] chore: contribution note --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index 8f51c9c..39ecb9b 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,10 @@ A number of these crawlers have been sourced from [Dark Visitors](https://darkvi If you'd like to add information about a crawler to the list, please make a pull request with the bot name added to `robots.txt`, `ai.txt`, and any relevant details in `table-of-bot-metrics.md` to help people understand what's crawling. +## Contributing + +A note about contributing: updates should be added/made to `robots.json`. A GitHub action, courtesy of [Adam](https://github.com/newbold), will then generate the updated `robots.txt` and `table-of-bot-metrics.md`. + ## Subscribe to updates You can subscribe to list updates via RSS/Atom with the releases feed: From 9d8d3de8ed7737b229bea996a90178e5199589e8 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 2 Aug 2024 16:31:59 +0000 Subject: [PATCH 018/249] Removing previously generated files --- robots.txt | 29 ----------------------------- table-of-bot-metrics.md | 30 ------------------------------ 2 files changed, 59 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 8ee7424..0000000 --- a/robots.txt +++ /dev/null @@ -1,29 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1407fd4..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,30 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From d8de1ebdd5e48fb27cba32181a9275c1dfd54ee8 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 2 Aug 2024 16:32:00 +0000 Subject: [PATCH 019/249] chore: contribution note --- robots.txt | 29 +++++++++++++++++++++++++++++ table-of-bot-metrics.md | 30 ++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..8ee7424 --- /dev/null +++ b/robots.txt @@ -0,0 +1,29 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1407fd4 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,30 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 74b15028394f2617d7bc805417b85c38554bb7d8 Mon Sep 17 00:00:00 2001 From: nisbet-hubbard <87453615+nisbet-hubbard@users.noreply.github.com> Date: Sat, 3 Aug 2024 14:04:58 +0800 Subject: [PATCH 020/249] Update FAQ.md --- FAQ.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/FAQ.md b/FAQ.md index 0bb1ac9..c0ec16f 100644 --- a/FAQ.md +++ b/FAQ.md @@ -8,6 +8,21 @@ The short answer is that we don't. `robots.txt` is a well-established standard b Yes, provided the crawlers identify themselves and your application/hosting supports doing so. +## What can we do if a bot doesn't respect `robots.txt`? + +That depends on your stack. + +- Nginx + - [Blocking Bots with Nginx](https://rknight.me/blog/blocking-bots-with-nginx/) by Robb Knight + - [Blocking AI web crawlers](https://underlap.org/blocking-ai-web-crawlers) by Glyn Normington +- Apache httpd + - [Blockin' bots.](https://ethanmarcotte.com/wrote/blockin-bots/) by Ethan Marcotte + - [Blocking Bots With 11ty And Apache](https://flamedfury.com/posts/blocking-bots-with-11ty-and-apache/) by fLaMEd fury +> [!TIP] +> The snippets in these articles all use `mod_rewrite`, which [should be considered a last resort](https://httpd.apache.org/docs/trunk/rewrite/avoid.html). A good alternative that's less resource-intensive is `mod_setenvif`; see [httpd docs](https://httpd.apache.org/docs/trunk/rewrite/access.html#blocking-of-robots) for an example. +- Netlify + - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman + ## Why should we block these crawlers? They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities. From b24e5cb3bb4e799f1856c22dc77439ddf22e9518 Mon Sep 17 00:00:00 2001 From: nisbet-hubbard <87453615+nisbet-hubbard@users.noreply.github.com> Date: Sat, 3 Aug 2024 14:12:50 +0800 Subject: [PATCH 021/249] Update FAQ.md --- FAQ.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/FAQ.md b/FAQ.md index c0ec16f..06fb2ef 100644 --- a/FAQ.md +++ b/FAQ.md @@ -22,6 +22,8 @@ That depends on your stack. > The snippets in these articles all use `mod_rewrite`, which [should be considered a last resort](https://httpd.apache.org/docs/trunk/rewrite/avoid.html). A good alternative that's less resource-intensive is `mod_setenvif`; see [httpd docs](https://httpd.apache.org/docs/trunk/rewrite/access.html#blocking-of-robots) for an example. - Netlify - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman +- Cloudflare + - [I’m blocking AI crawlers](https://roelant.net/en/2024/im-blocking-ai-crawlers-part-2/) by Roelant ## Why should we block these crawlers? From 2b56c72bacce5a4285e083d60fd1d4a20c033036 Mon Sep 17 00:00:00 2001 From: nisbet-hubbard <87453615+nisbet-hubbard@users.noreply.github.com> Date: Sat, 3 Aug 2024 14:27:25 +0800 Subject: [PATCH 022/249] Update FAQ.md --- FAQ.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FAQ.md b/FAQ.md index 06fb2ef..c26a936 100644 --- a/FAQ.md +++ b/FAQ.md @@ -19,7 +19,7 @@ That depends on your stack. - [Blockin' bots.](https://ethanmarcotte.com/wrote/blockin-bots/) by Ethan Marcotte - [Blocking Bots With 11ty And Apache](https://flamedfury.com/posts/blocking-bots-with-11ty-and-apache/) by fLaMEd fury > [!TIP] -> The snippets in these articles all use `mod_rewrite`, which [should be considered a last resort](https://httpd.apache.org/docs/trunk/rewrite/avoid.html). A good alternative that's less resource-intensive is `mod_setenvif`; see [httpd docs](https://httpd.apache.org/docs/trunk/rewrite/access.html#blocking-of-robots) for an example. +> The snippets in these articles all use `mod_rewrite`, which [should be considered a last resort](https://httpd.apache.org/docs/trunk/rewrite/avoid.html). A good alternative that's less resource-intensive is `mod_setenvif`; see [httpd docs](https://httpd.apache.org/docs/trunk/rewrite/access.html#blocking-of-robots) for an example. You should also consider [setting this up in `httpd.conf` instead of `.htaccess`](https://httpd.apache.org/docs/trunk/howto/htaccess.html#when) if it's available to you. - Netlify - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman - Cloudflare From b1907d86be1cfe36bf90e1a70ca8fce7878f3c40 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 3 Aug 2024 14:27:46 +0000 Subject: [PATCH 023/249] Removing previously generated files --- robots.txt | 29 ----------------------------- table-of-bot-metrics.md | 30 ------------------------------ 2 files changed, 59 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 8ee7424..0000000 --- a/robots.txt +++ /dev/null @@ -1,29 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1407fd4..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,30 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From ffbad453f321a72740db83ca5d914fdf73f15c21 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 3 Aug 2024 14:27:47 +0000 Subject: [PATCH 024/249] Merge pull request #24 from nisbet-hubbard/patch-5 Add last line of defence to FAQ --- robots.txt | 29 +++++++++++++++++++++++++++++ table-of-bot-metrics.md | 30 ++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..8ee7424 --- /dev/null +++ b/robots.txt @@ -0,0 +1,29 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1407fd4 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,30 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 5826c18909c8004ec5dc55bf56eecfac09ccd11f Mon Sep 17 00:00:00 2001 From: Mirium999 Date: Sun, 4 Aug 2024 10:11:25 +0900 Subject: [PATCH 025/249] Add ICC-Crawler --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index 159fe6e..b61946f 100644 --- a/robots.json +++ b/robots.json @@ -118,6 +118,13 @@ "frequency": "No information.", "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." }, + "ICC-Crawler": { + "operator": "[NICT](https:\/\/nict.go.jp)", + "respect": "Yes", + "function": "Scrapes data to train and support AI technologies.", + "frequency": "No information.", + "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." + }, "img2dataset": { "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", "respect": "Unclear at this time.", From 8c632e1ba45489763454526cb31da9d3f2eca8d1 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 01:21:55 +0000 Subject: [PATCH 026/249] Removing previously generated files --- robots.txt | 29 ----------------------------- table-of-bot-metrics.md | 30 ------------------------------ 2 files changed, 59 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 8ee7424..0000000 --- a/robots.txt +++ /dev/null @@ -1,29 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1407fd4..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,30 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 8de5bc8e01cd3d53f6f11f3160fe912f5f3b504a Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 01:21:56 +0000 Subject: [PATCH 027/249] Merge pull request #25 from mirium999/add_icc_crawler Add ICC-Crawler --- robots.txt | 30 ++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 31 +++++++++++++++++++++++++++++++ 2 files changed, 61 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..eb78054 --- /dev/null +++ b/robots.txt @@ -0,0 +1,30 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..9f9012d --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,31 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 1ca936ce115dad316417c571766106e7cde07ec8 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Sun, 4 Aug 2024 12:28:48 -0700 Subject: [PATCH 028/249] chore: restore FriendlyCrawler + ImageSift --- robots.json | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/robots.json b/robots.json index b61946f..5427572 100644 --- a/robots.json +++ b/robots.json @@ -83,6 +83,13 @@ "frequency": "Unclear at this time.", "description": "Unclear at this time." }, + "FriendlyCrawler": { + "operator": "Unknown", + "respect": "[Yes](https:\/\/imho.alex-kunz.com\/2024\/01\/25\/an-update-on-friendly-crawler)", + "function": "We are using the data from the crawler to build datasets for machine learning experiments.", + "frequency": "Unclear at this time.", + "description": "Unclear who the operator is; but data is used for training/machine learning." + }, "Google-Extended": { "operator": "Google", "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", @@ -125,6 +132,13 @@ "frequency": "No information.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, + "ImageSift": { + "operator": "[ImageSift](https:\/\/imagesift.com)", + "respect": "[Yes](https:\/\/imagesift.com\/about)", + "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", + "frequency": "No information.", + "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." + }, "img2dataset": { "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", "respect": "Unclear at this time.", From 9a8fa6677272757df37188df4d3c450b57969dbd Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 19:29:00 +0000 Subject: [PATCH 029/249] Removing previously generated files --- robots.txt | 30 ------------------------------ table-of-bot-metrics.md | 31 ------------------------------- 2 files changed, 61 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index eb78054..0000000 --- a/robots.txt +++ /dev/null @@ -1,30 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 9f9012d..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,31 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From c7b781034ebd9acac9ae9ffd33aad846b746dc72 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 19:29:01 +0000 Subject: [PATCH 030/249] chore: restore FriendlyCrawler + ImageSift --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..88cdfcb --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImageSift +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..4940112 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImageSift | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 146fd4ffba68f5c326ab7990d386e5dd7d25da21 Mon Sep 17 00:00:00 2001 From: Joshua Sheard Date: Sun, 4 Aug 2024 21:33:04 +0100 Subject: [PATCH 031/249] Fix Imagesift user agent --- robots.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/robots.json b/robots.json index 5427572..ef8b335 100644 --- a/robots.json +++ b/robots.json @@ -132,7 +132,7 @@ "frequency": "No information.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, - "ImageSift": { + "ImagesiftBot": { "operator": "[ImageSift](https:\/\/imagesift.com)", "respect": "[Yes](https:\/\/imagesift.com\/about)", "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", @@ -216,4 +216,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} \ No newline at end of file +} From 8dbbdbf44c868303620c607b3e93249b6c175162 Mon Sep 17 00:00:00 2001 From: Joshua Sheard Date: Sun, 4 Aug 2024 21:38:02 +0100 Subject: [PATCH 032/249] Add Cloudflares first-party scraper blocking to FAQ --- FAQ.md | 1 + 1 file changed, 1 insertion(+) diff --git a/FAQ.md b/FAQ.md index c26a936..59a44db 100644 --- a/FAQ.md +++ b/FAQ.md @@ -23,6 +23,7 @@ That depends on your stack. - Netlify - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman - Cloudflare + - [Block AI bots, scrapers and crawlers with a single click](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) by Cloudflare - [I’m blocking AI crawlers](https://roelant.net/en/2024/im-blocking-ai-crawlers-part-2/) by Roelant ## Why should we block these crawlers? From 0072b8f5f08aadda50e4b4202ecd93a95f1971b4 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 21:53:47 +0000 Subject: [PATCH 033/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 88cdfcb..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImageSift -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 4940112..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImageSift | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From c2f177870f88a2b3cf13aef13854e0865750eb35 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 21:53:48 +0000 Subject: [PATCH 034/249] Merge pull request #27 from jsheard/patch-1 Fix Imagesift user agent --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 1cfc0714984241e900c7eee71411b97596f4db98 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 21:54:16 +0000 Subject: [PATCH 035/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From eb924b9856607b591f4574b2b5a38e5307fc19ff Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 4 Aug 2024 21:54:17 +0000 Subject: [PATCH 036/249] Merge pull request #28 from jsheard/patch-2 Add Cloudflares first-party scraper blocking to FAQ --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From b0a93aeb700e98d49588568cd47cefc5c79f5220 Mon Sep 17 00:00:00 2001 From: John Bowdre Date: Sun, 4 Aug 2024 17:45:18 -0500 Subject: [PATCH 037/249] only build on changes to robots.json --- .github/workflows/main.yml | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index df6678f..ca5efd2 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -1,4 +1,7 @@ -on: [push] +on: + push: + paths: + - 'robots.json' jobs: ai-robots-txt: From b54e274bbc910d94f21248d35d8037968adb63b6 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 6 Aug 2024 15:44:53 +0000 Subject: [PATCH 038/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From e12ddc0f42ff390c6f8a7e9d3c43018d5b99dbcc Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 6 Aug 2024 15:44:54 +0000 Subject: [PATCH 039/249] Merge pull request #29 from jbowdre/dev only build on changes to robots.json --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 192bf67631183766b4ade6a3c6479fe63af8e2fa Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 17:01:56 +0100 Subject: [PATCH 040/249] add dark visitor workflow --- .github/workflows/daily_update.yml | 22 +++++++++++++++++ code/dark_visitors.py | 38 ++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) create mode 100644 .github/workflows/daily_update.yml create mode 100644 code/dark_visitors.py diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml new file mode 100644 index 0000000..1e36f7b --- /dev/null +++ b/.github/workflows/daily_update.yml @@ -0,0 +1,22 @@ +name: Daily Update from Dark Visitors +on: + schedule: + - cron: "0 0 * * *" + +jobs: + dark-visitors: + runs-on: ubuntu-latest + name: dark-visitors + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 2 + - run: | + pip install beautifulsoup4 requests + git config --global user.name "dark-visitors" + git config --global user.email "dark-visitors@users.noreply.github.com" + python code/dark_visitors.py + git add -A + git commit -m "Daily update from Dark Visitors" + git push + shell: bash \ No newline at end of file diff --git a/code/dark_visitors.py b/code/dark_visitors.py new file mode 100644 index 0000000..01965b9 --- /dev/null +++ b/code/dark_visitors.py @@ -0,0 +1,38 @@ +import json +from pathlib import Path + +import requests +from bs4 import BeautifulSoup + +session = requests.Session() +response = session.get("https://darkvisitors.com/agents") +soup = BeautifulSoup(response.text, "html.parser") + +existing_content = json.loads(Path("./robots.json").read_text()) + +for section in soup.find_all("div", {"class": "agent-links-section"}): + category = section.find("h2").get_text() + for agent in section.find_all("a", href=True): + name = agent.find("div", {"class": "agent-name"}).get_text().strip() + desc = agent.find("p").get_text().strip() + + if name in existing_content: + print(f"{name} already exists in robots.json") + continue + # Template: + # "Claude-Web": { + # "operator": "[Anthropic](https:\/\/www.anthropic.com)", + # "respect": "Unclear at this time.", + # "function": "Scrapes data to train Anthropic's AI products.", + # "frequency": "No information. provided.", + # "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + # } + existing_content[name] = { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Unclear at this time.", + "frequency": "Unclear at this time.", + "description": f"{desc} More info can be found at https://darkvisitors.com/agents{agent['href']}" + } + +Path("./robots.json").write_text(json.dumps(existing_content, indent=4)) \ No newline at end of file From 8ab1e30a6c2373ed4d0714c3562d44f28d8326a9 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 17:12:26 +0100 Subject: [PATCH 041/249] test workflow --- .github/workflows/daily_update.yml | 2 +- code/dark_visitors.py | 20 ++++++++++++++++++-- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 1e36f7b..28f777f 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -1,7 +1,7 @@ name: Daily Update from Dark Visitors on: schedule: - - cron: "0 0 * * *" + - cron: "*/10 * * * *" jobs: dark-visitors: diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 01965b9..c7d11dc 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -9,6 +9,21 @@ response = session.get("https://darkvisitors.com/agents") soup = BeautifulSoup(response.text, "html.parser") existing_content = json.loads(Path("./robots.json").read_text()) +added = 0 +to_include = [ + "AI Assistants", + "AI Data Scrapers", + "AI Search Crawlers", + "Archivers", + "Developer Helpers", + "Fetchers", + "Intelligence Gatherers", + "Scrapers", + "Search Engine Crawlers", + "SEO Crawlers", + "Uncategorized", + "Undocumented AI Agents" +] for section in soup.find_all("div", {"class": "agent-links-section"}): category = section.find("h2").get_text() @@ -17,7 +32,6 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): desc = agent.find("p").get_text().strip() if name in existing_content: - print(f"{name} already exists in robots.json") continue # Template: # "Claude-Web": { @@ -30,9 +44,11 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): existing_content[name] = { "operator": "Unclear at this time.", "respect": "Unclear at this time.", - "function": "Unclear at this time.", + "function": f"{category}", "frequency": "Unclear at this time.", "description": f"{desc} More info can be found at https://darkvisitors.com/agents{agent['href']}" } + added += 1 +print(f"Added {added} new agents, total is now {len(existing_content)}") Path("./robots.json").write_text(json.dumps(existing_content, indent=4)) \ No newline at end of file From b4d25bf0cb2fa75733ae052c26d468ac5475aa8c Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Tue, 6 Aug 2024 17:20:26 +0100 Subject: [PATCH 042/249] Add FAQ --- FAQ.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/FAQ.md b/FAQ.md index 59a44db..b0d2167 100644 --- a/FAQ.md +++ b/FAQ.md @@ -2,7 +2,13 @@ ## How do we know AI companies/bots respect `robots.txt`? -The short answer is that we don't. `robots.txt` is a well-established standard but compliance is voluntary. There is no enforcement mechanism. +The short answer is that we don't. `robots.txt` is a well-established standard, but compliance is voluntary. There is no enforcement mechanism. + +## Why might AI web crawlers respect `robots.txt`? + +Larger and/or reputable companies developing AI models probably wouldn't want to damage their reputation by ignoring `robots.txt`. + +Also, given the contentious nature of AI and the possibility of legislation limiting its development, companies developing AI models will probably want to be seen to be behaving ethically, and so should (eventually) respect `robots.txt`. ## Can we block crawlers based on user agent strings? From 83d9397f176c781f388bac1765c09ff7a95c9477 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 6 Aug 2024 16:21:00 +0000 Subject: [PATCH 043/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 6d2285f5e0b391fc3646889dcb5a57989ba71623 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 6 Aug 2024 16:21:01 +0000 Subject: [PATCH 044/249] Add FAQ --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From fdd261dad4f319735fb99cafa034fa09d0b82cf9 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Tue, 6 Aug 2024 16:27:02 +0000 Subject: [PATCH 045/249] Daily update from Dark Visitors --- robots.json | 5442 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 5402 insertions(+), 40 deletions(-) diff --git a/robots.json b/robots.json index ef8b335..dba55e9 100644 --- a/robots.json +++ b/robots.json @@ -7,14 +7,14 @@ "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." }, "anthropic-ai": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot-Extended": { - "operator": "[Apple](https:\/\/support.apple.com\/en-us\/119829#datausage)", + "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", "respect": "Yes", "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", "frequency": "Unclear at this time.", @@ -28,192 +28,5554 @@ "description": "Downloads data to train LLMS, including ChatGPT competitors." }, "CCBot": { - "operator": "[Common Crawl](https:\/\/commoncrawl.org)", - "respect": "[Yes](https:\/\/commoncrawl.org\/ccbot)", + "operator": "[Common Crawl](https://commoncrawl.org)", + "respect": "[Yes](https://commoncrawl.org/ccbot)", "function": "Provides crawl data for an open source repository that has been used to train LLMs.", "frequency": "Unclear at this time.", "description": "Sources data that is made openly available and is used to train AI models." }, "ChatGPT-User": { - "operator": "[OpenAI](https:\/\/openai.com)", + "operator": "[OpenAI](https://openai.com)", "respect": "Yes", "function": "Takes action based on user prompts.", "frequency": "Only when prompted by a user.", "description": "Used by plugins in ChatGPT to answer queries based on user input." }, "ClaudeBot": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Claude-Web": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "cohere-ai": { - "operator": "[Cohere](https:\/\/cohere.com)", + "operator": "[Cohere](https://cohere.com)", "respect": "Unclear at this time.", "function": "Retrieves data to provide responses to user-initiated prompts.", "frequency": "Takes action based on user prompts.", "description": "Retrieves data based on user prompts." }, "Diffbot": { - "operator": "[Diffbot](https:\/\/www.diffbot.com\/)", + "operator": "[Diffbot](https://www.diffbot.com/)", "respect": "At the discretion of Diffbot users.", "function": "Aggregates structured web data for monitoring and AI model training.", "frequency": "Unclear at this time.", "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." }, "FacebookBot": { - "operator": "Meta\/Facebook", - "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", "function": "Training language models", "frequency": "Up to 1 page per second", "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." }, "facebookexternalhit": { - "operator": "Meta\/Facebook", - "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", "function": "No information.", "frequency": "Unclear at this time.", "description": "Unclear at this time." }, "FriendlyCrawler": { "operator": "Unknown", - "respect": "[Yes](https:\/\/imho.alex-kunz.com\/2024\/01\/25\/an-update-on-friendly-crawler)", + "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)", "function": "We are using the data from the crawler to build datasets for machine learning experiments.", "frequency": "Unclear at this time.", "description": "Unclear who the operator is; but data is used for training/machine learning." }, "Google-Extended": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "LLM training.", "frequency": "No information.", "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." }, "GoogleOther": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Image": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Video": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GPTBot": { - "operator": "[OpenAI](https:\/\/openai.com)", + "operator": "[OpenAI](https://openai.com)", "respect": "Yes", "function": "Scrapes data to train OpenAI's products.", "frequency": "No information.", "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." }, "ICC-Crawler": { - "operator": "[NICT](https:\/\/nict.go.jp)", + "operator": "[NICT](https://nict.go.jp)", "respect": "Yes", "function": "Scrapes data to train and support AI technologies.", "frequency": "No information.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, "ImagesiftBot": { - "operator": "[ImageSift](https:\/\/imagesift.com)", - "respect": "[Yes](https:\/\/imagesift.com\/about)", + "operator": "[ImageSift](https://imagesift.com)", + "respect": "[Yes](https://imagesift.com/about)", "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", "frequency": "No information.", "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." }, "img2dataset": { - "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", + "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", "respect": "Unclear at this time.", "function": "Scrapes images for use in LLMs.", "frequency": "At the discretion of img2dataset users.", "description": "Downloads large sets of images into datasets for LLM training or other purposes." }, "Meta-ExternalAgent": { - "operator": "[Meta](https:\/\/developers.facebook.com\/docs\/sharing\/webmasters\/web-crawlers)", + "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", "respect": "Yes.", "function": "Used to train models and improve products.", "frequency": "No information.", "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" }, "OAI-SearchBot": { - "operator": "[OpenAI](https:\/\/openai.com)", - "respect": "[Yes](https:\/\/platform.openai.com\/docs\/bots)", + "operator": "[OpenAI](https://openai.com)", + "respect": "[Yes](https://platform.openai.com/docs/bots)", "function": "Search result generation.", "frequency": "No information.", "description": "Crawls sites to surface as results in SearchGPT." }, "omgili": { - "operator": "[Webz.io](https:\/\/webz.io\/)", - "respect": "[Yes](https:\/\/webz.io\/blog\/web-data\/what-is-the-omgili-bot-and-why-is-it-crawling-your-website\/)", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", "function": "Data is sold.", "frequency": "No information.", "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." }, "omgilibot": { - "operator": "[Webz.io](https:\/\/webz.io\/)", - "respect": "[Yes](https:\/\/web.archive.org\/web\/20170704003301\/http:\/\/omgili.com\/Crawler.html)", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)", "function": "Data is sold.", "frequency": "No information.", "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." }, "PerplexityBot": { - "operator": "[Perplexity](https:\/\/www.perplexity.ai\/)", - "respect": "[No](https:\/\/www.macstories.net\/stories\/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler\/)", + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", "function": "Used to answer queries at the request of users.", "frequency": "Takes action based on user prompts.", "description": "Operated by Perplexity to obtain results in response to user queries." }, "PetalBot": { - "operator": "[Huawei](https:\/\/huawei.com\/)", + "operator": "[Huawei](https://huawei.com/)", "respect": "Yes", "function": "Used to provide recommendations in Hauwei assistant and AI search services.", "frequency": "No explicit frequency provided.", "description": "Operated by Huawei to provide search and AI assistant services." }, "Scrapy": { - "operator": "[Zyte](https:\/\/www.zyte.com)", + "operator": "[Zyte](https://www.zyte.com)", "respect": "Unclear at this time.", "function": "Scrapes data a variety of uses including training AI.", "frequency": "No information.", "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" }, "Timpibot": { - "operator": "[Timpi](https:\/\/timpi.io)", + "operator": "[Timpi](https://timpi.io)", "respect": "Unclear at this time.", "function": "Scrapes data for use in training LLMs.", "frequency": "No information.", "description": "Makes data available for training AI models." }, "VelenPublicWebCrawler": { - "operator": "[Velen Crawler](https:\/\/velen.io)", - "respect": "[Yes](https:\/\/velen.io)", + "operator": "[Velen Crawler](https://velen.io)", + "respect": "[Yes](https://velen.io)", "function": "Scrapes data for business data sets and machine learning models.", "frequency": "No information.", "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" }, "YouBot": { - "operator": "[You](https:\/\/about.you.com\/youchat\/)", - "respect": "[Yes](https:\/\/about.you.com\/youbot\/)", + "operator": "[You](https://about.you.com/youchat/)", + "respect": "[Yes](https://about.you.com/youbot/)", "function": "Scrapes data for search engine and LLMs.", "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." + }, + "Meta-ExternalFetcher": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Assistants", + "frequency": "Unclear at this time.", + "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" + }, + "Applebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Search Crawlers", + "frequency": "Unclear at this time.", + "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" + }, + "archive.org_bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "archive.org_bot is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archive-org-bot" + }, + "Arquivo-web-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "Arquivo-web-crawler is an archiver operated by Arquivo.pt. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arquivo-web-crawler" + }, + "heritrix": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "heritrix is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/heritrix" + }, + "ia_archiver": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "ia_archiver is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver" + }, + "ia_archiver-web.archive.org": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "ia_archiver-web.archive.org is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver-web-archive-org" + }, + "Nicecrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "Nicecrawler is an archiver operated by NiceCrawler. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicecrawler" + }, + "2ip bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "2ip bot is a developer helper operated by 2IP. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-bot" + }, + "AhrefsSiteAudit": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "AhrefsSiteAudit is a developer helper operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefssiteaudit" + }, + "BingPreview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "BingPreview is a developer helper operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingpreview" + }, + "Chrome-Lighthouse": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "Chrome-Lighthouse is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/chrome-lighthouse" + }, + "Dark Visitor": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "Dark Visitor is a developer helper operated by Dark Visitors. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dark-visitor" + }, + "deadlinkchecker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "deadlinkchecker is a developer helper operated by Dead Link Checker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deadlinkchecker" + }, + "Google-InspectionTool": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "Google-InspectionTool is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-inspectiontool" + }, + "rogerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "rogerbot is a developer helper operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rogerbot" + }, + "SiteAuditBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "SiteAuditBot is a developer helper operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteauditbot" + }, + "t3versionsBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "t3versionsBot is a developer helper operated by T3Versions. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/t3versionsbot" + }, + "W3C_CSS_Validator": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "W3C_CSS_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-css-validator" + }, + "W3C_Validator": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "W3C_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-validator" + }, + "WellKnownBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "WellKnownBot is a developer helper operated by Well-Known. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wellknownbot" + }, + "BazQux": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "BazQux is a fetcher operated by BazQux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bazqux" + }, + "bitlybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "bitlybot is a fetcher operated by Bitly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitlybot" + }, + "BublupBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "BublupBot is a fetcher operated by Bublup. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bublupbot" + }, + "Discordbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Discordbot is a fetcher operated by Discord. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discordbot" + }, + "Embedly": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Embedly is a fetcher operated by Embedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/embedly" + }, + "Feedly": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Feedly is a fetcher operated by Feedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedly" + }, + "FlipboardProxy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "FlipboardProxy is a fetcher operated by Flipboard. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flipboardproxy" + }, + "FreshRSS": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "FreshRSS is a fetcher operated by FreshRSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshrss" + }, + "Friendica": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Friendica is a fetcher operated by Friendica. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/friendica" + }, + "Google Web Preview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Google Web Preview is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-web-preview" + }, + "Google-Read-Aloud": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Google-Read-Aloud is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-read-aloud" + }, + "Hatena": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Hatena is a fetcher operated by Hatena. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hatena" + }, + "Iframely": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Iframely is a fetcher operated by Iframely. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iframely" + }, + "inoreader": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "inoreader is a fetcher operated by Inoreader. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inoreader" + }, + "LinkedInBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "LinkedInBot is a fetcher operated by LinkedIn. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkedinbot" + }, + "Mail.RU_Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Mail.RU_Bot is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mail-ru-bot" + }, + "Mastodon": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Mastodon is a fetcher operated by Mastodon. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mastodon" + }, + "Miniflux": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Miniflux is a fetcher operated by Miniflux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miniflux" + }, + "NewsBlur": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "NewsBlur is a fetcher operated by NewsBlur. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsblur" + }, + "Nextcloud": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Nextcloud is a fetcher operated by Nextcloud. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextcloud" + }, + "Pinterestbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Pinterestbot is a fetcher operated by Pinterest. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterestbot" + }, + "PocketParser": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "PocketParser is a fetcher operated by Pocket. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pocketparser" + }, + "redditbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "redditbot is a fetcher operated by Reddit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/redditbot" + }, + "SerendeputyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SerendeputyBot is a fetcher operated by Serendeputy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serendeputybot" + }, + "SimplePie": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SimplePie is a fetcher operated by SimplePie. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplepie" + }, + "SkypeUriPreview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SkypeUriPreview is a fetcher operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/skypeuripreview" + }, + "Slackbot-LinkExpanding": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Slackbot-LinkExpanding is a fetcher operated by Slack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot-linkexpanding" + }, + "Snap URL Preview Service": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Snap URL Preview Service is a fetcher operated by Snap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snap-url-preview-service" + }, + "snapchat": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "snapchat is a fetcher operated by Snapchat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snapchat" + }, + "startmebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "startmebot is a fetcher operated by Start.me. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/startmebot" + }, + "Superfeedr": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Superfeedr is a fetcher operated by Superfeedr. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superfeedr" + }, + "SurdotlyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SurdotlyBot is a fetcher operated by Sur.ly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surdotlybot" + }, + "Synapse": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Synapse is a fetcher operated by Matrix. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synapse" + }, + "TelegramBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "TelegramBot is a fetcher operated by Telegram. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telegrambot" + }, + "Tiny Tiny RSS": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Tiny Tiny RSS is a fetcher operated by Tiny Tiny RSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tiny-tiny-rss" + }, + "Twitterbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Twitterbot is a fetcher operated by X. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twitterbot" + }, + "Viber": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Viber is a fetcher operated by Viber. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/viber" + }, + "vkShare": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "vkShare is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkshare" + }, + "WhatsApp": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "WhatsApp is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/whatsapp" + }, + "Yahoo Link Preview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Yahoo Link Preview is a fetcher operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-link-preview" + }, + "adbeat_bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "adbeat_bot is an intelligence gatherer operated by Adbeat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adbeat-bot" + }, + "AdsBot-Google": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AdsBot-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google" + }, + "AdsBot-Google-Mobile": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AdsBot-Google-Mobile is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google-mobile" + }, + "aiHitBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "aiHitBot is an intelligence gatherer operated by aiHit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aihitbot" + }, + "AndersPinkBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AndersPinkBot is an intelligence gatherer operated by Anders Pink. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anderspinkbot" + }, + "ArchiveBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "ArchiveBot is an intelligence gatherer operated by Wikimedia. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archivebot" + }, + "AwarioBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AwarioBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariobot" + }, + "AwarioSmartBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AwarioSmartBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariosmartbot" + }, + "BitSightBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "BitSightBot is an intelligence gatherer operated by Bitsight. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitsightbot" + }, + "Blackboard": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Blackboard is an intelligence gatherer operated by Anthology. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blackboard" + }, + "BrandVerity": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "BrandVerity is an intelligence gatherer operated by BrandVerity. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandverity" + }, + "Cincraw": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Cincraw is an intelligence gatherer operated by CINC. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cincraw" + }, + "ev-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "ev-crawler is an intelligence gatherer operated by Headline. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ev-crawler" + }, + "Google-Safety": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Google-Safety is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-safety" + }, + "HubSpot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "HubSpot is an intelligence gatherer operated by HubSpot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hubspot" + }, + "IonCrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "IonCrawl is an intelligence gatherer operated by IONOS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ioncrawl" + }, + "Jugendschutzprogramm-Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Jugendschutzprogramm-Crawler is an intelligence gatherer operated by JusProg. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jugendschutzprogramm-crawler" + }, + "KStandBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "KStandBot is an intelligence gatherer operated by URL Classification. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kstandbot" + }, + "LightspeedSystemsCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "LightspeedSystemsCrawler is an intelligence gatherer operated by Lightspeed Systems. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lightspeedsystemscrawler" + }, + "linkfluence": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "linkfluence is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkfluence" + }, + "LinkWalker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "LinkWalker is an intelligence gatherer operated by Fortra. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkwalker" + }, + "magpie-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "magpie-crawler is an intelligence gatherer operated by Brandwatch. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/magpie-crawler" + }, + "Mediapartners-Google": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Mediapartners-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediapartners-google" + }, + "Mediatoolkitbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Mediatoolkitbot is an intelligence gatherer operated by Determ. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediatoolkitbot" + }, + "MuckRack": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "MuckRack is an intelligence gatherer operated by Muck Rack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/muckrack" + }, + "NetcraftSurveyAgent": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "NetcraftSurveyAgent is an intelligence gatherer operated by Netcraft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netcraftsurveyagent" + }, + "Netvibes": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Netvibes is an intelligence gatherer operated by Netvibes. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netvibes" + }, + "Pandalytics": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Pandalytics is an intelligence gatherer operated by Domainsbot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pandalytics" + }, + "panscient.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "panscient.com is an intelligence gatherer operated by Panscient. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/panscient-com" + }, + "proximic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "proximic is an intelligence gatherer operated by Comscore. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proximic" + }, + "scoop.it": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "scoop.it is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoop-it" + }, + "SeekportBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "SeekportBot is an intelligence gatherer operated by Seekport. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekportbot" + }, + "SMTBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "SMTBot is an intelligence gatherer operated by SimilarTech. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/smtbot" + }, + "trendictionbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "trendictionbot is an intelligence gatherer operated by Trendiction. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendictionbot" + }, + "TrendsmapResolver": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "TrendsmapResolver is an intelligence gatherer operated by Trendsmap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendsmapresolver" + }, + "Turnitin": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Turnitin is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitin" + }, + "TurnitinBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "TurnitinBot is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitinbot" + }, + "TweetmemeBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "TweetmemeBot is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetmemebot" + }, + "Twingly": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Twingly is an intelligence gatherer operated by Twingly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twingly" + }, + "um-LN": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "um-LN is an intelligence gatherer operated by Ubermetrics. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ln" + }, + "virustotal": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "virustotal is an intelligence gatherer operated by VirusTotal. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/virustotal" + }, + "ZoominfoBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "ZoominfoBot is an intelligence gatherer operated by ZoomInfo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoominfobot" + }, + "008": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "008 is a scraper operated by 80legs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/008" + }, + "Dataprovider.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "Dataprovider.com is a scraper operated by Dataprovider.com. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataprovider-com" + }, + "dcrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "dcrawl is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dcrawl" + }, + "HTTrack": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "HTTrack is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack" + }, + "HTTrack 3.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "HTTrack 3.0 is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack-3-0" + }, + "MetaInspector": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "MetaInspector is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metainspector" + }, + "newspaper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "newspaper is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newspaper" + }, + "Nutch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "Nutch is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nutch" + }, + "Offline Explorer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "Offline Explorer is a scraper operated by MetaProducts. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/offline-explorer" + }, + "OpenindexSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "OpenindexSpider is a scraper operated by Openindex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openindexspider" + }, + "360Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "360Spider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider" + }, + "AlexandriaOrgBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "AlexandriaOrgBot is a search engine crawler operated by Alexandria.org. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexandriaorgbot" + }, + "Atom Feed Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Atom Feed Robot is a search engine crawler operated by RSSMicro. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/atom-feed-robot" + }, + "Baiduspider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Baiduspider is a search engine crawler operated by Baidu. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider" + }, + "bingbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "bingbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingbot" + }, + "coccocbot-web": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "coccocbot-web is a search engine crawler operated by Coc Coc. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot-web" + }, + "Daum": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Daum is a search engine crawler operated by Daum. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daum" + }, + "DuckDuckBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "DuckDuckBot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckbot" + }, + "DuckDuckGo-Favicons-Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "DuckDuckGo-Favicons-Bot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckgo-favicons-bot" + }, + "Feedfetcher-Google": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Feedfetcher-Google is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedfetcher-google" + }, + "Google Favicon": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Google Favicon is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-favicon" + }, + "Googlebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot" + }, + "Googlebot-Image": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-Image is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-image" + }, + "Googlebot-Mobile": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-Mobile is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-mobile" + }, + "Googlebot-News": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-News is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-news" + }, + "Googlebot-Video": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-Video is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-video" + }, + "HaoSouSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "HaoSouSpider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/haosouspider" + }, + "MojeekBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "MojeekBot is a search engine crawler operated by Mojeek. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeekbot" + }, + "msnbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "msnbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot" + }, + "msnbot-media": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "msnbot-media is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot-media" + }, + "Qwantify": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Qwantify is a search engine crawler operated by Qwant. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwantify" + }, + "SemanticScholarBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "SemanticScholarBot is a search engine crawler operated by AI2. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticscholarbot" + }, + "SeznamBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "SeznamBot is a search engine crawler operated by Senzam. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seznambot" + }, + "Sogou web spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Sogou web spider is a search engine crawler operated by Sogou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-web-spider" + }, + "teoma": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "teoma is a search engine crawler operated by Ask. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teoma" + }, + "TinEye": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "TinEye is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye" + }, + "TinEye-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "TinEye-bot is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye-bot" + }, + "yacybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "yacybot is a search engine crawler operated by YaCy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yacybot" + }, + "Yahoo! Slurp": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Yahoo! Slurp is a search engine crawler operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-slurp" + }, + "Yandex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Yandex is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandex" + }, + "YandexBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YandexBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexbot" + }, + "YandexImages": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YandexImages is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandeximages" + }, + "YandexRenderResourcesBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YandexRenderResourcesBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexrenderresourcesbot" + }, + "Yeti": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Yeti is a search engine crawler operated by Naver. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yeti" + }, + "YisouSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YisouSpider is a search engine crawler operated by Yisou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yisouspider" + }, + "ZumBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "ZumBot is a search engine crawler operated by ZUM Internet. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zumbot" + }, + "AhrefsBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "AhrefsBot is an SEO crawler operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefsbot" + }, + "Barkrowler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "Barkrowler is an SEO crawler operated by Babbar. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/barkrowler" + }, + "BLEXBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "BLEXBot is an SEO crawler operated by SEO PowerSuite. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blexbot" + }, + "BrightEdge Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "BrightEdge Crawler is an SEO crawler operated by BrightEdge. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brightedge-crawler" + }, + "Cocolyzebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "Cocolyzebot is an SEO crawler operated by Cocolyze. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cocolyzebot" + }, + "DataForSeoBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "DataForSeoBot is an SEO crawler operated by DataForSEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataforseobot" + }, + "DomainStatsBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "DomainStatsBot is an SEO crawler operated by Domainstats. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domainstatsbot" + }, + "dotbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "dotbot is an SEO crawler operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dotbot" + }, + "hypestat": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "hypestat is an SEO crawler operated by HypeStat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hypestat" + }, + "linkdexbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "linkdexbot is an SEO crawler operated by Linkdex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdexbot" + }, + "MJ12bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "MJ12bot is an SEO crawler operated by Majestic. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mj12bot" + }, + "online-webceo-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "online-webceo-bot is an SEO crawler operated by WebCEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/online-webceo-bot" + }, + "Screaming Frog SEO Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "Screaming Frog SEO Spider is an SEO crawler operated by Screaming Frog. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/screaming-frog-seo-spider" + }, + "SemrushBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot" + }, + "SemrushBot-BA": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-BA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ba" + }, + "SemrushBot-CT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-CT is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ct" + }, + "SemrushBot-SI": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-SI is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-si" + }, + "SemrushBot-SWA": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-SWA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-swa" + }, + "SenutoBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SenutoBot is an SEO crawler operated by Senuto. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/senutobot" + }, + "SeobilityBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SeobilityBot is an SEO crawler operated by Seobility. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seobilitybot" + }, + "SEOkicks": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SEOkicks is an SEO crawler operated by SEOkicks. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks" + }, + "SEOlizer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SEOlizer is an SEO crawler operated by SEOLizer. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seolizer" + }, + "serpstatbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "serpstatbot is an SEO crawler operated by Serpstat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serpstatbot" + }, + "SiteCheckerBotCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SiteCheckerBotCrawler is an SEO crawler operated by Sitechecker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheckerbotcrawler" + }, + "ZoomBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "ZoomBot is an SEO crawler operated by SEOZoom. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoombot" + }, + "007ac9 Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "007ac9 Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/007ac9-crawler" + }, + "2ip.ru": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "2ip.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-ru" + }, + "360Spider-Image": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "360Spider-Image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-image" + }, + "360Spider-Video": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "360Spider-Video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-video" + }, + "5emeRue": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "5emeRue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5emerue" + }, + "5erue": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "5erue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5erue" + }, + "A Patent Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "A Patent Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a-patent-crawler" + }, + "A6-Indexer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "A6-Indexer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a6-indexer" + }, + "Aboundex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Aboundex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aboundex" + }, + "AcademicBotRTU": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AcademicBotRTU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/academicbotrtu" + }, + "acapbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "acapbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acapbot" + }, + "acoonbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "acoonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acoonbot" + }, + "Acunetix Security Scanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Acunetix Security Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-security-scanner" + }, + "Acunetix Web Vulnerability Scanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Acunetix Web Vulnerability Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-web-vulnerability-scanner" + }, + "AddSearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AddSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addsearchbot" + }, + "AddThis": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AddThis is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addthis" + }, + "adequat": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "adequat is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat" + }, + "adequat-systems": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "adequat-systems is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat-systems" + }, + "AdIdxBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AdIdxBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adidxbot" + }, + "ADmantX": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ADmantX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/admantx" + }, + "adscanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "adscanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adscanner" + }, + "AdsTxtCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AdsTxtCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adstxtcrawler" + }, + "AdvBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AdvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/advbot" + }, + "AISearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AISearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aisearchbot" + }, + "Alexabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Alexabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexabot" + }, + "Alexibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Alexibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexibot" + }, + "AlphaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AlphaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alphabot" + }, + "AmiSoftware": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AmiSoftware is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/amisoftware" + }, + "antibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "antibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/antibot" + }, + "AnyEvent": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AnyEvent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anyevent" + }, + "Apercite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Apercite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/apercite" + }, + "AppInsights": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AppInsights is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/appinsights" + }, + "Aqua_Products": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Aqua_Products is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aqua-products" + }, + "arabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "arabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arabot" + }, + "Ask n read": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Ask n read is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ask-n-read" + }, + "asknread.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "asknread.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asknread-com" + }, + "AspiegelBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AspiegelBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aspiegelbot" + }, + "asterias": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "asterias is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asterias" + }, + "Augure": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Augure is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/augure" + }, + "auramundi": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "auramundi is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/auramundi" + }, + "AwarioRssBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AwarioRssBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariorssbot" + }, + "awesomecrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "awesomecrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awesomecrawler" + }, + "B2B Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "B2B Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2b-bot" + }, + "b2w": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "b2w is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2w" + }, + "BackDoorBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BackDoorBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backdoorbot" + }, + "BacklinkCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BacklinkCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backlinkcrawler" + }, + "Baidu-YunGuanCe": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baidu-YunGuanCe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baidu-yunguance" + }, + "Baiduspider-image": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baiduspider-image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-image" + }, + "Baiduspider-news": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baiduspider-news is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-news" + }, + "Baiduspider-video": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baiduspider-video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-video" + }, + "BDCbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BDCbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bdcbot" + }, + "BehloolBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BehloolBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/behloolbot" + }, + "betaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "betaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/betabot" + }, + "Better Uptime Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Better Uptime Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/better-uptime-bot" + }, + "bidswitchbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "bidswitchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bidswitchbot" + }, + "BIGLOTRON": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BIGLOTRON is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/biglotron" + }, + "binlar": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "binlar is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/binlar" + }, + "Birdcrawlerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Birdcrawlerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/birdcrawlerbot" + }, + "BitBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitbot" + }, + "Black Hole": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Black Hole is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/black-hole" + }, + "Blekkobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Blekkobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blekkobot" + }, + "blogmuraBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "blogmuraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blogmurabot" + }, + "BlowFish": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BlowFish is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blowfish" + }, + "BLP_bbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BLP_bbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blp-bbot" + }, + "bnf.fr_bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "bnf.fr_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bnf-fr-bot" + }, + "BomboraBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BomboraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bomborabot" + }, + "Bookmark search tool": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Bookmark search tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bookmark-search-tool" + }, + "bot-pge.chlooe.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "bot-pge.chlooe.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-pge-chlooe-com" + }, + "Bot.AraTurka.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Bot.AraTurka.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-araturka-com" + }, + "BotALot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BotALot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botalot" + }, + "botify": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "botify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botify" + }, + "BotRightHere": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BotRightHere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botrighthere" + }, + "BoxcarBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BoxcarBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/boxcarbot" + }, + "brainobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "brainobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brainobot" + }, + "BrandONbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BrandONbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandonbot" + }, + "BTWebClient": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BTWebClient is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/btwebclient" + }, + "BUbiNG": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BUbiNG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bubing" + }, + "Buck": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Buck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buck" + }, + "BuiltBotTough": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BuiltBotTough is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/builtbottough" + }, + "Bullseye": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Bullseye is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bullseye" + }, + "BunnySlippers": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BunnySlippers is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bunnyslippers" + }, + "buzzbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "buzzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buzzbot" + }, + "Caliperbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Caliperbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/caliperbot" + }, + "CapsuleChecker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CapsuleChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/capsulechecker" + }, + "careerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "careerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/careerbot" + }, + "CC Metadata Scaper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CC Metadata Scaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cc-metadata-scaper" + }, + "Cegbfeieh": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cegbfeieh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cegbfeieh" + }, + "centurybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "centurybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/centurybot" + }, + "changedetection": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "changedetection is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/changedetection" + }, + "CheckMarkNetwork": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CheckMarkNetwork is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/checkmarknetwork" + }, + "CheeseBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CheeseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cheesebot" + }, + "CherryPicker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CherryPicker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypicker" + }, + "CherryPickerElite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CherryPickerElite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerelite" + }, + "CherryPickerSE": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CherryPickerSE is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerse" + }, + "Cision": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cision is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cision" + }, + "CISPA Webcrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CISPA Webcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cispa-webcrawler" + }, + "citeseerxbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "citeseerxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citeseerxbot" + }, + "Citoid": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Citoid is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citoid" + }, + "Claritybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Claritybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/claritybot" + }, + "Clickagy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Clickagy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/clickagy" + }, + "Cliqzbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cliqzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cliqzbot" + }, + "CloudFlare-AlwaysOnline": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CloudFlare-AlwaysOnline is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cloudflare-alwaysonline" + }, + "coccoc": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "coccoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccoc" + }, + "coccocbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "coccocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot" + }, + "coexel": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "coexel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coexel" + }, + "Companybook-Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Companybook-Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/companybook-crawler" + }, + "content crawler spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "content crawler spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/content-crawler-spider" + }, + "ContextAd Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ContextAd Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contextad-bot" + }, + "contxbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "contxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contxbot" + }, + "convera": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "convera is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/convera" + }, + "ConveraCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ConveraCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/converacrawler" + }, + "Cookiebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cookiebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cookiebot" + }, + "Copernic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Copernic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copernic" + }, + "CopyRightCheck": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CopyRightCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copyrightcheck" + }, + "Corporama": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Corporama is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/corporama" + }, + "cosmos": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "cosmos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cosmos" + }, + "crawler4j": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "crawler4j is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawler4j" + }, + "CrawlyProjectCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CrawlyProjectCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawlyprojectcrawler" + }, + "Crescent": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Crescent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent" + }, + "Crescent Internet ToolPak HTTP OLE Control v.1.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Crescent Internet ToolPak HTTP OLE Control v.1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent-internet-toolpak-http-ole-control-v-1-0" + }, + "CriteoBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CriteoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/criteobot" + }, + "CrunchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CrunchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crunchbot" + }, + "CrystalSemanticsBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CrystalSemanticsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crystalsemanticsbot" + }, + "Curebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Curebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/curebot" + }, + "Cutbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cutbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cutbot" + }, + "cXensebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "cXensebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cxensebot" + }, + "CyberPatrol": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CyberPatrol is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cyberpatrol" + }, + "DareBoost": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DareBoost is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dareboost" + }, + "Datafeedwatch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Datafeedwatch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datafeedwatch" + }, + "datagnionbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "datagnionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datagnionbot" + }, + "Datanyze": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Datanyze is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datanyze" + }, + "daumoa": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "daumoa is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daumoa" + }, + "deepcrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "deepcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepcrawl" + }, + "deepnoc": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "deepnoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepnoc" + }, + "DeuSu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DeuSu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deusu" + }, + "Digg Deeper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Digg Deeper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digg-deeper" + }, + "Digimind": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Digimind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digimind" + }, + "Digincore bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Digincore bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digincore-bot" + }, + "discobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "discobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discobot" + }, + "Disqus": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Disqus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/disqus" + }, + "DittoSpyder": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DittoSpyder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dittospyder" + }, + "DnyzBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DnyzBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dnyzbot" + }, + "Domain Re-Animator Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Domain Re-Animator Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domain-re-animator-bot" + }, + "DomainCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DomainCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domaincrawler" + }, + "Dow Jones Searchbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Dow Jones Searchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dow-jones-searchbot" + }, + "Download Ninja": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Download Ninja is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/download-ninja" + }, + "Dragonbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Dragonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dragonbot" + }, + "drupact": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "drupact is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/drupact" + }, + "Dubbotbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Dubbotbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dubbotbot" + }, + "e.ventures Investment Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "e.ventures Investment Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/e-ventures-investment-crawler" + }, + "EasyBib AutoCite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EasyBib AutoCite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/easybib-autocite" + }, + "ec2linkfinder": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ec2linkfinder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ec2linkfinder" + }, + "edisterbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "edisterbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/edisterbot" + }, + "electricmonk": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "electricmonk is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/electricmonk" + }, + "elisabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "elisabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/elisabot" + }, + "ellisphere": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ellisphere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ellisphere" + }, + "EmailCollector": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EmailCollector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailcollector" + }, + "EmailSiphon": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EmailSiphon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailsiphon" + }, + "EmailWolf": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EmailWolf is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailwolf" + }, + "epicbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "epicbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/epicbot" + }, + "eright": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "eright is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eright" + }, + "EroCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EroCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/erocrawler" + }, + "EtaoSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EtaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/etaospider" + }, + "europarchive.org": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "europarchive.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/europarchive-org" + }, + "evc-batch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "evc-batch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/evc-batch" + }, + "EveryoneSocialBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EveryoneSocialBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/everyonesocialbot" + }, + "Exabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Exabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/exabot" + }, + "Experibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Experibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/experibot" + }, + "ExtLinksBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ExtLinksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extlinksbot" + }, + "ExtractorPro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ExtractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extractorpro" + }, + "Eyeotabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Eyeotabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eyeotabot" + }, + "EZID": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EZID is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezid" + }, + "Ezooms": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Ezooms is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezooms" + }, + "Facebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Facebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebot" + }, + "FairAd Client": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FairAd Client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fairad-client" + }, + "FAST Enterprise Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FAST Enterprise Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-enterprise-crawler" + }, + "FAST-WebCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FAST-WebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-webcrawler" + }, + "FediDB": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FediDB is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedidb" + }, + "fedoraplanet": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fedoraplanet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedoraplanet" + }, + "Feedbin": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Feedbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbin" + }, + "feedbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "feedbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbot" + }, + "FeedBurner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FeedBurner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedburner" + }, + "Feedspot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Feedspot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedspot" + }, + "FeedValidator": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FeedValidator is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedvalidator" + }, + "FemtosearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FemtosearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/femtosearchbot" + }, + "Fever": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Fever is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fever" + }, + "FindITAnswersbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FindITAnswersbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/finditanswersbot" + }, + "findlink": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "findlink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findlink" + }, + "findthatfile": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "findthatfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findthatfile" + }, + "findxbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "findxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findxbot" + }, + "Flaming AttackBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Flaming AttackBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flaming-attackbot" + }, + "Flamingo_SearchEngine": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Flamingo_SearchEngine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flamingo-searchengine" + }, + "fluffy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fluffy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fluffy" + }, + "Foobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Foobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/foobot" + }, + "fr-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fr-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fr-crawler" + }, + "FreeWebMonitoring SiteChecker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FreeWebMonitoring SiteChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freewebmonitoring-sitechecker" + }, + "FreshpingBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FreshpingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshpingbot" + }, + "fuelbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fuelbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fuelbot" + }, + "Fyrebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Fyrebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fyrebot" + }, + "g00g1e.net": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "g00g1e.net is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g00g1e-net" + }, + "G2 Web Services": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "G2 Web Services is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2-web-services" + }, + "g2reader-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "g2reader-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2reader-bot" + }, + "Gaisbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gaisbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gaisbot" + }, + "GarlikCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GarlikCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/garlikcrawler" + }, + "Genieo": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Genieo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/genieo" + }, + "GetRight": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GetRight is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/getright" + }, + "Gigablast": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gigablast is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigablast" + }, + "Gigabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gigabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigabot" + }, + "GingerCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GingerCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gingercrawler" + }, + "Gluten Free Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gluten Free Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gluten-free-crawler" + }, + "gnam gnam spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "gnam gnam spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnam-gnam-spider" + }, + "GnowitNewsbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GnowitNewsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnowitnewsbot" + }, + "Google-Adwords-Instant": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Adwords-Instant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-adwords-instant" + }, + "Google-Certificates-Bridge": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Certificates-Bridge is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-certificates-bridge" + }, + "Google-PhysicalWeb": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-PhysicalWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-physicalweb" + }, + "Google-Site-Verification": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Site-Verification is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-site-verification" + }, + "Google-Structured-Data-Testing-Tool": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Structured-Data-Testing-Tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-structured-data-testing-tool" + }, + "google-xrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "google-xrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-xrawler" + }, + "Gowikibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gowikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gowikibot" + }, + "grapeshot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "grapeshot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshot" + }, + "GrapeshotCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GrapeshotCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshotcrawler" + }, + "Grobbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Grobbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grobbot" + }, + "GroupHigh": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GroupHigh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grouphigh" + }, + "grub-client": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "grub-client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-client" + }, + "grub.org": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "grub.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-org" + }, + "gsa-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "gsa-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gsa-crawler" + }, + "gslfbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "gslfbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gslfbot" + }, + "Gwene": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gwene is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gwene" + }, + "Harvest": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Harvest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/harvest" + }, + "HawaiiBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "HawaiiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hawaiibot" + }, + "humanlinks": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "humanlinks is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/humanlinks" + }, + "hyscore.io": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "hyscore.io is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hyscore-io" + }, + "IAS crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IAS crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ias-crawler" + }, + "ICBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ICBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/icbot" + }, + "ichiro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ichiro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ichiro" + }, + "imrbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "imrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/imrbot" + }, + "IndeedBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IndeedBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/indeedbot" + }, + "INETDEX-BOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "INETDEX-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inetdex-bot" + }, + "InfoNaviRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "InfoNaviRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infonavirobot" + }, + "infoobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "infoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoobot" + }, + "infoseek": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "infoseek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoseek" + }, + "integromedb": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "integromedb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/integromedb" + }, + "intelium_bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "intelium_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/intelium-bot" + }, + "InterfaxScanBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "InterfaxScanBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/interfaxscanbot" + }, + "ip-web-crawler.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ip-web-crawler.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ip-web-crawler-com" + }, + "IRLbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IRLbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/irlbot" + }, + "Iron33": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Iron33 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iron33" + }, + "iskanie": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "iskanie is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iskanie" + }, + "IsraBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IsraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/israbot" + }, + "istellabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "istellabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/istellabot" + }, + "it2media-domain-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "it2media-domain-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/it2media-domain-crawler" + }, + "James BOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "James BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/james-bot" + }, + "JamesBOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JamesBOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamesbot" + }, + "Jamie's Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jamie's Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamies-spider" + }, + "JenkersBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JenkersBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jenkersbot" + }, + "JennyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JennyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jennybot" + }, + "Jetbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jetbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetbot" + }, + "Jetty": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jetty is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetty" + }, + "JikeSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JikeSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jikespider" + }, + "JobboerseBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JobboerseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jobboersebot" + }, + "Jooblebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jooblebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jooblebot" + }, + "jpg-newsbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "jpg-newsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jpg-newsbot" + }, + "jyxobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "jyxobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jyxobot" + }, + "k2spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "k2spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k2spider" + }, + "K7MLWCBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "K7MLWCBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k7mlwcbot" + }, + "kbcrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "kbcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kbcrawl" + }, + "Kemvibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Kemvibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kemvibot" + }, + "Kenjin Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Kenjin Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kenjin-spider" + }, + "keys-so-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "keys-so-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keys-so-bot" + }, + "Keyword Density": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Keyword Density is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keyword-density" + }, + "Knowings": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Knowings is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/knowings" + }, + "KomodiaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "KomodiaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/komodiabot" + }, + "KosmioBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "KosmioBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kosmiobot" + }, + "Landau-Media-Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Landau-Media-Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/landau-media-spider" + }, + "larbin": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "larbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/larbin" + }, + "Laserlikebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Laserlikebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/laserlikebot" + }, + "lb-spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lb-spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lb-spider" + }, + "leadbox": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "leadbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leadbox" + }, + "Leikibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Leikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leikibot" + }, + "LexiBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LexiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lexibot" + }, + "libWeb": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "libWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/libweb" + }, + "Linespider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Linespider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linespider" + }, + "Linguee Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Linguee Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linguee-bot" + }, + "linkapediabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "linkapediabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkapediabot" + }, + "LinkArchiver": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkArchiver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkarchiver" + }, + "LinkCheck by Siteimprove.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkCheck by Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkcheck-by-siteimprove-com" + }, + "linkdex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "linkdex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdex" + }, + "LinkextractorPro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkextractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkextractorpro" + }, + "LinkisBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkisBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkisbot" + }, + "linko": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "linko is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linko" + }, + "LinkpadBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkpadBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkpadbot" + }, + "LinkScan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkScan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkscan" + }, + "lipperhey": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lipperhey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lipperhey" + }, + "LivelapBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LivelapBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/livelapbot" + }, + "lkxscan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lkxscan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lkxscan" + }, + "LNSpiderguy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LNSpiderguy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lnspiderguy" + }, + "lssbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssbot" + }, + "lssrocketcrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lssrocketcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssrocketcrawler" + }, + "ltx71": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ltx71 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ltx71" + }, + "Luminator-robots": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Luminator-robots is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/luminator-robots" + }, + "lwp-trivial": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lwp-trivial is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lwp-trivial" + }, + "MaCoCu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MaCoCu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/macocu" + }, + "mappydata": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mappydata is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mappydata" + }, + "Mata Hari": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Mata Hari is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mata-hari" + }, + "MauiBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MauiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mauibot" + }, + "MBCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MBCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mbcrawler" + }, + "MegaIndex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MegaIndex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex" + }, + "MegaIndex.ru": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MegaIndex.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex-ru" + }, + "Meltawer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Meltawer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltawer" + }, + "Meltwater": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Meltwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwater" + }, + "MeltwaterNews": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MeltwaterNews is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwaternews" + }, + "memorybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "memorybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/memorybot" + }, + "mention": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mention is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mention" + }, + "MetaJobBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MetaJobBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metajobbot" + }, + "MetaURI": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MetaURI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metauri" + }, + "MIIxpc": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MIIxpc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miixpc" + }, + "mindUpBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mindUpBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mindupbot" + }, + "minicrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "minicrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/minicrawler" + }, + "Mister PiX": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Mister PiX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mister-pix" + }, + "MixnodeCache": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MixnodeCache is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mixnodecache" + }, + "mlbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mlbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mlbot" + }, + "moatbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "moatbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moatbot" + }, + "moget": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "moget is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moget" + }, + "Mojeek": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Mojeek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeek" + }, + "MoodleBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MoodleBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moodlebot" + }, + "Moreover": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Moreover is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moreover" + }, + "MS Search 4.0 Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MS Search 4.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-4-0-robot" + }, + "MS Search 6.0 Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MS Search 6.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-6-0-robot" + }, + "MSIECrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MSIECrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msiecrawler" + }, + "msrbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "msrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msrbot" + }, + "MTRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MTRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mtrobot" + }, + "Multiviewbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Multiviewbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/multiviewbot" + }, + "mytwip": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mytwip is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mytwip" + }, + "NAVER Blog Rssbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NAVER Blog Rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naver-blog-rssbot" + }, + "NaverBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NaverBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naverbot" + }, + "Neevabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Neevabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neevabot" + }, + "NerdByNature.Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NerdByNature.Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdbynature-bot" + }, + "nerdybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "nerdybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdybot" + }, + "NetAnts": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NetAnts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netants" + }, + "netEstate NE Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "netEstate NE Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netestate-ne-crawler" + }, + "Neticle Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Neticle Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neticle-crawler" + }, + "NetMechanic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NetMechanic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netmechanic" + }, + "netresearchserver": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "netresearchserver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netresearchserver" + }, + "NetSystemsResearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NetSystemsResearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netsystemsresearch" + }, + "newsharecounts": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "newsharecounts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsharecounts" + }, + "NewsNow": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NewsNow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsnow" + }, + "Newzbin": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Newzbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newzbin" + }, + "NextGenSearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NextGenSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextgensearchbot" + }, + "NICErsPRO": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NICErsPRO is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicerspro" + }, + "niki-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "niki-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/niki-bot" + }, + "NimbleCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NimbleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimblecrawler" + }, + "Nimbostratus-Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Nimbostratus-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimbostratus-bot" + }, + "NINJA bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NINJA bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ninja-bot" + }, + "NIXStatsbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NIXStatsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nixstatsbot" + }, + "NLUX_IAHarvester": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NLUX_IAHarvester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nlux-iaharvester" + }, + "Nmap Scripting Engine": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Nmap Scripting Engine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nmap-scripting-engine" + }, + "NPBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NPBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/npbot" + }, + "NTENTbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NTENTbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ntentbot" + }, + "Nuzzel": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Nuzzel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nuzzel" + }, + "OdklBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OdklBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/odklbot" + }, + "officestorebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "officestorebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/officestorebot" + }, + "Openbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Openbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openbot" + }, + "Openfind": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Openfind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind" + }, + "Openfind data gatherer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Openfind data gatherer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind-data-gatherer" + }, + "OpenGraphCheck": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OpenGraphCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opengraphcheck" + }, + "OpenHoseBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OpenHoseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openhosebot" + }, + "opinion-tracker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "opinion-tracker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opinion-tracker" + }, + "Oracle Ultra Search": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Oracle Ultra Search is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/oracle-ultra-search" + }, + "OrangeBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OrangeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orangebot" + }, + "Orthogaffe": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Orthogaffe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orthogaffe" + }, + "outbrain": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "outbrain is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outbrain" + }, + "OutclicksBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OutclicksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outclicksbot" + }, + "page2rss": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "page2rss is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/page2rss" + }, + "PagePeeker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PagePeeker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagepeeker" + }, + "PageThing": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PageThing is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagething" + }, + "peer39_crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "peer39_crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/peer39-crawler" + }, + "PerMan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PerMan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/perman" + }, + "Pingdom": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Pingdom is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pingdom" + }, + "Pinterest": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Pinterest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterest" + }, + "PiplBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PiplBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/piplbot" + }, + "postrank": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "postrank is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/postrank" + }, + "PR-CY.RU": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PR-CY.RU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pr-cy-ru" + }, + "Primalbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Primalbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/primalbot" + }, + "PrivacyAwareBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PrivacyAwareBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/privacyawarebot" + }, + "ProPowerBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ProPowerBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/propowerbot" + }, + "ProWebWalker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ProWebWalker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/prowebwalker" + }, + "proxem": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "proxem is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proxem" + }, + "psbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "psbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/psbot" + }, + "Pulsepoint": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Pulsepoint is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pulsepoint" + }, + "purebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "purebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/purebot" + }, + "QueryN Metasearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "QueryN Metasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/queryn-metasearch" + }, + "Qwam content intelligence": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Qwam content intelligence is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwam-content-intelligence" + }, + "Radiation Retriever 1.1": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Radiation Retriever 1.1 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/radiation-retriever-1-1" + }, + "RankActiveLinkBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RankActiveLinkBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankactivelinkbot" + }, + "RankFlex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RankFlex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankflex" + }, + "Refindbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Refindbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/refindbot" + }, + "RegionStuttgartBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RegionStuttgartBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/regionstuttgartbot" + }, + "RepoMonkey": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RepoMonkey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey" + }, + "RepoMonkey Bait & Tackle": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RepoMonkey Bait & Tackle is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey-bait-tackle" + }, + "RetrevoPageAnalyzer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RetrevoPageAnalyzer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/retrevopageanalyzer" + }, + "ReverseEngineeringBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ReverseEngineeringBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/reverseengineeringbot" + }, + "RidderBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RidderBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ridderbot" + }, + "Riddler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Riddler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/riddler" + }, + "Rivva": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Rivva is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rivva" + }, + "Robozilla": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Robozilla is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/robozilla" + }, + "rssbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssbot" + }, + "RSSingBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RSSingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssingbot" + }, + "RukiCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RukiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rukicrawler" + }, + "RuxitSynthetic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RuxitSynthetic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ruxitsynthetic" + }, + "RyteBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RyteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rytebot" + }, + "SafeDNSBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SafeDNSBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safednsbot" + }, + "SafeSearch microdata crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SafeSearch microdata crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safesearch-microdata-crawler" + }, + "SBL-BOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SBL-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sbl-bot" + }, + "score3": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "score3 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/score3" + }, + "ScoutJet": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ScoutJet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoutjet" + }, + "scribdbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "scribdbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scribdbot" + }, + "Scrubby": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Scrubby is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scrubby" + }, + "search.marginalia.nu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "search.marginalia.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/search-marginalia-nu" + }, + "SearchAtlas": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SearchAtlas is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchatlas" + }, + "SearchmetricsBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SearchmetricsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchmetricsbot" + }, + "searchpreview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "searchpreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchpreview" + }, + "seekbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seekbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekbot" + }, + "Seekport Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Seekport Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekport-crawler" + }, + "Seekr": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Seekr is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekr" + }, + "seewithkids": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seewithkids is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seewithkids" + }, + "semanticbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "semanticbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticbot" + }, + "sempi.tech": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sempi.tech is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sempi-tech" + }, + "SemrushBot-BM": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SemrushBot-BM is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-bm" + }, + "SemrushBot-SA": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SemrushBot-SA is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-sa" + }, + "sentibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sentibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sentibot" + }, + "SEOkicks-Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SEOkicks-Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks-robot" + }, + "seoscanners": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seoscanners is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seoscanners" + }, + "seostar.co": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seostar.co is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostar-co" + }, + "SEOstats": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SEOstats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostats" + }, + "SimpleCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SimpleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplecrawler" + }, + "SimpleScraper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SimpleScraper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplescraper" + }, + "Sindup": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sindup is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sindup" + }, + "sistrix crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sistrix crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sistrix-crawler" + }, + "SiteBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SiteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitebot" + }, + "sitecheck.internetseer.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sitecheck.internetseer.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheck-internetseer-com" + }, + "siteexplorer.info": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "siteexplorer.info is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteexplorer-info" + }, + "Siteimprove": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Siteimprove is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove" + }, + "Siteimprove.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove-com" + }, + "SiteSnagger": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SiteSnagger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesnagger" + }, + "SiteSucker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SiteSucker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesucker" + }, + "Slack-ImgProxy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Slack-ImgProxy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slack-imgproxy" + }, + "Slackbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Slackbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot" + }, + "Slurp": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Slurp is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slurp" + }, + "SocialRankIOBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SocialRankIOBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/socialrankiobot" + }, + "Sogou": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sogou is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou" + }, + "Sogou inst spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sogou inst spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-inst-spider" + }, + "Sogou spider2": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sogou spider2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-spider2" + }, + "Sonic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sonic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sonic" + }, + "Sosospider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sosospider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sosospider" + }, + "SpankBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SpankBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spankbot" + }, + "spanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "spanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spanner" + }, + "spbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "spbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spbot" + }, + "Spinn3r": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Spinn3r is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spinn3r" + }, + "spotter": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "spotter is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spotter" + }, + "SputnikBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SputnikBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sputnikbot" + }, + "Storebot-Google": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Storebot-Google is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storebot-google" + }, + "StorygizeBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "StorygizeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storygizebot" + }, + "StractBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "StractBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/stractbot" + }, + "Streamline3Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Streamline3Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/streamline3bot" + }, + "SummalyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SummalyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summalybot" + }, + "summify": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "summify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summify" + }, + "SuperBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SuperBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superbot" + }, + "SurveyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SurveyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surveybot" + }, + "suzuran": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "suzuran is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/suzuran" + }, + "Swiftbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Swiftbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swiftbot" + }, + "SWIMGBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SWIMGBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swimgbot" + }, + "Synthesio": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Synthesio is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synthesio" + }, + "Sysomos": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sysomos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sysomos" + }, + "Szukacz": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Szukacz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/szukacz" + }, + "Taboolabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Taboolabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/taboolabot" + }, + "tagoobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tagoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tagoobot" + }, + "Talkwater": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Talkwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/talkwater" + }, + "TangibleeBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TangibleeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tangibleebot" + }, + "Teleport": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Teleport is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleport" + }, + "TeleportPro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TeleportPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleportpro" + }, + "Telesoft": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Telesoft is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telesoft" + }, + "The Intraformant": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "The Intraformant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/the-intraformant" + }, + "TheNomad": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TheNomad is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thenomad" + }, + "theoldreader.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "theoldreader.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/theoldreader-com" + }, + "Thinklab": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Thinklab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thinklab" + }, + "tigerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tigerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tigerbot" + }, + "Titan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Titan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/titan" + }, + "toCrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "toCrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tocrawl" + }, + "TombaPublicWebCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TombaPublicWebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tombapublicwebcrawler" + }, + "toplistbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "toplistbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toplistbot" + }, + "ToutiaoSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ToutiaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toutiaospider" + }, + "Traackr.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Traackr.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/traackr-com" + }, + "tracemyfile": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tracemyfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tracemyfile" + }, + "trafilatura": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trafilatura is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trafilatura" + }, + "trendeo": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trendeo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendeo" + }, + "trendkite-akashic-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trendkite-akashic-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendkite-akashic-crawler" + }, + "trendybuzz": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trendybuzz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendybuzz" + }, + "trovitBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trovitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trovitbot" + }, + "True_Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "True_Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/true-robot" + }, + "TruliaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TruliaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/truliabot" + }, + "turingos": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "turingos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turingos" + }, + "tweetedtimes": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tweetedtimes is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetedtimes" + }, + "twengabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "twengabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twengabot" + }, + "Twurly": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Twurly is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twurly" + }, + "UbiCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UbiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ubicrawler" + }, + "um-IC": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "um-IC is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ic" + }, + "Updownerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Updownerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/updownerbot" + }, + "Upflow": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Upflow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/upflow" + }, + "Uptime-Kuma": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Uptime-Kuma is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptime-kuma" + }, + "Uptimebot.org": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Uptimebot.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimebot-org" + }, + "UptimeRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UptimeRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimerobot" + }, + "URL Control": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "URL Control is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-control" + }, + "URL_Spider_Pro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "URL_Spider_Pro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-spider-pro" + }, + "urlappendbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "urlappendbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urlappendbot" + }, + "URLy Warning": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "URLy Warning is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urly-warning" + }, + "usasearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "usasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usasearch" + }, + "UsineNouvelleCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UsineNouvelleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usinenouvellecrawler" + }, + "UT-Dorkbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UT-Dorkbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ut-dorkbot" + }, + "Validator.nu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Validator.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/validator-nu" + }, + "VCI": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VCI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci" + }, + "VCI WebViewer VCI WebViewer Win32": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VCI WebViewer VCI WebViewer Win32 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci-webviewer-vci-webviewer-win32" + }, + "vebidoobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vebidoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vebidoobot" + }, + "vecteurplus": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vecteurplus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vecteurplus" + }, + "Veoozbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Veoozbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/veoozbot" + }, + "verticalsearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "verticalsearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/verticalsearch" + }, + "Vigil": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Vigil is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vigil" + }, + "VKRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VKRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkrobot" + }, + "voilabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "voilabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voilabot" + }, + "voltron": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "voltron is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voltron" + }, + "VoluumDSP-content-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VoluumDSP-content-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voluumdsp-content-bot" + }, + "vsw": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vsw is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vsw" + }, + "vuhuvBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vuhuvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vuhuvbot" + }, + "W3C_I18n-Checker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C_I18n-Checker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-i18n-checker" + }, + "W3C_Unicorn": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C_Unicorn is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-unicorn" + }, + "W3C-checklink": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C-checklink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-checklink" + }, + "W3C-mobileOK": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C-mobileOK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-mobileok" + }, + "WASALive-Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WASALive-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wasalive-bot" + }, + "wbsearchbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "wbsearchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wbsearchbot" + }, + "Web Image Collector": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Web Image Collector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-image-collector" + }, + "web-archive-net.com.bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "web-archive-net.com.bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-archive-net-com-bot" + }, + "WebAuto": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebAuto is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webauto" + }, + "WebBandit": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebBandit is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webbandit" + }, + "WebCapture 2.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCapture 2.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcapture-2-0" + }, + "webcompanycrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "webcompanycrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcompanycrawler" + }, + "WebCopier": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCopier is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier" + }, + "WebCopier v.2.2": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCopier v.2.2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v-2-2" + }, + "WebCopier v3.2a": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCopier v3.2a is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v3-2a" + }, + "WebDataStats": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebDataStats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webdatastats" + }, + "WebEnhancer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebEnhancer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webenhancer" + }, + "WebmasterWorldForumBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebmasterWorldForumBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmasterworldforumbot" + }, + "webmon": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "webmon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmon" + }, + "WebReaper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebReaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webreaper" + }, + "WebSauger": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebSauger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/websauger" + }, + "Website Quester": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Website Quester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/website-quester" + }, + "WebStripper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebStripper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webstripper" + }, + "WebZIP": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebZIP is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webzip" + }, + "winello": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "winello is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winello" + }, + "WinHTTrack": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WinHTTrack is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winhttrack" + }, + "WiseGuys Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WiseGuys Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wiseguys-robot" + }, + "wocbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "wocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wocbot" + }, + "woobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "woobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woobot" + }, + "woorankreview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "woorankreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woorankreview" + }, + "WordupInfoSearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WordupInfoSearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wordupinfosearch" + }, + "woriobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "woriobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woriobot" + }, + "wotbox": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "wotbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wotbox" + }, + "WWW-Collector-E": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WWW-Collector-E is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-collector-e" + }, + "WWW-Mechanize": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WWW-Mechanize is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-mechanize" + }, + "www.uptime.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "www.uptime.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-uptime-com" + }, + "Xenu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu" + }, + "Xenu Link Sleuth": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu Link Sleuth is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu-link-sleuth" + }, + "Xenu's": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu's is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus" + }, + "Xenu's Link Sleuth 1.1c": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu's Link Sleuth 1.1c is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus-link-sleuth-1-1c" + }, + "xovibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "xovibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xovibot" + }, + "Yahoo Pipes 1.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Yahoo Pipes 1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-pipes-1-0" + }, + "YaK": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YaK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yak" + }, + "YandexMobileBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YandexMobileBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexmobilebot" + }, + "YandexVideo": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YandexVideo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexvideo" + }, + "yanga": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "yanga is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yanga" + }, + "Yellowbrandprotectionbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Yellowbrandprotectionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yellowbrandprotectionbot" + }, + "yoozBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "yoozBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yoozbot" + }, + "YoudaoBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YoudaoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youdaobot" + }, + "Youmag": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Youmag is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youmag" + }, + "Zabbix": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zabbix is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zabbix" + }, + "Zao": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zao is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zao" + }, + "Zealbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zealbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zealbot" + }, + "zenback bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "zenback bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zenback-bot" + }, + "Zeus": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zeus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus" + }, + "Zeus Link Scout": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zeus Link Scout is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus-link-scout" + }, + "zgrab": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "zgrab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zgrab" + }, + "Zite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zite" + }, + "ZuperlistBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ZuperlistBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zuperlistbot" + }, + "ZyBORG": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ZyBORG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zyborg" } -} +} \ No newline at end of file From 52d54cf1277fd6fab280453f77a0391fed10692e Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 17:28:00 +0100 Subject: [PATCH 046/249] restore the cron --- .github/workflows/daily_update.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 28f777f..1e36f7b 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -1,7 +1,7 @@ name: Daily Update from Dark Visitors on: schedule: - - cron: "*/10 * * * *" + - cron: "0 0 * * *" jobs: dark-visitors: From 55e92f43243f1501d98098c203e5dfb0fb1c5b17 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 17:48:06 +0100 Subject: [PATCH 047/249] update existing ones --- .github/workflows/daily_update.yml | 2 +- code/dark_visitors.py | 60 +++++++++++++++++++----------- 2 files changed, 40 insertions(+), 22 deletions(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 1e36f7b..28f777f 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -1,7 +1,7 @@ name: Daily Update from Dark Visitors on: schedule: - - cron: "0 0 * * *" + - cron: "*/10 * * * *" jobs: dark-visitors: diff --git a/code/dark_visitors.py b/code/dark_visitors.py index c7d11dc..2a84d58 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -9,7 +9,6 @@ response = session.get("https://darkvisitors.com/agents") soup = BeautifulSoup(response.text, "html.parser") existing_content = json.loads(Path("./robots.json").read_text()) -added = 0 to_include = [ "AI Assistants", "AI Data Scrapers", @@ -30,25 +29,44 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): for agent in section.find_all("a", href=True): name = agent.find("div", {"class": "agent-name"}).get_text().strip() desc = agent.find("p").get_text().strip() - - if name in existing_content: - continue - # Template: - # "Claude-Web": { - # "operator": "[Anthropic](https:\/\/www.anthropic.com)", - # "respect": "Unclear at this time.", - # "function": "Scrapes data to train Anthropic's AI products.", - # "frequency": "No information. provided.", - # "description": "Scrapes data to train LLMs and AI products offered by Anthropic." - # } - existing_content[name] = { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": f"{category}", - "frequency": "Unclear at this time.", - "description": f"{desc} More info can be found at https://darkvisitors.com/agents{agent['href']}" - } - added += 1 -print(f"Added {added} new agents, total is now {len(existing_content)}") + # TODO: there seems to be a typo? + default_values = { + "Unclear at this time.", + "No information. provided.", + "No information.", + "No explicit frequency provided." + } + default_value = "Unclear at this time." + + operator = default_value + if "operated by " in desc: + try: + operator = desc.split("operated by ", 1)[1].split(".", 1)[0].strip() + except Exception as e: + print(f"Error: {e}") + + + def consolidate(field: str, value: str) -> str: + # New entry + if name not in existing_content: + return value + # New field + if field not in existing_content[name]: + return value + # Unclear value + if existing_content[name][field] in default_values: + return value + # Existing value + return existing_content[name][field] + + existing_content[name] = { + "operator": consolidate("operator", operator), + "respect": consolidate("respect", default_value), + "function": consolidate("function", f"{category}"), + "frequency": consolidate("frequency", default_value), + "description": consolidate("description", f"{desc} More info can be found at https://darkvisitors.com/agents{agent['href']}") + } + +print(f"Total: {len(existing_content)}") Path("./robots.json").write_text(json.dumps(existing_content, indent=4)) \ No newline at end of file From 63c7e742c30c98a456cb665821e9e9ddfa681d25 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Tue, 6 Aug 2024 16:54:29 +0000 Subject: [PATCH 048/249] Daily update from Dark Visitors --- robots.json | 372 ++++++++++++++++++++++++++-------------------------- 1 file changed, 186 insertions(+), 186 deletions(-) diff --git a/robots.json b/robots.json index dba55e9..d5d400e 100644 --- a/robots.json +++ b/robots.json @@ -3,14 +3,14 @@ "operator": "Amazon", "respect": "Yes", "function": "Service improvement and enabling answers for Alexa users.", - "frequency": "No information. provided.", + "frequency": "Unclear at this time.", "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." }, "anthropic-ai": { "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", + "frequency": "Unclear at this time.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot-Extended": { @@ -45,14 +45,14 @@ "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", + "frequency": "Unclear at this time.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Claude-Web": { "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", + "frequency": "Unclear at this time.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "cohere-ai": { @@ -79,9 +79,9 @@ "facebookexternalhit": { "operator": "Meta/Facebook", "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", - "function": "No information.", + "function": "Fetchers", "frequency": "Unclear at this time.", - "description": "Unclear at this time." + "description": "facebookexternalhit is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebookexternalhit" }, "FriendlyCrawler": { "operator": "Unknown", @@ -94,14 +94,14 @@ "operator": "Google", "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "LLM training.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." }, "GoogleOther": { "operator": "Google", "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Image": { @@ -122,21 +122,21 @@ "operator": "[OpenAI](https://openai.com)", "respect": "Yes", "function": "Scrapes data to train OpenAI's products.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." }, "ICC-Crawler": { "operator": "[NICT](https://nict.go.jp)", "respect": "Yes", "function": "Scrapes data to train and support AI technologies.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, "ImagesiftBot": { "operator": "[ImageSift](https://imagesift.com)", "respect": "[Yes](https://imagesift.com/about)", "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." }, "img2dataset": { @@ -150,28 +150,28 @@ "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", "respect": "Yes.", "function": "Used to train models and improve products.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" }, "OAI-SearchBot": { "operator": "[OpenAI](https://openai.com)", "respect": "[Yes](https://platform.openai.com/docs/bots)", "function": "Search result generation.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Crawls sites to surface as results in SearchGPT." }, "omgili": { "operator": "[Webz.io](https://webz.io/)", "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", "function": "Data is sold.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." }, "omgilibot": { "operator": "[Webz.io](https://webz.io/)", "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)", "function": "Data is sold.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." }, "PerplexityBot": { @@ -185,35 +185,35 @@ "operator": "[Huawei](https://huawei.com/)", "respect": "Yes", "function": "Used to provide recommendations in Hauwei assistant and AI search services.", - "frequency": "No explicit frequency provided.", + "frequency": "Unclear at this time.", "description": "Operated by Huawei to provide search and AI assistant services." }, "Scrapy": { "operator": "[Zyte](https://www.zyte.com)", "respect": "Unclear at this time.", "function": "Scrapes data a variety of uses including training AI.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" }, "Timpibot": { "operator": "[Timpi](https://timpi.io)", "respect": "Unclear at this time.", "function": "Scrapes data for use in training LLMs.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Makes data available for training AI models." }, "VelenPublicWebCrawler": { "operator": "[Velen Crawler](https://velen.io)", "respect": "[Yes](https://velen.io)", "function": "Scrapes data for business data sets and machine learning models.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" }, "YouBot": { "operator": "[You](https://about.you.com/youchat/)", "respect": "[Yes](https://about.you.com/youbot/)", "function": "Scrapes data for search engine and LLMs.", - "frequency": "No information.", + "frequency": "Unclear at this time.", "description": "Retrieves data used for You.com web search engine and LLMs." }, "Meta-ExternalFetcher": { @@ -231,721 +231,721 @@ "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" }, "archive.org_bot": { - "operator": "Unclear at this time.", + "operator": "Internet Archive", "respect": "Unclear at this time.", "function": "Archivers", "frequency": "Unclear at this time.", "description": "archive.org_bot is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archive-org-bot" }, "Arquivo-web-crawler": { - "operator": "Unclear at this time.", + "operator": "Arquivo", "respect": "Unclear at this time.", "function": "Archivers", "frequency": "Unclear at this time.", "description": "Arquivo-web-crawler is an archiver operated by Arquivo.pt. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arquivo-web-crawler" }, "heritrix": { - "operator": "Unclear at this time.", + "operator": "Internet Archive", "respect": "Unclear at this time.", "function": "Archivers", "frequency": "Unclear at this time.", "description": "heritrix is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/heritrix" }, "ia_archiver": { - "operator": "Unclear at this time.", + "operator": "Internet Archive", "respect": "Unclear at this time.", "function": "Archivers", "frequency": "Unclear at this time.", "description": "ia_archiver is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver" }, "ia_archiver-web.archive.org": { - "operator": "Unclear at this time.", + "operator": "Internet Archive", "respect": "Unclear at this time.", "function": "Archivers", "frequency": "Unclear at this time.", "description": "ia_archiver-web.archive.org is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver-web-archive-org" }, "Nicecrawler": { - "operator": "Unclear at this time.", + "operator": "NiceCrawler", "respect": "Unclear at this time.", "function": "Archivers", "frequency": "Unclear at this time.", "description": "Nicecrawler is an archiver operated by NiceCrawler. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicecrawler" }, "2ip bot": { - "operator": "Unclear at this time.", + "operator": "2IP", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "2ip bot is a developer helper operated by 2IP. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-bot" }, "AhrefsSiteAudit": { - "operator": "Unclear at this time.", + "operator": "Ahrefs", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "AhrefsSiteAudit is a developer helper operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefssiteaudit" }, "BingPreview": { - "operator": "Unclear at this time.", + "operator": "Microsoft", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "BingPreview is a developer helper operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingpreview" }, "Chrome-Lighthouse": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "Chrome-Lighthouse is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/chrome-lighthouse" }, "Dark Visitor": { - "operator": "Unclear at this time.", + "operator": "Dark Visitors", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "Dark Visitor is a developer helper operated by Dark Visitors. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dark-visitor" }, "deadlinkchecker": { - "operator": "Unclear at this time.", + "operator": "Dead Link Checker", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "deadlinkchecker is a developer helper operated by Dead Link Checker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deadlinkchecker" }, "Google-InspectionTool": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "Google-InspectionTool is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-inspectiontool" }, "rogerbot": { - "operator": "Unclear at this time.", + "operator": "Moz", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "rogerbot is a developer helper operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rogerbot" }, "SiteAuditBot": { - "operator": "Unclear at this time.", + "operator": "Semrush", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "SiteAuditBot is a developer helper operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteauditbot" }, "t3versionsBot": { - "operator": "Unclear at this time.", + "operator": "T3Versions", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "t3versionsBot is a developer helper operated by T3Versions. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/t3versionsbot" }, "W3C_CSS_Validator": { - "operator": "Unclear at this time.", + "operator": "W3C", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "W3C_CSS_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-css-validator" }, "W3C_Validator": { - "operator": "Unclear at this time.", + "operator": "W3C", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "W3C_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-validator" }, "WellKnownBot": { - "operator": "Unclear at this time.", + "operator": "Well-Known", "respect": "Unclear at this time.", "function": "Developer Helpers", "frequency": "Unclear at this time.", "description": "WellKnownBot is a developer helper operated by Well-Known. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wellknownbot" }, "BazQux": { - "operator": "Unclear at this time.", + "operator": "BazQux", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "BazQux is a fetcher operated by BazQux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bazqux" }, "bitlybot": { - "operator": "Unclear at this time.", + "operator": "Bitly", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "bitlybot is a fetcher operated by Bitly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitlybot" }, "BublupBot": { - "operator": "Unclear at this time.", + "operator": "Bublup", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "BublupBot is a fetcher operated by Bublup. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bublupbot" }, "Discordbot": { - "operator": "Unclear at this time.", + "operator": "Discord", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Discordbot is a fetcher operated by Discord. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discordbot" }, "Embedly": { - "operator": "Unclear at this time.", + "operator": "Embedly", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Embedly is a fetcher operated by Embedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/embedly" }, "Feedly": { - "operator": "Unclear at this time.", + "operator": "Feedly", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Feedly is a fetcher operated by Feedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedly" }, "FlipboardProxy": { - "operator": "Unclear at this time.", + "operator": "Flipboard", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "FlipboardProxy is a fetcher operated by Flipboard. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flipboardproxy" }, "FreshRSS": { - "operator": "Unclear at this time.", + "operator": "FreshRSS", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "FreshRSS is a fetcher operated by FreshRSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshrss" }, "Friendica": { - "operator": "Unclear at this time.", + "operator": "Friendica", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Friendica is a fetcher operated by Friendica. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/friendica" }, "Google Web Preview": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Google Web Preview is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-web-preview" }, "Google-Read-Aloud": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Google-Read-Aloud is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-read-aloud" }, "Hatena": { - "operator": "Unclear at this time.", + "operator": "Hatena", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Hatena is a fetcher operated by Hatena. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hatena" }, "Iframely": { - "operator": "Unclear at this time.", + "operator": "Iframely", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Iframely is a fetcher operated by Iframely. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iframely" }, "inoreader": { - "operator": "Unclear at this time.", + "operator": "Inoreader", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "inoreader is a fetcher operated by Inoreader. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inoreader" }, "LinkedInBot": { - "operator": "Unclear at this time.", + "operator": "LinkedIn", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "LinkedInBot is a fetcher operated by LinkedIn. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkedinbot" }, "Mail.RU_Bot": { - "operator": "Unclear at this time.", + "operator": "VK", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Mail.RU_Bot is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mail-ru-bot" }, "Mastodon": { - "operator": "Unclear at this time.", + "operator": "Mastodon", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Mastodon is a fetcher operated by Mastodon. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mastodon" }, "Miniflux": { - "operator": "Unclear at this time.", + "operator": "Miniflux", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Miniflux is a fetcher operated by Miniflux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miniflux" }, "NewsBlur": { - "operator": "Unclear at this time.", + "operator": "NewsBlur", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "NewsBlur is a fetcher operated by NewsBlur. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsblur" }, "Nextcloud": { - "operator": "Unclear at this time.", + "operator": "Nextcloud", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Nextcloud is a fetcher operated by Nextcloud. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextcloud" }, "Pinterestbot": { - "operator": "Unclear at this time.", + "operator": "Pinterest", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Pinterestbot is a fetcher operated by Pinterest. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterestbot" }, "PocketParser": { - "operator": "Unclear at this time.", + "operator": "Pocket", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "PocketParser is a fetcher operated by Pocket. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pocketparser" }, "redditbot": { - "operator": "Unclear at this time.", + "operator": "Reddit", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "redditbot is a fetcher operated by Reddit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/redditbot" }, "SerendeputyBot": { - "operator": "Unclear at this time.", + "operator": "Serendeputy", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "SerendeputyBot is a fetcher operated by Serendeputy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serendeputybot" }, "SimplePie": { - "operator": "Unclear at this time.", + "operator": "SimplePie", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "SimplePie is a fetcher operated by SimplePie. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplepie" }, "SkypeUriPreview": { - "operator": "Unclear at this time.", + "operator": "Microsoft", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "SkypeUriPreview is a fetcher operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/skypeuripreview" }, "Slackbot-LinkExpanding": { - "operator": "Unclear at this time.", + "operator": "Slack", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Slackbot-LinkExpanding is a fetcher operated by Slack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot-linkexpanding" }, "Snap URL Preview Service": { - "operator": "Unclear at this time.", + "operator": "Snap", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Snap URL Preview Service is a fetcher operated by Snap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snap-url-preview-service" }, "snapchat": { - "operator": "Unclear at this time.", + "operator": "Snapchat", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "snapchat is a fetcher operated by Snapchat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snapchat" }, "startmebot": { - "operator": "Unclear at this time.", + "operator": "Start", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "startmebot is a fetcher operated by Start.me. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/startmebot" }, "Superfeedr": { - "operator": "Unclear at this time.", + "operator": "Superfeedr", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Superfeedr is a fetcher operated by Superfeedr. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superfeedr" }, "SurdotlyBot": { - "operator": "Unclear at this time.", + "operator": "Sur", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "SurdotlyBot is a fetcher operated by Sur.ly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surdotlybot" }, "Synapse": { - "operator": "Unclear at this time.", + "operator": "Matrix", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Synapse is a fetcher operated by Matrix. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synapse" }, "TelegramBot": { - "operator": "Unclear at this time.", + "operator": "Telegram", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "TelegramBot is a fetcher operated by Telegram. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telegrambot" }, "Tiny Tiny RSS": { - "operator": "Unclear at this time.", + "operator": "Tiny Tiny RSS", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Tiny Tiny RSS is a fetcher operated by Tiny Tiny RSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tiny-tiny-rss" }, "Twitterbot": { - "operator": "Unclear at this time.", + "operator": "X", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Twitterbot is a fetcher operated by X. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twitterbot" }, "Viber": { - "operator": "Unclear at this time.", + "operator": "Viber", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Viber is a fetcher operated by Viber. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/viber" }, "vkShare": { - "operator": "Unclear at this time.", + "operator": "VK", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "vkShare is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkshare" }, "WhatsApp": { - "operator": "Unclear at this time.", + "operator": "Meta", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "WhatsApp is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/whatsapp" }, "Yahoo Link Preview": { - "operator": "Unclear at this time.", + "operator": "Yahoo", "respect": "Unclear at this time.", "function": "Fetchers", "frequency": "Unclear at this time.", "description": "Yahoo Link Preview is a fetcher operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-link-preview" }, "adbeat_bot": { - "operator": "Unclear at this time.", + "operator": "Adbeat", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "adbeat_bot is an intelligence gatherer operated by Adbeat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adbeat-bot" }, "AdsBot-Google": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "AdsBot-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google" }, "AdsBot-Google-Mobile": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "AdsBot-Google-Mobile is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google-mobile" }, "aiHitBot": { - "operator": "Unclear at this time.", + "operator": "aiHit", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "aiHitBot is an intelligence gatherer operated by aiHit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aihitbot" }, "AndersPinkBot": { - "operator": "Unclear at this time.", + "operator": "Anders Pink", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "AndersPinkBot is an intelligence gatherer operated by Anders Pink. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anderspinkbot" }, "ArchiveBot": { - "operator": "Unclear at this time.", + "operator": "Wikimedia", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "ArchiveBot is an intelligence gatherer operated by Wikimedia. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archivebot" }, "AwarioBot": { - "operator": "Unclear at this time.", + "operator": "Awario", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "AwarioBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariobot" }, "AwarioSmartBot": { - "operator": "Unclear at this time.", + "operator": "Awario", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "AwarioSmartBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariosmartbot" }, "BitSightBot": { - "operator": "Unclear at this time.", + "operator": "Bitsight", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "BitSightBot is an intelligence gatherer operated by Bitsight. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitsightbot" }, "Blackboard": { - "operator": "Unclear at this time.", + "operator": "Anthology", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Blackboard is an intelligence gatherer operated by Anthology. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blackboard" }, "BrandVerity": { - "operator": "Unclear at this time.", + "operator": "BrandVerity", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "BrandVerity is an intelligence gatherer operated by BrandVerity. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandverity" }, "Cincraw": { - "operator": "Unclear at this time.", + "operator": "CINC", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Cincraw is an intelligence gatherer operated by CINC. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cincraw" }, "ev-crawler": { - "operator": "Unclear at this time.", + "operator": "Headline", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "ev-crawler is an intelligence gatherer operated by Headline. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ev-crawler" }, "Google-Safety": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Google-Safety is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-safety" }, "HubSpot": { - "operator": "Unclear at this time.", + "operator": "HubSpot", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "HubSpot is an intelligence gatherer operated by HubSpot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hubspot" }, "IonCrawl": { - "operator": "Unclear at this time.", + "operator": "IONOS", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "IonCrawl is an intelligence gatherer operated by IONOS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ioncrawl" }, "Jugendschutzprogramm-Crawler": { - "operator": "Unclear at this time.", + "operator": "JusProg", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Jugendschutzprogramm-Crawler is an intelligence gatherer operated by JusProg. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jugendschutzprogramm-crawler" }, "KStandBot": { - "operator": "Unclear at this time.", + "operator": "URL Classification", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "KStandBot is an intelligence gatherer operated by URL Classification. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kstandbot" }, "LightspeedSystemsCrawler": { - "operator": "Unclear at this time.", + "operator": "Lightspeed Systems", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "LightspeedSystemsCrawler is an intelligence gatherer operated by Lightspeed Systems. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lightspeedsystemscrawler" }, "linkfluence": { - "operator": "Unclear at this time.", + "operator": "Meltwater", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "linkfluence is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkfluence" }, "LinkWalker": { - "operator": "Unclear at this time.", + "operator": "Fortra", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "LinkWalker is an intelligence gatherer operated by Fortra. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkwalker" }, "magpie-crawler": { - "operator": "Unclear at this time.", + "operator": "Brandwatch", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "magpie-crawler is an intelligence gatherer operated by Brandwatch. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/magpie-crawler" }, "Mediapartners-Google": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Mediapartners-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediapartners-google" }, "Mediatoolkitbot": { - "operator": "Unclear at this time.", + "operator": "Determ", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Mediatoolkitbot is an intelligence gatherer operated by Determ. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediatoolkitbot" }, "MuckRack": { - "operator": "Unclear at this time.", + "operator": "Muck Rack", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "MuckRack is an intelligence gatherer operated by Muck Rack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/muckrack" }, "NetcraftSurveyAgent": { - "operator": "Unclear at this time.", + "operator": "Netcraft", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "NetcraftSurveyAgent is an intelligence gatherer operated by Netcraft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netcraftsurveyagent" }, "Netvibes": { - "operator": "Unclear at this time.", + "operator": "Netvibes", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Netvibes is an intelligence gatherer operated by Netvibes. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netvibes" }, "Pandalytics": { - "operator": "Unclear at this time.", + "operator": "Domainsbot", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Pandalytics is an intelligence gatherer operated by Domainsbot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pandalytics" }, "panscient.com": { - "operator": "Unclear at this time.", + "operator": "Panscient", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "panscient.com is an intelligence gatherer operated by Panscient. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/panscient-com" }, "proximic": { - "operator": "Unclear at this time.", + "operator": "Comscore", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "proximic is an intelligence gatherer operated by Comscore. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proximic" }, "scoop.it": { - "operator": "Unclear at this time.", + "operator": "Meltwater", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "scoop.it is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoop-it" }, "SeekportBot": { - "operator": "Unclear at this time.", + "operator": "Seekport", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "SeekportBot is an intelligence gatherer operated by Seekport. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekportbot" }, "SMTBot": { - "operator": "Unclear at this time.", + "operator": "SimilarTech", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "SMTBot is an intelligence gatherer operated by SimilarTech. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/smtbot" }, "trendictionbot": { - "operator": "Unclear at this time.", + "operator": "Trendiction", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "trendictionbot is an intelligence gatherer operated by Trendiction. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendictionbot" }, "TrendsmapResolver": { - "operator": "Unclear at this time.", + "operator": "Trendsmap", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "TrendsmapResolver is an intelligence gatherer operated by Trendsmap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendsmapresolver" }, "Turnitin": { - "operator": "Unclear at this time.", + "operator": "Turnitin", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Turnitin is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitin" }, "TurnitinBot": { - "operator": "Unclear at this time.", + "operator": "Turnitin", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "TurnitinBot is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitinbot" }, "TweetmemeBot": { - "operator": "Unclear at this time.", + "operator": "Meltwater", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "TweetmemeBot is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetmemebot" }, "Twingly": { - "operator": "Unclear at this time.", + "operator": "Twingly", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "Twingly is an intelligence gatherer operated by Twingly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twingly" }, "um-LN": { - "operator": "Unclear at this time.", + "operator": "Ubermetrics", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "um-LN is an intelligence gatherer operated by Ubermetrics. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ln" }, "virustotal": { - "operator": "Unclear at this time.", + "operator": "VirusTotal", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "virustotal is an intelligence gatherer operated by VirusTotal. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/virustotal" }, "ZoominfoBot": { - "operator": "Unclear at this time.", + "operator": "ZoomInfo", "respect": "Unclear at this time.", "function": "Intelligence Gatherers", "frequency": "Unclear at this time.", "description": "ZoominfoBot is an intelligence gatherer operated by ZoomInfo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoominfobot" }, "008": { - "operator": "Unclear at this time.", + "operator": "80legs", "respect": "Unclear at this time.", "function": "Scrapers", "frequency": "Unclear at this time.", "description": "008 is a scraper operated by 80legs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/008" }, "Dataprovider.com": { - "operator": "Unclear at this time.", + "operator": "Dataprovider", "respect": "Unclear at this time.", "function": "Scrapers", "frequency": "Unclear at this time.", @@ -994,441 +994,441 @@ "description": "Nutch is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nutch" }, "Offline Explorer": { - "operator": "Unclear at this time.", + "operator": "MetaProducts", "respect": "Unclear at this time.", "function": "Scrapers", "frequency": "Unclear at this time.", "description": "Offline Explorer is a scraper operated by MetaProducts. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/offline-explorer" }, "OpenindexSpider": { - "operator": "Unclear at this time.", + "operator": "Openindex", "respect": "Unclear at this time.", "function": "Scrapers", "frequency": "Unclear at this time.", "description": "OpenindexSpider is a scraper operated by Openindex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openindexspider" }, "360Spider": { - "operator": "Unclear at this time.", + "operator": "Qihoo 360", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "360Spider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider" }, "AlexandriaOrgBot": { - "operator": "Unclear at this time.", + "operator": "Alexandria", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "AlexandriaOrgBot is a search engine crawler operated by Alexandria.org. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexandriaorgbot" }, "Atom Feed Robot": { - "operator": "Unclear at this time.", + "operator": "RSSMicro", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Atom Feed Robot is a search engine crawler operated by RSSMicro. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/atom-feed-robot" }, "Baiduspider": { - "operator": "Unclear at this time.", + "operator": "Baidu", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Baiduspider is a search engine crawler operated by Baidu. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider" }, "bingbot": { - "operator": "Unclear at this time.", + "operator": "Microsoft", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "bingbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingbot" }, "coccocbot-web": { - "operator": "Unclear at this time.", + "operator": "Coc Coc", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "coccocbot-web is a search engine crawler operated by Coc Coc. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot-web" }, "Daum": { - "operator": "Unclear at this time.", + "operator": "Daum", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Daum is a search engine crawler operated by Daum. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daum" }, "DuckDuckBot": { - "operator": "Unclear at this time.", + "operator": "DuckDuckGo", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "DuckDuckBot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckbot" }, "DuckDuckGo-Favicons-Bot": { - "operator": "Unclear at this time.", + "operator": "DuckDuckGo", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "DuckDuckGo-Favicons-Bot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckgo-favicons-bot" }, "Feedfetcher-Google": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Feedfetcher-Google is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedfetcher-google" }, "Google Favicon": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Google Favicon is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-favicon" }, "Googlebot": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Googlebot is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot" }, "Googlebot-Image": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Googlebot-Image is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-image" }, "Googlebot-Mobile": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Googlebot-Mobile is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-mobile" }, "Googlebot-News": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Googlebot-News is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-news" }, "Googlebot-Video": { - "operator": "Unclear at this time.", + "operator": "Google", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Googlebot-Video is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-video" }, "HaoSouSpider": { - "operator": "Unclear at this time.", + "operator": "Qihoo 360", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "HaoSouSpider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/haosouspider" }, "MojeekBot": { - "operator": "Unclear at this time.", + "operator": "Mojeek", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "MojeekBot is a search engine crawler operated by Mojeek. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeekbot" }, "msnbot": { - "operator": "Unclear at this time.", + "operator": "Microsoft", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "msnbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot" }, "msnbot-media": { - "operator": "Unclear at this time.", + "operator": "Microsoft", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "msnbot-media is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot-media" }, "Qwantify": { - "operator": "Unclear at this time.", + "operator": "Qwant", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Qwantify is a search engine crawler operated by Qwant. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwantify" }, "SemanticScholarBot": { - "operator": "Unclear at this time.", + "operator": "AI2", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "SemanticScholarBot is a search engine crawler operated by AI2. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticscholarbot" }, "SeznamBot": { - "operator": "Unclear at this time.", + "operator": "Senzam", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "SeznamBot is a search engine crawler operated by Senzam. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seznambot" }, "Sogou web spider": { - "operator": "Unclear at this time.", + "operator": "Sogou", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Sogou web spider is a search engine crawler operated by Sogou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-web-spider" }, "teoma": { - "operator": "Unclear at this time.", + "operator": "Ask", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "teoma is a search engine crawler operated by Ask. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teoma" }, "TinEye": { - "operator": "Unclear at this time.", + "operator": "TinEye", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "TinEye is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye" }, "TinEye-bot": { - "operator": "Unclear at this time.", + "operator": "TinEye", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "TinEye-bot is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye-bot" }, "yacybot": { - "operator": "Unclear at this time.", + "operator": "YaCy", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "yacybot is a search engine crawler operated by YaCy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yacybot" }, "Yahoo! Slurp": { - "operator": "Unclear at this time.", + "operator": "Yahoo", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Yahoo! Slurp is a search engine crawler operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-slurp" }, "Yandex": { - "operator": "Unclear at this time.", + "operator": "Yandex", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Yandex is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandex" }, "YandexBot": { - "operator": "Unclear at this time.", + "operator": "Yandex", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "YandexBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexbot" }, "YandexImages": { - "operator": "Unclear at this time.", + "operator": "Yandex", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "YandexImages is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandeximages" }, "YandexRenderResourcesBot": { - "operator": "Unclear at this time.", + "operator": "Yandex", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "YandexRenderResourcesBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexrenderresourcesbot" }, "Yeti": { - "operator": "Unclear at this time.", + "operator": "Naver", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "Yeti is a search engine crawler operated by Naver. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yeti" }, "YisouSpider": { - "operator": "Unclear at this time.", + "operator": "Yisou", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "YisouSpider is a search engine crawler operated by Yisou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yisouspider" }, "ZumBot": { - "operator": "Unclear at this time.", + "operator": "ZUM Internet", "respect": "Unclear at this time.", "function": "Search Engine Crawlers", "frequency": "Unclear at this time.", "description": "ZumBot is a search engine crawler operated by ZUM Internet. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zumbot" }, "AhrefsBot": { - "operator": "Unclear at this time.", + "operator": "Ahrefs", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "AhrefsBot is an SEO crawler operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefsbot" }, "Barkrowler": { - "operator": "Unclear at this time.", + "operator": "Babbar", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "Barkrowler is an SEO crawler operated by Babbar. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/barkrowler" }, "BLEXBot": { - "operator": "Unclear at this time.", + "operator": "SEO PowerSuite", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "BLEXBot is an SEO crawler operated by SEO PowerSuite. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blexbot" }, "BrightEdge Crawler": { - "operator": "Unclear at this time.", + "operator": "BrightEdge", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "BrightEdge Crawler is an SEO crawler operated by BrightEdge. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brightedge-crawler" }, "Cocolyzebot": { - "operator": "Unclear at this time.", + "operator": "Cocolyze", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "Cocolyzebot is an SEO crawler operated by Cocolyze. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cocolyzebot" }, "DataForSeoBot": { - "operator": "Unclear at this time.", + "operator": "DataForSEO", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "DataForSeoBot is an SEO crawler operated by DataForSEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataforseobot" }, "DomainStatsBot": { - "operator": "Unclear at this time.", + "operator": "Domainstats", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "DomainStatsBot is an SEO crawler operated by Domainstats. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domainstatsbot" }, "dotbot": { - "operator": "Unclear at this time.", + "operator": "Moz", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "dotbot is an SEO crawler operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dotbot" }, "hypestat": { - "operator": "Unclear at this time.", + "operator": "HypeStat", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "hypestat is an SEO crawler operated by HypeStat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hypestat" }, "linkdexbot": { - "operator": "Unclear at this time.", + "operator": "Linkdex", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "linkdexbot is an SEO crawler operated by Linkdex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdexbot" }, "MJ12bot": { - "operator": "Unclear at this time.", + "operator": "Majestic", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "MJ12bot is an SEO crawler operated by Majestic. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mj12bot" }, "online-webceo-bot": { - "operator": "Unclear at this time.", + "operator": "WebCEO", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "online-webceo-bot is an SEO crawler operated by WebCEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/online-webceo-bot" }, "Screaming Frog SEO Spider": { - "operator": "Unclear at this time.", + "operator": "Screaming Frog", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "Screaming Frog SEO Spider is an SEO crawler operated by Screaming Frog. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/screaming-frog-seo-spider" }, "SemrushBot": { - "operator": "Unclear at this time.", + "operator": "Semrush", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SemrushBot is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot" }, "SemrushBot-BA": { - "operator": "Unclear at this time.", + "operator": "Semrush", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SemrushBot-BA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ba" }, "SemrushBot-CT": { - "operator": "Unclear at this time.", + "operator": "Semrush", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SemrushBot-CT is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ct" }, "SemrushBot-SI": { - "operator": "Unclear at this time.", + "operator": "Semrush", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SemrushBot-SI is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-si" }, "SemrushBot-SWA": { - "operator": "Unclear at this time.", + "operator": "Semrush", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SemrushBot-SWA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-swa" }, "SenutoBot": { - "operator": "Unclear at this time.", + "operator": "Senuto", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SenutoBot is an SEO crawler operated by Senuto. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/senutobot" }, "SeobilityBot": { - "operator": "Unclear at this time.", + "operator": "Seobility", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SeobilityBot is an SEO crawler operated by Seobility. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seobilitybot" }, "SEOkicks": { - "operator": "Unclear at this time.", + "operator": "SEOkicks", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SEOkicks is an SEO crawler operated by SEOkicks. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks" }, "SEOlizer": { - "operator": "Unclear at this time.", + "operator": "SEOLizer", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SEOlizer is an SEO crawler operated by SEOLizer. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seolizer" }, "serpstatbot": { - "operator": "Unclear at this time.", + "operator": "Serpstat", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "serpstatbot is an SEO crawler operated by Serpstat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serpstatbot" }, "SiteCheckerBotCrawler": { - "operator": "Unclear at this time.", + "operator": "Sitechecker", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", "description": "SiteCheckerBotCrawler is an SEO crawler operated by Sitechecker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheckerbotcrawler" }, "ZoomBot": { - "operator": "Unclear at this time.", + "operator": "SEOZoom", "respect": "Unclear at this time.", "function": "SEO Crawlers", "frequency": "Unclear at this time.", From 8c6482fb45c37a09706adf6b88e21e2bd36fcac7 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 18:12:41 +0100 Subject: [PATCH 049/249] restore the cron --- .github/workflows/daily_update.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 28f777f..1e36f7b 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -1,7 +1,7 @@ name: Daily Update from Dark Visitors on: schedule: - - cron: "*/10 * * * *" + - cron: "0 0 * * *" jobs: dark-visitors: From 2a3685385c783f6edec0025a5ced20090200415d Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 19:33:49 +0100 Subject: [PATCH 050/249] restrict scope --- code/dark_visitors.py | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 2a84d58..3b9775b 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -13,14 +13,14 @@ to_include = [ "AI Assistants", "AI Data Scrapers", "AI Search Crawlers", - "Archivers", - "Developer Helpers", - "Fetchers", - "Intelligence Gatherers", - "Scrapers", - "Search Engine Crawlers", - "SEO Crawlers", - "Uncategorized", + # "Archivers", + # "Developer Helpers", + # "Fetchers", + # "Intelligence Gatherers", + # "Scrapers", + # "Search Engine Crawlers", + # "SEO Crawlers", + # "Uncategorized", "Undocumented AI Agents" ] @@ -29,8 +29,7 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): for agent in section.find_all("a", href=True): name = agent.find("div", {"class": "agent-name"}).get_text().strip() desc = agent.find("p").get_text().strip() - - # TODO: there seems to be a typo? + default_values = { "Unclear at this time.", "No information. provided.", @@ -39,6 +38,7 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): } default_value = "Unclear at this time." + # Parse the operator information from the description if possible operator = default_value if "operated by " in desc: try: @@ -46,7 +46,6 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): except Exception as e: print(f"Error: {e}") - def consolidate(field: str, value: str) -> str: # New entry if name not in existing_content: @@ -55,7 +54,7 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): if field not in existing_content[name]: return value # Unclear value - if existing_content[name][field] in default_values: + if existing_content[name][field] in default_values and value not in default_values: return value # Existing value return existing_content[name][field] From 379c339f978f1578d5b5088a4cade81ea880b8a3 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 19:40:01 +0100 Subject: [PATCH 051/249] skip push if no change --- .github/workflows/daily_update.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 1e36f7b..4cc04e0 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -16,7 +16,7 @@ jobs: git config --global user.name "dark-visitors" git config --global user.email "dark-visitors@users.noreply.github.com" python code/dark_visitors.py + git add -A - git commit -m "Daily update from Dark Visitors" - git push + git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) shell: bash \ No newline at end of file From 0b6eba8dd561dedad5c133245012af8431e7c664 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 19:40:43 +0100 Subject: [PATCH 052/249] skip push if no change --- .github/workflows/daily_update.yml | 1 - 1 file changed, 1 deletion(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 4cc04e0..1901520 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -16,7 +16,6 @@ jobs: git config --global user.name "dark-visitors" git config --global user.email "dark-visitors@users.noreply.github.com" python code/dark_visitors.py - git add -A git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) shell: bash \ No newline at end of file From 4cf82b703fb212a6722efbaa1d8213da302a17cd Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Tue, 6 Aug 2024 19:50:38 +0100 Subject: [PATCH 053/249] restore original robots.json --- robots.json | 5480 +-------------------------------------------------- 1 file changed, 59 insertions(+), 5421 deletions(-) diff --git a/robots.json b/robots.json index d5d400e..d550d50 100644 --- a/robots.json +++ b/robots.json @@ -3,18 +3,18 @@ "operator": "Amazon", "respect": "Yes", "function": "Service improvement and enabling answers for Alexa users.", - "frequency": "Unclear at this time.", + "frequency": "No information. provided.", "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." }, "anthropic-ai": { - "operator": "[Anthropic](https://www.anthropic.com)", + "operator": "[Anthropic](https:\/\/www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "Unclear at this time.", + "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot-Extended": { - "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", + "operator": "[Apple](https:\/\/support.apple.com\/en-us\/119829#datausage)", "respect": "Yes", "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", "frequency": "Unclear at this time.", @@ -28,5554 +28,192 @@ "description": "Downloads data to train LLMS, including ChatGPT competitors." }, "CCBot": { - "operator": "[Common Crawl](https://commoncrawl.org)", - "respect": "[Yes](https://commoncrawl.org/ccbot)", + "operator": "[Common Crawl](https:\/\/commoncrawl.org)", + "respect": "[Yes](https:\/\/commoncrawl.org\/ccbot)", "function": "Provides crawl data for an open source repository that has been used to train LLMs.", "frequency": "Unclear at this time.", "description": "Sources data that is made openly available and is used to train AI models." }, "ChatGPT-User": { - "operator": "[OpenAI](https://openai.com)", + "operator": "[OpenAI](https:\/\/openai.com)", "respect": "Yes", "function": "Takes action based on user prompts.", "frequency": "Only when prompted by a user.", "description": "Used by plugins in ChatGPT to answer queries based on user input." }, "ClaudeBot": { - "operator": "[Anthropic](https://www.anthropic.com)", + "operator": "[Anthropic](https:\/\/www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "Unclear at this time.", + "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Claude-Web": { - "operator": "[Anthropic](https://www.anthropic.com)", + "operator": "[Anthropic](https:\/\/www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "Unclear at this time.", + "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "cohere-ai": { - "operator": "[Cohere](https://cohere.com)", + "operator": "[Cohere](https:\/\/cohere.com)", "respect": "Unclear at this time.", "function": "Retrieves data to provide responses to user-initiated prompts.", "frequency": "Takes action based on user prompts.", "description": "Retrieves data based on user prompts." }, "Diffbot": { - "operator": "[Diffbot](https://www.diffbot.com/)", + "operator": "[Diffbot](https:\/\/www.diffbot.com\/)", "respect": "At the discretion of Diffbot users.", "function": "Aggregates structured web data for monitoring and AI model training.", "frequency": "Unclear at this time.", "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." }, "FacebookBot": { - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", + "operator": "Meta\/Facebook", + "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", "function": "Training language models", "frequency": "Up to 1 page per second", "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." }, "facebookexternalhit": { - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", - "function": "Fetchers", + "operator": "Meta\/Facebook", + "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "function": "No information.", "frequency": "Unclear at this time.", - "description": "facebookexternalhit is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebookexternalhit" + "description": "Unclear at this time." }, "FriendlyCrawler": { "operator": "Unknown", - "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)", + "respect": "[Yes](https:\/\/imho.alex-kunz.com\/2024\/01\/25\/an-update-on-friendly-crawler)", "function": "We are using the data from the crawler to build datasets for machine learning experiments.", "frequency": "Unclear at this time.", "description": "Unclear who the operator is; but data is used for training/machine learning." }, "Google-Extended": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "LLM training.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." }, "GoogleOther": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "Scrapes data.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Image": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Video": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GPTBot": { - "operator": "[OpenAI](https://openai.com)", + "operator": "[OpenAI](https:\/\/openai.com)", "respect": "Yes", "function": "Scrapes data to train OpenAI's products.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." }, "ICC-Crawler": { - "operator": "[NICT](https://nict.go.jp)", + "operator": "[NICT](https:\/\/nict.go.jp)", "respect": "Yes", "function": "Scrapes data to train and support AI technologies.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, "ImagesiftBot": { - "operator": "[ImageSift](https://imagesift.com)", - "respect": "[Yes](https://imagesift.com/about)", + "operator": "[ImageSift](https:\/\/imagesift.com)", + "respect": "[Yes](https:\/\/imagesift.com\/about)", "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." }, "img2dataset": { - "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", + "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", "respect": "Unclear at this time.", "function": "Scrapes images for use in LLMs.", "frequency": "At the discretion of img2dataset users.", "description": "Downloads large sets of images into datasets for LLM training or other purposes." }, "Meta-ExternalAgent": { - "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", + "operator": "[Meta](https:\/\/developers.facebook.com\/docs\/sharing\/webmasters\/web-crawlers)", "respect": "Yes.", "function": "Used to train models and improve products.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" }, "OAI-SearchBot": { - "operator": "[OpenAI](https://openai.com)", - "respect": "[Yes](https://platform.openai.com/docs/bots)", + "operator": "[OpenAI](https:\/\/openai.com)", + "respect": "[Yes](https:\/\/platform.openai.com\/docs\/bots)", "function": "Search result generation.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Crawls sites to surface as results in SearchGPT." }, "omgili": { - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", + "operator": "[Webz.io](https:\/\/webz.io\/)", + "respect": "[Yes](https:\/\/webz.io\/blog\/web-data\/what-is-the-omgili-bot-and-why-is-it-crawling-your-website\/)", "function": "Data is sold.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." }, "omgilibot": { - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)", + "operator": "[Webz.io](https:\/\/webz.io\/)", + "respect": "[Yes](https:\/\/web.archive.org\/web\/20170704003301\/http:\/\/omgili.com\/Crawler.html)", "function": "Data is sold.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." }, "PerplexityBot": { - "operator": "[Perplexity](https://www.perplexity.ai/)", - "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", + "operator": "[Perplexity](https:\/\/www.perplexity.ai\/)", + "respect": "[No](https:\/\/www.macstories.net\/stories\/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler\/)", "function": "Used to answer queries at the request of users.", "frequency": "Takes action based on user prompts.", "description": "Operated by Perplexity to obtain results in response to user queries." }, "PetalBot": { - "operator": "[Huawei](https://huawei.com/)", + "operator": "[Huawei](https:\/\/huawei.com\/)", "respect": "Yes", "function": "Used to provide recommendations in Hauwei assistant and AI search services.", - "frequency": "Unclear at this time.", + "frequency": "No explicit frequency provided.", "description": "Operated by Huawei to provide search and AI assistant services." }, "Scrapy": { - "operator": "[Zyte](https://www.zyte.com)", + "operator": "[Zyte](https:\/\/www.zyte.com)", "respect": "Unclear at this time.", "function": "Scrapes data a variety of uses including training AI.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" }, "Timpibot": { - "operator": "[Timpi](https://timpi.io)", + "operator": "[Timpi](https:\/\/timpi.io)", "respect": "Unclear at this time.", "function": "Scrapes data for use in training LLMs.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Makes data available for training AI models." }, "VelenPublicWebCrawler": { - "operator": "[Velen Crawler](https://velen.io)", - "respect": "[Yes](https://velen.io)", + "operator": "[Velen Crawler](https:\/\/velen.io)", + "respect": "[Yes](https:\/\/velen.io)", "function": "Scrapes data for business data sets and machine learning models.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" }, "YouBot": { - "operator": "[You](https://about.you.com/youchat/)", - "respect": "[Yes](https://about.you.com/youbot/)", + "operator": "[You](https:\/\/about.you.com\/youchat\/)", + "respect": "[Yes](https:\/\/about.you.com\/youbot\/)", "function": "Scrapes data for search engine and LLMs.", - "frequency": "Unclear at this time.", + "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." - }, - "Meta-ExternalFetcher": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "AI Assistants", - "frequency": "Unclear at this time.", - "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" - }, - "Applebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "AI Search Crawlers", - "frequency": "Unclear at this time.", - "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" - }, - "archive.org_bot": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "archive.org_bot is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archive-org-bot" - }, - "Arquivo-web-crawler": { - "operator": "Arquivo", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "Arquivo-web-crawler is an archiver operated by Arquivo.pt. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arquivo-web-crawler" - }, - "heritrix": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "heritrix is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/heritrix" - }, - "ia_archiver": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "ia_archiver is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver" - }, - "ia_archiver-web.archive.org": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "ia_archiver-web.archive.org is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver-web-archive-org" - }, - "Nicecrawler": { - "operator": "NiceCrawler", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "Nicecrawler is an archiver operated by NiceCrawler. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicecrawler" - }, - "2ip bot": { - "operator": "2IP", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "2ip bot is a developer helper operated by 2IP. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-bot" - }, - "AhrefsSiteAudit": { - "operator": "Ahrefs", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "AhrefsSiteAudit is a developer helper operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefssiteaudit" - }, - "BingPreview": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "BingPreview is a developer helper operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingpreview" - }, - "Chrome-Lighthouse": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "Chrome-Lighthouse is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/chrome-lighthouse" - }, - "Dark Visitor": { - "operator": "Dark Visitors", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "Dark Visitor is a developer helper operated by Dark Visitors. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dark-visitor" - }, - "deadlinkchecker": { - "operator": "Dead Link Checker", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "deadlinkchecker is a developer helper operated by Dead Link Checker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deadlinkchecker" - }, - "Google-InspectionTool": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "Google-InspectionTool is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-inspectiontool" - }, - "rogerbot": { - "operator": "Moz", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "rogerbot is a developer helper operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rogerbot" - }, - "SiteAuditBot": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "SiteAuditBot is a developer helper operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteauditbot" - }, - "t3versionsBot": { - "operator": "T3Versions", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "t3versionsBot is a developer helper operated by T3Versions. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/t3versionsbot" - }, - "W3C_CSS_Validator": { - "operator": "W3C", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "W3C_CSS_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-css-validator" - }, - "W3C_Validator": { - "operator": "W3C", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "W3C_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-validator" - }, - "WellKnownBot": { - "operator": "Well-Known", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "WellKnownBot is a developer helper operated by Well-Known. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wellknownbot" - }, - "BazQux": { - "operator": "BazQux", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "BazQux is a fetcher operated by BazQux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bazqux" - }, - "bitlybot": { - "operator": "Bitly", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "bitlybot is a fetcher operated by Bitly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitlybot" - }, - "BublupBot": { - "operator": "Bublup", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "BublupBot is a fetcher operated by Bublup. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bublupbot" - }, - "Discordbot": { - "operator": "Discord", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Discordbot is a fetcher operated by Discord. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discordbot" - }, - "Embedly": { - "operator": "Embedly", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Embedly is a fetcher operated by Embedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/embedly" - }, - "Feedly": { - "operator": "Feedly", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Feedly is a fetcher operated by Feedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedly" - }, - "FlipboardProxy": { - "operator": "Flipboard", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "FlipboardProxy is a fetcher operated by Flipboard. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flipboardproxy" - }, - "FreshRSS": { - "operator": "FreshRSS", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "FreshRSS is a fetcher operated by FreshRSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshrss" - }, - "Friendica": { - "operator": "Friendica", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Friendica is a fetcher operated by Friendica. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/friendica" - }, - "Google Web Preview": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Google Web Preview is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-web-preview" - }, - "Google-Read-Aloud": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Google-Read-Aloud is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-read-aloud" - }, - "Hatena": { - "operator": "Hatena", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Hatena is a fetcher operated by Hatena. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hatena" - }, - "Iframely": { - "operator": "Iframely", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Iframely is a fetcher operated by Iframely. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iframely" - }, - "inoreader": { - "operator": "Inoreader", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "inoreader is a fetcher operated by Inoreader. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inoreader" - }, - "LinkedInBot": { - "operator": "LinkedIn", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "LinkedInBot is a fetcher operated by LinkedIn. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkedinbot" - }, - "Mail.RU_Bot": { - "operator": "VK", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Mail.RU_Bot is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mail-ru-bot" - }, - "Mastodon": { - "operator": "Mastodon", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Mastodon is a fetcher operated by Mastodon. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mastodon" - }, - "Miniflux": { - "operator": "Miniflux", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Miniflux is a fetcher operated by Miniflux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miniflux" - }, - "NewsBlur": { - "operator": "NewsBlur", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "NewsBlur is a fetcher operated by NewsBlur. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsblur" - }, - "Nextcloud": { - "operator": "Nextcloud", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Nextcloud is a fetcher operated by Nextcloud. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextcloud" - }, - "Pinterestbot": { - "operator": "Pinterest", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Pinterestbot is a fetcher operated by Pinterest. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterestbot" - }, - "PocketParser": { - "operator": "Pocket", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "PocketParser is a fetcher operated by Pocket. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pocketparser" - }, - "redditbot": { - "operator": "Reddit", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "redditbot is a fetcher operated by Reddit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/redditbot" - }, - "SerendeputyBot": { - "operator": "Serendeputy", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SerendeputyBot is a fetcher operated by Serendeputy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serendeputybot" - }, - "SimplePie": { - "operator": "SimplePie", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SimplePie is a fetcher operated by SimplePie. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplepie" - }, - "SkypeUriPreview": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SkypeUriPreview is a fetcher operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/skypeuripreview" - }, - "Slackbot-LinkExpanding": { - "operator": "Slack", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Slackbot-LinkExpanding is a fetcher operated by Slack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot-linkexpanding" - }, - "Snap URL Preview Service": { - "operator": "Snap", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Snap URL Preview Service is a fetcher operated by Snap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snap-url-preview-service" - }, - "snapchat": { - "operator": "Snapchat", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "snapchat is a fetcher operated by Snapchat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snapchat" - }, - "startmebot": { - "operator": "Start", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "startmebot is a fetcher operated by Start.me. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/startmebot" - }, - "Superfeedr": { - "operator": "Superfeedr", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Superfeedr is a fetcher operated by Superfeedr. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superfeedr" - }, - "SurdotlyBot": { - "operator": "Sur", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SurdotlyBot is a fetcher operated by Sur.ly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surdotlybot" - }, - "Synapse": { - "operator": "Matrix", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Synapse is a fetcher operated by Matrix. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synapse" - }, - "TelegramBot": { - "operator": "Telegram", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "TelegramBot is a fetcher operated by Telegram. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telegrambot" - }, - "Tiny Tiny RSS": { - "operator": "Tiny Tiny RSS", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Tiny Tiny RSS is a fetcher operated by Tiny Tiny RSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tiny-tiny-rss" - }, - "Twitterbot": { - "operator": "X", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Twitterbot is a fetcher operated by X. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twitterbot" - }, - "Viber": { - "operator": "Viber", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Viber is a fetcher operated by Viber. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/viber" - }, - "vkShare": { - "operator": "VK", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "vkShare is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkshare" - }, - "WhatsApp": { - "operator": "Meta", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "WhatsApp is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/whatsapp" - }, - "Yahoo Link Preview": { - "operator": "Yahoo", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Yahoo Link Preview is a fetcher operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-link-preview" - }, - "adbeat_bot": { - "operator": "Adbeat", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "adbeat_bot is an intelligence gatherer operated by Adbeat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adbeat-bot" - }, - "AdsBot-Google": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AdsBot-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google" - }, - "AdsBot-Google-Mobile": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AdsBot-Google-Mobile is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google-mobile" - }, - "aiHitBot": { - "operator": "aiHit", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "aiHitBot is an intelligence gatherer operated by aiHit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aihitbot" - }, - "AndersPinkBot": { - "operator": "Anders Pink", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AndersPinkBot is an intelligence gatherer operated by Anders Pink. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anderspinkbot" - }, - "ArchiveBot": { - "operator": "Wikimedia", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "ArchiveBot is an intelligence gatherer operated by Wikimedia. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archivebot" - }, - "AwarioBot": { - "operator": "Awario", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AwarioBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariobot" - }, - "AwarioSmartBot": { - "operator": "Awario", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AwarioSmartBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariosmartbot" - }, - "BitSightBot": { - "operator": "Bitsight", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "BitSightBot is an intelligence gatherer operated by Bitsight. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitsightbot" - }, - "Blackboard": { - "operator": "Anthology", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Blackboard is an intelligence gatherer operated by Anthology. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blackboard" - }, - "BrandVerity": { - "operator": "BrandVerity", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "BrandVerity is an intelligence gatherer operated by BrandVerity. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandverity" - }, - "Cincraw": { - "operator": "CINC", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Cincraw is an intelligence gatherer operated by CINC. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cincraw" - }, - "ev-crawler": { - "operator": "Headline", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "ev-crawler is an intelligence gatherer operated by Headline. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ev-crawler" - }, - "Google-Safety": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Google-Safety is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-safety" - }, - "HubSpot": { - "operator": "HubSpot", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "HubSpot is an intelligence gatherer operated by HubSpot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hubspot" - }, - "IonCrawl": { - "operator": "IONOS", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "IonCrawl is an intelligence gatherer operated by IONOS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ioncrawl" - }, - "Jugendschutzprogramm-Crawler": { - "operator": "JusProg", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Jugendschutzprogramm-Crawler is an intelligence gatherer operated by JusProg. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jugendschutzprogramm-crawler" - }, - "KStandBot": { - "operator": "URL Classification", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "KStandBot is an intelligence gatherer operated by URL Classification. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kstandbot" - }, - "LightspeedSystemsCrawler": { - "operator": "Lightspeed Systems", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "LightspeedSystemsCrawler is an intelligence gatherer operated by Lightspeed Systems. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lightspeedsystemscrawler" - }, - "linkfluence": { - "operator": "Meltwater", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "linkfluence is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkfluence" - }, - "LinkWalker": { - "operator": "Fortra", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "LinkWalker is an intelligence gatherer operated by Fortra. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkwalker" - }, - "magpie-crawler": { - "operator": "Brandwatch", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "magpie-crawler is an intelligence gatherer operated by Brandwatch. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/magpie-crawler" - }, - "Mediapartners-Google": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Mediapartners-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediapartners-google" - }, - "Mediatoolkitbot": { - "operator": "Determ", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Mediatoolkitbot is an intelligence gatherer operated by Determ. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediatoolkitbot" - }, - "MuckRack": { - "operator": "Muck Rack", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "MuckRack is an intelligence gatherer operated by Muck Rack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/muckrack" - }, - "NetcraftSurveyAgent": { - "operator": "Netcraft", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "NetcraftSurveyAgent is an intelligence gatherer operated by Netcraft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netcraftsurveyagent" - }, - "Netvibes": { - "operator": "Netvibes", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Netvibes is an intelligence gatherer operated by Netvibes. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netvibes" - }, - "Pandalytics": { - "operator": "Domainsbot", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Pandalytics is an intelligence gatherer operated by Domainsbot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pandalytics" - }, - "panscient.com": { - "operator": "Panscient", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "panscient.com is an intelligence gatherer operated by Panscient. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/panscient-com" - }, - "proximic": { - "operator": "Comscore", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "proximic is an intelligence gatherer operated by Comscore. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proximic" - }, - "scoop.it": { - "operator": "Meltwater", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "scoop.it is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoop-it" - }, - "SeekportBot": { - "operator": "Seekport", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "SeekportBot is an intelligence gatherer operated by Seekport. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekportbot" - }, - "SMTBot": { - "operator": "SimilarTech", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "SMTBot is an intelligence gatherer operated by SimilarTech. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/smtbot" - }, - "trendictionbot": { - "operator": "Trendiction", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "trendictionbot is an intelligence gatherer operated by Trendiction. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendictionbot" - }, - "TrendsmapResolver": { - "operator": "Trendsmap", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "TrendsmapResolver is an intelligence gatherer operated by Trendsmap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendsmapresolver" - }, - "Turnitin": { - "operator": "Turnitin", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Turnitin is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitin" - }, - "TurnitinBot": { - "operator": "Turnitin", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "TurnitinBot is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitinbot" - }, - "TweetmemeBot": { - "operator": "Meltwater", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "TweetmemeBot is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetmemebot" - }, - "Twingly": { - "operator": "Twingly", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Twingly is an intelligence gatherer operated by Twingly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twingly" - }, - "um-LN": { - "operator": "Ubermetrics", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "um-LN is an intelligence gatherer operated by Ubermetrics. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ln" - }, - "virustotal": { - "operator": "VirusTotal", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "virustotal is an intelligence gatherer operated by VirusTotal. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/virustotal" - }, - "ZoominfoBot": { - "operator": "ZoomInfo", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "ZoominfoBot is an intelligence gatherer operated by ZoomInfo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoominfobot" - }, - "008": { - "operator": "80legs", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "008 is a scraper operated by 80legs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/008" - }, - "Dataprovider.com": { - "operator": "Dataprovider", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "Dataprovider.com is a scraper operated by Dataprovider.com. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataprovider-com" - }, - "dcrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "dcrawl is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dcrawl" - }, - "HTTrack": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "HTTrack is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack" - }, - "HTTrack 3.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "HTTrack 3.0 is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack-3-0" - }, - "MetaInspector": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "MetaInspector is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metainspector" - }, - "newspaper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "newspaper is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newspaper" - }, - "Nutch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "Nutch is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nutch" - }, - "Offline Explorer": { - "operator": "MetaProducts", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "Offline Explorer is a scraper operated by MetaProducts. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/offline-explorer" - }, - "OpenindexSpider": { - "operator": "Openindex", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "OpenindexSpider is a scraper operated by Openindex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openindexspider" - }, - "360Spider": { - "operator": "Qihoo 360", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "360Spider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider" - }, - "AlexandriaOrgBot": { - "operator": "Alexandria", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "AlexandriaOrgBot is a search engine crawler operated by Alexandria.org. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexandriaorgbot" - }, - "Atom Feed Robot": { - "operator": "RSSMicro", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Atom Feed Robot is a search engine crawler operated by RSSMicro. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/atom-feed-robot" - }, - "Baiduspider": { - "operator": "Baidu", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Baiduspider is a search engine crawler operated by Baidu. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider" - }, - "bingbot": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "bingbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingbot" - }, - "coccocbot-web": { - "operator": "Coc Coc", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "coccocbot-web is a search engine crawler operated by Coc Coc. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot-web" - }, - "Daum": { - "operator": "Daum", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Daum is a search engine crawler operated by Daum. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daum" - }, - "DuckDuckBot": { - "operator": "DuckDuckGo", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "DuckDuckBot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckbot" - }, - "DuckDuckGo-Favicons-Bot": { - "operator": "DuckDuckGo", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "DuckDuckGo-Favicons-Bot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckgo-favicons-bot" - }, - "Feedfetcher-Google": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Feedfetcher-Google is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedfetcher-google" - }, - "Google Favicon": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Google Favicon is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-favicon" - }, - "Googlebot": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot" - }, - "Googlebot-Image": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-Image is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-image" - }, - "Googlebot-Mobile": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-Mobile is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-mobile" - }, - "Googlebot-News": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-News is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-news" - }, - "Googlebot-Video": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-Video is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-video" - }, - "HaoSouSpider": { - "operator": "Qihoo 360", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "HaoSouSpider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/haosouspider" - }, - "MojeekBot": { - "operator": "Mojeek", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "MojeekBot is a search engine crawler operated by Mojeek. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeekbot" - }, - "msnbot": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "msnbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot" - }, - "msnbot-media": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "msnbot-media is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot-media" - }, - "Qwantify": { - "operator": "Qwant", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Qwantify is a search engine crawler operated by Qwant. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwantify" - }, - "SemanticScholarBot": { - "operator": "AI2", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "SemanticScholarBot is a search engine crawler operated by AI2. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticscholarbot" - }, - "SeznamBot": { - "operator": "Senzam", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "SeznamBot is a search engine crawler operated by Senzam. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seznambot" - }, - "Sogou web spider": { - "operator": "Sogou", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Sogou web spider is a search engine crawler operated by Sogou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-web-spider" - }, - "teoma": { - "operator": "Ask", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "teoma is a search engine crawler operated by Ask. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teoma" - }, - "TinEye": { - "operator": "TinEye", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "TinEye is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye" - }, - "TinEye-bot": { - "operator": "TinEye", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "TinEye-bot is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye-bot" - }, - "yacybot": { - "operator": "YaCy", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "yacybot is a search engine crawler operated by YaCy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yacybot" - }, - "Yahoo! Slurp": { - "operator": "Yahoo", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Yahoo! Slurp is a search engine crawler operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-slurp" - }, - "Yandex": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Yandex is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandex" - }, - "YandexBot": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YandexBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexbot" - }, - "YandexImages": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YandexImages is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandeximages" - }, - "YandexRenderResourcesBot": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YandexRenderResourcesBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexrenderresourcesbot" - }, - "Yeti": { - "operator": "Naver", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Yeti is a search engine crawler operated by Naver. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yeti" - }, - "YisouSpider": { - "operator": "Yisou", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YisouSpider is a search engine crawler operated by Yisou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yisouspider" - }, - "ZumBot": { - "operator": "ZUM Internet", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "ZumBot is a search engine crawler operated by ZUM Internet. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zumbot" - }, - "AhrefsBot": { - "operator": "Ahrefs", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "AhrefsBot is an SEO crawler operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefsbot" - }, - "Barkrowler": { - "operator": "Babbar", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "Barkrowler is an SEO crawler operated by Babbar. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/barkrowler" - }, - "BLEXBot": { - "operator": "SEO PowerSuite", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "BLEXBot is an SEO crawler operated by SEO PowerSuite. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blexbot" - }, - "BrightEdge Crawler": { - "operator": "BrightEdge", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "BrightEdge Crawler is an SEO crawler operated by BrightEdge. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brightedge-crawler" - }, - "Cocolyzebot": { - "operator": "Cocolyze", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "Cocolyzebot is an SEO crawler operated by Cocolyze. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cocolyzebot" - }, - "DataForSeoBot": { - "operator": "DataForSEO", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "DataForSeoBot is an SEO crawler operated by DataForSEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataforseobot" - }, - "DomainStatsBot": { - "operator": "Domainstats", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "DomainStatsBot is an SEO crawler operated by Domainstats. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domainstatsbot" - }, - "dotbot": { - "operator": "Moz", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "dotbot is an SEO crawler operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dotbot" - }, - "hypestat": { - "operator": "HypeStat", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "hypestat is an SEO crawler operated by HypeStat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hypestat" - }, - "linkdexbot": { - "operator": "Linkdex", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "linkdexbot is an SEO crawler operated by Linkdex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdexbot" - }, - "MJ12bot": { - "operator": "Majestic", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "MJ12bot is an SEO crawler operated by Majestic. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mj12bot" - }, - "online-webceo-bot": { - "operator": "WebCEO", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "online-webceo-bot is an SEO crawler operated by WebCEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/online-webceo-bot" - }, - "Screaming Frog SEO Spider": { - "operator": "Screaming Frog", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "Screaming Frog SEO Spider is an SEO crawler operated by Screaming Frog. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/screaming-frog-seo-spider" - }, - "SemrushBot": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot" - }, - "SemrushBot-BA": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-BA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ba" - }, - "SemrushBot-CT": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-CT is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ct" - }, - "SemrushBot-SI": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-SI is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-si" - }, - "SemrushBot-SWA": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-SWA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-swa" - }, - "SenutoBot": { - "operator": "Senuto", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SenutoBot is an SEO crawler operated by Senuto. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/senutobot" - }, - "SeobilityBot": { - "operator": "Seobility", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SeobilityBot is an SEO crawler operated by Seobility. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seobilitybot" - }, - "SEOkicks": { - "operator": "SEOkicks", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SEOkicks is an SEO crawler operated by SEOkicks. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks" - }, - "SEOlizer": { - "operator": "SEOLizer", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SEOlizer is an SEO crawler operated by SEOLizer. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seolizer" - }, - "serpstatbot": { - "operator": "Serpstat", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "serpstatbot is an SEO crawler operated by Serpstat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serpstatbot" - }, - "SiteCheckerBotCrawler": { - "operator": "Sitechecker", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SiteCheckerBotCrawler is an SEO crawler operated by Sitechecker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheckerbotcrawler" - }, - "ZoomBot": { - "operator": "SEOZoom", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "ZoomBot is an SEO crawler operated by SEOZoom. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoombot" - }, - "007ac9 Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "007ac9 Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/007ac9-crawler" - }, - "2ip.ru": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "2ip.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-ru" - }, - "360Spider-Image": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "360Spider-Image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-image" - }, - "360Spider-Video": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "360Spider-Video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-video" - }, - "5emeRue": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "5emeRue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5emerue" - }, - "5erue": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "5erue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5erue" - }, - "A Patent Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "A Patent Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a-patent-crawler" - }, - "A6-Indexer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "A6-Indexer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a6-indexer" - }, - "Aboundex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Aboundex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aboundex" - }, - "AcademicBotRTU": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AcademicBotRTU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/academicbotrtu" - }, - "acapbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "acapbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acapbot" - }, - "acoonbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "acoonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acoonbot" - }, - "Acunetix Security Scanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Acunetix Security Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-security-scanner" - }, - "Acunetix Web Vulnerability Scanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Acunetix Web Vulnerability Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-web-vulnerability-scanner" - }, - "AddSearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AddSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addsearchbot" - }, - "AddThis": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AddThis is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addthis" - }, - "adequat": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "adequat is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat" - }, - "adequat-systems": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "adequat-systems is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat-systems" - }, - "AdIdxBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AdIdxBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adidxbot" - }, - "ADmantX": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ADmantX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/admantx" - }, - "adscanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "adscanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adscanner" - }, - "AdsTxtCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AdsTxtCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adstxtcrawler" - }, - "AdvBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AdvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/advbot" - }, - "AISearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AISearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aisearchbot" - }, - "Alexabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Alexabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexabot" - }, - "Alexibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Alexibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexibot" - }, - "AlphaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AlphaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alphabot" - }, - "AmiSoftware": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AmiSoftware is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/amisoftware" - }, - "antibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "antibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/antibot" - }, - "AnyEvent": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AnyEvent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anyevent" - }, - "Apercite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Apercite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/apercite" - }, - "AppInsights": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AppInsights is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/appinsights" - }, - "Aqua_Products": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Aqua_Products is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aqua-products" - }, - "arabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "arabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arabot" - }, - "Ask n read": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Ask n read is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ask-n-read" - }, - "asknread.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "asknread.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asknread-com" - }, - "AspiegelBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AspiegelBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aspiegelbot" - }, - "asterias": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "asterias is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asterias" - }, - "Augure": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Augure is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/augure" - }, - "auramundi": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "auramundi is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/auramundi" - }, - "AwarioRssBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AwarioRssBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariorssbot" - }, - "awesomecrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "awesomecrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awesomecrawler" - }, - "B2B Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "B2B Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2b-bot" - }, - "b2w": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "b2w is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2w" - }, - "BackDoorBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BackDoorBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backdoorbot" - }, - "BacklinkCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BacklinkCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backlinkcrawler" - }, - "Baidu-YunGuanCe": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baidu-YunGuanCe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baidu-yunguance" - }, - "Baiduspider-image": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baiduspider-image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-image" - }, - "Baiduspider-news": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baiduspider-news is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-news" - }, - "Baiduspider-video": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baiduspider-video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-video" - }, - "BDCbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BDCbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bdcbot" - }, - "BehloolBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BehloolBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/behloolbot" - }, - "betaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "betaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/betabot" - }, - "Better Uptime Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Better Uptime Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/better-uptime-bot" - }, - "bidswitchbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "bidswitchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bidswitchbot" - }, - "BIGLOTRON": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BIGLOTRON is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/biglotron" - }, - "binlar": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "binlar is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/binlar" - }, - "Birdcrawlerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Birdcrawlerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/birdcrawlerbot" - }, - "BitBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitbot" - }, - "Black Hole": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Black Hole is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/black-hole" - }, - "Blekkobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Blekkobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blekkobot" - }, - "blogmuraBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "blogmuraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blogmurabot" - }, - "BlowFish": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BlowFish is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blowfish" - }, - "BLP_bbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BLP_bbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blp-bbot" - }, - "bnf.fr_bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "bnf.fr_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bnf-fr-bot" - }, - "BomboraBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BomboraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bomborabot" - }, - "Bookmark search tool": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Bookmark search tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bookmark-search-tool" - }, - "bot-pge.chlooe.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "bot-pge.chlooe.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-pge-chlooe-com" - }, - "Bot.AraTurka.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Bot.AraTurka.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-araturka-com" - }, - "BotALot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BotALot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botalot" - }, - "botify": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "botify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botify" - }, - "BotRightHere": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BotRightHere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botrighthere" - }, - "BoxcarBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BoxcarBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/boxcarbot" - }, - "brainobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "brainobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brainobot" - }, - "BrandONbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BrandONbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandonbot" - }, - "BTWebClient": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BTWebClient is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/btwebclient" - }, - "BUbiNG": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BUbiNG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bubing" - }, - "Buck": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Buck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buck" - }, - "BuiltBotTough": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BuiltBotTough is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/builtbottough" - }, - "Bullseye": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Bullseye is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bullseye" - }, - "BunnySlippers": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BunnySlippers is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bunnyslippers" - }, - "buzzbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "buzzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buzzbot" - }, - "Caliperbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Caliperbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/caliperbot" - }, - "CapsuleChecker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CapsuleChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/capsulechecker" - }, - "careerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "careerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/careerbot" - }, - "CC Metadata Scaper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CC Metadata Scaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cc-metadata-scaper" - }, - "Cegbfeieh": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cegbfeieh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cegbfeieh" - }, - "centurybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "centurybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/centurybot" - }, - "changedetection": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "changedetection is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/changedetection" - }, - "CheckMarkNetwork": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CheckMarkNetwork is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/checkmarknetwork" - }, - "CheeseBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CheeseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cheesebot" - }, - "CherryPicker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CherryPicker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypicker" - }, - "CherryPickerElite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CherryPickerElite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerelite" - }, - "CherryPickerSE": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CherryPickerSE is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerse" - }, - "Cision": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cision is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cision" - }, - "CISPA Webcrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CISPA Webcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cispa-webcrawler" - }, - "citeseerxbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "citeseerxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citeseerxbot" - }, - "Citoid": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Citoid is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citoid" - }, - "Claritybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Claritybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/claritybot" - }, - "Clickagy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Clickagy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/clickagy" - }, - "Cliqzbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cliqzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cliqzbot" - }, - "CloudFlare-AlwaysOnline": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CloudFlare-AlwaysOnline is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cloudflare-alwaysonline" - }, - "coccoc": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "coccoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccoc" - }, - "coccocbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "coccocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot" - }, - "coexel": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "coexel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coexel" - }, - "Companybook-Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Companybook-Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/companybook-crawler" - }, - "content crawler spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "content crawler spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/content-crawler-spider" - }, - "ContextAd Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ContextAd Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contextad-bot" - }, - "contxbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "contxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contxbot" - }, - "convera": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "convera is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/convera" - }, - "ConveraCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ConveraCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/converacrawler" - }, - "Cookiebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cookiebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cookiebot" - }, - "Copernic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Copernic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copernic" - }, - "CopyRightCheck": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CopyRightCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copyrightcheck" - }, - "Corporama": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Corporama is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/corporama" - }, - "cosmos": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "cosmos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cosmos" - }, - "crawler4j": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "crawler4j is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawler4j" - }, - "CrawlyProjectCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CrawlyProjectCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawlyprojectcrawler" - }, - "Crescent": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Crescent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent" - }, - "Crescent Internet ToolPak HTTP OLE Control v.1.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Crescent Internet ToolPak HTTP OLE Control v.1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent-internet-toolpak-http-ole-control-v-1-0" - }, - "CriteoBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CriteoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/criteobot" - }, - "CrunchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CrunchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crunchbot" - }, - "CrystalSemanticsBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CrystalSemanticsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crystalsemanticsbot" - }, - "Curebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Curebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/curebot" - }, - "Cutbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cutbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cutbot" - }, - "cXensebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "cXensebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cxensebot" - }, - "CyberPatrol": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CyberPatrol is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cyberpatrol" - }, - "DareBoost": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DareBoost is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dareboost" - }, - "Datafeedwatch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Datafeedwatch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datafeedwatch" - }, - "datagnionbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "datagnionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datagnionbot" - }, - "Datanyze": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Datanyze is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datanyze" - }, - "daumoa": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "daumoa is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daumoa" - }, - "deepcrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "deepcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepcrawl" - }, - "deepnoc": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "deepnoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepnoc" - }, - "DeuSu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DeuSu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deusu" - }, - "Digg Deeper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Digg Deeper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digg-deeper" - }, - "Digimind": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Digimind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digimind" - }, - "Digincore bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Digincore bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digincore-bot" - }, - "discobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "discobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discobot" - }, - "Disqus": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Disqus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/disqus" - }, - "DittoSpyder": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DittoSpyder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dittospyder" - }, - "DnyzBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DnyzBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dnyzbot" - }, - "Domain Re-Animator Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Domain Re-Animator Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domain-re-animator-bot" - }, - "DomainCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DomainCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domaincrawler" - }, - "Dow Jones Searchbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Dow Jones Searchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dow-jones-searchbot" - }, - "Download Ninja": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Download Ninja is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/download-ninja" - }, - "Dragonbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Dragonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dragonbot" - }, - "drupact": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "drupact is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/drupact" - }, - "Dubbotbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Dubbotbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dubbotbot" - }, - "e.ventures Investment Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "e.ventures Investment Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/e-ventures-investment-crawler" - }, - "EasyBib AutoCite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EasyBib AutoCite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/easybib-autocite" - }, - "ec2linkfinder": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ec2linkfinder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ec2linkfinder" - }, - "edisterbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "edisterbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/edisterbot" - }, - "electricmonk": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "electricmonk is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/electricmonk" - }, - "elisabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "elisabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/elisabot" - }, - "ellisphere": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ellisphere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ellisphere" - }, - "EmailCollector": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EmailCollector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailcollector" - }, - "EmailSiphon": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EmailSiphon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailsiphon" - }, - "EmailWolf": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EmailWolf is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailwolf" - }, - "epicbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "epicbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/epicbot" - }, - "eright": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "eright is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eright" - }, - "EroCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EroCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/erocrawler" - }, - "EtaoSpider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EtaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/etaospider" - }, - "europarchive.org": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "europarchive.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/europarchive-org" - }, - "evc-batch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "evc-batch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/evc-batch" - }, - "EveryoneSocialBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EveryoneSocialBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/everyonesocialbot" - }, - "Exabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Exabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/exabot" - }, - "Experibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Experibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/experibot" - }, - "ExtLinksBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ExtLinksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extlinksbot" - }, - "ExtractorPro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ExtractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extractorpro" - }, - "Eyeotabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Eyeotabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eyeotabot" - }, - "EZID": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EZID is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezid" - }, - "Ezooms": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Ezooms is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezooms" - }, - "Facebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Facebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebot" - }, - "FairAd Client": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FairAd Client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fairad-client" - }, - "FAST Enterprise Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FAST Enterprise Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-enterprise-crawler" - }, - "FAST-WebCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FAST-WebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-webcrawler" - }, - "FediDB": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FediDB is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedidb" - }, - "fedoraplanet": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fedoraplanet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedoraplanet" - }, - "Feedbin": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Feedbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbin" - }, - "feedbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "feedbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbot" - }, - "FeedBurner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FeedBurner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedburner" - }, - "Feedspot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Feedspot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedspot" - }, - "FeedValidator": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FeedValidator is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedvalidator" - }, - "FemtosearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FemtosearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/femtosearchbot" - }, - "Fever": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Fever is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fever" - }, - "FindITAnswersbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FindITAnswersbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/finditanswersbot" - }, - "findlink": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "findlink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findlink" - }, - "findthatfile": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "findthatfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findthatfile" - }, - "findxbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "findxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findxbot" - }, - "Flaming AttackBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Flaming AttackBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flaming-attackbot" - }, - "Flamingo_SearchEngine": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Flamingo_SearchEngine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flamingo-searchengine" - }, - "fluffy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fluffy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fluffy" - }, - "Foobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Foobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/foobot" - }, - "fr-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fr-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fr-crawler" - }, - "FreeWebMonitoring SiteChecker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FreeWebMonitoring SiteChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freewebmonitoring-sitechecker" - }, - "FreshpingBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FreshpingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshpingbot" - }, - "fuelbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fuelbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fuelbot" - }, - "Fyrebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Fyrebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fyrebot" - }, - "g00g1e.net": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "g00g1e.net is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g00g1e-net" - }, - "G2 Web Services": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "G2 Web Services is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2-web-services" - }, - "g2reader-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "g2reader-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2reader-bot" - }, - "Gaisbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gaisbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gaisbot" - }, - "GarlikCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GarlikCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/garlikcrawler" - }, - "Genieo": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Genieo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/genieo" - }, - "GetRight": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GetRight is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/getright" - }, - "Gigablast": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gigablast is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigablast" - }, - "Gigabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gigabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigabot" - }, - "GingerCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GingerCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gingercrawler" - }, - "Gluten Free Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gluten Free Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gluten-free-crawler" - }, - "gnam gnam spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "gnam gnam spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnam-gnam-spider" - }, - "GnowitNewsbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GnowitNewsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnowitnewsbot" - }, - "Google-Adwords-Instant": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Adwords-Instant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-adwords-instant" - }, - "Google-Certificates-Bridge": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Certificates-Bridge is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-certificates-bridge" - }, - "Google-PhysicalWeb": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-PhysicalWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-physicalweb" - }, - "Google-Site-Verification": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Site-Verification is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-site-verification" - }, - "Google-Structured-Data-Testing-Tool": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Structured-Data-Testing-Tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-structured-data-testing-tool" - }, - "google-xrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "google-xrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-xrawler" - }, - "Gowikibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gowikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gowikibot" - }, - "grapeshot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "grapeshot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshot" - }, - "GrapeshotCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GrapeshotCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshotcrawler" - }, - "Grobbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Grobbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grobbot" - }, - "GroupHigh": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GroupHigh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grouphigh" - }, - "grub-client": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "grub-client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-client" - }, - "grub.org": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "grub.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-org" - }, - "gsa-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "gsa-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gsa-crawler" - }, - "gslfbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "gslfbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gslfbot" - }, - "Gwene": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gwene is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gwene" - }, - "Harvest": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Harvest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/harvest" - }, - "HawaiiBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "HawaiiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hawaiibot" - }, - "humanlinks": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "humanlinks is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/humanlinks" - }, - "hyscore.io": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "hyscore.io is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hyscore-io" - }, - "IAS crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IAS crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ias-crawler" - }, - "ICBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ICBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/icbot" - }, - "ichiro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ichiro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ichiro" - }, - "imrbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "imrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/imrbot" - }, - "IndeedBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IndeedBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/indeedbot" - }, - "INETDEX-BOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "INETDEX-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inetdex-bot" - }, - "InfoNaviRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "InfoNaviRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infonavirobot" - }, - "infoobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "infoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoobot" - }, - "infoseek": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "infoseek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoseek" - }, - "integromedb": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "integromedb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/integromedb" - }, - "intelium_bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "intelium_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/intelium-bot" - }, - "InterfaxScanBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "InterfaxScanBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/interfaxscanbot" - }, - "ip-web-crawler.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ip-web-crawler.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ip-web-crawler-com" - }, - "IRLbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IRLbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/irlbot" - }, - "Iron33": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Iron33 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iron33" - }, - "iskanie": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "iskanie is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iskanie" - }, - "IsraBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IsraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/israbot" - }, - "istellabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "istellabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/istellabot" - }, - "it2media-domain-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "it2media-domain-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/it2media-domain-crawler" - }, - "James BOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "James BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/james-bot" - }, - "JamesBOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JamesBOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamesbot" - }, - "Jamie's Spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jamie's Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamies-spider" - }, - "JenkersBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JenkersBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jenkersbot" - }, - "JennyBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JennyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jennybot" - }, - "Jetbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jetbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetbot" - }, - "Jetty": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jetty is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetty" - }, - "JikeSpider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JikeSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jikespider" - }, - "JobboerseBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JobboerseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jobboersebot" - }, - "Jooblebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jooblebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jooblebot" - }, - "jpg-newsbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "jpg-newsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jpg-newsbot" - }, - "jyxobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "jyxobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jyxobot" - }, - "k2spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "k2spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k2spider" - }, - "K7MLWCBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "K7MLWCBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k7mlwcbot" - }, - "kbcrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "kbcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kbcrawl" - }, - "Kemvibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Kemvibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kemvibot" - }, - "Kenjin Spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Kenjin Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kenjin-spider" - }, - "keys-so-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "keys-so-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keys-so-bot" - }, - "Keyword Density": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Keyword Density is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keyword-density" - }, - "Knowings": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Knowings is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/knowings" - }, - "KomodiaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "KomodiaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/komodiabot" - }, - "KosmioBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "KosmioBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kosmiobot" - }, - "Landau-Media-Spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Landau-Media-Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/landau-media-spider" - }, - "larbin": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "larbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/larbin" - }, - "Laserlikebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Laserlikebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/laserlikebot" - }, - "lb-spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lb-spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lb-spider" - }, - "leadbox": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "leadbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leadbox" - }, - "Leikibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Leikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leikibot" - }, - "LexiBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LexiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lexibot" - }, - "libWeb": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "libWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/libweb" - }, - "Linespider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Linespider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linespider" - }, - "Linguee Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Linguee Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linguee-bot" - }, - "linkapediabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "linkapediabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkapediabot" - }, - "LinkArchiver": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkArchiver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkarchiver" - }, - "LinkCheck by Siteimprove.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkCheck by Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkcheck-by-siteimprove-com" - }, - "linkdex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "linkdex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdex" - }, - "LinkextractorPro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkextractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkextractorpro" - }, - "LinkisBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkisBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkisbot" - }, - "linko": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "linko is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linko" - }, - "LinkpadBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkpadBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkpadbot" - }, - "LinkScan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkScan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkscan" - }, - "lipperhey": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lipperhey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lipperhey" - }, - "LivelapBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LivelapBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/livelapbot" - }, - "lkxscan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lkxscan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lkxscan" - }, - "LNSpiderguy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LNSpiderguy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lnspiderguy" - }, - "lssbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssbot" - }, - "lssrocketcrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lssrocketcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssrocketcrawler" - }, - "ltx71": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ltx71 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ltx71" - }, - "Luminator-robots": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Luminator-robots is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/luminator-robots" - }, - "lwp-trivial": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lwp-trivial is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lwp-trivial" - }, - "MaCoCu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MaCoCu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/macocu" - }, - "mappydata": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mappydata is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mappydata" - }, - "Mata Hari": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Mata Hari is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mata-hari" - }, - "MauiBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MauiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mauibot" - }, - "MBCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MBCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mbcrawler" - }, - "MegaIndex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MegaIndex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex" - }, - "MegaIndex.ru": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MegaIndex.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex-ru" - }, - "Meltawer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Meltawer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltawer" - }, - "Meltwater": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Meltwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwater" - }, - "MeltwaterNews": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MeltwaterNews is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwaternews" - }, - "memorybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "memorybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/memorybot" - }, - "mention": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mention is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mention" - }, - "MetaJobBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MetaJobBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metajobbot" - }, - "MetaURI": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MetaURI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metauri" - }, - "MIIxpc": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MIIxpc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miixpc" - }, - "mindUpBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mindUpBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mindupbot" - }, - "minicrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "minicrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/minicrawler" - }, - "Mister PiX": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Mister PiX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mister-pix" - }, - "MixnodeCache": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MixnodeCache is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mixnodecache" - }, - "mlbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mlbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mlbot" - }, - "moatbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "moatbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moatbot" - }, - "moget": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "moget is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moget" - }, - "Mojeek": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Mojeek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeek" - }, - "MoodleBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MoodleBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moodlebot" - }, - "Moreover": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Moreover is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moreover" - }, - "MS Search 4.0 Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MS Search 4.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-4-0-robot" - }, - "MS Search 6.0 Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MS Search 6.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-6-0-robot" - }, - "MSIECrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MSIECrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msiecrawler" - }, - "msrbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "msrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msrbot" - }, - "MTRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MTRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mtrobot" - }, - "Multiviewbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Multiviewbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/multiviewbot" - }, - "mytwip": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mytwip is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mytwip" - }, - "NAVER Blog Rssbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NAVER Blog Rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naver-blog-rssbot" - }, - "NaverBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NaverBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naverbot" - }, - "Neevabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Neevabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neevabot" - }, - "NerdByNature.Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NerdByNature.Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdbynature-bot" - }, - "nerdybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "nerdybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdybot" - }, - "NetAnts": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NetAnts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netants" - }, - "netEstate NE Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "netEstate NE Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netestate-ne-crawler" - }, - "Neticle Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Neticle Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neticle-crawler" - }, - "NetMechanic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NetMechanic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netmechanic" - }, - "netresearchserver": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "netresearchserver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netresearchserver" - }, - "NetSystemsResearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NetSystemsResearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netsystemsresearch" - }, - "newsharecounts": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "newsharecounts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsharecounts" - }, - "NewsNow": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NewsNow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsnow" - }, - "Newzbin": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Newzbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newzbin" - }, - "NextGenSearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NextGenSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextgensearchbot" - }, - "NICErsPRO": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NICErsPRO is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicerspro" - }, - "niki-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "niki-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/niki-bot" - }, - "NimbleCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NimbleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimblecrawler" - }, - "Nimbostratus-Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Nimbostratus-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimbostratus-bot" - }, - "NINJA bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NINJA bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ninja-bot" - }, - "NIXStatsbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NIXStatsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nixstatsbot" - }, - "NLUX_IAHarvester": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NLUX_IAHarvester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nlux-iaharvester" - }, - "Nmap Scripting Engine": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Nmap Scripting Engine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nmap-scripting-engine" - }, - "NPBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NPBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/npbot" - }, - "NTENTbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NTENTbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ntentbot" - }, - "Nuzzel": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Nuzzel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nuzzel" - }, - "OdklBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OdklBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/odklbot" - }, - "officestorebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "officestorebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/officestorebot" - }, - "Openbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Openbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openbot" - }, - "Openfind": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Openfind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind" - }, - "Openfind data gatherer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Openfind data gatherer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind-data-gatherer" - }, - "OpenGraphCheck": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OpenGraphCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opengraphcheck" - }, - "OpenHoseBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OpenHoseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openhosebot" - }, - "opinion-tracker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "opinion-tracker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opinion-tracker" - }, - "Oracle Ultra Search": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Oracle Ultra Search is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/oracle-ultra-search" - }, - "OrangeBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OrangeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orangebot" - }, - "Orthogaffe": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Orthogaffe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orthogaffe" - }, - "outbrain": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "outbrain is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outbrain" - }, - "OutclicksBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OutclicksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outclicksbot" - }, - "page2rss": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "page2rss is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/page2rss" - }, - "PagePeeker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PagePeeker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagepeeker" - }, - "PageThing": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PageThing is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagething" - }, - "peer39_crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "peer39_crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/peer39-crawler" - }, - "PerMan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PerMan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/perman" - }, - "Pingdom": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Pingdom is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pingdom" - }, - "Pinterest": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Pinterest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterest" - }, - "PiplBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PiplBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/piplbot" - }, - "postrank": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "postrank is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/postrank" - }, - "PR-CY.RU": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PR-CY.RU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pr-cy-ru" - }, - "Primalbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Primalbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/primalbot" - }, - "PrivacyAwareBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PrivacyAwareBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/privacyawarebot" - }, - "ProPowerBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ProPowerBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/propowerbot" - }, - "ProWebWalker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ProWebWalker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/prowebwalker" - }, - "proxem": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "proxem is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proxem" - }, - "psbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "psbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/psbot" - }, - "Pulsepoint": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Pulsepoint is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pulsepoint" - }, - "purebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "purebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/purebot" - }, - "QueryN Metasearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "QueryN Metasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/queryn-metasearch" - }, - "Qwam content intelligence": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Qwam content intelligence is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwam-content-intelligence" - }, - "Radiation Retriever 1.1": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Radiation Retriever 1.1 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/radiation-retriever-1-1" - }, - "RankActiveLinkBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RankActiveLinkBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankactivelinkbot" - }, - "RankFlex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RankFlex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankflex" - }, - "Refindbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Refindbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/refindbot" - }, - "RegionStuttgartBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RegionStuttgartBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/regionstuttgartbot" - }, - "RepoMonkey": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RepoMonkey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey" - }, - "RepoMonkey Bait & Tackle": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RepoMonkey Bait & Tackle is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey-bait-tackle" - }, - "RetrevoPageAnalyzer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RetrevoPageAnalyzer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/retrevopageanalyzer" - }, - "ReverseEngineeringBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ReverseEngineeringBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/reverseengineeringbot" - }, - "RidderBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RidderBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ridderbot" - }, - "Riddler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Riddler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/riddler" - }, - "Rivva": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Rivva is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rivva" - }, - "Robozilla": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Robozilla is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/robozilla" - }, - "rssbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssbot" - }, - "RSSingBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RSSingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssingbot" - }, - "RukiCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RukiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rukicrawler" - }, - "RuxitSynthetic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RuxitSynthetic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ruxitsynthetic" - }, - "RyteBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RyteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rytebot" - }, - "SafeDNSBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SafeDNSBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safednsbot" - }, - "SafeSearch microdata crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SafeSearch microdata crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safesearch-microdata-crawler" - }, - "SBL-BOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SBL-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sbl-bot" - }, - "score3": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "score3 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/score3" - }, - "ScoutJet": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ScoutJet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoutjet" - }, - "scribdbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "scribdbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scribdbot" - }, - "Scrubby": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Scrubby is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scrubby" - }, - "search.marginalia.nu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "search.marginalia.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/search-marginalia-nu" - }, - "SearchAtlas": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SearchAtlas is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchatlas" - }, - "SearchmetricsBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SearchmetricsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchmetricsbot" - }, - "searchpreview": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "searchpreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchpreview" - }, - "seekbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seekbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekbot" - }, - "Seekport Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Seekport Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekport-crawler" - }, - "Seekr": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Seekr is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekr" - }, - "seewithkids": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seewithkids is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seewithkids" - }, - "semanticbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "semanticbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticbot" - }, - "sempi.tech": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sempi.tech is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sempi-tech" - }, - "SemrushBot-BM": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SemrushBot-BM is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-bm" - }, - "SemrushBot-SA": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SemrushBot-SA is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-sa" - }, - "sentibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sentibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sentibot" - }, - "SEOkicks-Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SEOkicks-Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks-robot" - }, - "seoscanners": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seoscanners is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seoscanners" - }, - "seostar.co": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seostar.co is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostar-co" - }, - "SEOstats": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SEOstats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostats" - }, - "SimpleCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SimpleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplecrawler" - }, - "SimpleScraper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SimpleScraper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplescraper" - }, - "Sindup": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sindup is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sindup" - }, - "sistrix crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sistrix crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sistrix-crawler" - }, - "SiteBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SiteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitebot" - }, - "sitecheck.internetseer.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sitecheck.internetseer.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheck-internetseer-com" - }, - "siteexplorer.info": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "siteexplorer.info is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteexplorer-info" - }, - "Siteimprove": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Siteimprove is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove" - }, - "Siteimprove.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove-com" - }, - "SiteSnagger": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SiteSnagger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesnagger" - }, - "SiteSucker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SiteSucker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesucker" - }, - "Slack-ImgProxy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Slack-ImgProxy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slack-imgproxy" - }, - "Slackbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Slackbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot" - }, - "Slurp": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Slurp is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slurp" - }, - "SocialRankIOBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SocialRankIOBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/socialrankiobot" - }, - "Sogou": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sogou is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou" - }, - "Sogou inst spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sogou inst spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-inst-spider" - }, - "Sogou spider2": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sogou spider2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-spider2" - }, - "Sonic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sonic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sonic" - }, - "Sosospider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sosospider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sosospider" - }, - "SpankBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SpankBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spankbot" - }, - "spanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "spanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spanner" - }, - "spbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "spbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spbot" - }, - "Spinn3r": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Spinn3r is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spinn3r" - }, - "spotter": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "spotter is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spotter" - }, - "SputnikBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SputnikBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sputnikbot" - }, - "Storebot-Google": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Storebot-Google is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storebot-google" - }, - "StorygizeBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "StorygizeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storygizebot" - }, - "StractBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "StractBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/stractbot" - }, - "Streamline3Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Streamline3Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/streamline3bot" - }, - "SummalyBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SummalyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summalybot" - }, - "summify": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "summify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summify" - }, - "SuperBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SuperBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superbot" - }, - "SurveyBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SurveyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surveybot" - }, - "suzuran": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "suzuran is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/suzuran" - }, - "Swiftbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Swiftbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swiftbot" - }, - "SWIMGBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SWIMGBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swimgbot" - }, - "Synthesio": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Synthesio is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synthesio" - }, - "Sysomos": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sysomos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sysomos" - }, - "Szukacz": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Szukacz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/szukacz" - }, - "Taboolabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Taboolabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/taboolabot" - }, - "tagoobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tagoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tagoobot" - }, - "Talkwater": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Talkwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/talkwater" - }, - "TangibleeBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TangibleeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tangibleebot" - }, - "Teleport": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Teleport is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleport" - }, - "TeleportPro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TeleportPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleportpro" - }, - "Telesoft": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Telesoft is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telesoft" - }, - "The Intraformant": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "The Intraformant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/the-intraformant" - }, - "TheNomad": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TheNomad is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thenomad" - }, - "theoldreader.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "theoldreader.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/theoldreader-com" - }, - "Thinklab": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Thinklab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thinklab" - }, - "tigerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tigerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tigerbot" - }, - "Titan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Titan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/titan" - }, - "toCrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "toCrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tocrawl" - }, - "TombaPublicWebCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TombaPublicWebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tombapublicwebcrawler" - }, - "toplistbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "toplistbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toplistbot" - }, - "ToutiaoSpider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ToutiaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toutiaospider" - }, - "Traackr.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Traackr.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/traackr-com" - }, - "tracemyfile": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tracemyfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tracemyfile" - }, - "trafilatura": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trafilatura is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trafilatura" - }, - "trendeo": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trendeo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendeo" - }, - "trendkite-akashic-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trendkite-akashic-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendkite-akashic-crawler" - }, - "trendybuzz": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trendybuzz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendybuzz" - }, - "trovitBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trovitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trovitbot" - }, - "True_Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "True_Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/true-robot" - }, - "TruliaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TruliaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/truliabot" - }, - "turingos": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "turingos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turingos" - }, - "tweetedtimes": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tweetedtimes is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetedtimes" - }, - "twengabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "twengabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twengabot" - }, - "Twurly": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Twurly is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twurly" - }, - "UbiCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UbiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ubicrawler" - }, - "um-IC": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "um-IC is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ic" - }, - "Updownerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Updownerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/updownerbot" - }, - "Upflow": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Upflow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/upflow" - }, - "Uptime-Kuma": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Uptime-Kuma is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptime-kuma" - }, - "Uptimebot.org": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Uptimebot.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimebot-org" - }, - "UptimeRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UptimeRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimerobot" - }, - "URL Control": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "URL Control is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-control" - }, - "URL_Spider_Pro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "URL_Spider_Pro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-spider-pro" - }, - "urlappendbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "urlappendbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urlappendbot" - }, - "URLy Warning": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "URLy Warning is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urly-warning" - }, - "usasearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "usasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usasearch" - }, - "UsineNouvelleCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UsineNouvelleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usinenouvellecrawler" - }, - "UT-Dorkbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UT-Dorkbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ut-dorkbot" - }, - "Validator.nu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Validator.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/validator-nu" - }, - "VCI": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VCI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci" - }, - "VCI WebViewer VCI WebViewer Win32": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VCI WebViewer VCI WebViewer Win32 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci-webviewer-vci-webviewer-win32" - }, - "vebidoobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vebidoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vebidoobot" - }, - "vecteurplus": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vecteurplus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vecteurplus" - }, - "Veoozbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Veoozbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/veoozbot" - }, - "verticalsearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "verticalsearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/verticalsearch" - }, - "Vigil": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Vigil is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vigil" - }, - "VKRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VKRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkrobot" - }, - "voilabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "voilabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voilabot" - }, - "voltron": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "voltron is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voltron" - }, - "VoluumDSP-content-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VoluumDSP-content-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voluumdsp-content-bot" - }, - "vsw": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vsw is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vsw" - }, - "vuhuvBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vuhuvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vuhuvbot" - }, - "W3C_I18n-Checker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C_I18n-Checker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-i18n-checker" - }, - "W3C_Unicorn": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C_Unicorn is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-unicorn" - }, - "W3C-checklink": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C-checklink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-checklink" - }, - "W3C-mobileOK": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C-mobileOK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-mobileok" - }, - "WASALive-Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WASALive-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wasalive-bot" - }, - "wbsearchbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "wbsearchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wbsearchbot" - }, - "Web Image Collector": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Web Image Collector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-image-collector" - }, - "web-archive-net.com.bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "web-archive-net.com.bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-archive-net-com-bot" - }, - "WebAuto": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebAuto is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webauto" - }, - "WebBandit": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebBandit is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webbandit" - }, - "WebCapture 2.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCapture 2.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcapture-2-0" - }, - "webcompanycrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "webcompanycrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcompanycrawler" - }, - "WebCopier": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCopier is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier" - }, - "WebCopier v.2.2": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCopier v.2.2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v-2-2" - }, - "WebCopier v3.2a": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCopier v3.2a is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v3-2a" - }, - "WebDataStats": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebDataStats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webdatastats" - }, - "WebEnhancer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebEnhancer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webenhancer" - }, - "WebmasterWorldForumBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebmasterWorldForumBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmasterworldforumbot" - }, - "webmon": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "webmon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmon" - }, - "WebReaper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebReaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webreaper" - }, - "WebSauger": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebSauger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/websauger" - }, - "Website Quester": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Website Quester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/website-quester" - }, - "WebStripper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebStripper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webstripper" - }, - "WebZIP": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebZIP is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webzip" - }, - "winello": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "winello is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winello" - }, - "WinHTTrack": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WinHTTrack is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winhttrack" - }, - "WiseGuys Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WiseGuys Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wiseguys-robot" - }, - "wocbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "wocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wocbot" - }, - "woobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "woobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woobot" - }, - "woorankreview": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "woorankreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woorankreview" - }, - "WordupInfoSearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WordupInfoSearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wordupinfosearch" - }, - "woriobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "woriobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woriobot" - }, - "wotbox": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "wotbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wotbox" - }, - "WWW-Collector-E": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WWW-Collector-E is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-collector-e" - }, - "WWW-Mechanize": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WWW-Mechanize is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-mechanize" - }, - "www.uptime.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "www.uptime.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-uptime-com" - }, - "Xenu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu" - }, - "Xenu Link Sleuth": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu Link Sleuth is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu-link-sleuth" - }, - "Xenu's": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu's is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus" - }, - "Xenu's Link Sleuth 1.1c": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu's Link Sleuth 1.1c is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus-link-sleuth-1-1c" - }, - "xovibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "xovibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xovibot" - }, - "Yahoo Pipes 1.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Yahoo Pipes 1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-pipes-1-0" - }, - "YaK": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YaK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yak" - }, - "YandexMobileBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YandexMobileBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexmobilebot" - }, - "YandexVideo": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YandexVideo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexvideo" - }, - "yanga": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "yanga is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yanga" - }, - "Yellowbrandprotectionbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Yellowbrandprotectionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yellowbrandprotectionbot" - }, - "yoozBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "yoozBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yoozbot" - }, - "YoudaoBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YoudaoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youdaobot" - }, - "Youmag": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Youmag is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youmag" - }, - "Zabbix": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zabbix is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zabbix" - }, - "Zao": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zao is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zao" - }, - "Zealbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zealbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zealbot" - }, - "zenback bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "zenback bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zenback-bot" - }, - "Zeus": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zeus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus" - }, - "Zeus Link Scout": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zeus Link Scout is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus-link-scout" - }, - "zgrab": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "zgrab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zgrab" - }, - "Zite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zite" - }, - "ZuperlistBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ZuperlistBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zuperlistbot" - }, - "ZyBORG": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ZyBORG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zyborg" } } \ No newline at end of file From d6a5e8cd8145705fa3cf244cb089c22d76d6a951 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 6 Aug 2024 18:50:53 +0000 Subject: [PATCH 054/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 3d4bf2c3db702e4e1e6131cb4da6992c58a89a62 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 6 Aug 2024 18:50:54 +0000 Subject: [PATCH 055/249] restore original robots.json --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From cebf8093914c9c9d64a9f551f8b7ecbe1558301c Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Wed, 7 Aug 2024 00:14:26 +0000 Subject: [PATCH 056/249] Daily update from Dark Visitors --- robots.json | 5444 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 5403 insertions(+), 41 deletions(-) diff --git a/robots.json b/robots.json index d550d50..00b594f 100644 --- a/robots.json +++ b/robots.json @@ -7,14 +7,14 @@ "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." }, "anthropic-ai": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot-Extended": { - "operator": "[Apple](https:\/\/support.apple.com\/en-us\/119829#datausage)", + "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", "respect": "Yes", "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", "frequency": "Unclear at this time.", @@ -28,192 +28,5554 @@ "description": "Downloads data to train LLMS, including ChatGPT competitors." }, "CCBot": { - "operator": "[Common Crawl](https:\/\/commoncrawl.org)", - "respect": "[Yes](https:\/\/commoncrawl.org\/ccbot)", + "operator": "[Common Crawl](https://commoncrawl.org)", + "respect": "[Yes](https://commoncrawl.org/ccbot)", "function": "Provides crawl data for an open source repository that has been used to train LLMs.", "frequency": "Unclear at this time.", "description": "Sources data that is made openly available and is used to train AI models." }, "ChatGPT-User": { - "operator": "[OpenAI](https:\/\/openai.com)", + "operator": "[OpenAI](https://openai.com)", "respect": "Yes", "function": "Takes action based on user prompts.", "frequency": "Only when prompted by a user.", "description": "Used by plugins in ChatGPT to answer queries based on user input." }, "ClaudeBot": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Claude-Web": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "cohere-ai": { - "operator": "[Cohere](https:\/\/cohere.com)", + "operator": "[Cohere](https://cohere.com)", "respect": "Unclear at this time.", "function": "Retrieves data to provide responses to user-initiated prompts.", "frequency": "Takes action based on user prompts.", "description": "Retrieves data based on user prompts." }, "Diffbot": { - "operator": "[Diffbot](https:\/\/www.diffbot.com\/)", + "operator": "[Diffbot](https://www.diffbot.com/)", "respect": "At the discretion of Diffbot users.", "function": "Aggregates structured web data for monitoring and AI model training.", "frequency": "Unclear at this time.", "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." }, "FacebookBot": { - "operator": "Meta\/Facebook", - "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", "function": "Training language models", "frequency": "Up to 1 page per second", "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." }, "facebookexternalhit": { - "operator": "Meta\/Facebook", - "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", - "function": "No information.", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", + "function": "Fetchers", "frequency": "Unclear at this time.", - "description": "Unclear at this time." + "description": "facebookexternalhit is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebookexternalhit" }, "FriendlyCrawler": { "operator": "Unknown", - "respect": "[Yes](https:\/\/imho.alex-kunz.com\/2024\/01\/25\/an-update-on-friendly-crawler)", + "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)", "function": "We are using the data from the crawler to build datasets for machine learning experiments.", "frequency": "Unclear at this time.", "description": "Unclear who the operator is; but data is used for training/machine learning." }, "Google-Extended": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "LLM training.", "frequency": "No information.", "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." }, "GoogleOther": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Image": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Video": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GPTBot": { - "operator": "[OpenAI](https:\/\/openai.com)", + "operator": "[OpenAI](https://openai.com)", "respect": "Yes", "function": "Scrapes data to train OpenAI's products.", "frequency": "No information.", "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." }, "ICC-Crawler": { - "operator": "[NICT](https:\/\/nict.go.jp)", + "operator": "[NICT](https://nict.go.jp)", "respect": "Yes", "function": "Scrapes data to train and support AI technologies.", "frequency": "No information.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, "ImagesiftBot": { - "operator": "[ImageSift](https:\/\/imagesift.com)", - "respect": "[Yes](https:\/\/imagesift.com\/about)", + "operator": "[ImageSift](https://imagesift.com)", + "respect": "[Yes](https://imagesift.com/about)", "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", "frequency": "No information.", "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." }, "img2dataset": { - "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", + "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", "respect": "Unclear at this time.", "function": "Scrapes images for use in LLMs.", "frequency": "At the discretion of img2dataset users.", "description": "Downloads large sets of images into datasets for LLM training or other purposes." }, "Meta-ExternalAgent": { - "operator": "[Meta](https:\/\/developers.facebook.com\/docs\/sharing\/webmasters\/web-crawlers)", + "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", "respect": "Yes.", "function": "Used to train models and improve products.", "frequency": "No information.", "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" }, "OAI-SearchBot": { - "operator": "[OpenAI](https:\/\/openai.com)", - "respect": "[Yes](https:\/\/platform.openai.com\/docs\/bots)", + "operator": "[OpenAI](https://openai.com)", + "respect": "[Yes](https://platform.openai.com/docs/bots)", "function": "Search result generation.", "frequency": "No information.", "description": "Crawls sites to surface as results in SearchGPT." }, "omgili": { - "operator": "[Webz.io](https:\/\/webz.io\/)", - "respect": "[Yes](https:\/\/webz.io\/blog\/web-data\/what-is-the-omgili-bot-and-why-is-it-crawling-your-website\/)", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", "function": "Data is sold.", "frequency": "No information.", "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." }, "omgilibot": { - "operator": "[Webz.io](https:\/\/webz.io\/)", - "respect": "[Yes](https:\/\/web.archive.org\/web\/20170704003301\/http:\/\/omgili.com\/Crawler.html)", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)", "function": "Data is sold.", "frequency": "No information.", "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." }, "PerplexityBot": { - "operator": "[Perplexity](https:\/\/www.perplexity.ai\/)", - "respect": "[No](https:\/\/www.macstories.net\/stories\/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler\/)", + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", "function": "Used to answer queries at the request of users.", "frequency": "Takes action based on user prompts.", "description": "Operated by Perplexity to obtain results in response to user queries." }, "PetalBot": { - "operator": "[Huawei](https:\/\/huawei.com\/)", + "operator": "[Huawei](https://huawei.com/)", "respect": "Yes", "function": "Used to provide recommendations in Hauwei assistant and AI search services.", "frequency": "No explicit frequency provided.", "description": "Operated by Huawei to provide search and AI assistant services." }, "Scrapy": { - "operator": "[Zyte](https:\/\/www.zyte.com)", + "operator": "[Zyte](https://www.zyte.com)", "respect": "Unclear at this time.", "function": "Scrapes data a variety of uses including training AI.", "frequency": "No information.", "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" }, "Timpibot": { - "operator": "[Timpi](https:\/\/timpi.io)", + "operator": "[Timpi](https://timpi.io)", "respect": "Unclear at this time.", "function": "Scrapes data for use in training LLMs.", "frequency": "No information.", "description": "Makes data available for training AI models." }, "VelenPublicWebCrawler": { - "operator": "[Velen Crawler](https:\/\/velen.io)", - "respect": "[Yes](https:\/\/velen.io)", + "operator": "[Velen Crawler](https://velen.io)", + "respect": "[Yes](https://velen.io)", "function": "Scrapes data for business data sets and machine learning models.", "frequency": "No information.", "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" }, "YouBot": { - "operator": "[You](https:\/\/about.you.com\/youchat\/)", - "respect": "[Yes](https:\/\/about.you.com\/youbot\/)", + "operator": "[You](https://about.you.com/youchat/)", + "respect": "[Yes](https://about.you.com/youbot/)", "function": "Scrapes data for search engine and LLMs.", "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." + }, + "Meta-ExternalFetcher": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Assistants", + "frequency": "Unclear at this time.", + "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" + }, + "Applebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Search Crawlers", + "frequency": "Unclear at this time.", + "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" + }, + "archive.org_bot": { + "operator": "Internet Archive", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "archive.org_bot is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archive-org-bot" + }, + "Arquivo-web-crawler": { + "operator": "Arquivo", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "Arquivo-web-crawler is an archiver operated by Arquivo.pt. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arquivo-web-crawler" + }, + "heritrix": { + "operator": "Internet Archive", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "heritrix is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/heritrix" + }, + "ia_archiver": { + "operator": "Internet Archive", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "ia_archiver is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver" + }, + "ia_archiver-web.archive.org": { + "operator": "Internet Archive", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "ia_archiver-web.archive.org is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver-web-archive-org" + }, + "Nicecrawler": { + "operator": "NiceCrawler", + "respect": "Unclear at this time.", + "function": "Archivers", + "frequency": "Unclear at this time.", + "description": "Nicecrawler is an archiver operated by NiceCrawler. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicecrawler" + }, + "2ip bot": { + "operator": "2IP", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "2ip bot is a developer helper operated by 2IP. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-bot" + }, + "AhrefsSiteAudit": { + "operator": "Ahrefs", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "AhrefsSiteAudit is a developer helper operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefssiteaudit" + }, + "BingPreview": { + "operator": "Microsoft", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "BingPreview is a developer helper operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingpreview" + }, + "Chrome-Lighthouse": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "Chrome-Lighthouse is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/chrome-lighthouse" + }, + "Dark Visitor": { + "operator": "Dark Visitors", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "Dark Visitor is a developer helper operated by Dark Visitors. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dark-visitor" + }, + "deadlinkchecker": { + "operator": "Dead Link Checker", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "deadlinkchecker is a developer helper operated by Dead Link Checker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deadlinkchecker" + }, + "Google-InspectionTool": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "Google-InspectionTool is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-inspectiontool" + }, + "rogerbot": { + "operator": "Moz", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "rogerbot is a developer helper operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rogerbot" + }, + "SiteAuditBot": { + "operator": "Semrush", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "SiteAuditBot is a developer helper operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteauditbot" + }, + "t3versionsBot": { + "operator": "T3Versions", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "t3versionsBot is a developer helper operated by T3Versions. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/t3versionsbot" + }, + "W3C_CSS_Validator": { + "operator": "W3C", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "W3C_CSS_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-css-validator" + }, + "W3C_Validator": { + "operator": "W3C", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "W3C_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-validator" + }, + "WellKnownBot": { + "operator": "Well-Known", + "respect": "Unclear at this time.", + "function": "Developer Helpers", + "frequency": "Unclear at this time.", + "description": "WellKnownBot is a developer helper operated by Well-Known. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wellknownbot" + }, + "BazQux": { + "operator": "BazQux", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "BazQux is a fetcher operated by BazQux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bazqux" + }, + "bitlybot": { + "operator": "Bitly", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "bitlybot is a fetcher operated by Bitly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitlybot" + }, + "BublupBot": { + "operator": "Bublup", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "BublupBot is a fetcher operated by Bublup. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bublupbot" + }, + "Discordbot": { + "operator": "Discord", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Discordbot is a fetcher operated by Discord. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discordbot" + }, + "Embedly": { + "operator": "Embedly", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Embedly is a fetcher operated by Embedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/embedly" + }, + "Feedly": { + "operator": "Feedly", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Feedly is a fetcher operated by Feedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedly" + }, + "FlipboardProxy": { + "operator": "Flipboard", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "FlipboardProxy is a fetcher operated by Flipboard. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flipboardproxy" + }, + "FreshRSS": { + "operator": "FreshRSS", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "FreshRSS is a fetcher operated by FreshRSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshrss" + }, + "Friendica": { + "operator": "Friendica", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Friendica is a fetcher operated by Friendica. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/friendica" + }, + "Google Web Preview": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Google Web Preview is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-web-preview" + }, + "Google-Read-Aloud": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Google-Read-Aloud is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-read-aloud" + }, + "Hatena": { + "operator": "Hatena", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Hatena is a fetcher operated by Hatena. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hatena" + }, + "Iframely": { + "operator": "Iframely", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Iframely is a fetcher operated by Iframely. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iframely" + }, + "inoreader": { + "operator": "Inoreader", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "inoreader is a fetcher operated by Inoreader. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inoreader" + }, + "LinkedInBot": { + "operator": "LinkedIn", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "LinkedInBot is a fetcher operated by LinkedIn. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkedinbot" + }, + "Mail.RU_Bot": { + "operator": "VK", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Mail.RU_Bot is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mail-ru-bot" + }, + "Mastodon": { + "operator": "Mastodon", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Mastodon is a fetcher operated by Mastodon. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mastodon" + }, + "Miniflux": { + "operator": "Miniflux", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Miniflux is a fetcher operated by Miniflux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miniflux" + }, + "NewsBlur": { + "operator": "NewsBlur", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "NewsBlur is a fetcher operated by NewsBlur. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsblur" + }, + "Nextcloud": { + "operator": "Nextcloud", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Nextcloud is a fetcher operated by Nextcloud. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextcloud" + }, + "Pinterestbot": { + "operator": "Pinterest", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Pinterestbot is a fetcher operated by Pinterest. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterestbot" + }, + "PocketParser": { + "operator": "Pocket", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "PocketParser is a fetcher operated by Pocket. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pocketparser" + }, + "redditbot": { + "operator": "Reddit", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "redditbot is a fetcher operated by Reddit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/redditbot" + }, + "SerendeputyBot": { + "operator": "Serendeputy", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SerendeputyBot is a fetcher operated by Serendeputy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serendeputybot" + }, + "SimplePie": { + "operator": "SimplePie", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SimplePie is a fetcher operated by SimplePie. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplepie" + }, + "SkypeUriPreview": { + "operator": "Microsoft", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SkypeUriPreview is a fetcher operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/skypeuripreview" + }, + "Slackbot-LinkExpanding": { + "operator": "Slack", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Slackbot-LinkExpanding is a fetcher operated by Slack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot-linkexpanding" + }, + "Snap URL Preview Service": { + "operator": "Snap", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Snap URL Preview Service is a fetcher operated by Snap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snap-url-preview-service" + }, + "snapchat": { + "operator": "Snapchat", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "snapchat is a fetcher operated by Snapchat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snapchat" + }, + "startmebot": { + "operator": "Start", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "startmebot is a fetcher operated by Start.me. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/startmebot" + }, + "Superfeedr": { + "operator": "Superfeedr", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Superfeedr is a fetcher operated by Superfeedr. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superfeedr" + }, + "SurdotlyBot": { + "operator": "Sur", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "SurdotlyBot is a fetcher operated by Sur.ly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surdotlybot" + }, + "Synapse": { + "operator": "Matrix", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Synapse is a fetcher operated by Matrix. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synapse" + }, + "TelegramBot": { + "operator": "Telegram", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "TelegramBot is a fetcher operated by Telegram. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telegrambot" + }, + "Tiny Tiny RSS": { + "operator": "Tiny Tiny RSS", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Tiny Tiny RSS is a fetcher operated by Tiny Tiny RSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tiny-tiny-rss" + }, + "Twitterbot": { + "operator": "X", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Twitterbot is a fetcher operated by X. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twitterbot" + }, + "Viber": { + "operator": "Viber", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Viber is a fetcher operated by Viber. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/viber" + }, + "vkShare": { + "operator": "VK", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "vkShare is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkshare" + }, + "WhatsApp": { + "operator": "Meta", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "WhatsApp is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/whatsapp" + }, + "Yahoo Link Preview": { + "operator": "Yahoo", + "respect": "Unclear at this time.", + "function": "Fetchers", + "frequency": "Unclear at this time.", + "description": "Yahoo Link Preview is a fetcher operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-link-preview" + }, + "adbeat_bot": { + "operator": "Adbeat", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "adbeat_bot is an intelligence gatherer operated by Adbeat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adbeat-bot" + }, + "AdsBot-Google": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AdsBot-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google" + }, + "AdsBot-Google-Mobile": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AdsBot-Google-Mobile is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google-mobile" + }, + "aiHitBot": { + "operator": "aiHit", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "aiHitBot is an intelligence gatherer operated by aiHit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aihitbot" + }, + "AndersPinkBot": { + "operator": "Anders Pink", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AndersPinkBot is an intelligence gatherer operated by Anders Pink. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anderspinkbot" + }, + "ArchiveBot": { + "operator": "Wikimedia", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "ArchiveBot is an intelligence gatherer operated by Wikimedia. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archivebot" + }, + "AwarioBot": { + "operator": "Awario", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AwarioBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariobot" + }, + "AwarioSmartBot": { + "operator": "Awario", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "AwarioSmartBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariosmartbot" + }, + "BitSightBot": { + "operator": "Bitsight", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "BitSightBot is an intelligence gatherer operated by Bitsight. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitsightbot" + }, + "Blackboard": { + "operator": "Anthology", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Blackboard is an intelligence gatherer operated by Anthology. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blackboard" + }, + "BrandVerity": { + "operator": "BrandVerity", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "BrandVerity is an intelligence gatherer operated by BrandVerity. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandverity" + }, + "Cincraw": { + "operator": "CINC", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Cincraw is an intelligence gatherer operated by CINC. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cincraw" + }, + "ev-crawler": { + "operator": "Headline", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "ev-crawler is an intelligence gatherer operated by Headline. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ev-crawler" + }, + "Google-Safety": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Google-Safety is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-safety" + }, + "HubSpot": { + "operator": "HubSpot", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "HubSpot is an intelligence gatherer operated by HubSpot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hubspot" + }, + "IonCrawl": { + "operator": "IONOS", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "IonCrawl is an intelligence gatherer operated by IONOS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ioncrawl" + }, + "Jugendschutzprogramm-Crawler": { + "operator": "JusProg", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Jugendschutzprogramm-Crawler is an intelligence gatherer operated by JusProg. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jugendschutzprogramm-crawler" + }, + "KStandBot": { + "operator": "URL Classification", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "KStandBot is an intelligence gatherer operated by URL Classification. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kstandbot" + }, + "LightspeedSystemsCrawler": { + "operator": "Lightspeed Systems", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "LightspeedSystemsCrawler is an intelligence gatherer operated by Lightspeed Systems. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lightspeedsystemscrawler" + }, + "linkfluence": { + "operator": "Meltwater", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "linkfluence is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkfluence" + }, + "LinkWalker": { + "operator": "Fortra", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "LinkWalker is an intelligence gatherer operated by Fortra. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkwalker" + }, + "magpie-crawler": { + "operator": "Brandwatch", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "magpie-crawler is an intelligence gatherer operated by Brandwatch. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/magpie-crawler" + }, + "Mediapartners-Google": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Mediapartners-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediapartners-google" + }, + "Mediatoolkitbot": { + "operator": "Determ", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Mediatoolkitbot is an intelligence gatherer operated by Determ. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediatoolkitbot" + }, + "MuckRack": { + "operator": "Muck Rack", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "MuckRack is an intelligence gatherer operated by Muck Rack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/muckrack" + }, + "NetcraftSurveyAgent": { + "operator": "Netcraft", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "NetcraftSurveyAgent is an intelligence gatherer operated by Netcraft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netcraftsurveyagent" + }, + "Netvibes": { + "operator": "Netvibes", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Netvibes is an intelligence gatherer operated by Netvibes. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netvibes" + }, + "Pandalytics": { + "operator": "Domainsbot", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Pandalytics is an intelligence gatherer operated by Domainsbot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pandalytics" + }, + "panscient.com": { + "operator": "Panscient", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "panscient.com is an intelligence gatherer operated by Panscient. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/panscient-com" + }, + "proximic": { + "operator": "Comscore", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "proximic is an intelligence gatherer operated by Comscore. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proximic" + }, + "scoop.it": { + "operator": "Meltwater", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "scoop.it is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoop-it" + }, + "SeekportBot": { + "operator": "Seekport", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "SeekportBot is an intelligence gatherer operated by Seekport. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekportbot" + }, + "SMTBot": { + "operator": "SimilarTech", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "SMTBot is an intelligence gatherer operated by SimilarTech. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/smtbot" + }, + "trendictionbot": { + "operator": "Trendiction", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "trendictionbot is an intelligence gatherer operated by Trendiction. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendictionbot" + }, + "TrendsmapResolver": { + "operator": "Trendsmap", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "TrendsmapResolver is an intelligence gatherer operated by Trendsmap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendsmapresolver" + }, + "Turnitin": { + "operator": "Turnitin", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Turnitin is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitin" + }, + "TurnitinBot": { + "operator": "Turnitin", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "TurnitinBot is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitinbot" + }, + "TweetmemeBot": { + "operator": "Meltwater", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "TweetmemeBot is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetmemebot" + }, + "Twingly": { + "operator": "Twingly", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "Twingly is an intelligence gatherer operated by Twingly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twingly" + }, + "um-LN": { + "operator": "Ubermetrics", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "um-LN is an intelligence gatherer operated by Ubermetrics. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ln" + }, + "virustotal": { + "operator": "VirusTotal", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "virustotal is an intelligence gatherer operated by VirusTotal. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/virustotal" + }, + "ZoominfoBot": { + "operator": "ZoomInfo", + "respect": "Unclear at this time.", + "function": "Intelligence Gatherers", + "frequency": "Unclear at this time.", + "description": "ZoominfoBot is an intelligence gatherer operated by ZoomInfo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoominfobot" + }, + "008": { + "operator": "80legs", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "008 is a scraper operated by 80legs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/008" + }, + "Dataprovider.com": { + "operator": "Dataprovider", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "Dataprovider.com is a scraper operated by Dataprovider.com. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataprovider-com" + }, + "dcrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "dcrawl is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dcrawl" + }, + "HTTrack": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "HTTrack is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack" + }, + "HTTrack 3.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "HTTrack 3.0 is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack-3-0" + }, + "MetaInspector": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "MetaInspector is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metainspector" + }, + "newspaper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "newspaper is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newspaper" + }, + "Nutch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "Nutch is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nutch" + }, + "Offline Explorer": { + "operator": "MetaProducts", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "Offline Explorer is a scraper operated by MetaProducts. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/offline-explorer" + }, + "OpenindexSpider": { + "operator": "Openindex", + "respect": "Unclear at this time.", + "function": "Scrapers", + "frequency": "Unclear at this time.", + "description": "OpenindexSpider is a scraper operated by Openindex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openindexspider" + }, + "360Spider": { + "operator": "Qihoo 360", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "360Spider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider" + }, + "AlexandriaOrgBot": { + "operator": "Alexandria", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "AlexandriaOrgBot is a search engine crawler operated by Alexandria.org. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexandriaorgbot" + }, + "Atom Feed Robot": { + "operator": "RSSMicro", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Atom Feed Robot is a search engine crawler operated by RSSMicro. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/atom-feed-robot" + }, + "Baiduspider": { + "operator": "Baidu", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Baiduspider is a search engine crawler operated by Baidu. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider" + }, + "bingbot": { + "operator": "Microsoft", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "bingbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingbot" + }, + "coccocbot-web": { + "operator": "Coc Coc", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "coccocbot-web is a search engine crawler operated by Coc Coc. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot-web" + }, + "Daum": { + "operator": "Daum", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Daum is a search engine crawler operated by Daum. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daum" + }, + "DuckDuckBot": { + "operator": "DuckDuckGo", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "DuckDuckBot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckbot" + }, + "DuckDuckGo-Favicons-Bot": { + "operator": "DuckDuckGo", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "DuckDuckGo-Favicons-Bot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckgo-favicons-bot" + }, + "Feedfetcher-Google": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Feedfetcher-Google is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedfetcher-google" + }, + "Google Favicon": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Google Favicon is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-favicon" + }, + "Googlebot": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot" + }, + "Googlebot-Image": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-Image is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-image" + }, + "Googlebot-Mobile": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-Mobile is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-mobile" + }, + "Googlebot-News": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-News is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-news" + }, + "Googlebot-Video": { + "operator": "Google", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Googlebot-Video is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-video" + }, + "HaoSouSpider": { + "operator": "Qihoo 360", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "HaoSouSpider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/haosouspider" + }, + "MojeekBot": { + "operator": "Mojeek", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "MojeekBot is a search engine crawler operated by Mojeek. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeekbot" + }, + "msnbot": { + "operator": "Microsoft", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "msnbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot" + }, + "msnbot-media": { + "operator": "Microsoft", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "msnbot-media is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot-media" + }, + "Qwantify": { + "operator": "Qwant", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Qwantify is a search engine crawler operated by Qwant. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwantify" + }, + "SemanticScholarBot": { + "operator": "AI2", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "SemanticScholarBot is a search engine crawler operated by AI2. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticscholarbot" + }, + "SeznamBot": { + "operator": "Senzam", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "SeznamBot is a search engine crawler operated by Senzam. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seznambot" + }, + "Sogou web spider": { + "operator": "Sogou", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Sogou web spider is a search engine crawler operated by Sogou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-web-spider" + }, + "teoma": { + "operator": "Ask", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "teoma is a search engine crawler operated by Ask. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teoma" + }, + "TinEye": { + "operator": "TinEye", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "TinEye is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye" + }, + "TinEye-bot": { + "operator": "TinEye", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "TinEye-bot is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye-bot" + }, + "yacybot": { + "operator": "YaCy", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "yacybot is a search engine crawler operated by YaCy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yacybot" + }, + "Yahoo! Slurp": { + "operator": "Yahoo", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Yahoo! Slurp is a search engine crawler operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-slurp" + }, + "Yandex": { + "operator": "Yandex", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Yandex is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandex" + }, + "YandexBot": { + "operator": "Yandex", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YandexBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexbot" + }, + "YandexImages": { + "operator": "Yandex", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YandexImages is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandeximages" + }, + "YandexRenderResourcesBot": { + "operator": "Yandex", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YandexRenderResourcesBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexrenderresourcesbot" + }, + "Yeti": { + "operator": "Naver", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "Yeti is a search engine crawler operated by Naver. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yeti" + }, + "YisouSpider": { + "operator": "Yisou", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "YisouSpider is a search engine crawler operated by Yisou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yisouspider" + }, + "ZumBot": { + "operator": "ZUM Internet", + "respect": "Unclear at this time.", + "function": "Search Engine Crawlers", + "frequency": "Unclear at this time.", + "description": "ZumBot is a search engine crawler operated by ZUM Internet. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zumbot" + }, + "AhrefsBot": { + "operator": "Ahrefs", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "AhrefsBot is an SEO crawler operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefsbot" + }, + "Barkrowler": { + "operator": "Babbar", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "Barkrowler is an SEO crawler operated by Babbar. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/barkrowler" + }, + "BLEXBot": { + "operator": "SEO PowerSuite", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "BLEXBot is an SEO crawler operated by SEO PowerSuite. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blexbot" + }, + "BrightEdge Crawler": { + "operator": "BrightEdge", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "BrightEdge Crawler is an SEO crawler operated by BrightEdge. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brightedge-crawler" + }, + "Cocolyzebot": { + "operator": "Cocolyze", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "Cocolyzebot is an SEO crawler operated by Cocolyze. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cocolyzebot" + }, + "DataForSeoBot": { + "operator": "DataForSEO", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "DataForSeoBot is an SEO crawler operated by DataForSEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataforseobot" + }, + "DomainStatsBot": { + "operator": "Domainstats", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "DomainStatsBot is an SEO crawler operated by Domainstats. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domainstatsbot" + }, + "dotbot": { + "operator": "Moz", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "dotbot is an SEO crawler operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dotbot" + }, + "hypestat": { + "operator": "HypeStat", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "hypestat is an SEO crawler operated by HypeStat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hypestat" + }, + "linkdexbot": { + "operator": "Linkdex", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "linkdexbot is an SEO crawler operated by Linkdex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdexbot" + }, + "MJ12bot": { + "operator": "Majestic", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "MJ12bot is an SEO crawler operated by Majestic. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mj12bot" + }, + "online-webceo-bot": { + "operator": "WebCEO", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "online-webceo-bot is an SEO crawler operated by WebCEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/online-webceo-bot" + }, + "Screaming Frog SEO Spider": { + "operator": "Screaming Frog", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "Screaming Frog SEO Spider is an SEO crawler operated by Screaming Frog. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/screaming-frog-seo-spider" + }, + "SemrushBot": { + "operator": "Semrush", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot" + }, + "SemrushBot-BA": { + "operator": "Semrush", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-BA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ba" + }, + "SemrushBot-CT": { + "operator": "Semrush", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-CT is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ct" + }, + "SemrushBot-SI": { + "operator": "Semrush", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-SI is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-si" + }, + "SemrushBot-SWA": { + "operator": "Semrush", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SemrushBot-SWA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-swa" + }, + "SenutoBot": { + "operator": "Senuto", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SenutoBot is an SEO crawler operated by Senuto. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/senutobot" + }, + "SeobilityBot": { + "operator": "Seobility", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SeobilityBot is an SEO crawler operated by Seobility. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seobilitybot" + }, + "SEOkicks": { + "operator": "SEOkicks", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SEOkicks is an SEO crawler operated by SEOkicks. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks" + }, + "SEOlizer": { + "operator": "SEOLizer", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SEOlizer is an SEO crawler operated by SEOLizer. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seolizer" + }, + "serpstatbot": { + "operator": "Serpstat", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "serpstatbot is an SEO crawler operated by Serpstat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serpstatbot" + }, + "SiteCheckerBotCrawler": { + "operator": "Sitechecker", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "SiteCheckerBotCrawler is an SEO crawler operated by Sitechecker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheckerbotcrawler" + }, + "ZoomBot": { + "operator": "SEOZoom", + "respect": "Unclear at this time.", + "function": "SEO Crawlers", + "frequency": "Unclear at this time.", + "description": "ZoomBot is an SEO crawler operated by SEOZoom. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoombot" + }, + "007ac9 Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "007ac9 Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/007ac9-crawler" + }, + "2ip.ru": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "2ip.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-ru" + }, + "360Spider-Image": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "360Spider-Image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-image" + }, + "360Spider-Video": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "360Spider-Video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-video" + }, + "5emeRue": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "5emeRue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5emerue" + }, + "5erue": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "5erue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5erue" + }, + "A Patent Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "A Patent Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a-patent-crawler" + }, + "A6-Indexer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "A6-Indexer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a6-indexer" + }, + "Aboundex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Aboundex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aboundex" + }, + "AcademicBotRTU": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AcademicBotRTU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/academicbotrtu" + }, + "acapbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "acapbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acapbot" + }, + "acoonbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "acoonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acoonbot" + }, + "Acunetix Security Scanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Acunetix Security Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-security-scanner" + }, + "Acunetix Web Vulnerability Scanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Acunetix Web Vulnerability Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-web-vulnerability-scanner" + }, + "AddSearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AddSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addsearchbot" + }, + "AddThis": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AddThis is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addthis" + }, + "adequat": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "adequat is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat" + }, + "adequat-systems": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "adequat-systems is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat-systems" + }, + "AdIdxBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AdIdxBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adidxbot" + }, + "ADmantX": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ADmantX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/admantx" + }, + "adscanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "adscanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adscanner" + }, + "AdsTxtCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AdsTxtCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adstxtcrawler" + }, + "AdvBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AdvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/advbot" + }, + "AISearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AISearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aisearchbot" + }, + "Alexabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Alexabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexabot" + }, + "Alexibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Alexibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexibot" + }, + "AlphaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AlphaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alphabot" + }, + "AmiSoftware": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AmiSoftware is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/amisoftware" + }, + "antibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "antibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/antibot" + }, + "AnyEvent": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AnyEvent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anyevent" + }, + "Apercite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Apercite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/apercite" + }, + "AppInsights": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AppInsights is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/appinsights" + }, + "Aqua_Products": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Aqua_Products is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aqua-products" + }, + "arabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "arabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arabot" + }, + "Ask n read": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Ask n read is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ask-n-read" + }, + "asknread.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "asknread.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asknread-com" + }, + "AspiegelBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AspiegelBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aspiegelbot" + }, + "asterias": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "asterias is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asterias" + }, + "Augure": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Augure is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/augure" + }, + "auramundi": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "auramundi is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/auramundi" + }, + "AwarioRssBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "AwarioRssBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariorssbot" + }, + "awesomecrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "awesomecrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awesomecrawler" + }, + "B2B Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "B2B Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2b-bot" + }, + "b2w": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "b2w is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2w" + }, + "BackDoorBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BackDoorBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backdoorbot" + }, + "BacklinkCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BacklinkCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backlinkcrawler" + }, + "Baidu-YunGuanCe": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baidu-YunGuanCe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baidu-yunguance" + }, + "Baiduspider-image": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baiduspider-image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-image" + }, + "Baiduspider-news": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baiduspider-news is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-news" + }, + "Baiduspider-video": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Baiduspider-video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-video" + }, + "BDCbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BDCbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bdcbot" + }, + "BehloolBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BehloolBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/behloolbot" + }, + "betaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "betaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/betabot" + }, + "Better Uptime Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Better Uptime Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/better-uptime-bot" + }, + "bidswitchbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "bidswitchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bidswitchbot" + }, + "BIGLOTRON": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BIGLOTRON is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/biglotron" + }, + "binlar": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "binlar is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/binlar" + }, + "Birdcrawlerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Birdcrawlerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/birdcrawlerbot" + }, + "BitBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitbot" + }, + "Black Hole": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Black Hole is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/black-hole" + }, + "Blekkobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Blekkobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blekkobot" + }, + "blogmuraBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "blogmuraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blogmurabot" + }, + "BlowFish": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BlowFish is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blowfish" + }, + "BLP_bbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BLP_bbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blp-bbot" + }, + "bnf.fr_bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "bnf.fr_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bnf-fr-bot" + }, + "BomboraBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BomboraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bomborabot" + }, + "Bookmark search tool": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Bookmark search tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bookmark-search-tool" + }, + "bot-pge.chlooe.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "bot-pge.chlooe.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-pge-chlooe-com" + }, + "Bot.AraTurka.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Bot.AraTurka.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-araturka-com" + }, + "BotALot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BotALot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botalot" + }, + "botify": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "botify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botify" + }, + "BotRightHere": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BotRightHere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botrighthere" + }, + "BoxcarBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BoxcarBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/boxcarbot" + }, + "brainobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "brainobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brainobot" + }, + "BrandONbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BrandONbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandonbot" + }, + "BTWebClient": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BTWebClient is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/btwebclient" + }, + "BUbiNG": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BUbiNG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bubing" + }, + "Buck": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Buck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buck" + }, + "BuiltBotTough": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BuiltBotTough is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/builtbottough" + }, + "Bullseye": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Bullseye is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bullseye" + }, + "BunnySlippers": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "BunnySlippers is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bunnyslippers" + }, + "buzzbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "buzzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buzzbot" + }, + "Caliperbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Caliperbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/caliperbot" + }, + "CapsuleChecker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CapsuleChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/capsulechecker" + }, + "careerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "careerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/careerbot" + }, + "CC Metadata Scaper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CC Metadata Scaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cc-metadata-scaper" + }, + "Cegbfeieh": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cegbfeieh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cegbfeieh" + }, + "centurybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "centurybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/centurybot" + }, + "changedetection": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "changedetection is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/changedetection" + }, + "CheckMarkNetwork": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CheckMarkNetwork is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/checkmarknetwork" + }, + "CheeseBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CheeseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cheesebot" + }, + "CherryPicker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CherryPicker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypicker" + }, + "CherryPickerElite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CherryPickerElite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerelite" + }, + "CherryPickerSE": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CherryPickerSE is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerse" + }, + "Cision": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cision is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cision" + }, + "CISPA Webcrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CISPA Webcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cispa-webcrawler" + }, + "citeseerxbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "citeseerxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citeseerxbot" + }, + "Citoid": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Citoid is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citoid" + }, + "Claritybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Claritybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/claritybot" + }, + "Clickagy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Clickagy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/clickagy" + }, + "Cliqzbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cliqzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cliqzbot" + }, + "CloudFlare-AlwaysOnline": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CloudFlare-AlwaysOnline is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cloudflare-alwaysonline" + }, + "coccoc": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "coccoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccoc" + }, + "coccocbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "coccocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot" + }, + "coexel": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "coexel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coexel" + }, + "Companybook-Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Companybook-Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/companybook-crawler" + }, + "content crawler spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "content crawler spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/content-crawler-spider" + }, + "ContextAd Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ContextAd Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contextad-bot" + }, + "contxbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "contxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contxbot" + }, + "convera": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "convera is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/convera" + }, + "ConveraCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ConveraCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/converacrawler" + }, + "Cookiebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cookiebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cookiebot" + }, + "Copernic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Copernic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copernic" + }, + "CopyRightCheck": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CopyRightCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copyrightcheck" + }, + "Corporama": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Corporama is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/corporama" + }, + "cosmos": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "cosmos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cosmos" + }, + "crawler4j": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "crawler4j is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawler4j" + }, + "CrawlyProjectCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CrawlyProjectCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawlyprojectcrawler" + }, + "Crescent": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Crescent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent" + }, + "Crescent Internet ToolPak HTTP OLE Control v.1.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Crescent Internet ToolPak HTTP OLE Control v.1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent-internet-toolpak-http-ole-control-v-1-0" + }, + "CriteoBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CriteoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/criteobot" + }, + "CrunchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CrunchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crunchbot" + }, + "CrystalSemanticsBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CrystalSemanticsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crystalsemanticsbot" + }, + "Curebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Curebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/curebot" + }, + "Cutbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Cutbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cutbot" + }, + "cXensebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "cXensebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cxensebot" + }, + "CyberPatrol": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "CyberPatrol is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cyberpatrol" + }, + "DareBoost": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DareBoost is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dareboost" + }, + "Datafeedwatch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Datafeedwatch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datafeedwatch" + }, + "datagnionbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "datagnionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datagnionbot" + }, + "Datanyze": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Datanyze is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datanyze" + }, + "daumoa": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "daumoa is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daumoa" + }, + "deepcrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "deepcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepcrawl" + }, + "deepnoc": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "deepnoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepnoc" + }, + "DeuSu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DeuSu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deusu" + }, + "Digg Deeper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Digg Deeper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digg-deeper" + }, + "Digimind": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Digimind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digimind" + }, + "Digincore bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Digincore bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digincore-bot" + }, + "discobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "discobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discobot" + }, + "Disqus": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Disqus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/disqus" + }, + "DittoSpyder": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DittoSpyder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dittospyder" + }, + "DnyzBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DnyzBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dnyzbot" + }, + "Domain Re-Animator Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Domain Re-Animator Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domain-re-animator-bot" + }, + "DomainCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "DomainCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domaincrawler" + }, + "Dow Jones Searchbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Dow Jones Searchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dow-jones-searchbot" + }, + "Download Ninja": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Download Ninja is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/download-ninja" + }, + "Dragonbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Dragonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dragonbot" + }, + "drupact": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "drupact is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/drupact" + }, + "Dubbotbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Dubbotbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dubbotbot" + }, + "e.ventures Investment Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "e.ventures Investment Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/e-ventures-investment-crawler" + }, + "EasyBib AutoCite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EasyBib AutoCite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/easybib-autocite" + }, + "ec2linkfinder": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ec2linkfinder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ec2linkfinder" + }, + "edisterbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "edisterbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/edisterbot" + }, + "electricmonk": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "electricmonk is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/electricmonk" + }, + "elisabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "elisabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/elisabot" + }, + "ellisphere": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ellisphere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ellisphere" + }, + "EmailCollector": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EmailCollector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailcollector" + }, + "EmailSiphon": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EmailSiphon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailsiphon" + }, + "EmailWolf": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EmailWolf is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailwolf" + }, + "epicbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "epicbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/epicbot" + }, + "eright": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "eright is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eright" + }, + "EroCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EroCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/erocrawler" + }, + "EtaoSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EtaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/etaospider" + }, + "europarchive.org": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "europarchive.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/europarchive-org" + }, + "evc-batch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "evc-batch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/evc-batch" + }, + "EveryoneSocialBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EveryoneSocialBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/everyonesocialbot" + }, + "Exabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Exabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/exabot" + }, + "Experibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Experibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/experibot" + }, + "ExtLinksBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ExtLinksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extlinksbot" + }, + "ExtractorPro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ExtractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extractorpro" + }, + "Eyeotabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Eyeotabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eyeotabot" + }, + "EZID": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "EZID is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezid" + }, + "Ezooms": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Ezooms is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezooms" + }, + "Facebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Facebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebot" + }, + "FairAd Client": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FairAd Client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fairad-client" + }, + "FAST Enterprise Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FAST Enterprise Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-enterprise-crawler" + }, + "FAST-WebCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FAST-WebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-webcrawler" + }, + "FediDB": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FediDB is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedidb" + }, + "fedoraplanet": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fedoraplanet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedoraplanet" + }, + "Feedbin": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Feedbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbin" + }, + "feedbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "feedbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbot" + }, + "FeedBurner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FeedBurner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedburner" + }, + "Feedspot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Feedspot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedspot" + }, + "FeedValidator": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FeedValidator is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedvalidator" + }, + "FemtosearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FemtosearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/femtosearchbot" + }, + "Fever": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Fever is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fever" + }, + "FindITAnswersbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FindITAnswersbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/finditanswersbot" + }, + "findlink": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "findlink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findlink" + }, + "findthatfile": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "findthatfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findthatfile" + }, + "findxbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "findxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findxbot" + }, + "Flaming AttackBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Flaming AttackBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flaming-attackbot" + }, + "Flamingo_SearchEngine": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Flamingo_SearchEngine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flamingo-searchengine" + }, + "fluffy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fluffy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fluffy" + }, + "Foobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Foobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/foobot" + }, + "fr-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fr-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fr-crawler" + }, + "FreeWebMonitoring SiteChecker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FreeWebMonitoring SiteChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freewebmonitoring-sitechecker" + }, + "FreshpingBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "FreshpingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshpingbot" + }, + "fuelbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "fuelbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fuelbot" + }, + "Fyrebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Fyrebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fyrebot" + }, + "g00g1e.net": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "g00g1e.net is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g00g1e-net" + }, + "G2 Web Services": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "G2 Web Services is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2-web-services" + }, + "g2reader-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "g2reader-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2reader-bot" + }, + "Gaisbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gaisbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gaisbot" + }, + "GarlikCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GarlikCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/garlikcrawler" + }, + "Genieo": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Genieo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/genieo" + }, + "GetRight": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GetRight is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/getright" + }, + "Gigablast": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gigablast is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigablast" + }, + "Gigabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gigabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigabot" + }, + "GingerCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GingerCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gingercrawler" + }, + "Gluten Free Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gluten Free Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gluten-free-crawler" + }, + "gnam gnam spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "gnam gnam spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnam-gnam-spider" + }, + "GnowitNewsbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GnowitNewsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnowitnewsbot" + }, + "Google-Adwords-Instant": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Adwords-Instant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-adwords-instant" + }, + "Google-Certificates-Bridge": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Certificates-Bridge is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-certificates-bridge" + }, + "Google-PhysicalWeb": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-PhysicalWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-physicalweb" + }, + "Google-Site-Verification": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Site-Verification is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-site-verification" + }, + "Google-Structured-Data-Testing-Tool": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Google-Structured-Data-Testing-Tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-structured-data-testing-tool" + }, + "google-xrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "google-xrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-xrawler" + }, + "Gowikibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gowikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gowikibot" + }, + "grapeshot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "grapeshot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshot" + }, + "GrapeshotCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GrapeshotCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshotcrawler" + }, + "Grobbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Grobbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grobbot" + }, + "GroupHigh": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "GroupHigh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grouphigh" + }, + "grub-client": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "grub-client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-client" + }, + "grub.org": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "grub.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-org" + }, + "gsa-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "gsa-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gsa-crawler" + }, + "gslfbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "gslfbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gslfbot" + }, + "Gwene": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Gwene is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gwene" + }, + "Harvest": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Harvest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/harvest" + }, + "HawaiiBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "HawaiiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hawaiibot" + }, + "humanlinks": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "humanlinks is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/humanlinks" + }, + "hyscore.io": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "hyscore.io is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hyscore-io" + }, + "IAS crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IAS crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ias-crawler" + }, + "ICBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ICBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/icbot" + }, + "ichiro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ichiro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ichiro" + }, + "imrbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "imrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/imrbot" + }, + "IndeedBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IndeedBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/indeedbot" + }, + "INETDEX-BOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "INETDEX-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inetdex-bot" + }, + "InfoNaviRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "InfoNaviRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infonavirobot" + }, + "infoobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "infoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoobot" + }, + "infoseek": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "infoseek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoseek" + }, + "integromedb": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "integromedb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/integromedb" + }, + "intelium_bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "intelium_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/intelium-bot" + }, + "InterfaxScanBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "InterfaxScanBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/interfaxscanbot" + }, + "ip-web-crawler.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ip-web-crawler.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ip-web-crawler-com" + }, + "IRLbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IRLbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/irlbot" + }, + "Iron33": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Iron33 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iron33" + }, + "iskanie": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "iskanie is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iskanie" + }, + "IsraBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "IsraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/israbot" + }, + "istellabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "istellabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/istellabot" + }, + "it2media-domain-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "it2media-domain-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/it2media-domain-crawler" + }, + "James BOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "James BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/james-bot" + }, + "JamesBOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JamesBOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamesbot" + }, + "Jamie's Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jamie's Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamies-spider" + }, + "JenkersBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JenkersBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jenkersbot" + }, + "JennyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JennyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jennybot" + }, + "Jetbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jetbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetbot" + }, + "Jetty": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jetty is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetty" + }, + "JikeSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JikeSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jikespider" + }, + "JobboerseBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "JobboerseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jobboersebot" + }, + "Jooblebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Jooblebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jooblebot" + }, + "jpg-newsbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "jpg-newsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jpg-newsbot" + }, + "jyxobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "jyxobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jyxobot" + }, + "k2spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "k2spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k2spider" + }, + "K7MLWCBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "K7MLWCBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k7mlwcbot" + }, + "kbcrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "kbcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kbcrawl" + }, + "Kemvibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Kemvibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kemvibot" + }, + "Kenjin Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Kenjin Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kenjin-spider" + }, + "keys-so-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "keys-so-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keys-so-bot" + }, + "Keyword Density": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Keyword Density is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keyword-density" + }, + "Knowings": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Knowings is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/knowings" + }, + "KomodiaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "KomodiaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/komodiabot" + }, + "KosmioBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "KosmioBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kosmiobot" + }, + "Landau-Media-Spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Landau-Media-Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/landau-media-spider" + }, + "larbin": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "larbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/larbin" + }, + "Laserlikebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Laserlikebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/laserlikebot" + }, + "lb-spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lb-spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lb-spider" + }, + "leadbox": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "leadbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leadbox" + }, + "Leikibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Leikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leikibot" + }, + "LexiBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LexiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lexibot" + }, + "libWeb": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "libWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/libweb" + }, + "Linespider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Linespider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linespider" + }, + "Linguee Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Linguee Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linguee-bot" + }, + "linkapediabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "linkapediabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkapediabot" + }, + "LinkArchiver": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkArchiver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkarchiver" + }, + "LinkCheck by Siteimprove.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkCheck by Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkcheck-by-siteimprove-com" + }, + "linkdex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "linkdex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdex" + }, + "LinkextractorPro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkextractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkextractorpro" + }, + "LinkisBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkisBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkisbot" + }, + "linko": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "linko is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linko" + }, + "LinkpadBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkpadBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkpadbot" + }, + "LinkScan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LinkScan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkscan" + }, + "lipperhey": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lipperhey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lipperhey" + }, + "LivelapBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LivelapBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/livelapbot" + }, + "lkxscan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lkxscan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lkxscan" + }, + "LNSpiderguy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "LNSpiderguy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lnspiderguy" + }, + "lssbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssbot" + }, + "lssrocketcrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lssrocketcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssrocketcrawler" + }, + "ltx71": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ltx71 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ltx71" + }, + "Luminator-robots": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Luminator-robots is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/luminator-robots" + }, + "lwp-trivial": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "lwp-trivial is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lwp-trivial" + }, + "MaCoCu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MaCoCu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/macocu" + }, + "mappydata": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mappydata is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mappydata" + }, + "Mata Hari": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Mata Hari is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mata-hari" + }, + "MauiBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MauiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mauibot" + }, + "MBCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MBCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mbcrawler" + }, + "MegaIndex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MegaIndex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex" + }, + "MegaIndex.ru": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MegaIndex.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex-ru" + }, + "Meltawer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Meltawer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltawer" + }, + "Meltwater": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Meltwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwater" + }, + "MeltwaterNews": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MeltwaterNews is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwaternews" + }, + "memorybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "memorybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/memorybot" + }, + "mention": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mention is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mention" + }, + "MetaJobBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MetaJobBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metajobbot" + }, + "MetaURI": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MetaURI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metauri" + }, + "MIIxpc": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MIIxpc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miixpc" + }, + "mindUpBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mindUpBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mindupbot" + }, + "minicrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "minicrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/minicrawler" + }, + "Mister PiX": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Mister PiX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mister-pix" + }, + "MixnodeCache": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MixnodeCache is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mixnodecache" + }, + "mlbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mlbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mlbot" + }, + "moatbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "moatbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moatbot" + }, + "moget": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "moget is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moget" + }, + "Mojeek": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Mojeek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeek" + }, + "MoodleBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MoodleBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moodlebot" + }, + "Moreover": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Moreover is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moreover" + }, + "MS Search 4.0 Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MS Search 4.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-4-0-robot" + }, + "MS Search 6.0 Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MS Search 6.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-6-0-robot" + }, + "MSIECrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MSIECrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msiecrawler" + }, + "msrbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "msrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msrbot" + }, + "MTRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "MTRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mtrobot" + }, + "Multiviewbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Multiviewbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/multiviewbot" + }, + "mytwip": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "mytwip is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mytwip" + }, + "NAVER Blog Rssbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NAVER Blog Rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naver-blog-rssbot" + }, + "NaverBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NaverBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naverbot" + }, + "Neevabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Neevabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neevabot" + }, + "NerdByNature.Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NerdByNature.Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdbynature-bot" + }, + "nerdybot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "nerdybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdybot" + }, + "NetAnts": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NetAnts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netants" + }, + "netEstate NE Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "netEstate NE Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netestate-ne-crawler" + }, + "Neticle Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Neticle Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neticle-crawler" + }, + "NetMechanic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NetMechanic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netmechanic" + }, + "netresearchserver": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "netresearchserver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netresearchserver" + }, + "NetSystemsResearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NetSystemsResearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netsystemsresearch" + }, + "newsharecounts": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "newsharecounts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsharecounts" + }, + "NewsNow": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NewsNow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsnow" + }, + "Newzbin": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Newzbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newzbin" + }, + "NextGenSearchBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NextGenSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextgensearchbot" + }, + "NICErsPRO": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NICErsPRO is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicerspro" + }, + "niki-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "niki-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/niki-bot" + }, + "NimbleCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NimbleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimblecrawler" + }, + "Nimbostratus-Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Nimbostratus-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimbostratus-bot" + }, + "NINJA bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NINJA bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ninja-bot" + }, + "NIXStatsbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NIXStatsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nixstatsbot" + }, + "NLUX_IAHarvester": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NLUX_IAHarvester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nlux-iaharvester" + }, + "Nmap Scripting Engine": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Nmap Scripting Engine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nmap-scripting-engine" + }, + "NPBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NPBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/npbot" + }, + "NTENTbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "NTENTbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ntentbot" + }, + "Nuzzel": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Nuzzel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nuzzel" + }, + "OdklBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OdklBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/odklbot" + }, + "officestorebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "officestorebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/officestorebot" + }, + "Openbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Openbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openbot" + }, + "Openfind": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Openfind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind" + }, + "Openfind data gatherer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Openfind data gatherer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind-data-gatherer" + }, + "OpenGraphCheck": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OpenGraphCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opengraphcheck" + }, + "OpenHoseBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OpenHoseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openhosebot" + }, + "opinion-tracker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "opinion-tracker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opinion-tracker" + }, + "Oracle Ultra Search": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Oracle Ultra Search is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/oracle-ultra-search" + }, + "OrangeBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OrangeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orangebot" + }, + "Orthogaffe": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Orthogaffe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orthogaffe" + }, + "outbrain": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "outbrain is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outbrain" + }, + "OutclicksBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "OutclicksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outclicksbot" + }, + "page2rss": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "page2rss is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/page2rss" + }, + "PagePeeker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PagePeeker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagepeeker" + }, + "PageThing": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PageThing is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagething" + }, + "peer39_crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "peer39_crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/peer39-crawler" + }, + "PerMan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PerMan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/perman" + }, + "Pingdom": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Pingdom is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pingdom" + }, + "Pinterest": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Pinterest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterest" + }, + "PiplBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PiplBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/piplbot" + }, + "postrank": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "postrank is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/postrank" + }, + "PR-CY.RU": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PR-CY.RU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pr-cy-ru" + }, + "Primalbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Primalbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/primalbot" + }, + "PrivacyAwareBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "PrivacyAwareBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/privacyawarebot" + }, + "ProPowerBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ProPowerBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/propowerbot" + }, + "ProWebWalker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ProWebWalker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/prowebwalker" + }, + "proxem": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "proxem is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proxem" + }, + "psbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "psbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/psbot" + }, + "Pulsepoint": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Pulsepoint is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pulsepoint" + }, + "purebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "purebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/purebot" + }, + "QueryN Metasearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "QueryN Metasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/queryn-metasearch" + }, + "Qwam content intelligence": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Qwam content intelligence is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwam-content-intelligence" + }, + "Radiation Retriever 1.1": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Radiation Retriever 1.1 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/radiation-retriever-1-1" + }, + "RankActiveLinkBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RankActiveLinkBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankactivelinkbot" + }, + "RankFlex": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RankFlex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankflex" + }, + "Refindbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Refindbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/refindbot" + }, + "RegionStuttgartBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RegionStuttgartBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/regionstuttgartbot" + }, + "RepoMonkey": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RepoMonkey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey" + }, + "RepoMonkey Bait & Tackle": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RepoMonkey Bait & Tackle is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey-bait-tackle" + }, + "RetrevoPageAnalyzer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RetrevoPageAnalyzer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/retrevopageanalyzer" + }, + "ReverseEngineeringBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ReverseEngineeringBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/reverseengineeringbot" + }, + "RidderBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RidderBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ridderbot" + }, + "Riddler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Riddler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/riddler" + }, + "Rivva": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Rivva is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rivva" + }, + "Robozilla": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Robozilla is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/robozilla" + }, + "rssbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssbot" + }, + "RSSingBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RSSingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssingbot" + }, + "RukiCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RukiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rukicrawler" + }, + "RuxitSynthetic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RuxitSynthetic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ruxitsynthetic" + }, + "RyteBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "RyteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rytebot" + }, + "SafeDNSBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SafeDNSBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safednsbot" + }, + "SafeSearch microdata crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SafeSearch microdata crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safesearch-microdata-crawler" + }, + "SBL-BOT": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SBL-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sbl-bot" + }, + "score3": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "score3 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/score3" + }, + "ScoutJet": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ScoutJet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoutjet" + }, + "scribdbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "scribdbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scribdbot" + }, + "Scrubby": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Scrubby is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scrubby" + }, + "search.marginalia.nu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "search.marginalia.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/search-marginalia-nu" + }, + "SearchAtlas": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SearchAtlas is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchatlas" + }, + "SearchmetricsBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SearchmetricsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchmetricsbot" + }, + "searchpreview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "searchpreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchpreview" + }, + "seekbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seekbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekbot" + }, + "Seekport Crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Seekport Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekport-crawler" + }, + "Seekr": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Seekr is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekr" + }, + "seewithkids": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seewithkids is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seewithkids" + }, + "semanticbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "semanticbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticbot" + }, + "sempi.tech": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sempi.tech is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sempi-tech" + }, + "SemrushBot-BM": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SemrushBot-BM is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-bm" + }, + "SemrushBot-SA": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SemrushBot-SA is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-sa" + }, + "sentibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sentibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sentibot" + }, + "SEOkicks-Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SEOkicks-Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks-robot" + }, + "seoscanners": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seoscanners is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seoscanners" + }, + "seostar.co": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "seostar.co is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostar-co" + }, + "SEOstats": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SEOstats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostats" + }, + "SimpleCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SimpleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplecrawler" + }, + "SimpleScraper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SimpleScraper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplescraper" + }, + "Sindup": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sindup is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sindup" + }, + "sistrix crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sistrix crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sistrix-crawler" + }, + "SiteBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SiteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitebot" + }, + "sitecheck.internetseer.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "sitecheck.internetseer.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheck-internetseer-com" + }, + "siteexplorer.info": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "siteexplorer.info is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteexplorer-info" + }, + "Siteimprove": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Siteimprove is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove" + }, + "Siteimprove.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove-com" + }, + "SiteSnagger": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SiteSnagger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesnagger" + }, + "SiteSucker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SiteSucker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesucker" + }, + "Slack-ImgProxy": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Slack-ImgProxy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slack-imgproxy" + }, + "Slackbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Slackbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot" + }, + "Slurp": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Slurp is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slurp" + }, + "SocialRankIOBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SocialRankIOBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/socialrankiobot" + }, + "Sogou": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sogou is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou" + }, + "Sogou inst spider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sogou inst spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-inst-spider" + }, + "Sogou spider2": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sogou spider2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-spider2" + }, + "Sonic": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sonic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sonic" + }, + "Sosospider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sosospider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sosospider" + }, + "SpankBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SpankBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spankbot" + }, + "spanner": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "spanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spanner" + }, + "spbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "spbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spbot" + }, + "Spinn3r": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Spinn3r is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spinn3r" + }, + "spotter": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "spotter is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spotter" + }, + "SputnikBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SputnikBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sputnikbot" + }, + "Storebot-Google": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Storebot-Google is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storebot-google" + }, + "StorygizeBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "StorygizeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storygizebot" + }, + "StractBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "StractBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/stractbot" + }, + "Streamline3Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Streamline3Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/streamline3bot" + }, + "SummalyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SummalyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summalybot" + }, + "summify": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "summify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summify" + }, + "SuperBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SuperBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superbot" + }, + "SurveyBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SurveyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surveybot" + }, + "suzuran": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "suzuran is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/suzuran" + }, + "Swiftbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Swiftbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swiftbot" + }, + "SWIMGBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "SWIMGBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swimgbot" + }, + "Synthesio": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Synthesio is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synthesio" + }, + "Sysomos": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Sysomos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sysomos" + }, + "Szukacz": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Szukacz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/szukacz" + }, + "Taboolabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Taboolabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/taboolabot" + }, + "tagoobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tagoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tagoobot" + }, + "Talkwater": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Talkwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/talkwater" + }, + "TangibleeBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TangibleeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tangibleebot" + }, + "Teleport": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Teleport is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleport" + }, + "TeleportPro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TeleportPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleportpro" + }, + "Telesoft": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Telesoft is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telesoft" + }, + "The Intraformant": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "The Intraformant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/the-intraformant" + }, + "TheNomad": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TheNomad is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thenomad" + }, + "theoldreader.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "theoldreader.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/theoldreader-com" + }, + "Thinklab": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Thinklab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thinklab" + }, + "tigerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tigerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tigerbot" + }, + "Titan": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Titan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/titan" + }, + "toCrawl": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "toCrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tocrawl" + }, + "TombaPublicWebCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TombaPublicWebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tombapublicwebcrawler" + }, + "toplistbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "toplistbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toplistbot" + }, + "ToutiaoSpider": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ToutiaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toutiaospider" + }, + "Traackr.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Traackr.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/traackr-com" + }, + "tracemyfile": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tracemyfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tracemyfile" + }, + "trafilatura": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trafilatura is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trafilatura" + }, + "trendeo": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trendeo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendeo" + }, + "trendkite-akashic-crawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trendkite-akashic-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendkite-akashic-crawler" + }, + "trendybuzz": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trendybuzz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendybuzz" + }, + "trovitBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "trovitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trovitbot" + }, + "True_Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "True_Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/true-robot" + }, + "TruliaBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "TruliaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/truliabot" + }, + "turingos": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "turingos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turingos" + }, + "tweetedtimes": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "tweetedtimes is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetedtimes" + }, + "twengabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "twengabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twengabot" + }, + "Twurly": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Twurly is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twurly" + }, + "UbiCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UbiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ubicrawler" + }, + "um-IC": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "um-IC is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ic" + }, + "Updownerbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Updownerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/updownerbot" + }, + "Upflow": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Upflow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/upflow" + }, + "Uptime-Kuma": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Uptime-Kuma is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptime-kuma" + }, + "Uptimebot.org": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Uptimebot.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimebot-org" + }, + "UptimeRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UptimeRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimerobot" + }, + "URL Control": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "URL Control is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-control" + }, + "URL_Spider_Pro": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "URL_Spider_Pro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-spider-pro" + }, + "urlappendbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "urlappendbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urlappendbot" + }, + "URLy Warning": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "URLy Warning is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urly-warning" + }, + "usasearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "usasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usasearch" + }, + "UsineNouvelleCrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UsineNouvelleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usinenouvellecrawler" + }, + "UT-Dorkbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "UT-Dorkbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ut-dorkbot" + }, + "Validator.nu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Validator.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/validator-nu" + }, + "VCI": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VCI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci" + }, + "VCI WebViewer VCI WebViewer Win32": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VCI WebViewer VCI WebViewer Win32 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci-webviewer-vci-webviewer-win32" + }, + "vebidoobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vebidoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vebidoobot" + }, + "vecteurplus": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vecteurplus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vecteurplus" + }, + "Veoozbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Veoozbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/veoozbot" + }, + "verticalsearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "verticalsearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/verticalsearch" + }, + "Vigil": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Vigil is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vigil" + }, + "VKRobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VKRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkrobot" + }, + "voilabot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "voilabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voilabot" + }, + "voltron": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "voltron is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voltron" + }, + "VoluumDSP-content-bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "VoluumDSP-content-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voluumdsp-content-bot" + }, + "vsw": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vsw is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vsw" + }, + "vuhuvBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "vuhuvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vuhuvbot" + }, + "W3C_I18n-Checker": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C_I18n-Checker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-i18n-checker" + }, + "W3C_Unicorn": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C_Unicorn is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-unicorn" + }, + "W3C-checklink": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C-checklink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-checklink" + }, + "W3C-mobileOK": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "W3C-mobileOK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-mobileok" + }, + "WASALive-Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WASALive-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wasalive-bot" + }, + "wbsearchbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "wbsearchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wbsearchbot" + }, + "Web Image Collector": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Web Image Collector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-image-collector" + }, + "web-archive-net.com.bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "web-archive-net.com.bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-archive-net-com-bot" + }, + "WebAuto": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebAuto is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webauto" + }, + "WebBandit": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebBandit is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webbandit" + }, + "WebCapture 2.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCapture 2.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcapture-2-0" + }, + "webcompanycrawler": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "webcompanycrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcompanycrawler" + }, + "WebCopier": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCopier is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier" + }, + "WebCopier v.2.2": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCopier v.2.2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v-2-2" + }, + "WebCopier v3.2a": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebCopier v3.2a is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v3-2a" + }, + "WebDataStats": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebDataStats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webdatastats" + }, + "WebEnhancer": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebEnhancer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webenhancer" + }, + "WebmasterWorldForumBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebmasterWorldForumBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmasterworldforumbot" + }, + "webmon": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "webmon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmon" + }, + "WebReaper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebReaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webreaper" + }, + "WebSauger": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebSauger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/websauger" + }, + "Website Quester": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Website Quester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/website-quester" + }, + "WebStripper": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebStripper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webstripper" + }, + "WebZIP": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WebZIP is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webzip" + }, + "winello": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "winello is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winello" + }, + "WinHTTrack": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WinHTTrack is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winhttrack" + }, + "WiseGuys Robot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WiseGuys Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wiseguys-robot" + }, + "wocbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "wocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wocbot" + }, + "woobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "woobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woobot" + }, + "woorankreview": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "woorankreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woorankreview" + }, + "WordupInfoSearch": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WordupInfoSearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wordupinfosearch" + }, + "woriobot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "woriobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woriobot" + }, + "wotbox": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "wotbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wotbox" + }, + "WWW-Collector-E": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WWW-Collector-E is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-collector-e" + }, + "WWW-Mechanize": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "WWW-Mechanize is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-mechanize" + }, + "www.uptime.com": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "www.uptime.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-uptime-com" + }, + "Xenu": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu" + }, + "Xenu Link Sleuth": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu Link Sleuth is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu-link-sleuth" + }, + "Xenu's": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu's is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus" + }, + "Xenu's Link Sleuth 1.1c": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Xenu's Link Sleuth 1.1c is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus-link-sleuth-1-1c" + }, + "xovibot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "xovibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xovibot" + }, + "Yahoo Pipes 1.0": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Yahoo Pipes 1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-pipes-1-0" + }, + "YaK": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YaK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yak" + }, + "YandexMobileBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YandexMobileBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexmobilebot" + }, + "YandexVideo": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YandexVideo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexvideo" + }, + "yanga": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "yanga is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yanga" + }, + "Yellowbrandprotectionbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Yellowbrandprotectionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yellowbrandprotectionbot" + }, + "yoozBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "yoozBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yoozbot" + }, + "YoudaoBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "YoudaoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youdaobot" + }, + "Youmag": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Youmag is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youmag" + }, + "Zabbix": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zabbix is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zabbix" + }, + "Zao": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zao is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zao" + }, + "Zealbot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zealbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zealbot" + }, + "zenback bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "zenback bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zenback-bot" + }, + "Zeus": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zeus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus" + }, + "Zeus Link Scout": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zeus Link Scout is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus-link-scout" + }, + "zgrab": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "zgrab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zgrab" + }, + "Zite": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "Zite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zite" + }, + "ZuperlistBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ZuperlistBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zuperlistbot" + }, + "ZyBORG": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "Uncategorized", + "frequency": "Unclear at this time.", + "description": "ZyBORG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zyborg" } } \ No newline at end of file From 944bee0f5655f0f4bf5dd413d5ea8ba3677ae6aa Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Wed, 7 Aug 2024 11:31:58 +0100 Subject: [PATCH 057/249] call main after update --- .github/workflows/daily_update.yml | 7 +- .github/workflows/main.yml | 1 + code/dark_visitors.py | 2 + robots.json | 5446 +--------------------------- 4 files changed, 50 insertions(+), 5406 deletions(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 1901520..9b0b4a2 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -1,7 +1,7 @@ name: Daily Update from Dark Visitors on: schedule: - - cron: "0 0 * * *" + - cron: "*/10 * * * *" jobs: dark-visitors: @@ -18,4 +18,7 @@ jobs: python code/dark_visitors.py git add -A git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) - shell: bash \ No newline at end of file + shell: bash + call-main: + uses: ./.github/workflows/main.yml + secrets: inherit \ No newline at end of file diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index ca5efd2..c3f3f57 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -1,4 +1,5 @@ on: + workflow_call: push: paths: - 'robots.json' diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 3b9775b..e6f9c2e 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -26,6 +26,8 @@ to_include = [ for section in soup.find_all("div", {"class": "agent-links-section"}): category = section.find("h2").get_text() + if category not in to_include: + continue for agent in section.find_all("a", href=True): name = agent.find("div", {"class": "agent-name"}).get_text().strip() desc = agent.find("p").get_text().strip() diff --git a/robots.json b/robots.json index 00b594f..ef8b335 100644 --- a/robots.json +++ b/robots.json @@ -7,14 +7,14 @@ "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." }, "anthropic-ai": { - "operator": "[Anthropic](https://www.anthropic.com)", + "operator": "[Anthropic](https:\/\/www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot-Extended": { - "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", + "operator": "[Apple](https:\/\/support.apple.com\/en-us\/119829#datausage)", "respect": "Yes", "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", "frequency": "Unclear at this time.", @@ -28,5554 +28,192 @@ "description": "Downloads data to train LLMS, including ChatGPT competitors." }, "CCBot": { - "operator": "[Common Crawl](https://commoncrawl.org)", - "respect": "[Yes](https://commoncrawl.org/ccbot)", + "operator": "[Common Crawl](https:\/\/commoncrawl.org)", + "respect": "[Yes](https:\/\/commoncrawl.org\/ccbot)", "function": "Provides crawl data for an open source repository that has been used to train LLMs.", "frequency": "Unclear at this time.", "description": "Sources data that is made openly available and is used to train AI models." }, "ChatGPT-User": { - "operator": "[OpenAI](https://openai.com)", + "operator": "[OpenAI](https:\/\/openai.com)", "respect": "Yes", "function": "Takes action based on user prompts.", "frequency": "Only when prompted by a user.", "description": "Used by plugins in ChatGPT to answer queries based on user input." }, "ClaudeBot": { - "operator": "[Anthropic](https://www.anthropic.com)", + "operator": "[Anthropic](https:\/\/www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Claude-Web": { - "operator": "[Anthropic](https://www.anthropic.com)", + "operator": "[Anthropic](https:\/\/www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "cohere-ai": { - "operator": "[Cohere](https://cohere.com)", + "operator": "[Cohere](https:\/\/cohere.com)", "respect": "Unclear at this time.", "function": "Retrieves data to provide responses to user-initiated prompts.", "frequency": "Takes action based on user prompts.", "description": "Retrieves data based on user prompts." }, "Diffbot": { - "operator": "[Diffbot](https://www.diffbot.com/)", + "operator": "[Diffbot](https:\/\/www.diffbot.com\/)", "respect": "At the discretion of Diffbot users.", "function": "Aggregates structured web data for monitoring and AI model training.", "frequency": "Unclear at this time.", "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." }, "FacebookBot": { - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", + "operator": "Meta\/Facebook", + "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", "function": "Training language models", "frequency": "Up to 1 page per second", "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." }, "facebookexternalhit": { - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", - "function": "Fetchers", + "operator": "Meta\/Facebook", + "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "function": "No information.", "frequency": "Unclear at this time.", - "description": "facebookexternalhit is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebookexternalhit" + "description": "Unclear at this time." }, "FriendlyCrawler": { "operator": "Unknown", - "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)", + "respect": "[Yes](https:\/\/imho.alex-kunz.com\/2024\/01\/25\/an-update-on-friendly-crawler)", "function": "We are using the data from the crawler to build datasets for machine learning experiments.", "frequency": "Unclear at this time.", "description": "Unclear who the operator is; but data is used for training/machine learning." }, "Google-Extended": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "LLM training.", "frequency": "No information.", "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." }, "GoogleOther": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Image": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Video": { "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GPTBot": { - "operator": "[OpenAI](https://openai.com)", + "operator": "[OpenAI](https:\/\/openai.com)", "respect": "Yes", "function": "Scrapes data to train OpenAI's products.", "frequency": "No information.", "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." }, "ICC-Crawler": { - "operator": "[NICT](https://nict.go.jp)", + "operator": "[NICT](https:\/\/nict.go.jp)", "respect": "Yes", "function": "Scrapes data to train and support AI technologies.", "frequency": "No information.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, "ImagesiftBot": { - "operator": "[ImageSift](https://imagesift.com)", - "respect": "[Yes](https://imagesift.com/about)", + "operator": "[ImageSift](https:\/\/imagesift.com)", + "respect": "[Yes](https:\/\/imagesift.com\/about)", "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", "frequency": "No information.", "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." }, "img2dataset": { - "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", + "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", "respect": "Unclear at this time.", "function": "Scrapes images for use in LLMs.", "frequency": "At the discretion of img2dataset users.", "description": "Downloads large sets of images into datasets for LLM training or other purposes." }, "Meta-ExternalAgent": { - "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", + "operator": "[Meta](https:\/\/developers.facebook.com\/docs\/sharing\/webmasters\/web-crawlers)", "respect": "Yes.", "function": "Used to train models and improve products.", "frequency": "No information.", "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" }, "OAI-SearchBot": { - "operator": "[OpenAI](https://openai.com)", - "respect": "[Yes](https://platform.openai.com/docs/bots)", + "operator": "[OpenAI](https:\/\/openai.com)", + "respect": "[Yes](https:\/\/platform.openai.com\/docs\/bots)", "function": "Search result generation.", "frequency": "No information.", "description": "Crawls sites to surface as results in SearchGPT." }, "omgili": { - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", + "operator": "[Webz.io](https:\/\/webz.io\/)", + "respect": "[Yes](https:\/\/webz.io\/blog\/web-data\/what-is-the-omgili-bot-and-why-is-it-crawling-your-website\/)", "function": "Data is sold.", "frequency": "No information.", "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." }, "omgilibot": { - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)", + "operator": "[Webz.io](https:\/\/webz.io\/)", + "respect": "[Yes](https:\/\/web.archive.org\/web\/20170704003301\/http:\/\/omgili.com\/Crawler.html)", "function": "Data is sold.", "frequency": "No information.", "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." }, "PerplexityBot": { - "operator": "[Perplexity](https://www.perplexity.ai/)", - "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", + "operator": "[Perplexity](https:\/\/www.perplexity.ai\/)", + "respect": "[No](https:\/\/www.macstories.net\/stories\/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler\/)", "function": "Used to answer queries at the request of users.", "frequency": "Takes action based on user prompts.", "description": "Operated by Perplexity to obtain results in response to user queries." }, "PetalBot": { - "operator": "[Huawei](https://huawei.com/)", + "operator": "[Huawei](https:\/\/huawei.com\/)", "respect": "Yes", "function": "Used to provide recommendations in Hauwei assistant and AI search services.", "frequency": "No explicit frequency provided.", "description": "Operated by Huawei to provide search and AI assistant services." }, "Scrapy": { - "operator": "[Zyte](https://www.zyte.com)", + "operator": "[Zyte](https:\/\/www.zyte.com)", "respect": "Unclear at this time.", "function": "Scrapes data a variety of uses including training AI.", "frequency": "No information.", "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" }, "Timpibot": { - "operator": "[Timpi](https://timpi.io)", + "operator": "[Timpi](https:\/\/timpi.io)", "respect": "Unclear at this time.", "function": "Scrapes data for use in training LLMs.", "frequency": "No information.", "description": "Makes data available for training AI models." }, "VelenPublicWebCrawler": { - "operator": "[Velen Crawler](https://velen.io)", - "respect": "[Yes](https://velen.io)", + "operator": "[Velen Crawler](https:\/\/velen.io)", + "respect": "[Yes](https:\/\/velen.io)", "function": "Scrapes data for business data sets and machine learning models.", "frequency": "No information.", "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" }, "YouBot": { - "operator": "[You](https://about.you.com/youchat/)", - "respect": "[Yes](https://about.you.com/youbot/)", + "operator": "[You](https:\/\/about.you.com\/youchat\/)", + "respect": "[Yes](https:\/\/about.you.com\/youbot\/)", "function": "Scrapes data for search engine and LLMs.", "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." - }, - "Meta-ExternalFetcher": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "AI Assistants", - "frequency": "Unclear at this time.", - "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" - }, - "Applebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "AI Search Crawlers", - "frequency": "Unclear at this time.", - "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" - }, - "archive.org_bot": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "archive.org_bot is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archive-org-bot" - }, - "Arquivo-web-crawler": { - "operator": "Arquivo", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "Arquivo-web-crawler is an archiver operated by Arquivo.pt. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arquivo-web-crawler" - }, - "heritrix": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "heritrix is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/heritrix" - }, - "ia_archiver": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "ia_archiver is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver" - }, - "ia_archiver-web.archive.org": { - "operator": "Internet Archive", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "ia_archiver-web.archive.org is an archiver operated by Internet Archive. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ia-archiver-web-archive-org" - }, - "Nicecrawler": { - "operator": "NiceCrawler", - "respect": "Unclear at this time.", - "function": "Archivers", - "frequency": "Unclear at this time.", - "description": "Nicecrawler is an archiver operated by NiceCrawler. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicecrawler" - }, - "2ip bot": { - "operator": "2IP", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "2ip bot is a developer helper operated by 2IP. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-bot" - }, - "AhrefsSiteAudit": { - "operator": "Ahrefs", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "AhrefsSiteAudit is a developer helper operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefssiteaudit" - }, - "BingPreview": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "BingPreview is a developer helper operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingpreview" - }, - "Chrome-Lighthouse": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "Chrome-Lighthouse is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/chrome-lighthouse" - }, - "Dark Visitor": { - "operator": "Dark Visitors", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "Dark Visitor is a developer helper operated by Dark Visitors. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dark-visitor" - }, - "deadlinkchecker": { - "operator": "Dead Link Checker", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "deadlinkchecker is a developer helper operated by Dead Link Checker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deadlinkchecker" - }, - "Google-InspectionTool": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "Google-InspectionTool is a developer helper operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-inspectiontool" - }, - "rogerbot": { - "operator": "Moz", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "rogerbot is a developer helper operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rogerbot" - }, - "SiteAuditBot": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "SiteAuditBot is a developer helper operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteauditbot" - }, - "t3versionsBot": { - "operator": "T3Versions", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "t3versionsBot is a developer helper operated by T3Versions. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/t3versionsbot" - }, - "W3C_CSS_Validator": { - "operator": "W3C", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "W3C_CSS_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-css-validator" - }, - "W3C_Validator": { - "operator": "W3C", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "W3C_Validator is a developer helper operated by W3C. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-validator" - }, - "WellKnownBot": { - "operator": "Well-Known", - "respect": "Unclear at this time.", - "function": "Developer Helpers", - "frequency": "Unclear at this time.", - "description": "WellKnownBot is a developer helper operated by Well-Known. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wellknownbot" - }, - "BazQux": { - "operator": "BazQux", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "BazQux is a fetcher operated by BazQux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bazqux" - }, - "bitlybot": { - "operator": "Bitly", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "bitlybot is a fetcher operated by Bitly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitlybot" - }, - "BublupBot": { - "operator": "Bublup", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "BublupBot is a fetcher operated by Bublup. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bublupbot" - }, - "Discordbot": { - "operator": "Discord", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Discordbot is a fetcher operated by Discord. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discordbot" - }, - "Embedly": { - "operator": "Embedly", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Embedly is a fetcher operated by Embedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/embedly" - }, - "Feedly": { - "operator": "Feedly", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Feedly is a fetcher operated by Feedly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedly" - }, - "FlipboardProxy": { - "operator": "Flipboard", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "FlipboardProxy is a fetcher operated by Flipboard. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flipboardproxy" - }, - "FreshRSS": { - "operator": "FreshRSS", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "FreshRSS is a fetcher operated by FreshRSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshrss" - }, - "Friendica": { - "operator": "Friendica", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Friendica is a fetcher operated by Friendica. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/friendica" - }, - "Google Web Preview": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Google Web Preview is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-web-preview" - }, - "Google-Read-Aloud": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Google-Read-Aloud is a fetcher operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-read-aloud" - }, - "Hatena": { - "operator": "Hatena", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Hatena is a fetcher operated by Hatena. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hatena" - }, - "Iframely": { - "operator": "Iframely", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Iframely is a fetcher operated by Iframely. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iframely" - }, - "inoreader": { - "operator": "Inoreader", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "inoreader is a fetcher operated by Inoreader. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inoreader" - }, - "LinkedInBot": { - "operator": "LinkedIn", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "LinkedInBot is a fetcher operated by LinkedIn. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkedinbot" - }, - "Mail.RU_Bot": { - "operator": "VK", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Mail.RU_Bot is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mail-ru-bot" - }, - "Mastodon": { - "operator": "Mastodon", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Mastodon is a fetcher operated by Mastodon. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mastodon" - }, - "Miniflux": { - "operator": "Miniflux", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Miniflux is a fetcher operated by Miniflux. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miniflux" - }, - "NewsBlur": { - "operator": "NewsBlur", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "NewsBlur is a fetcher operated by NewsBlur. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsblur" - }, - "Nextcloud": { - "operator": "Nextcloud", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Nextcloud is a fetcher operated by Nextcloud. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextcloud" - }, - "Pinterestbot": { - "operator": "Pinterest", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Pinterestbot is a fetcher operated by Pinterest. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterestbot" - }, - "PocketParser": { - "operator": "Pocket", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "PocketParser is a fetcher operated by Pocket. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pocketparser" - }, - "redditbot": { - "operator": "Reddit", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "redditbot is a fetcher operated by Reddit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/redditbot" - }, - "SerendeputyBot": { - "operator": "Serendeputy", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SerendeputyBot is a fetcher operated by Serendeputy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serendeputybot" - }, - "SimplePie": { - "operator": "SimplePie", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SimplePie is a fetcher operated by SimplePie. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplepie" - }, - "SkypeUriPreview": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SkypeUriPreview is a fetcher operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/skypeuripreview" - }, - "Slackbot-LinkExpanding": { - "operator": "Slack", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Slackbot-LinkExpanding is a fetcher operated by Slack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot-linkexpanding" - }, - "Snap URL Preview Service": { - "operator": "Snap", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Snap URL Preview Service is a fetcher operated by Snap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snap-url-preview-service" - }, - "snapchat": { - "operator": "Snapchat", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "snapchat is a fetcher operated by Snapchat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/snapchat" - }, - "startmebot": { - "operator": "Start", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "startmebot is a fetcher operated by Start.me. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/startmebot" - }, - "Superfeedr": { - "operator": "Superfeedr", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Superfeedr is a fetcher operated by Superfeedr. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superfeedr" - }, - "SurdotlyBot": { - "operator": "Sur", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "SurdotlyBot is a fetcher operated by Sur.ly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surdotlybot" - }, - "Synapse": { - "operator": "Matrix", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Synapse is a fetcher operated by Matrix. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synapse" - }, - "TelegramBot": { - "operator": "Telegram", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "TelegramBot is a fetcher operated by Telegram. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telegrambot" - }, - "Tiny Tiny RSS": { - "operator": "Tiny Tiny RSS", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Tiny Tiny RSS is a fetcher operated by Tiny Tiny RSS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tiny-tiny-rss" - }, - "Twitterbot": { - "operator": "X", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Twitterbot is a fetcher operated by X. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twitterbot" - }, - "Viber": { - "operator": "Viber", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Viber is a fetcher operated by Viber. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/viber" - }, - "vkShare": { - "operator": "VK", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "vkShare is a fetcher operated by VK. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkshare" - }, - "WhatsApp": { - "operator": "Meta", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "WhatsApp is a fetcher operated by Meta. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/whatsapp" - }, - "Yahoo Link Preview": { - "operator": "Yahoo", - "respect": "Unclear at this time.", - "function": "Fetchers", - "frequency": "Unclear at this time.", - "description": "Yahoo Link Preview is a fetcher operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-link-preview" - }, - "adbeat_bot": { - "operator": "Adbeat", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "adbeat_bot is an intelligence gatherer operated by Adbeat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adbeat-bot" - }, - "AdsBot-Google": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AdsBot-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google" - }, - "AdsBot-Google-Mobile": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AdsBot-Google-Mobile is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adsbot-google-mobile" - }, - "aiHitBot": { - "operator": "aiHit", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "aiHitBot is an intelligence gatherer operated by aiHit. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aihitbot" - }, - "AndersPinkBot": { - "operator": "Anders Pink", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AndersPinkBot is an intelligence gatherer operated by Anders Pink. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anderspinkbot" - }, - "ArchiveBot": { - "operator": "Wikimedia", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "ArchiveBot is an intelligence gatherer operated by Wikimedia. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/archivebot" - }, - "AwarioBot": { - "operator": "Awario", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AwarioBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariobot" - }, - "AwarioSmartBot": { - "operator": "Awario", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "AwarioSmartBot is an intelligence gatherer operated by Awario. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariosmartbot" - }, - "BitSightBot": { - "operator": "Bitsight", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "BitSightBot is an intelligence gatherer operated by Bitsight. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitsightbot" - }, - "Blackboard": { - "operator": "Anthology", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Blackboard is an intelligence gatherer operated by Anthology. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blackboard" - }, - "BrandVerity": { - "operator": "BrandVerity", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "BrandVerity is an intelligence gatherer operated by BrandVerity. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandverity" - }, - "Cincraw": { - "operator": "CINC", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Cincraw is an intelligence gatherer operated by CINC. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cincraw" - }, - "ev-crawler": { - "operator": "Headline", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "ev-crawler is an intelligence gatherer operated by Headline. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ev-crawler" - }, - "Google-Safety": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Google-Safety is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-safety" - }, - "HubSpot": { - "operator": "HubSpot", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "HubSpot is an intelligence gatherer operated by HubSpot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hubspot" - }, - "IonCrawl": { - "operator": "IONOS", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "IonCrawl is an intelligence gatherer operated by IONOS. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ioncrawl" - }, - "Jugendschutzprogramm-Crawler": { - "operator": "JusProg", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Jugendschutzprogramm-Crawler is an intelligence gatherer operated by JusProg. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jugendschutzprogramm-crawler" - }, - "KStandBot": { - "operator": "URL Classification", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "KStandBot is an intelligence gatherer operated by URL Classification. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kstandbot" - }, - "LightspeedSystemsCrawler": { - "operator": "Lightspeed Systems", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "LightspeedSystemsCrawler is an intelligence gatherer operated by Lightspeed Systems. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lightspeedsystemscrawler" - }, - "linkfluence": { - "operator": "Meltwater", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "linkfluence is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkfluence" - }, - "LinkWalker": { - "operator": "Fortra", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "LinkWalker is an intelligence gatherer operated by Fortra. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkwalker" - }, - "magpie-crawler": { - "operator": "Brandwatch", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "magpie-crawler is an intelligence gatherer operated by Brandwatch. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/magpie-crawler" - }, - "Mediapartners-Google": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Mediapartners-Google is an intelligence gatherer operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediapartners-google" - }, - "Mediatoolkitbot": { - "operator": "Determ", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Mediatoolkitbot is an intelligence gatherer operated by Determ. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mediatoolkitbot" - }, - "MuckRack": { - "operator": "Muck Rack", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "MuckRack is an intelligence gatherer operated by Muck Rack. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/muckrack" - }, - "NetcraftSurveyAgent": { - "operator": "Netcraft", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "NetcraftSurveyAgent is an intelligence gatherer operated by Netcraft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netcraftsurveyagent" - }, - "Netvibes": { - "operator": "Netvibes", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Netvibes is an intelligence gatherer operated by Netvibes. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netvibes" - }, - "Pandalytics": { - "operator": "Domainsbot", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Pandalytics is an intelligence gatherer operated by Domainsbot. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pandalytics" - }, - "panscient.com": { - "operator": "Panscient", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "panscient.com is an intelligence gatherer operated by Panscient. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/panscient-com" - }, - "proximic": { - "operator": "Comscore", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "proximic is an intelligence gatherer operated by Comscore. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proximic" - }, - "scoop.it": { - "operator": "Meltwater", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "scoop.it is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoop-it" - }, - "SeekportBot": { - "operator": "Seekport", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "SeekportBot is an intelligence gatherer operated by Seekport. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekportbot" - }, - "SMTBot": { - "operator": "SimilarTech", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "SMTBot is an intelligence gatherer operated by SimilarTech. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/smtbot" - }, - "trendictionbot": { - "operator": "Trendiction", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "trendictionbot is an intelligence gatherer operated by Trendiction. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendictionbot" - }, - "TrendsmapResolver": { - "operator": "Trendsmap", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "TrendsmapResolver is an intelligence gatherer operated by Trendsmap. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendsmapresolver" - }, - "Turnitin": { - "operator": "Turnitin", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Turnitin is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitin" - }, - "TurnitinBot": { - "operator": "Turnitin", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "TurnitinBot is an intelligence gatherer operated by Turnitin. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turnitinbot" - }, - "TweetmemeBot": { - "operator": "Meltwater", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "TweetmemeBot is an intelligence gatherer operated by Meltwater. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetmemebot" - }, - "Twingly": { - "operator": "Twingly", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "Twingly is an intelligence gatherer operated by Twingly. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twingly" - }, - "um-LN": { - "operator": "Ubermetrics", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "um-LN is an intelligence gatherer operated by Ubermetrics. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ln" - }, - "virustotal": { - "operator": "VirusTotal", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "virustotal is an intelligence gatherer operated by VirusTotal. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/virustotal" - }, - "ZoominfoBot": { - "operator": "ZoomInfo", - "respect": "Unclear at this time.", - "function": "Intelligence Gatherers", - "frequency": "Unclear at this time.", - "description": "ZoominfoBot is an intelligence gatherer operated by ZoomInfo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoominfobot" - }, - "008": { - "operator": "80legs", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "008 is a scraper operated by 80legs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/008" - }, - "Dataprovider.com": { - "operator": "Dataprovider", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "Dataprovider.com is a scraper operated by Dataprovider.com. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataprovider-com" - }, - "dcrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "dcrawl is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dcrawl" - }, - "HTTrack": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "HTTrack is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack" - }, - "HTTrack 3.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "HTTrack 3.0 is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/httrack-3-0" - }, - "MetaInspector": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "MetaInspector is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metainspector" - }, - "newspaper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "newspaper is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newspaper" - }, - "Nutch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "Nutch is a scraper. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nutch" - }, - "Offline Explorer": { - "operator": "MetaProducts", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "Offline Explorer is a scraper operated by MetaProducts. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/offline-explorer" - }, - "OpenindexSpider": { - "operator": "Openindex", - "respect": "Unclear at this time.", - "function": "Scrapers", - "frequency": "Unclear at this time.", - "description": "OpenindexSpider is a scraper operated by Openindex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openindexspider" - }, - "360Spider": { - "operator": "Qihoo 360", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "360Spider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider" - }, - "AlexandriaOrgBot": { - "operator": "Alexandria", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "AlexandriaOrgBot is a search engine crawler operated by Alexandria.org. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexandriaorgbot" - }, - "Atom Feed Robot": { - "operator": "RSSMicro", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Atom Feed Robot is a search engine crawler operated by RSSMicro. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/atom-feed-robot" - }, - "Baiduspider": { - "operator": "Baidu", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Baiduspider is a search engine crawler operated by Baidu. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider" - }, - "bingbot": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "bingbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bingbot" - }, - "coccocbot-web": { - "operator": "Coc Coc", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "coccocbot-web is a search engine crawler operated by Coc Coc. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot-web" - }, - "Daum": { - "operator": "Daum", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Daum is a search engine crawler operated by Daum. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daum" - }, - "DuckDuckBot": { - "operator": "DuckDuckGo", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "DuckDuckBot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckbot" - }, - "DuckDuckGo-Favicons-Bot": { - "operator": "DuckDuckGo", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "DuckDuckGo-Favicons-Bot is a search engine crawler operated by DuckDuckGo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/duckduckgo-favicons-bot" - }, - "Feedfetcher-Google": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Feedfetcher-Google is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedfetcher-google" - }, - "Google Favicon": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Google Favicon is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-favicon" - }, - "Googlebot": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot" - }, - "Googlebot-Image": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-Image is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-image" - }, - "Googlebot-Mobile": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-Mobile is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-mobile" - }, - "Googlebot-News": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-News is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-news" - }, - "Googlebot-Video": { - "operator": "Google", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Googlebot-Video is a search engine crawler operated by Google. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/googlebot-video" - }, - "HaoSouSpider": { - "operator": "Qihoo 360", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "HaoSouSpider is a search engine crawler operated by Qihoo 360. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/haosouspider" - }, - "MojeekBot": { - "operator": "Mojeek", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "MojeekBot is a search engine crawler operated by Mojeek. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeekbot" - }, - "msnbot": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "msnbot is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot" - }, - "msnbot-media": { - "operator": "Microsoft", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "msnbot-media is a search engine crawler operated by Microsoft. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msnbot-media" - }, - "Qwantify": { - "operator": "Qwant", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Qwantify is a search engine crawler operated by Qwant. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwantify" - }, - "SemanticScholarBot": { - "operator": "AI2", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "SemanticScholarBot is a search engine crawler operated by AI2. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticscholarbot" - }, - "SeznamBot": { - "operator": "Senzam", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "SeznamBot is a search engine crawler operated by Senzam. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seznambot" - }, - "Sogou web spider": { - "operator": "Sogou", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Sogou web spider is a search engine crawler operated by Sogou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-web-spider" - }, - "teoma": { - "operator": "Ask", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "teoma is a search engine crawler operated by Ask. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teoma" - }, - "TinEye": { - "operator": "TinEye", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "TinEye is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye" - }, - "TinEye-bot": { - "operator": "TinEye", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "TinEye-bot is a search engine crawler operated by TinEye. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tineye-bot" - }, - "yacybot": { - "operator": "YaCy", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "yacybot is a search engine crawler operated by YaCy. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yacybot" - }, - "Yahoo! Slurp": { - "operator": "Yahoo", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Yahoo! Slurp is a search engine crawler operated by Yahoo. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-slurp" - }, - "Yandex": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Yandex is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandex" - }, - "YandexBot": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YandexBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexbot" - }, - "YandexImages": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YandexImages is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandeximages" - }, - "YandexRenderResourcesBot": { - "operator": "Yandex", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YandexRenderResourcesBot is a search engine crawler operated by Yandex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexrenderresourcesbot" - }, - "Yeti": { - "operator": "Naver", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "Yeti is a search engine crawler operated by Naver. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yeti" - }, - "YisouSpider": { - "operator": "Yisou", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "YisouSpider is a search engine crawler operated by Yisou. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yisouspider" - }, - "ZumBot": { - "operator": "ZUM Internet", - "respect": "Unclear at this time.", - "function": "Search Engine Crawlers", - "frequency": "Unclear at this time.", - "description": "ZumBot is a search engine crawler operated by ZUM Internet. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zumbot" - }, - "AhrefsBot": { - "operator": "Ahrefs", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "AhrefsBot is an SEO crawler operated by Ahrefs. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ahrefsbot" - }, - "Barkrowler": { - "operator": "Babbar", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "Barkrowler is an SEO crawler operated by Babbar. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/barkrowler" - }, - "BLEXBot": { - "operator": "SEO PowerSuite", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "BLEXBot is an SEO crawler operated by SEO PowerSuite. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blexbot" - }, - "BrightEdge Crawler": { - "operator": "BrightEdge", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "BrightEdge Crawler is an SEO crawler operated by BrightEdge. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brightedge-crawler" - }, - "Cocolyzebot": { - "operator": "Cocolyze", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "Cocolyzebot is an SEO crawler operated by Cocolyze. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cocolyzebot" - }, - "DataForSeoBot": { - "operator": "DataForSEO", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "DataForSeoBot is an SEO crawler operated by DataForSEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dataforseobot" - }, - "DomainStatsBot": { - "operator": "Domainstats", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "DomainStatsBot is an SEO crawler operated by Domainstats. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domainstatsbot" - }, - "dotbot": { - "operator": "Moz", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "dotbot is an SEO crawler operated by Moz. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dotbot" - }, - "hypestat": { - "operator": "HypeStat", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "hypestat is an SEO crawler operated by HypeStat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hypestat" - }, - "linkdexbot": { - "operator": "Linkdex", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "linkdexbot is an SEO crawler operated by Linkdex. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdexbot" - }, - "MJ12bot": { - "operator": "Majestic", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "MJ12bot is an SEO crawler operated by Majestic. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mj12bot" - }, - "online-webceo-bot": { - "operator": "WebCEO", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "online-webceo-bot is an SEO crawler operated by WebCEO. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/online-webceo-bot" - }, - "Screaming Frog SEO Spider": { - "operator": "Screaming Frog", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "Screaming Frog SEO Spider is an SEO crawler operated by Screaming Frog. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/screaming-frog-seo-spider" - }, - "SemrushBot": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot" - }, - "SemrushBot-BA": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-BA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ba" - }, - "SemrushBot-CT": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-CT is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-ct" - }, - "SemrushBot-SI": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-SI is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-si" - }, - "SemrushBot-SWA": { - "operator": "Semrush", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SemrushBot-SWA is an SEO crawler operated by Semrush. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-swa" - }, - "SenutoBot": { - "operator": "Senuto", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SenutoBot is an SEO crawler operated by Senuto. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/senutobot" - }, - "SeobilityBot": { - "operator": "Seobility", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SeobilityBot is an SEO crawler operated by Seobility. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seobilitybot" - }, - "SEOkicks": { - "operator": "SEOkicks", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SEOkicks is an SEO crawler operated by SEOkicks. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks" - }, - "SEOlizer": { - "operator": "SEOLizer", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SEOlizer is an SEO crawler operated by SEOLizer. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seolizer" - }, - "serpstatbot": { - "operator": "Serpstat", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "serpstatbot is an SEO crawler operated by Serpstat. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/serpstatbot" - }, - "SiteCheckerBotCrawler": { - "operator": "Sitechecker", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "SiteCheckerBotCrawler is an SEO crawler operated by Sitechecker. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheckerbotcrawler" - }, - "ZoomBot": { - "operator": "SEOZoom", - "respect": "Unclear at this time.", - "function": "SEO Crawlers", - "frequency": "Unclear at this time.", - "description": "ZoomBot is an SEO crawler operated by SEOZoom. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zoombot" - }, - "007ac9 Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "007ac9 Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/007ac9-crawler" - }, - "2ip.ru": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "2ip.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/2ip-ru" - }, - "360Spider-Image": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "360Spider-Image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-image" - }, - "360Spider-Video": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "360Spider-Video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/360spider-video" - }, - "5emeRue": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "5emeRue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5emerue" - }, - "5erue": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "5erue is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/5erue" - }, - "A Patent Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "A Patent Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a-patent-crawler" - }, - "A6-Indexer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "A6-Indexer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/a6-indexer" - }, - "Aboundex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Aboundex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aboundex" - }, - "AcademicBotRTU": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AcademicBotRTU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/academicbotrtu" - }, - "acapbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "acapbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acapbot" - }, - "acoonbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "acoonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acoonbot" - }, - "Acunetix Security Scanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Acunetix Security Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-security-scanner" - }, - "Acunetix Web Vulnerability Scanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Acunetix Web Vulnerability Scanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/acunetix-web-vulnerability-scanner" - }, - "AddSearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AddSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addsearchbot" - }, - "AddThis": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AddThis is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/addthis" - }, - "adequat": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "adequat is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat" - }, - "adequat-systems": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "adequat-systems is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adequat-systems" - }, - "AdIdxBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AdIdxBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adidxbot" - }, - "ADmantX": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ADmantX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/admantx" - }, - "adscanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "adscanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adscanner" - }, - "AdsTxtCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AdsTxtCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/adstxtcrawler" - }, - "AdvBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AdvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/advbot" - }, - "AISearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AISearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aisearchbot" - }, - "Alexabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Alexabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexabot" - }, - "Alexibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Alexibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alexibot" - }, - "AlphaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AlphaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/alphabot" - }, - "AmiSoftware": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AmiSoftware is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/amisoftware" - }, - "antibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "antibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/antibot" - }, - "AnyEvent": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AnyEvent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/anyevent" - }, - "Apercite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Apercite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/apercite" - }, - "AppInsights": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AppInsights is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/appinsights" - }, - "Aqua_Products": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Aqua_Products is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aqua-products" - }, - "arabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "arabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/arabot" - }, - "Ask n read": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Ask n read is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ask-n-read" - }, - "asknread.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "asknread.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asknread-com" - }, - "AspiegelBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AspiegelBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/aspiegelbot" - }, - "asterias": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "asterias is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/asterias" - }, - "Augure": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Augure is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/augure" - }, - "auramundi": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "auramundi is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/auramundi" - }, - "AwarioRssBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "AwarioRssBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awariorssbot" - }, - "awesomecrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "awesomecrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/awesomecrawler" - }, - "B2B Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "B2B Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2b-bot" - }, - "b2w": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "b2w is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/b2w" - }, - "BackDoorBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BackDoorBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backdoorbot" - }, - "BacklinkCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BacklinkCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/backlinkcrawler" - }, - "Baidu-YunGuanCe": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baidu-YunGuanCe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baidu-yunguance" - }, - "Baiduspider-image": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baiduspider-image is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-image" - }, - "Baiduspider-news": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baiduspider-news is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-news" - }, - "Baiduspider-video": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Baiduspider-video is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/baiduspider-video" - }, - "BDCbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BDCbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bdcbot" - }, - "BehloolBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BehloolBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/behloolbot" - }, - "betaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "betaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/betabot" - }, - "Better Uptime Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Better Uptime Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/better-uptime-bot" - }, - "bidswitchbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "bidswitchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bidswitchbot" - }, - "BIGLOTRON": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BIGLOTRON is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/biglotron" - }, - "binlar": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "binlar is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/binlar" - }, - "Birdcrawlerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Birdcrawlerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/birdcrawlerbot" - }, - "BitBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bitbot" - }, - "Black Hole": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Black Hole is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/black-hole" - }, - "Blekkobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Blekkobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blekkobot" - }, - "blogmuraBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "blogmuraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blogmurabot" - }, - "BlowFish": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BlowFish is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blowfish" - }, - "BLP_bbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BLP_bbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/blp-bbot" - }, - "bnf.fr_bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "bnf.fr_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bnf-fr-bot" - }, - "BomboraBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BomboraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bomborabot" - }, - "Bookmark search tool": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Bookmark search tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bookmark-search-tool" - }, - "bot-pge.chlooe.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "bot-pge.chlooe.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-pge-chlooe-com" - }, - "Bot.AraTurka.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Bot.AraTurka.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bot-araturka-com" - }, - "BotALot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BotALot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botalot" - }, - "botify": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "botify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botify" - }, - "BotRightHere": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BotRightHere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/botrighthere" - }, - "BoxcarBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BoxcarBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/boxcarbot" - }, - "brainobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "brainobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brainobot" - }, - "BrandONbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BrandONbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/brandonbot" - }, - "BTWebClient": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BTWebClient is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/btwebclient" - }, - "BUbiNG": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BUbiNG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bubing" - }, - "Buck": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Buck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buck" - }, - "BuiltBotTough": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BuiltBotTough is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/builtbottough" - }, - "Bullseye": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Bullseye is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bullseye" - }, - "BunnySlippers": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "BunnySlippers is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/bunnyslippers" - }, - "buzzbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "buzzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/buzzbot" - }, - "Caliperbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Caliperbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/caliperbot" - }, - "CapsuleChecker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CapsuleChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/capsulechecker" - }, - "careerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "careerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/careerbot" - }, - "CC Metadata Scaper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CC Metadata Scaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cc-metadata-scaper" - }, - "Cegbfeieh": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cegbfeieh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cegbfeieh" - }, - "centurybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "centurybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/centurybot" - }, - "changedetection": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "changedetection is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/changedetection" - }, - "CheckMarkNetwork": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CheckMarkNetwork is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/checkmarknetwork" - }, - "CheeseBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CheeseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cheesebot" - }, - "CherryPicker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CherryPicker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypicker" - }, - "CherryPickerElite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CherryPickerElite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerelite" - }, - "CherryPickerSE": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CherryPickerSE is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cherrypickerse" - }, - "Cision": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cision is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cision" - }, - "CISPA Webcrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CISPA Webcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cispa-webcrawler" - }, - "citeseerxbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "citeseerxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citeseerxbot" - }, - "Citoid": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Citoid is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/citoid" - }, - "Claritybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Claritybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/claritybot" - }, - "Clickagy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Clickagy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/clickagy" - }, - "Cliqzbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cliqzbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cliqzbot" - }, - "CloudFlare-AlwaysOnline": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CloudFlare-AlwaysOnline is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cloudflare-alwaysonline" - }, - "coccoc": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "coccoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccoc" - }, - "coccocbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "coccocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coccocbot" - }, - "coexel": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "coexel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/coexel" - }, - "Companybook-Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Companybook-Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/companybook-crawler" - }, - "content crawler spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "content crawler spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/content-crawler-spider" - }, - "ContextAd Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ContextAd Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contextad-bot" - }, - "contxbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "contxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/contxbot" - }, - "convera": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "convera is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/convera" - }, - "ConveraCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ConveraCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/converacrawler" - }, - "Cookiebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cookiebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cookiebot" - }, - "Copernic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Copernic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copernic" - }, - "CopyRightCheck": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CopyRightCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/copyrightcheck" - }, - "Corporama": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Corporama is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/corporama" - }, - "cosmos": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "cosmos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cosmos" - }, - "crawler4j": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "crawler4j is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawler4j" - }, - "CrawlyProjectCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CrawlyProjectCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crawlyprojectcrawler" - }, - "Crescent": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Crescent is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent" - }, - "Crescent Internet ToolPak HTTP OLE Control v.1.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Crescent Internet ToolPak HTTP OLE Control v.1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crescent-internet-toolpak-http-ole-control-v-1-0" - }, - "CriteoBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CriteoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/criteobot" - }, - "CrunchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CrunchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crunchbot" - }, - "CrystalSemanticsBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CrystalSemanticsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/crystalsemanticsbot" - }, - "Curebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Curebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/curebot" - }, - "Cutbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Cutbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cutbot" - }, - "cXensebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "cXensebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cxensebot" - }, - "CyberPatrol": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "CyberPatrol is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/cyberpatrol" - }, - "DareBoost": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DareBoost is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dareboost" - }, - "Datafeedwatch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Datafeedwatch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datafeedwatch" - }, - "datagnionbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "datagnionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datagnionbot" - }, - "Datanyze": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Datanyze is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/datanyze" - }, - "daumoa": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "daumoa is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/daumoa" - }, - "deepcrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "deepcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepcrawl" - }, - "deepnoc": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "deepnoc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deepnoc" - }, - "DeuSu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DeuSu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/deusu" - }, - "Digg Deeper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Digg Deeper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digg-deeper" - }, - "Digimind": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Digimind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digimind" - }, - "Digincore bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Digincore bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/digincore-bot" - }, - "discobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "discobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/discobot" - }, - "Disqus": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Disqus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/disqus" - }, - "DittoSpyder": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DittoSpyder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dittospyder" - }, - "DnyzBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DnyzBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dnyzbot" - }, - "Domain Re-Animator Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Domain Re-Animator Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domain-re-animator-bot" - }, - "DomainCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "DomainCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/domaincrawler" - }, - "Dow Jones Searchbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Dow Jones Searchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dow-jones-searchbot" - }, - "Download Ninja": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Download Ninja is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/download-ninja" - }, - "Dragonbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Dragonbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dragonbot" - }, - "drupact": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "drupact is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/drupact" - }, - "Dubbotbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Dubbotbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/dubbotbot" - }, - "e.ventures Investment Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "e.ventures Investment Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/e-ventures-investment-crawler" - }, - "EasyBib AutoCite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EasyBib AutoCite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/easybib-autocite" - }, - "ec2linkfinder": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ec2linkfinder is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ec2linkfinder" - }, - "edisterbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "edisterbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/edisterbot" - }, - "electricmonk": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "electricmonk is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/electricmonk" - }, - "elisabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "elisabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/elisabot" - }, - "ellisphere": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ellisphere is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ellisphere" - }, - "EmailCollector": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EmailCollector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailcollector" - }, - "EmailSiphon": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EmailSiphon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailsiphon" - }, - "EmailWolf": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EmailWolf is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/emailwolf" - }, - "epicbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "epicbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/epicbot" - }, - "eright": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "eright is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eright" - }, - "EroCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EroCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/erocrawler" - }, - "EtaoSpider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EtaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/etaospider" - }, - "europarchive.org": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "europarchive.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/europarchive-org" - }, - "evc-batch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "evc-batch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/evc-batch" - }, - "EveryoneSocialBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EveryoneSocialBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/everyonesocialbot" - }, - "Exabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Exabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/exabot" - }, - "Experibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Experibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/experibot" - }, - "ExtLinksBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ExtLinksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extlinksbot" - }, - "ExtractorPro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ExtractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/extractorpro" - }, - "Eyeotabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Eyeotabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/eyeotabot" - }, - "EZID": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "EZID is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezid" - }, - "Ezooms": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Ezooms is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ezooms" - }, - "Facebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Facebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/facebot" - }, - "FairAd Client": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FairAd Client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fairad-client" - }, - "FAST Enterprise Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FAST Enterprise Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-enterprise-crawler" - }, - "FAST-WebCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FAST-WebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fast-webcrawler" - }, - "FediDB": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FediDB is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedidb" - }, - "fedoraplanet": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fedoraplanet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fedoraplanet" - }, - "Feedbin": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Feedbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbin" - }, - "feedbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "feedbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedbot" - }, - "FeedBurner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FeedBurner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedburner" - }, - "Feedspot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Feedspot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedspot" - }, - "FeedValidator": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FeedValidator is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/feedvalidator" - }, - "FemtosearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FemtosearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/femtosearchbot" - }, - "Fever": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Fever is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fever" - }, - "FindITAnswersbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FindITAnswersbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/finditanswersbot" - }, - "findlink": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "findlink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findlink" - }, - "findthatfile": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "findthatfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findthatfile" - }, - "findxbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "findxbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/findxbot" - }, - "Flaming AttackBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Flaming AttackBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flaming-attackbot" - }, - "Flamingo_SearchEngine": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Flamingo_SearchEngine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/flamingo-searchengine" - }, - "fluffy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fluffy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fluffy" - }, - "Foobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Foobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/foobot" - }, - "fr-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fr-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fr-crawler" - }, - "FreeWebMonitoring SiteChecker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FreeWebMonitoring SiteChecker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freewebmonitoring-sitechecker" - }, - "FreshpingBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "FreshpingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/freshpingbot" - }, - "fuelbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "fuelbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fuelbot" - }, - "Fyrebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Fyrebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/fyrebot" - }, - "g00g1e.net": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "g00g1e.net is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g00g1e-net" - }, - "G2 Web Services": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "G2 Web Services is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2-web-services" - }, - "g2reader-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "g2reader-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/g2reader-bot" - }, - "Gaisbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gaisbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gaisbot" - }, - "GarlikCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GarlikCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/garlikcrawler" - }, - "Genieo": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Genieo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/genieo" - }, - "GetRight": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GetRight is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/getright" - }, - "Gigablast": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gigablast is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigablast" - }, - "Gigabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gigabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gigabot" - }, - "GingerCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GingerCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gingercrawler" - }, - "Gluten Free Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gluten Free Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gluten-free-crawler" - }, - "gnam gnam spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "gnam gnam spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnam-gnam-spider" - }, - "GnowitNewsbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GnowitNewsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gnowitnewsbot" - }, - "Google-Adwords-Instant": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Adwords-Instant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-adwords-instant" - }, - "Google-Certificates-Bridge": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Certificates-Bridge is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-certificates-bridge" - }, - "Google-PhysicalWeb": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-PhysicalWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-physicalweb" - }, - "Google-Site-Verification": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Site-Verification is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-site-verification" - }, - "Google-Structured-Data-Testing-Tool": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Google-Structured-Data-Testing-Tool is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-structured-data-testing-tool" - }, - "google-xrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "google-xrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/google-xrawler" - }, - "Gowikibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gowikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gowikibot" - }, - "grapeshot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "grapeshot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshot" - }, - "GrapeshotCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GrapeshotCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grapeshotcrawler" - }, - "Grobbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Grobbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grobbot" - }, - "GroupHigh": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "GroupHigh is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grouphigh" - }, - "grub-client": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "grub-client is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-client" - }, - "grub.org": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "grub.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/grub-org" - }, - "gsa-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "gsa-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gsa-crawler" - }, - "gslfbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "gslfbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gslfbot" - }, - "Gwene": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Gwene is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/gwene" - }, - "Harvest": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Harvest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/harvest" - }, - "HawaiiBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "HawaiiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hawaiibot" - }, - "humanlinks": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "humanlinks is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/humanlinks" - }, - "hyscore.io": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "hyscore.io is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/hyscore-io" - }, - "IAS crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IAS crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ias-crawler" - }, - "ICBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ICBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/icbot" - }, - "ichiro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ichiro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ichiro" - }, - "imrbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "imrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/imrbot" - }, - "IndeedBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IndeedBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/indeedbot" - }, - "INETDEX-BOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "INETDEX-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/inetdex-bot" - }, - "InfoNaviRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "InfoNaviRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infonavirobot" - }, - "infoobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "infoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoobot" - }, - "infoseek": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "infoseek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/infoseek" - }, - "integromedb": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "integromedb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/integromedb" - }, - "intelium_bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "intelium_bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/intelium-bot" - }, - "InterfaxScanBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "InterfaxScanBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/interfaxscanbot" - }, - "ip-web-crawler.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ip-web-crawler.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ip-web-crawler-com" - }, - "IRLbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IRLbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/irlbot" - }, - "Iron33": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Iron33 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iron33" - }, - "iskanie": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "iskanie is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/iskanie" - }, - "IsraBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "IsraBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/israbot" - }, - "istellabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "istellabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/istellabot" - }, - "it2media-domain-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "it2media-domain-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/it2media-domain-crawler" - }, - "James BOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "James BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/james-bot" - }, - "JamesBOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JamesBOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamesbot" - }, - "Jamie's Spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jamie's Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jamies-spider" - }, - "JenkersBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JenkersBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jenkersbot" - }, - "JennyBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JennyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jennybot" - }, - "Jetbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jetbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetbot" - }, - "Jetty": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jetty is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jetty" - }, - "JikeSpider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JikeSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jikespider" - }, - "JobboerseBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "JobboerseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jobboersebot" - }, - "Jooblebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Jooblebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jooblebot" - }, - "jpg-newsbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "jpg-newsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jpg-newsbot" - }, - "jyxobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "jyxobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/jyxobot" - }, - "k2spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "k2spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k2spider" - }, - "K7MLWCBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "K7MLWCBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/k7mlwcbot" - }, - "kbcrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "kbcrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kbcrawl" - }, - "Kemvibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Kemvibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kemvibot" - }, - "Kenjin Spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Kenjin Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kenjin-spider" - }, - "keys-so-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "keys-so-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keys-so-bot" - }, - "Keyword Density": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Keyword Density is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/keyword-density" - }, - "Knowings": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Knowings is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/knowings" - }, - "KomodiaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "KomodiaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/komodiabot" - }, - "KosmioBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "KosmioBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/kosmiobot" - }, - "Landau-Media-Spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Landau-Media-Spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/landau-media-spider" - }, - "larbin": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "larbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/larbin" - }, - "Laserlikebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Laserlikebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/laserlikebot" - }, - "lb-spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lb-spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lb-spider" - }, - "leadbox": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "leadbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leadbox" - }, - "Leikibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Leikibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/leikibot" - }, - "LexiBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LexiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lexibot" - }, - "libWeb": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "libWeb is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/libweb" - }, - "Linespider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Linespider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linespider" - }, - "Linguee Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Linguee Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linguee-bot" - }, - "linkapediabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "linkapediabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkapediabot" - }, - "LinkArchiver": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkArchiver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkarchiver" - }, - "LinkCheck by Siteimprove.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkCheck by Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkcheck-by-siteimprove-com" - }, - "linkdex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "linkdex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkdex" - }, - "LinkextractorPro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkextractorPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkextractorpro" - }, - "LinkisBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkisBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkisbot" - }, - "linko": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "linko is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linko" - }, - "LinkpadBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkpadBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkpadbot" - }, - "LinkScan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LinkScan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/linkscan" - }, - "lipperhey": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lipperhey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lipperhey" - }, - "LivelapBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LivelapBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/livelapbot" - }, - "lkxscan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lkxscan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lkxscan" - }, - "LNSpiderguy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "LNSpiderguy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lnspiderguy" - }, - "lssbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssbot" - }, - "lssrocketcrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lssrocketcrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lssrocketcrawler" - }, - "ltx71": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ltx71 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ltx71" - }, - "Luminator-robots": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Luminator-robots is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/luminator-robots" - }, - "lwp-trivial": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "lwp-trivial is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/lwp-trivial" - }, - "MaCoCu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MaCoCu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/macocu" - }, - "mappydata": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mappydata is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mappydata" - }, - "Mata Hari": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Mata Hari is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mata-hari" - }, - "MauiBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MauiBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mauibot" - }, - "MBCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MBCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mbcrawler" - }, - "MegaIndex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MegaIndex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex" - }, - "MegaIndex.ru": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MegaIndex.ru is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/megaindex-ru" - }, - "Meltawer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Meltawer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltawer" - }, - "Meltwater": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Meltwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwater" - }, - "MeltwaterNews": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MeltwaterNews is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/meltwaternews" - }, - "memorybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "memorybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/memorybot" - }, - "mention": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mention is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mention" - }, - "MetaJobBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MetaJobBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metajobbot" - }, - "MetaURI": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MetaURI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/metauri" - }, - "MIIxpc": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MIIxpc is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/miixpc" - }, - "mindUpBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mindUpBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mindupbot" - }, - "minicrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "minicrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/minicrawler" - }, - "Mister PiX": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Mister PiX is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mister-pix" - }, - "MixnodeCache": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MixnodeCache is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mixnodecache" - }, - "mlbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mlbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mlbot" - }, - "moatbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "moatbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moatbot" - }, - "moget": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "moget is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moget" - }, - "Mojeek": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Mojeek is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mojeek" - }, - "MoodleBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MoodleBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moodlebot" - }, - "Moreover": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Moreover is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/moreover" - }, - "MS Search 4.0 Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MS Search 4.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-4-0-robot" - }, - "MS Search 6.0 Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MS Search 6.0 Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ms-search-6-0-robot" - }, - "MSIECrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MSIECrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msiecrawler" - }, - "msrbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "msrbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/msrbot" - }, - "MTRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "MTRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mtrobot" - }, - "Multiviewbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Multiviewbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/multiviewbot" - }, - "mytwip": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "mytwip is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/mytwip" - }, - "NAVER Blog Rssbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NAVER Blog Rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naver-blog-rssbot" - }, - "NaverBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NaverBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/naverbot" - }, - "Neevabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Neevabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neevabot" - }, - "NerdByNature.Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NerdByNature.Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdbynature-bot" - }, - "nerdybot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "nerdybot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nerdybot" - }, - "NetAnts": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NetAnts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netants" - }, - "netEstate NE Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "netEstate NE Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netestate-ne-crawler" - }, - "Neticle Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Neticle Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/neticle-crawler" - }, - "NetMechanic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NetMechanic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netmechanic" - }, - "netresearchserver": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "netresearchserver is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netresearchserver" - }, - "NetSystemsResearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NetSystemsResearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/netsystemsresearch" - }, - "newsharecounts": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "newsharecounts is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsharecounts" - }, - "NewsNow": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NewsNow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newsnow" - }, - "Newzbin": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Newzbin is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/newzbin" - }, - "NextGenSearchBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NextGenSearchBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nextgensearchbot" - }, - "NICErsPRO": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NICErsPRO is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nicerspro" - }, - "niki-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "niki-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/niki-bot" - }, - "NimbleCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NimbleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimblecrawler" - }, - "Nimbostratus-Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Nimbostratus-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nimbostratus-bot" - }, - "NINJA bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NINJA bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ninja-bot" - }, - "NIXStatsbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NIXStatsbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nixstatsbot" - }, - "NLUX_IAHarvester": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NLUX_IAHarvester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nlux-iaharvester" - }, - "Nmap Scripting Engine": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Nmap Scripting Engine is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nmap-scripting-engine" - }, - "NPBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NPBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/npbot" - }, - "NTENTbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "NTENTbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ntentbot" - }, - "Nuzzel": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Nuzzel is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/nuzzel" - }, - "OdklBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OdklBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/odklbot" - }, - "officestorebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "officestorebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/officestorebot" - }, - "Openbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Openbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openbot" - }, - "Openfind": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Openfind is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind" - }, - "Openfind data gatherer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Openfind data gatherer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openfind-data-gatherer" - }, - "OpenGraphCheck": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OpenGraphCheck is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opengraphcheck" - }, - "OpenHoseBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OpenHoseBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/openhosebot" - }, - "opinion-tracker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "opinion-tracker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/opinion-tracker" - }, - "Oracle Ultra Search": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Oracle Ultra Search is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/oracle-ultra-search" - }, - "OrangeBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OrangeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orangebot" - }, - "Orthogaffe": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Orthogaffe is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/orthogaffe" - }, - "outbrain": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "outbrain is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outbrain" - }, - "OutclicksBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "OutclicksBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/outclicksbot" - }, - "page2rss": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "page2rss is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/page2rss" - }, - "PagePeeker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PagePeeker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagepeeker" - }, - "PageThing": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PageThing is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pagething" - }, - "peer39_crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "peer39_crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/peer39-crawler" - }, - "PerMan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PerMan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/perman" - }, - "Pingdom": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Pingdom is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pingdom" - }, - "Pinterest": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Pinterest is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pinterest" - }, - "PiplBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PiplBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/piplbot" - }, - "postrank": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "postrank is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/postrank" - }, - "PR-CY.RU": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PR-CY.RU is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pr-cy-ru" - }, - "Primalbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Primalbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/primalbot" - }, - "PrivacyAwareBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "PrivacyAwareBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/privacyawarebot" - }, - "ProPowerBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ProPowerBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/propowerbot" - }, - "ProWebWalker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ProWebWalker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/prowebwalker" - }, - "proxem": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "proxem is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/proxem" - }, - "psbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "psbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/psbot" - }, - "Pulsepoint": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Pulsepoint is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/pulsepoint" - }, - "purebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "purebot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/purebot" - }, - "QueryN Metasearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "QueryN Metasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/queryn-metasearch" - }, - "Qwam content intelligence": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Qwam content intelligence is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/qwam-content-intelligence" - }, - "Radiation Retriever 1.1": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Radiation Retriever 1.1 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/radiation-retriever-1-1" - }, - "RankActiveLinkBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RankActiveLinkBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankactivelinkbot" - }, - "RankFlex": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RankFlex is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rankflex" - }, - "Refindbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Refindbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/refindbot" - }, - "RegionStuttgartBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RegionStuttgartBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/regionstuttgartbot" - }, - "RepoMonkey": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RepoMonkey is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey" - }, - "RepoMonkey Bait & Tackle": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RepoMonkey Bait & Tackle is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/repomonkey-bait-tackle" - }, - "RetrevoPageAnalyzer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RetrevoPageAnalyzer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/retrevopageanalyzer" - }, - "ReverseEngineeringBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ReverseEngineeringBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/reverseengineeringbot" - }, - "RidderBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RidderBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ridderbot" - }, - "Riddler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Riddler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/riddler" - }, - "Rivva": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Rivva is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rivva" - }, - "Robozilla": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Robozilla is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/robozilla" - }, - "rssbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "rssbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssbot" - }, - "RSSingBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RSSingBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rssingbot" - }, - "RukiCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RukiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rukicrawler" - }, - "RuxitSynthetic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RuxitSynthetic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ruxitsynthetic" - }, - "RyteBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "RyteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/rytebot" - }, - "SafeDNSBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SafeDNSBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safednsbot" - }, - "SafeSearch microdata crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SafeSearch microdata crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/safesearch-microdata-crawler" - }, - "SBL-BOT": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SBL-BOT is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sbl-bot" - }, - "score3": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "score3 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/score3" - }, - "ScoutJet": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ScoutJet is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scoutjet" - }, - "scribdbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "scribdbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scribdbot" - }, - "Scrubby": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Scrubby is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/scrubby" - }, - "search.marginalia.nu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "search.marginalia.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/search-marginalia-nu" - }, - "SearchAtlas": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SearchAtlas is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchatlas" - }, - "SearchmetricsBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SearchmetricsBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchmetricsbot" - }, - "searchpreview": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "searchpreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/searchpreview" - }, - "seekbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seekbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekbot" - }, - "Seekport Crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Seekport Crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekport-crawler" - }, - "Seekr": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Seekr is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seekr" - }, - "seewithkids": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seewithkids is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seewithkids" - }, - "semanticbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "semanticbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semanticbot" - }, - "sempi.tech": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sempi.tech is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sempi-tech" - }, - "SemrushBot-BM": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SemrushBot-BM is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-bm" - }, - "SemrushBot-SA": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SemrushBot-SA is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/semrushbot-sa" - }, - "sentibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sentibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sentibot" - }, - "SEOkicks-Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SEOkicks-Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seokicks-robot" - }, - "seoscanners": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seoscanners is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seoscanners" - }, - "seostar.co": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "seostar.co is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostar-co" - }, - "SEOstats": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SEOstats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/seostats" - }, - "SimpleCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SimpleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplecrawler" - }, - "SimpleScraper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SimpleScraper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/simplescraper" - }, - "Sindup": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sindup is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sindup" - }, - "sistrix crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sistrix crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sistrix-crawler" - }, - "SiteBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SiteBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitebot" - }, - "sitecheck.internetseer.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "sitecheck.internetseer.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitecheck-internetseer-com" - }, - "siteexplorer.info": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "siteexplorer.info is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteexplorer-info" - }, - "Siteimprove": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Siteimprove is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove" - }, - "Siteimprove.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Siteimprove.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/siteimprove-com" - }, - "SiteSnagger": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SiteSnagger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesnagger" - }, - "SiteSucker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SiteSucker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sitesucker" - }, - "Slack-ImgProxy": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Slack-ImgProxy is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slack-imgproxy" - }, - "Slackbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Slackbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slackbot" - }, - "Slurp": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Slurp is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/slurp" - }, - "SocialRankIOBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SocialRankIOBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/socialrankiobot" - }, - "Sogou": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sogou is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou" - }, - "Sogou inst spider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sogou inst spider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-inst-spider" - }, - "Sogou spider2": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sogou spider2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sogou-spider2" - }, - "Sonic": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sonic is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sonic" - }, - "Sosospider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sosospider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sosospider" - }, - "SpankBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SpankBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spankbot" - }, - "spanner": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "spanner is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spanner" - }, - "spbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "spbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spbot" - }, - "Spinn3r": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Spinn3r is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spinn3r" - }, - "spotter": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "spotter is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/spotter" - }, - "SputnikBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SputnikBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sputnikbot" - }, - "Storebot-Google": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Storebot-Google is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storebot-google" - }, - "StorygizeBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "StorygizeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/storygizebot" - }, - "StractBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "StractBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/stractbot" - }, - "Streamline3Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Streamline3Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/streamline3bot" - }, - "SummalyBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SummalyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summalybot" - }, - "summify": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "summify is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/summify" - }, - "SuperBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SuperBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/superbot" - }, - "SurveyBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SurveyBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/surveybot" - }, - "suzuran": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "suzuran is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/suzuran" - }, - "Swiftbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Swiftbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swiftbot" - }, - "SWIMGBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "SWIMGBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/swimgbot" - }, - "Synthesio": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Synthesio is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/synthesio" - }, - "Sysomos": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Sysomos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/sysomos" - }, - "Szukacz": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Szukacz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/szukacz" - }, - "Taboolabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Taboolabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/taboolabot" - }, - "tagoobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tagoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tagoobot" - }, - "Talkwater": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Talkwater is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/talkwater" - }, - "TangibleeBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TangibleeBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tangibleebot" - }, - "Teleport": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Teleport is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleport" - }, - "TeleportPro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TeleportPro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/teleportpro" - }, - "Telesoft": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Telesoft is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/telesoft" - }, - "The Intraformant": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "The Intraformant is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/the-intraformant" - }, - "TheNomad": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TheNomad is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thenomad" - }, - "theoldreader.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "theoldreader.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/theoldreader-com" - }, - "Thinklab": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Thinklab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/thinklab" - }, - "tigerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tigerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tigerbot" - }, - "Titan": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Titan is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/titan" - }, - "toCrawl": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "toCrawl is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tocrawl" - }, - "TombaPublicWebCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TombaPublicWebCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tombapublicwebcrawler" - }, - "toplistbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "toplistbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toplistbot" - }, - "ToutiaoSpider": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ToutiaoSpider is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/toutiaospider" - }, - "Traackr.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Traackr.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/traackr-com" - }, - "tracemyfile": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tracemyfile is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tracemyfile" - }, - "trafilatura": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trafilatura is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trafilatura" - }, - "trendeo": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trendeo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendeo" - }, - "trendkite-akashic-crawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trendkite-akashic-crawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendkite-akashic-crawler" - }, - "trendybuzz": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trendybuzz is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trendybuzz" - }, - "trovitBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "trovitBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/trovitbot" - }, - "True_Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "True_Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/true-robot" - }, - "TruliaBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "TruliaBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/truliabot" - }, - "turingos": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "turingos is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/turingos" - }, - "tweetedtimes": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "tweetedtimes is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/tweetedtimes" - }, - "twengabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "twengabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twengabot" - }, - "Twurly": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Twurly is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/twurly" - }, - "UbiCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UbiCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ubicrawler" - }, - "um-IC": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "um-IC is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/um-ic" - }, - "Updownerbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Updownerbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/updownerbot" - }, - "Upflow": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Upflow is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/upflow" - }, - "Uptime-Kuma": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Uptime-Kuma is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptime-kuma" - }, - "Uptimebot.org": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Uptimebot.org is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimebot-org" - }, - "UptimeRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UptimeRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/uptimerobot" - }, - "URL Control": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "URL Control is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-control" - }, - "URL_Spider_Pro": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "URL_Spider_Pro is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/url-spider-pro" - }, - "urlappendbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "urlappendbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urlappendbot" - }, - "URLy Warning": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "URLy Warning is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/urly-warning" - }, - "usasearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "usasearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usasearch" - }, - "UsineNouvelleCrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UsineNouvelleCrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/usinenouvellecrawler" - }, - "UT-Dorkbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "UT-Dorkbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/ut-dorkbot" - }, - "Validator.nu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Validator.nu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/validator-nu" - }, - "VCI": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VCI is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci" - }, - "VCI WebViewer VCI WebViewer Win32": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VCI WebViewer VCI WebViewer Win32 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vci-webviewer-vci-webviewer-win32" - }, - "vebidoobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vebidoobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vebidoobot" - }, - "vecteurplus": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vecteurplus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vecteurplus" - }, - "Veoozbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Veoozbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/veoozbot" - }, - "verticalsearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "verticalsearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/verticalsearch" - }, - "Vigil": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Vigil is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vigil" - }, - "VKRobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VKRobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vkrobot" - }, - "voilabot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "voilabot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voilabot" - }, - "voltron": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "voltron is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voltron" - }, - "VoluumDSP-content-bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "VoluumDSP-content-bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/voluumdsp-content-bot" - }, - "vsw": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vsw is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vsw" - }, - "vuhuvBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "vuhuvBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/vuhuvbot" - }, - "W3C_I18n-Checker": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C_I18n-Checker is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-i18n-checker" - }, - "W3C_Unicorn": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C_Unicorn is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-unicorn" - }, - "W3C-checklink": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C-checklink is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-checklink" - }, - "W3C-mobileOK": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "W3C-mobileOK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/w3c-mobileok" - }, - "WASALive-Bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WASALive-Bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wasalive-bot" - }, - "wbsearchbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "wbsearchbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wbsearchbot" - }, - "Web Image Collector": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Web Image Collector is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-image-collector" - }, - "web-archive-net.com.bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "web-archive-net.com.bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/web-archive-net-com-bot" - }, - "WebAuto": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebAuto is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webauto" - }, - "WebBandit": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebBandit is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webbandit" - }, - "WebCapture 2.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCapture 2.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcapture-2-0" - }, - "webcompanycrawler": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "webcompanycrawler is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcompanycrawler" - }, - "WebCopier": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCopier is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier" - }, - "WebCopier v.2.2": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCopier v.2.2 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v-2-2" - }, - "WebCopier v3.2a": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebCopier v3.2a is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webcopier-v3-2a" - }, - "WebDataStats": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebDataStats is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webdatastats" - }, - "WebEnhancer": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebEnhancer is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webenhancer" - }, - "WebmasterWorldForumBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebmasterWorldForumBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmasterworldforumbot" - }, - "webmon": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "webmon is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webmon" - }, - "WebReaper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebReaper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webreaper" - }, - "WebSauger": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebSauger is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/websauger" - }, - "Website Quester": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Website Quester is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/website-quester" - }, - "WebStripper": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebStripper is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webstripper" - }, - "WebZIP": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WebZIP is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/webzip" - }, - "winello": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "winello is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winello" - }, - "WinHTTrack": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WinHTTrack is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/winhttrack" - }, - "WiseGuys Robot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WiseGuys Robot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wiseguys-robot" - }, - "wocbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "wocbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wocbot" - }, - "woobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "woobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woobot" - }, - "woorankreview": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "woorankreview is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woorankreview" - }, - "WordupInfoSearch": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WordupInfoSearch is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wordupinfosearch" - }, - "woriobot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "woriobot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/woriobot" - }, - "wotbox": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "wotbox is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/wotbox" - }, - "WWW-Collector-E": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WWW-Collector-E is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-collector-e" - }, - "WWW-Mechanize": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "WWW-Mechanize is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-mechanize" - }, - "www.uptime.com": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "www.uptime.com is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/www-uptime-com" - }, - "Xenu": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu" - }, - "Xenu Link Sleuth": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu Link Sleuth is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenu-link-sleuth" - }, - "Xenu's": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu's is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus" - }, - "Xenu's Link Sleuth 1.1c": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Xenu's Link Sleuth 1.1c is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xenus-link-sleuth-1-1c" - }, - "xovibot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "xovibot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/xovibot" - }, - "Yahoo Pipes 1.0": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Yahoo Pipes 1.0 is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yahoo-pipes-1-0" - }, - "YaK": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YaK is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yak" - }, - "YandexMobileBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YandexMobileBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexmobilebot" - }, - "YandexVideo": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YandexVideo is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yandexvideo" - }, - "yanga": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "yanga is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yanga" - }, - "Yellowbrandprotectionbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Yellowbrandprotectionbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yellowbrandprotectionbot" - }, - "yoozBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "yoozBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/yoozbot" - }, - "YoudaoBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "YoudaoBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youdaobot" - }, - "Youmag": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Youmag is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/youmag" - }, - "Zabbix": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zabbix is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zabbix" - }, - "Zao": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zao is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zao" - }, - "Zealbot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zealbot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zealbot" - }, - "zenback bot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "zenback bot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zenback-bot" - }, - "Zeus": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zeus is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus" - }, - "Zeus Link Scout": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zeus Link Scout is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zeus-link-scout" - }, - "zgrab": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "zgrab is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zgrab" - }, - "Zite": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "Zite is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zite" - }, - "ZuperlistBot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ZuperlistBot is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zuperlistbot" - }, - "ZyBORG": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "Uncategorized", - "frequency": "Unclear at this time.", - "description": "ZyBORG is an uncategorized agent. It's not currently known to be artificially intelligent or AI-related. If you think that's incorrect or can provide more detail about its purpose, please contact us. More info can be found at https://darkvisitors.com/agents/agents/zyborg" } -} \ No newline at end of file +} From bd3eee7a30c991905c5ee0a05f7206a8e3921ea4 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 10:32:12 +0000 Subject: [PATCH 058/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 30eaff1447ee154165e7145c4feb62cda0e19d98 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 10:32:13 +0000 Subject: [PATCH 059/249] call main after update --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From d4f34363ec8656e5a3183b0720456f34423ef285 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 10:40:50 +0000 Subject: [PATCH 060/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 09c6b78b46155ae2f421ab1a2151d881e276cb70 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Wed, 7 Aug 2024 11:45:37 +0100 Subject: [PATCH 061/249] fix job dependency --- .github/workflows/daily_update.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 9b0b4a2..6a51674 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -20,5 +20,6 @@ jobs: git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) shell: bash call-main: + needs: dark-visitors uses: ./.github/workflows/main.yml secrets: inherit \ No newline at end of file From 6a275366be10783826bfa4a7a8390f41b94213bb Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Wed, 7 Aug 2024 10:50:45 +0000 Subject: [PATCH 062/249] Daily update from Dark Visitors --- robots.json | 94 ++++++++++++++++++++++++++++++----------------------- 1 file changed, 54 insertions(+), 40 deletions(-) diff --git a/robots.json b/robots.json index ef8b335..745bb0b 100644 --- a/robots.json +++ b/robots.json @@ -7,14 +7,14 @@ "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." }, "anthropic-ai": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot-Extended": { - "operator": "[Apple](https:\/\/support.apple.com\/en-us\/119829#datausage)", + "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", "respect": "Yes", "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", "frequency": "Unclear at this time.", @@ -28,192 +28,206 @@ "description": "Downloads data to train LLMS, including ChatGPT competitors." }, "CCBot": { - "operator": "[Common Crawl](https:\/\/commoncrawl.org)", - "respect": "[Yes](https:\/\/commoncrawl.org\/ccbot)", + "operator": "[Common Crawl](https://commoncrawl.org)", + "respect": "[Yes](https://commoncrawl.org/ccbot)", "function": "Provides crawl data for an open source repository that has been used to train LLMs.", "frequency": "Unclear at this time.", "description": "Sources data that is made openly available and is used to train AI models." }, "ChatGPT-User": { - "operator": "[OpenAI](https:\/\/openai.com)", + "operator": "[OpenAI](https://openai.com)", "respect": "Yes", "function": "Takes action based on user prompts.", "frequency": "Only when prompted by a user.", "description": "Used by plugins in ChatGPT to answer queries based on user input." }, "ClaudeBot": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Claude-Web": { - "operator": "[Anthropic](https:\/\/www.anthropic.com)", + "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", "frequency": "No information. provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "cohere-ai": { - "operator": "[Cohere](https:\/\/cohere.com)", + "operator": "[Cohere](https://cohere.com)", "respect": "Unclear at this time.", "function": "Retrieves data to provide responses to user-initiated prompts.", "frequency": "Takes action based on user prompts.", "description": "Retrieves data based on user prompts." }, "Diffbot": { - "operator": "[Diffbot](https:\/\/www.diffbot.com\/)", + "operator": "[Diffbot](https://www.diffbot.com/)", "respect": "At the discretion of Diffbot users.", "function": "Aggregates structured web data for monitoring and AI model training.", "frequency": "Unclear at this time.", "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." }, "FacebookBot": { - "operator": "Meta\/Facebook", - "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", "function": "Training language models", "frequency": "Up to 1 page per second", "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." }, "facebookexternalhit": { - "operator": "Meta\/Facebook", - "respect": "[Yes](https:\/\/developers.facebook.com\/docs\/sharing\/bot\/)", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", "function": "No information.", "frequency": "Unclear at this time.", "description": "Unclear at this time." }, "FriendlyCrawler": { "operator": "Unknown", - "respect": "[Yes](https:\/\/imho.alex-kunz.com\/2024\/01\/25\/an-update-on-friendly-crawler)", + "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)", "function": "We are using the data from the crawler to build datasets for machine learning experiments.", "frequency": "Unclear at this time.", "description": "Unclear who the operator is; but data is used for training/machine learning." }, "Google-Extended": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "LLM training.", "frequency": "No information.", "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." }, "GoogleOther": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Image": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GoogleOther-Video": { "operator": "Google", - "respect": "[Yes](https:\/\/developers.google.com\/search\/docs\/crawling-indexing\/overview-google-crawlers)", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", "function": "Scrapes data.", "frequency": "No information.", "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" }, "GPTBot": { - "operator": "[OpenAI](https:\/\/openai.com)", + "operator": "[OpenAI](https://openai.com)", "respect": "Yes", "function": "Scrapes data to train OpenAI's products.", "frequency": "No information.", "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." }, "ICC-Crawler": { - "operator": "[NICT](https:\/\/nict.go.jp)", + "operator": "[NICT](https://nict.go.jp)", "respect": "Yes", "function": "Scrapes data to train and support AI technologies.", "frequency": "No information.", "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." }, "ImagesiftBot": { - "operator": "[ImageSift](https:\/\/imagesift.com)", - "respect": "[Yes](https:\/\/imagesift.com\/about)", + "operator": "[ImageSift](https://imagesift.com)", + "respect": "[Yes](https://imagesift.com/about)", "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", "frequency": "No information.", "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." }, "img2dataset": { - "operator": "[img2dataset](https:\/\/github.com\/rom1504\/img2dataset)", + "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", "respect": "Unclear at this time.", "function": "Scrapes images for use in LLMs.", "frequency": "At the discretion of img2dataset users.", "description": "Downloads large sets of images into datasets for LLM training or other purposes." }, "Meta-ExternalAgent": { - "operator": "[Meta](https:\/\/developers.facebook.com\/docs\/sharing\/webmasters\/web-crawlers)", + "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", "respect": "Yes.", "function": "Used to train models and improve products.", "frequency": "No information.", "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" }, "OAI-SearchBot": { - "operator": "[OpenAI](https:\/\/openai.com)", - "respect": "[Yes](https:\/\/platform.openai.com\/docs\/bots)", + "operator": "[OpenAI](https://openai.com)", + "respect": "[Yes](https://platform.openai.com/docs/bots)", "function": "Search result generation.", "frequency": "No information.", "description": "Crawls sites to surface as results in SearchGPT." }, "omgili": { - "operator": "[Webz.io](https:\/\/webz.io\/)", - "respect": "[Yes](https:\/\/webz.io\/blog\/web-data\/what-is-the-omgili-bot-and-why-is-it-crawling-your-website\/)", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", "function": "Data is sold.", "frequency": "No information.", "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." }, "omgilibot": { - "operator": "[Webz.io](https:\/\/webz.io\/)", - "respect": "[Yes](https:\/\/web.archive.org\/web\/20170704003301\/http:\/\/omgili.com\/Crawler.html)", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)", "function": "Data is sold.", "frequency": "No information.", "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." }, "PerplexityBot": { - "operator": "[Perplexity](https:\/\/www.perplexity.ai\/)", - "respect": "[No](https:\/\/www.macstories.net\/stories\/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler\/)", + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", "function": "Used to answer queries at the request of users.", "frequency": "Takes action based on user prompts.", "description": "Operated by Perplexity to obtain results in response to user queries." }, "PetalBot": { - "operator": "[Huawei](https:\/\/huawei.com\/)", + "operator": "[Huawei](https://huawei.com/)", "respect": "Yes", "function": "Used to provide recommendations in Hauwei assistant and AI search services.", "frequency": "No explicit frequency provided.", "description": "Operated by Huawei to provide search and AI assistant services." }, "Scrapy": { - "operator": "[Zyte](https:\/\/www.zyte.com)", + "operator": "[Zyte](https://www.zyte.com)", "respect": "Unclear at this time.", "function": "Scrapes data a variety of uses including training AI.", "frequency": "No information.", "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" }, "Timpibot": { - "operator": "[Timpi](https:\/\/timpi.io)", + "operator": "[Timpi](https://timpi.io)", "respect": "Unclear at this time.", "function": "Scrapes data for use in training LLMs.", "frequency": "No information.", "description": "Makes data available for training AI models." }, "VelenPublicWebCrawler": { - "operator": "[Velen Crawler](https:\/\/velen.io)", - "respect": "[Yes](https:\/\/velen.io)", + "operator": "[Velen Crawler](https://velen.io)", + "respect": "[Yes](https://velen.io)", "function": "Scrapes data for business data sets and machine learning models.", "frequency": "No information.", "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" }, "YouBot": { - "operator": "[You](https:\/\/about.you.com\/youchat\/)", - "respect": "[Yes](https:\/\/about.you.com\/youbot\/)", + "operator": "[You](https://about.you.com/youchat/)", + "respect": "[Yes](https://about.you.com/youbot/)", "function": "Scrapes data for search engine and LLMs.", "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." + }, + "Meta-ExternalFetcher": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Assistants", + "frequency": "Unclear at this time.", + "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" + }, + "Applebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Search Crawlers", + "frequency": "Unclear at this time.", + "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" } -} +} \ No newline at end of file From fbebbbfefb04788eb67123be9c4047bfaa92ee0c Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Wed, 7 Aug 2024 12:02:46 +0100 Subject: [PATCH 063/249] restore files deleted by failed workflow --- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 65 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From aaa55594e193e5756d3ab9d5659bce1950ac5a1e Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 11:13:16 +0000 Subject: [PATCH 064/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 366e49dc6d97017c397c7015eafdedde54a3da40 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Wed, 7 Aug 2024 12:21:40 +0100 Subject: [PATCH 065/249] restore files deleted by failed workflow and fix main commit message --- .github/workflows/daily_update.yml | 4 +++- .github/workflows/main.yml | 11 +++++++++- robots.txt | 32 +++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 ++++++++++++++++++++++++++++++ 4 files changed, 78 insertions(+), 2 deletions(-) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 6a51674..ae4e98a 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -22,4 +22,6 @@ jobs: call-main: needs: dark-visitors uses: ./.github/workflows/main.yml - secrets: inherit \ No newline at end of file + secrets: inherit + with: + message: "Daily update from Dark Visitors" \ No newline at end of file diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index c3f3f57..708621d 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -1,5 +1,10 @@ on: workflow_call: + inputs: + message: + type: string + required: true + description: The message to commit push: paths: - 'robots.json' @@ -24,6 +29,10 @@ jobs: git config --global user.name "ai.robots.txt" git config --global user.email "ai.robots.txt@users.noreply.github.com" git add -A - git commit -m "${{ github.event.head_commit.message }}" + if [ -n "${{ github.event.inputs.message }}" ]; then + git commit -m "${{ github.event.inputs.message }}" + else + git commit -m "${{ github.event.head_commit.message }}" + fi git push shell: bash \ No newline at end of file diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 4a63c482c4f5129ae6e874061101f4341e05c79d Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 11:31:02 +0000 Subject: [PATCH 066/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From b00067bc861b5a07863d78de5784ec8be96f85f6 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Wed, 7 Aug 2024 12:36:21 +0100 Subject: [PATCH 067/249] restore files deleted by failed workflow and fix main commit message --- .github/workflows/main.yml | 4 ++-- robots.txt | 32 ++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 33 +++++++++++++++++++++++++++++++++ 3 files changed, 67 insertions(+), 2 deletions(-) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 708621d..4b127d7 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -29,8 +29,8 @@ jobs: git config --global user.name "ai.robots.txt" git config --global user.email "ai.robots.txt@users.noreply.github.com" git add -A - if [ -n "${{ github.event.inputs.message }}" ]; then - git commit -m "${{ github.event.inputs.message }}" + if [ -n "${{ inputs.message }}" ]; then + git commit -m "${{ inputs.message }}" else git commit -m "${{ github.event.head_commit.message }}" fi diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..7f3cb46 --- /dev/null +++ b/robots.txt @@ -0,0 +1,32 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..2dafd43 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,33 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 8738c66c653062f9623b0ed649f4110bad329433 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 11:40:59 +0000 Subject: [PATCH 068/249] Removing previously generated files --- robots.txt | 32 -------------------------------- table-of-bot-metrics.md | 33 --------------------------------- 2 files changed, 65 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 7f3cb46..0000000 --- a/robots.txt +++ /dev/null @@ -1,32 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 2dafd43..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,33 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From ab17662f966cdd787cdffd73ef2328aa2db5a8b5 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 11:41:00 +0000 Subject: [PATCH 069/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 6f96795edc2abe79953685f1e4a7c9b98b50bcc7 Mon Sep 17 00:00:00 2001 From: Chenghao Mou Date: Wed, 7 Aug 2024 12:43:44 +0100 Subject: [PATCH 070/249] restore cron --- .github/workflows/daily_update.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index ae4e98a..2ae0398 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -1,7 +1,7 @@ name: Daily Update from Dark Visitors on: schedule: - - cron: "*/10 * * * *" + - cron: "0 0 * * *" jobs: dark-visitors: From 663b85cc07a75c94fab3f3503ecb3cd151be4baf Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 22:40:24 +0000 Subject: [PATCH 071/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 0122dea1e93728d13ad917d6095c67dc8617ec03 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 7 Aug 2024 22:40:24 +0000 Subject: [PATCH 072/249] Merge pull request #32 from ChenghaoMou/main Tracking Dark Visitors Automatically --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 40f9325a4fc3fd97ce5124f5c8cd4938e4e8f4c1 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 8 Aug 2024 01:10:12 +0000 Subject: [PATCH 073/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 57f006150b14b81b5a9c31e9149f115fcbebe35f Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 8 Aug 2024 01:10:13 +0000 Subject: [PATCH 074/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From ed7d7d3fdf7f7885be3347f05f350804d2191fc3 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 9 Aug 2024 01:11:11 +0000 Subject: [PATCH 075/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 21e5cd96a968db17df58759995e9baf98f30b231 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 9 Aug 2024 01:11:12 +0000 Subject: [PATCH 076/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 46540633baf01cc0cd068610b173afa1ff38cec9 Mon Sep 17 00:00:00 2001 From: nisbet-hubbard <87453615+nisbet-hubbard@users.noreply.github.com> Date: Sat, 10 Aug 2024 08:22:28 +0800 Subject: [PATCH 077/249] Update README.md --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 39ecb9b..60ac9ec 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ -This is an open list of web crawlers associated with AI companies and the training of LLMs to block. We encourage you to contribute to and implement this list on your own site. See [information about the listed crawlers](./table-of-bot-metrics.md). +This is an open list of web crawlers associated with AI companies and the training of LLMs to block. We encourage you to contribute to and implement this list on your own site. See [information about the listed crawlers](./table-of-bot-metrics.md) and the [FAQ](https://github.com/ai-robots-txt/ai.robots.txt/blob/main/FAQ.md). A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers. @@ -22,6 +22,10 @@ https://github.com/ai-robots-txt/ai.robots.txt/releases.atom You can subscribe with [Feedly](https://feedly.com/i/subscription/feed/https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), [Inoreader](https://www.inoreader.com/?add_feed=https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), [The Old Reader](https://theoldreader.com/feeds/subscribe?url=https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), [Feedbin](https://feedbin.me/?subscribe=https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), or any other reader app. +## Report abusive crawlers + +If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https://docs.google.com/forms/d/e/1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg/viewform). + ## Additional resources - [Blocking Bots with Nginx](https://rknight.me/blog/blocking-bots-with-nginx/) by Robb Knight From 4242f8cc7ba3ea61bdff9954af03db472a71a3ba Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 10 Aug 2024 01:10:53 +0000 Subject: [PATCH 078/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 53449ad1bd5f4e2aa41f14f4ef98b7be473d01a0 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 10 Aug 2024 01:10:53 +0000 Subject: [PATCH 079/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From cb98669cc299556e392e9b2155b420977e67eac4 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 11 Aug 2024 01:16:03 +0000 Subject: [PATCH 080/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 6472e07f0921fd4517645ff694f89bc1b258ad37 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 11 Aug 2024 01:16:04 +0000 Subject: [PATCH 081/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 274d48b8f03403188f891b8a60cd69636a8a417c Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 12 Aug 2024 01:12:23 +0000 Subject: [PATCH 082/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 53a39b2f71bfeee8d015a8bf8feca16b00998ebf Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 12 Aug 2024 01:12:23 +0000 Subject: [PATCH 083/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From f1d0c5b1fead27ab6594bd2b0405f72e8be2462d Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 13 Aug 2024 01:12:02 +0000 Subject: [PATCH 084/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 2e8e8af8e48935819de9800873421aa651c35d00 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 13 Aug 2024 01:12:03 +0000 Subject: [PATCH 085/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 2c8ed062b9080367cb557fbb1efb589863c1da08 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 14 Aug 2024 01:11:02 +0000 Subject: [PATCH 086/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From df5b6ef6475623d5595f216a79b5d75cf8669a3a Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 14 Aug 2024 01:11:03 +0000 Subject: [PATCH 087/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..6f06862 --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: ClaudeBot +User-agent: Claude-Web +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: Meta-ExternalAgent +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: Meta-ExternalFetcher +User-agent: Applebot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..5f07c1a --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From bc66d10afd14f5e8456aaf34ed7a7148d896c71b Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Wed, 14 Aug 2024 09:21:26 -0700 Subject: [PATCH 088/249] chore: update faq --- FAQ.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/FAQ.md b/FAQ.md index b0d2167..d60ca9a 100644 --- a/FAQ.md +++ b/FAQ.md @@ -14,6 +14,8 @@ Also, given the contentious nature of AI and the possibility of legislation limi Yes, provided the crawlers identify themselves and your application/hosting supports doing so. +Some crawlers — [such as Perplexity](https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/) — do not identify themselves via their user agent strings and, as such, are difficult to block. + ## What can we do if a bot doesn't respect `robots.txt`? That depends on your stack. From 407b9e12e6484f2ebcc649128145dc89eceaca1f Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Wed, 14 Aug 2024 17:10:29 -0700 Subject: [PATCH 089/249] chore: sort output --- code/dark_visitors.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/code/dark_visitors.py b/code/dark_visitors.py index e6f9c2e..484daa1 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -70,4 +70,4 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): } print(f"Total: {len(existing_content)}") -Path("./robots.json").write_text(json.dumps(existing_content, indent=4)) \ No newline at end of file +Path("./robots.json").write_text(json.dumps(existing_content, indent=4, sort_keys=True)) \ No newline at end of file From 5937434aff9b09b487a59a54f11b0dd084c09d0a Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Thu, 15 Aug 2024 01:07:15 +0000 Subject: [PATCH 090/249] Daily update from Dark Visitors --- robots.json | 450 ++++++++++++++++++++++++++-------------------------- 1 file changed, 225 insertions(+), 225 deletions(-) diff --git a/robots.json b/robots.json index 745bb0b..e80f094 100644 --- a/robots.json +++ b/robots.json @@ -1,233 +1,233 @@ { "Amazonbot": { - "operator": "Amazon", - "respect": "Yes", + "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses.", + "frequency": "No information. provided.", "function": "Service improvement and enabling answers for Alexa users.", - "frequency": "No information. provided.", - "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." - }, - "anthropic-ai": { - "operator": "[Anthropic](https://www.anthropic.com)", - "respect": "Unclear at this time.", - "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", - "description": "Scrapes data to train LLMs and AI products offered by Anthropic." - }, - "Applebot-Extended": { - "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", - "respect": "Yes", - "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", - "frequency": "Unclear at this time.", - "description": "Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools." - }, - "Bytespider": { - "operator": "ByteDance", - "respect": "No", - "function": "LLM training.", - "frequency": "Unclear at this time.", - "description": "Downloads data to train LLMS, including ChatGPT competitors." - }, - "CCBot": { - "operator": "[Common Crawl](https://commoncrawl.org)", - "respect": "[Yes](https://commoncrawl.org/ccbot)", - "function": "Provides crawl data for an open source repository that has been used to train LLMs.", - "frequency": "Unclear at this time.", - "description": "Sources data that is made openly available and is used to train AI models." - }, - "ChatGPT-User": { - "operator": "[OpenAI](https://openai.com)", - "respect": "Yes", - "function": "Takes action based on user prompts.", - "frequency": "Only when prompted by a user.", - "description": "Used by plugins in ChatGPT to answer queries based on user input." - }, - "ClaudeBot": { - "operator": "[Anthropic](https://www.anthropic.com)", - "respect": "Unclear at this time.", - "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", - "description": "Scrapes data to train LLMs and AI products offered by Anthropic." - }, - "Claude-Web": { - "operator": "[Anthropic](https://www.anthropic.com)", - "respect": "Unclear at this time.", - "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", - "description": "Scrapes data to train LLMs and AI products offered by Anthropic." - }, - "cohere-ai": { - "operator": "[Cohere](https://cohere.com)", - "respect": "Unclear at this time.", - "function": "Retrieves data to provide responses to user-initiated prompts.", - "frequency": "Takes action based on user prompts.", - "description": "Retrieves data based on user prompts." - }, - "Diffbot": { - "operator": "[Diffbot](https://www.diffbot.com/)", - "respect": "At the discretion of Diffbot users.", - "function": "Aggregates structured web data for monitoring and AI model training.", - "frequency": "Unclear at this time.", - "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." - }, - "FacebookBot": { - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", - "function": "Training language models", - "frequency": "Up to 1 page per second", - "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." - }, - "facebookexternalhit": { - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", - "function": "No information.", - "frequency": "Unclear at this time.", - "description": "Unclear at this time." - }, - "FriendlyCrawler": { - "operator": "Unknown", - "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)", - "function": "We are using the data from the crawler to build datasets for machine learning experiments.", - "frequency": "Unclear at this time.", - "description": "Unclear who the operator is; but data is used for training/machine learning." - }, - "Google-Extended": { - "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", - "function": "LLM training.", - "frequency": "No information.", - "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." - }, - "GoogleOther": { - "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", - "function": "Scrapes data.", - "frequency": "No information.", - "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" - }, - "GoogleOther-Image": { - "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", - "function": "Scrapes data.", - "frequency": "No information.", - "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" - }, - "GoogleOther-Video": { - "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", - "function": "Scrapes data.", - "frequency": "No information.", - "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"" - }, - "GPTBot": { - "operator": "[OpenAI](https://openai.com)", - "respect": "Yes", - "function": "Scrapes data to train OpenAI's products.", - "frequency": "No information.", - "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." - }, - "ICC-Crawler": { - "operator": "[NICT](https://nict.go.jp)", - "respect": "Yes", - "function": "Scrapes data to train and support AI technologies.", - "frequency": "No information.", - "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business." - }, - "ImagesiftBot": { - "operator": "[ImageSift](https://imagesift.com)", - "respect": "[Yes](https://imagesift.com/about)", - "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", - "frequency": "No information.", - "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images." - }, - "img2dataset": { - "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", - "respect": "Unclear at this time.", - "function": "Scrapes images for use in LLMs.", - "frequency": "At the discretion of img2dataset users.", - "description": "Downloads large sets of images into datasets for LLM training or other purposes." - }, - "Meta-ExternalAgent": { - "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", - "respect": "Yes.", - "function": "Used to train models and improve products.", - "frequency": "No information.", - "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" - }, - "OAI-SearchBot": { - "operator": "[OpenAI](https://openai.com)", - "respect": "[Yes](https://platform.openai.com/docs/bots)", - "function": "Search result generation.", - "frequency": "No information.", - "description": "Crawls sites to surface as results in SearchGPT." - }, - "omgili": { - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", - "function": "Data is sold.", - "frequency": "No information.", - "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." - }, - "omgilibot": { - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)", - "function": "Data is sold.", - "frequency": "No information.", - "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io." - }, - "PerplexityBot": { - "operator": "[Perplexity](https://www.perplexity.ai/)", - "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", - "function": "Used to answer queries at the request of users.", - "frequency": "Takes action based on user prompts.", - "description": "Operated by Perplexity to obtain results in response to user queries." - }, - "PetalBot": { - "operator": "[Huawei](https://huawei.com/)", - "respect": "Yes", - "function": "Used to provide recommendations in Hauwei assistant and AI search services.", - "frequency": "No explicit frequency provided.", - "description": "Operated by Huawei to provide search and AI assistant services." - }, - "Scrapy": { - "operator": "[Zyte](https://www.zyte.com)", - "respect": "Unclear at this time.", - "function": "Scrapes data a variety of uses including training AI.", - "frequency": "No information.", - "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"" - }, - "Timpibot": { - "operator": "[Timpi](https://timpi.io)", - "respect": "Unclear at this time.", - "function": "Scrapes data for use in training LLMs.", - "frequency": "No information.", - "description": "Makes data available for training AI models." - }, - "VelenPublicWebCrawler": { - "operator": "[Velen Crawler](https://velen.io)", - "respect": "[Yes](https://velen.io)", - "function": "Scrapes data for business data sets and machine learning models.", - "frequency": "No information.", - "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"" - }, - "YouBot": { - "operator": "[You](https://about.you.com/youchat/)", - "respect": "[Yes](https://about.you.com/youbot/)", - "function": "Scrapes data for search engine and LLMs.", - "frequency": "No information.", - "description": "Retrieves data used for You.com web search engine and LLMs." - }, - "Meta-ExternalFetcher": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "AI Assistants", - "frequency": "Unclear at this time.", - "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" + "operator": "Amazon", + "respect": "Yes" }, "Applebot": { - "operator": "Unclear at this time.", - "respect": "Unclear at this time.", - "function": "AI Search Crawlers", + "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot", "frequency": "Unclear at this time.", - "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" + "function": "AI Search Crawlers", + "operator": "Unclear at this time.", + "respect": "Unclear at this time." + }, + "Applebot-Extended": { + "description": "Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.", + "frequency": "Unclear at this time.", + "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", + "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", + "respect": "Yes" + }, + "Bytespider": { + "description": "Downloads data to train LLMS, including ChatGPT competitors.", + "frequency": "Unclear at this time.", + "function": "LLM training.", + "operator": "ByteDance", + "respect": "No" + }, + "CCBot": { + "description": "Sources data that is made openly available and is used to train AI models.", + "frequency": "Unclear at this time.", + "function": "Provides crawl data for an open source repository that has been used to train LLMs.", + "operator": "[Common Crawl](https://commoncrawl.org)", + "respect": "[Yes](https://commoncrawl.org/ccbot)" + }, + "ChatGPT-User": { + "description": "Used by plugins in ChatGPT to answer queries based on user input.", + "frequency": "Only when prompted by a user.", + "function": "Takes action based on user prompts.", + "operator": "[OpenAI](https://openai.com)", + "respect": "Yes" + }, + "Claude-Web": { + "description": "Scrapes data to train LLMs and AI products offered by Anthropic.", + "frequency": "No information. provided.", + "function": "Scrapes data to train Anthropic's AI products.", + "operator": "[Anthropic](https://www.anthropic.com)", + "respect": "Unclear at this time." + }, + "ClaudeBot": { + "description": "Scrapes data to train LLMs and AI products offered by Anthropic.", + "frequency": "No information. provided.", + "function": "Scrapes data to train Anthropic's AI products.", + "operator": "[Anthropic](https://www.anthropic.com)", + "respect": "Unclear at this time." + }, + "Diffbot": { + "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training.", + "frequency": "Unclear at this time.", + "function": "Aggregates structured web data for monitoring and AI model training.", + "operator": "[Diffbot](https://www.diffbot.com/)", + "respect": "At the discretion of Diffbot users." + }, + "FacebookBot": { + "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically.", + "frequency": "Up to 1 page per second", + "function": "Training language models", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)" + }, + "FriendlyCrawler": { + "description": "Unclear who the operator is; but data is used for training/machine learning.", + "frequency": "Unclear at this time.", + "function": "We are using the data from the crawler to build datasets for machine learning experiments.", + "operator": "Unknown", + "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)" + }, + "GPTBot": { + "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies.", + "frequency": "No information.", + "function": "Scrapes data to train OpenAI's products.", + "operator": "[OpenAI](https://openai.com)", + "respect": "Yes" + }, + "Google-Extended": { + "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search.", + "frequency": "No information.", + "function": "LLM training.", + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + }, + "GoogleOther": { + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"", + "frequency": "No information.", + "function": "Scrapes data.", + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + }, + "GoogleOther-Image": { + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"", + "frequency": "No information.", + "function": "Scrapes data.", + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + }, + "GoogleOther-Video": { + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"", + "frequency": "No information.", + "function": "Scrapes data.", + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + }, + "ICC-Crawler": { + "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business.", + "frequency": "No information.", + "function": "Scrapes data to train and support AI technologies.", + "operator": "[NICT](https://nict.go.jp)", + "respect": "Yes" + }, + "ImagesiftBot": { + "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images.", + "frequency": "No information.", + "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", + "operator": "[ImageSift](https://imagesift.com)", + "respect": "[Yes](https://imagesift.com/about)" + }, + "Meta-ExternalAgent": { + "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"", + "frequency": "No information.", + "function": "Used to train models and improve products.", + "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", + "respect": "Yes." + }, + "Meta-ExternalFetcher": { + "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher", + "frequency": "Unclear at this time.", + "function": "AI Assistants", + "operator": "Unclear at this time.", + "respect": "Unclear at this time." + }, + "OAI-SearchBot": { + "description": "Crawls sites to surface as results in SearchGPT.", + "frequency": "No information.", + "function": "Search result generation.", + "operator": "[OpenAI](https://openai.com)", + "respect": "[Yes](https://platform.openai.com/docs/bots)" + }, + "PerplexityBot": { + "description": "Operated by Perplexity to obtain results in response to user queries.", + "frequency": "Takes action based on user prompts.", + "function": "Used to answer queries at the request of users.", + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)" + }, + "PetalBot": { + "description": "Operated by Huawei to provide search and AI assistant services.", + "frequency": "No explicit frequency provided.", + "function": "Used to provide recommendations in Hauwei assistant and AI search services.", + "operator": "[Huawei](https://huawei.com/)", + "respect": "Yes" + }, + "Scrapy": { + "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"", + "frequency": "No information.", + "function": "Scrapes data a variety of uses including training AI.", + "operator": "[Zyte](https://www.zyte.com)", + "respect": "Unclear at this time." + }, + "Timpibot": { + "description": "Makes data available for training AI models.", + "frequency": "No information.", + "function": "Scrapes data for use in training LLMs.", + "operator": "[Timpi](https://timpi.io)", + "respect": "Unclear at this time." + }, + "VelenPublicWebCrawler": { + "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"", + "frequency": "No information.", + "function": "Scrapes data for business data sets and machine learning models.", + "operator": "[Velen Crawler](https://velen.io)", + "respect": "[Yes](https://velen.io)" + }, + "YouBot": { + "description": "Retrieves data used for You.com web search engine and LLMs.", + "frequency": "No information.", + "function": "Scrapes data for search engine and LLMs.", + "operator": "[You](https://about.you.com/youchat/)", + "respect": "[Yes](https://about.you.com/youbot/)" + }, + "anthropic-ai": { + "description": "Scrapes data to train LLMs and AI products offered by Anthropic.", + "frequency": "No information. provided.", + "function": "Scrapes data to train Anthropic's AI products.", + "operator": "[Anthropic](https://www.anthropic.com)", + "respect": "Unclear at this time." + }, + "cohere-ai": { + "description": "Retrieves data based on user prompts.", + "frequency": "Takes action based on user prompts.", + "function": "Retrieves data to provide responses to user-initiated prompts.", + "operator": "[Cohere](https://cohere.com)", + "respect": "Unclear at this time." + }, + "facebookexternalhit": { + "description": "Unclear at this time.", + "frequency": "Unclear at this time.", + "function": "No information.", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)" + }, + "img2dataset": { + "description": "Downloads large sets of images into datasets for LLM training or other purposes.", + "frequency": "At the discretion of img2dataset users.", + "function": "Scrapes images for use in LLMs.", + "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", + "respect": "Unclear at this time." + }, + "omgili": { + "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training.", + "frequency": "No information.", + "function": "Data is sold.", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)" + }, + "omgilibot": { + "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io.", + "frequency": "No information.", + "function": "Data is sold.", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" } } \ No newline at end of file From 3ef9cb7ce4fda30ec48902076c7171644190867c Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 16 Aug 2024 01:10:13 +0000 Subject: [PATCH 091/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 6f06862..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: anthropic-ai -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: ClaudeBot -User-agent: Claude-Web -User-agent: cohere-ai -User-agent: Diffbot -User-agent: FacebookBot -User-agent: facebookexternalhit -User-agent: FriendlyCrawler -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: GPTBot -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: img2dataset -User-agent: Meta-ExternalAgent -User-agent: OAI-SearchBot -User-agent: omgili -User-agent: omgilibot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: Meta-ExternalFetcher -User-agent: Applebot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 5f07c1a..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | From 2a075cb2f13bac6a4995e6ecf0272ce58c1ca21c Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 16 Aug 2024 01:10:14 +0000 Subject: [PATCH 092/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..175610d --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..810093c --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 558d5871b2218f216cc5f67f2e6485403911b844 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 17 Aug 2024 01:08:17 +0000 Subject: [PATCH 093/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 175610d..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 810093c..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 3afcefdff5fd5cdcd80756f535ffced52472d5ea Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 17 Aug 2024 01:08:17 +0000 Subject: [PATCH 094/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..175610d --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..810093c --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 60ff792ba9839f2bde6e4d10c4f6e4ea2b707bb7 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 18 Aug 2024 01:14:49 +0000 Subject: [PATCH 095/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 175610d..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 810093c..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From b8e68c12f35ee9eca6ffe32474e9a0db75766c65 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 18 Aug 2024 01:14:50 +0000 Subject: [PATCH 096/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..175610d --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..810093c --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 2363e57608d9fda2ae6d5305eaa03d31f5f2e1b9 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Sun, 18 Aug 2024 11:34:08 -0700 Subject: [PATCH 097/249] chore: minor update --- .github/FUNDING.yml | 14 -------------- 1 file changed, 14 deletions(-) delete mode 100644 .github/FUNDING.yml diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml deleted file mode 100644 index 3c73694..0000000 --- a/.github/FUNDING.yml +++ /dev/null @@ -1,14 +0,0 @@ -# These are supported funding model platforms - -github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] -patreon: # Replace with a single Patreon username -open_collective: # Replace with a single Open Collective username -ko_fi: # Replace with a single Ko-fi username -tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel -community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry -liberapay: # Replace with a single Liberapay username -issuehunt: # Replace with a single IssueHunt username -lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry -polar: # Replace with a single Polar username -buy_me_a_coffee: cory -custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] From 1d3194f75d69f9b4627abcfa918d59d9f1170b9c Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Sun, 18 Aug 2024 11:34:43 -0700 Subject: [PATCH 098/249] chore: update readme --- README.md | 4 ---- 1 file changed, 4 deletions(-) diff --git a/README.md b/README.md index 60ac9ec..c2d79f5 100644 --- a/README.md +++ b/README.md @@ -33,7 +33,3 @@ If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your- - [Blocking Bots With 11ty And Apache](https://flamedfury.com/posts/blocking-bots-with-11ty-and-apache/) by fLaMEd fury - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman - [Blocking AI web crawlers](https://underlap.org/blocking-ai-web-crawlers) by Glyn Normington - ---- - -Thank you to [Glyn](https://github.com/glyn) for pushing [me](https://coryd.dev) to set this up after [I posted about blocking these crawlers](https://coryd.dev/posts/2024/go-ahead-and-block-ai-web-crawlers/). From 394e447c78803d9cd33cf15a75b1ae1ba821b97e Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 19 Aug 2024 01:11:49 +0000 Subject: [PATCH 099/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 175610d..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 810093c..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 591a99c320c8370992d8e755ab7e2a71cc04f321 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 19 Aug 2024 01:11:49 +0000 Subject: [PATCH 100/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..175610d --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..810093c --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 7e0dd921dbe74814c29646549ea5bd1cbb8fe9fa Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 20 Aug 2024 01:10:11 +0000 Subject: [PATCH 101/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 175610d..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 810093c..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 358df0833e56a1a0eac945fa9e599998b0c478a0 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 20 Aug 2024 01:10:11 +0000 Subject: [PATCH 102/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..175610d --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..810093c --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From fad335178f06d957dda878c0f622f42d8bf6506e Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 21 Aug 2024 01:10:10 +0000 Subject: [PATCH 103/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 175610d..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 810093c..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 3580a7096ff3920a444ab271299ebbda2fe154a5 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 21 Aug 2024 01:10:11 +0000 Subject: [PATCH 104/249] Daily update from Dark Visitors --- robots.txt | 34 ++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..175610d --- /dev/null +++ b/robots.txt @@ -0,0 +1,34 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..810093c --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,35 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 7bfc1647a85422b2a9feb33495bb484023e4d0fb Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Thu, 22 Aug 2024 01:11:43 +0000 Subject: [PATCH 105/249] Daily update from Dark Visitors --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index e80f094..b023009 100644 --- a/robots.json +++ b/robots.json @@ -181,6 +181,13 @@ "operator": "[Velen Crawler](https://velen.io)", "respect": "[Yes](https://velen.io)" }, + "Webzio-Extended": { + "description": "Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended", + "frequency": "Unclear at this time.", + "function": "AI Data Scrapers", + "operator": "Unclear at this time.", + "respect": "Unclear at this time." + }, "YouBot": { "description": "Retrieves data used for You.com web search engine and LLMs.", "frequency": "No information.", From 61d851baf518fa5b153193b80f19dcb80575202c Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 23 Aug 2024 01:10:53 +0000 Subject: [PATCH 106/249] Removing previously generated files --- robots.txt | 34 ---------------------------------- table-of-bot-metrics.md | 35 ----------------------------------- 2 files changed, 69 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 175610d..0000000 --- a/robots.txt +++ /dev/null @@ -1,34 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 810093c..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,35 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From d95f2e80724107ef4706db1ff97acd31735d2747 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 23 Aug 2024 01:10:54 +0000 Subject: [PATCH 107/249] Daily update from Dark Visitors --- robots.txt | 35 +++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..cfdd1ec --- /dev/null +++ b/robots.txt @@ -0,0 +1,35 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..02ecfb8 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,36 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From ac1250cfa5164916111d013ecb9cfb9fc7422514 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 24 Aug 2024 01:09:29 +0000 Subject: [PATCH 108/249] Removing previously generated files --- robots.txt | 35 ----------------------------------- table-of-bot-metrics.md | 36 ------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cfdd1ec..0000000 --- a/robots.txt +++ /dev/null @@ -1,35 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 02ecfb8..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,36 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From b202b9e1e3e30d45616eb4e6343d6574be43345b Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 24 Aug 2024 01:09:29 +0000 Subject: [PATCH 109/249] Daily update from Dark Visitors --- robots.txt | 35 +++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..cfdd1ec --- /dev/null +++ b/robots.txt @@ -0,0 +1,35 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..02ecfb8 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,36 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 907866301fd89ab265a5e570da536892448ed624 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 25 Aug 2024 01:16:27 +0000 Subject: [PATCH 110/249] Removing previously generated files --- robots.txt | 35 ----------------------------------- table-of-bot-metrics.md | 36 ------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cfdd1ec..0000000 --- a/robots.txt +++ /dev/null @@ -1,35 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 02ecfb8..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,36 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 42a7ca7eda25611132924518c41d8d0d6969727f Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 25 Aug 2024 01:16:28 +0000 Subject: [PATCH 111/249] Daily update from Dark Visitors --- robots.txt | 35 +++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..cfdd1ec --- /dev/null +++ b/robots.txt @@ -0,0 +1,35 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..02ecfb8 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,36 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 6cb9bc8ebfe90d078eff06003f005bb52390d9ca Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 26 Aug 2024 01:11:40 +0000 Subject: [PATCH 112/249] Removing previously generated files --- robots.txt | 35 ----------------------------------- table-of-bot-metrics.md | 36 ------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cfdd1ec..0000000 --- a/robots.txt +++ /dev/null @@ -1,35 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 02ecfb8..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,36 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From ccec3eef152af0553389f3ef69552e539243f18b Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 26 Aug 2024 01:11:41 +0000 Subject: [PATCH 113/249] Daily update from Dark Visitors --- robots.txt | 35 +++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..cfdd1ec --- /dev/null +++ b/robots.txt @@ -0,0 +1,35 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..02ecfb8 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,36 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 699862f4bde063bba1a75c1386e2ddccf3c91d42 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 27 Aug 2024 01:12:19 +0000 Subject: [PATCH 114/249] Removing previously generated files --- robots.txt | 35 ----------------------------------- table-of-bot-metrics.md | 36 ------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cfdd1ec..0000000 --- a/robots.txt +++ /dev/null @@ -1,35 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 02ecfb8..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,36 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 84a2376f652c7f31c5b35010d2c19c5d73b81557 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 27 Aug 2024 01:12:20 +0000 Subject: [PATCH 115/249] Daily update from Dark Visitors --- robots.txt | 35 +++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..cfdd1ec --- /dev/null +++ b/robots.txt @@ -0,0 +1,35 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..02ecfb8 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,36 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 00ef18f93c56682410093ee820342716fc5f38ce Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 28 Aug 2024 01:12:35 +0000 Subject: [PATCH 116/249] Removing previously generated files --- robots.txt | 35 ----------------------------------- table-of-bot-metrics.md | 36 ------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cfdd1ec..0000000 --- a/robots.txt +++ /dev/null @@ -1,35 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 02ecfb8..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,36 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 1d417ffab9329a4143334c1c8189f31b67560093 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 28 Aug 2024 01:12:35 +0000 Subject: [PATCH 117/249] Daily update from Dark Visitors --- robots.txt | 35 +++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..cfdd1ec --- /dev/null +++ b/robots.txt @@ -0,0 +1,35 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..02ecfb8 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,36 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 71eefcdb052b3a38e206afb947f9f2177c7599ef Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 29 Aug 2024 01:13:19 +0000 Subject: [PATCH 118/249] Removing previously generated files --- robots.txt | 35 ----------------------------------- table-of-bot-metrics.md | 36 ------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cfdd1ec..0000000 --- a/robots.txt +++ /dev/null @@ -1,35 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 02ecfb8..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,36 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 6dc900b5826b046e937772517be8c4213eabfc7f Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 29 Aug 2024 01:13:19 +0000 Subject: [PATCH 119/249] Daily update from Dark Visitors --- robots.txt | 35 +++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..cfdd1ec --- /dev/null +++ b/robots.txt @@ -0,0 +1,35 @@ +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..02ecfb8 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,36 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 0f8723558f0f16bd4ad292a74ed1c3b883588a57 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Wed, 28 Aug 2024 20:07:32 -0700 Subject: [PATCH 120/249] chore: add ai2bot --- robots.json | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/robots.json b/robots.json index b023009..a1cfaa1 100644 --- a/robots.json +++ b/robots.json @@ -1,4 +1,18 @@ { + "AI2Bot": { + "description": "Explores 'certain domains' to find web content.", + "frequency": "No information. provided.", + "function": "Content is used to train open language models.", + "operator": "[Ai2](https://allenai.org/crawler)", + "respect": "Yes" + }, + "Ai2Bot-Dolma": { + "description": "Explores 'certain domains' to find web content.", + "frequency": "No information. provided.", + "function": "Content is used to train open language models.", + "operator": "[Ai2](https://allenai.org/crawler)", + "respect": "Yes" + }, "Amazonbot": { "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses.", "frequency": "No information. provided.", From 3bce634e4af7fb699bbc34686408377762680619 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 29 Aug 2024 03:07:51 +0000 Subject: [PATCH 121/249] Removing previously generated files --- robots.txt | 35 ----------------------------------- table-of-bot-metrics.md | 36 ------------------------------------ 2 files changed, 71 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index cfdd1ec..0000000 --- a/robots.txt +++ /dev/null @@ -1,35 +0,0 @@ -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 02ecfb8..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,36 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 008a34ceb42994df3c9e7f703bf8679c85c51fa2 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 29 Aug 2024 03:07:52 +0000 Subject: [PATCH 122/249] chore: add ai2bot --- robots.txt | 37 +++++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..4fdca4c --- /dev/null +++ b/robots.txt @@ -0,0 +1,37 @@ +User-agent: AI2Bot +User-agent: Ai2Bot-Dolma +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1a96903 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,38 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From b2970316d8ae56515662792a62893f056566026a Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 30 Aug 2024 01:13:29 +0000 Subject: [PATCH 123/249] Removing previously generated files --- robots.txt | 37 ------------------------------------- table-of-bot-metrics.md | 38 -------------------------------------- 2 files changed, 75 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 4fdca4c..0000000 --- a/robots.txt +++ /dev/null @@ -1,37 +0,0 @@ -User-agent: AI2Bot -User-agent: Ai2Bot-Dolma -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1a96903..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,38 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 054c97ad4f6c8c8819451c1e7b77c00662f42a71 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 30 Aug 2024 01:13:29 +0000 Subject: [PATCH 124/249] Daily update from Dark Visitors --- robots.txt | 37 +++++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..4fdca4c --- /dev/null +++ b/robots.txt @@ -0,0 +1,37 @@ +User-agent: AI2Bot +User-agent: Ai2Bot-Dolma +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1a96903 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,38 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 9a4ebb57ee480141a5181d29b2645efafeeb2ec4 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 31 Aug 2024 01:13:04 +0000 Subject: [PATCH 125/249] Removing previously generated files --- robots.txt | 37 ------------------------------------- table-of-bot-metrics.md | 38 -------------------------------------- 2 files changed, 75 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 4fdca4c..0000000 --- a/robots.txt +++ /dev/null @@ -1,37 +0,0 @@ -User-agent: AI2Bot -User-agent: Ai2Bot-Dolma -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1a96903..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,38 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 9a7f556d87fb09609f465547c4317a775346f55a Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 31 Aug 2024 01:13:04 +0000 Subject: [PATCH 126/249] Daily update from Dark Visitors --- robots.txt | 37 +++++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..4fdca4c --- /dev/null +++ b/robots.txt @@ -0,0 +1,37 @@ +User-agent: AI2Bot +User-agent: Ai2Bot-Dolma +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1a96903 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,38 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 01589718dff5cee3e270a928521984a407a5a979 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 1 Sep 2024 01:24:52 +0000 Subject: [PATCH 127/249] Removing previously generated files --- robots.txt | 37 ------------------------------------- table-of-bot-metrics.md | 38 -------------------------------------- 2 files changed, 75 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 4fdca4c..0000000 --- a/robots.txt +++ /dev/null @@ -1,37 +0,0 @@ -User-agent: AI2Bot -User-agent: Ai2Bot-Dolma -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1a96903..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,38 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 543e993b08de1b03844c4ad2312e2a36768cdace Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 1 Sep 2024 01:24:53 +0000 Subject: [PATCH 128/249] Daily update from Dark Visitors --- robots.txt | 37 +++++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..4fdca4c --- /dev/null +++ b/robots.txt @@ -0,0 +1,37 @@ +User-agent: AI2Bot +User-agent: Ai2Bot-Dolma +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1a96903 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,38 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 567bd00aec0cf1397478e8e1bb99965770721ebf Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 2 Sep 2024 01:15:07 +0000 Subject: [PATCH 129/249] Removing previously generated files --- robots.txt | 37 ------------------------------------- table-of-bot-metrics.md | 38 -------------------------------------- 2 files changed, 75 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 4fdca4c..0000000 --- a/robots.txt +++ /dev/null @@ -1,37 +0,0 @@ -User-agent: AI2Bot -User-agent: Ai2Bot-Dolma -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1a96903..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,38 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From c9325c9e18d00786de7d0cec50ab82bf9e84b6f5 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 2 Sep 2024 01:15:07 +0000 Subject: [PATCH 130/249] Daily update from Dark Visitors --- robots.txt | 37 +++++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..4fdca4c --- /dev/null +++ b/robots.txt @@ -0,0 +1,37 @@ +User-agent: AI2Bot +User-agent: Ai2Bot-Dolma +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1a96903 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,38 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From cc18b8617c7c01b343e7534bf948bff8667856cf Mon Sep 17 00:00:00 2001 From: nisbet-hubbard <87453615+nisbet-hubbard@users.noreply.github.com> Date: Tue, 3 Sep 2024 07:48:48 +0800 Subject: [PATCH 131/249] Update main.yml --- .github/workflows/main.yml | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 4b127d7..bd10a45 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -20,14 +20,7 @@ jobs: - run: | git config --global user.name "ai.robots.txt" git config --global user.email "ai.robots.txt@users.noreply.github.com" - git rm robots.txt - git rm table-of-bot-metrics.md - git add -A - git commit -m "Removing previously generated files" - git push php -f code/action.php - git config --global user.name "ai.robots.txt" - git config --global user.email "ai.robots.txt@users.noreply.github.com" git add -A if [ -n "${{ inputs.message }}" ]; then git commit -m "${{ inputs.message }}" @@ -35,4 +28,4 @@ jobs: git commit -m "${{ github.event.head_commit.message }}" fi git push - shell: bash \ No newline at end of file + shell: bash From 7151f6c5695d704df25fa115734bacd0122f3fb2 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 3 Sep 2024 01:12:56 +0000 Subject: [PATCH 132/249] Removing previously generated files --- robots.txt | 37 ------------------------------------- table-of-bot-metrics.md | 38 -------------------------------------- 2 files changed, 75 deletions(-) delete mode 100644 robots.txt delete mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt deleted file mode 100644 index 4fdca4c..0000000 --- a/robots.txt +++ /dev/null @@ -1,37 +0,0 @@ -User-agent: AI2Bot -User-agent: Ai2Bot-Dolma -User-agent: Amazonbot -User-agent: Applebot -User-agent: Applebot-Extended -User-agent: Bytespider -User-agent: CCBot -User-agent: ChatGPT-User -User-agent: Claude-Web -User-agent: ClaudeBot -User-agent: Diffbot -User-agent: FacebookBot -User-agent: FriendlyCrawler -User-agent: GPTBot -User-agent: Google-Extended -User-agent: GoogleOther -User-agent: GoogleOther-Image -User-agent: GoogleOther-Video -User-agent: ICC-Crawler -User-agent: ImagesiftBot -User-agent: Meta-ExternalAgent -User-agent: Meta-ExternalFetcher -User-agent: OAI-SearchBot -User-agent: PerplexityBot -User-agent: PetalBot -User-agent: Scrapy -User-agent: Timpibot -User-agent: VelenPublicWebCrawler -User-agent: Webzio-Extended -User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot -Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md deleted file mode 100644 index 1a96903..0000000 --- a/table-of-bot-metrics.md +++ /dev/null @@ -1,38 +0,0 @@ -| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| -| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From fb5c995243c74389117589ed2a2b6d68abbb9a72 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 3 Sep 2024 01:12:57 +0000 Subject: [PATCH 133/249] Daily update from Dark Visitors --- robots.txt | 37 +++++++++++++++++++++++++++++++++++++ table-of-bot-metrics.md | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) create mode 100644 robots.txt create mode 100644 table-of-bot-metrics.md diff --git a/robots.txt b/robots.txt new file mode 100644 index 0000000..4fdca4c --- /dev/null +++ b/robots.txt @@ -0,0 +1,37 @@ +User-agent: AI2Bot +User-agent: Ai2Bot-Dolma +User-agent: Amazonbot +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: Diffbot +User-agent: FacebookBot +User-agent: FriendlyCrawler +User-agent: GPTBot +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +User-agent: anthropic-ai +User-agent: cohere-ai +User-agent: facebookexternalhit +User-agent: img2dataset +User-agent: omgili +User-agent: omgilibot +Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md new file mode 100644 index 0000000..1a96903 --- /dev/null +++ b/table-of-bot-metrics.md @@ -0,0 +1,38 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 837329440466bc3bfe145d23610e7b87787b3ec8 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Fri, 6 Sep 2024 19:05:26 -0700 Subject: [PATCH 134/249] chore: add iaskspider/2.0 --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index a1cfaa1..c31df62 100644 --- a/robots.json +++ b/robots.json @@ -125,6 +125,13 @@ "operator": "Google", "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" }, + "iaskspider/2.0": { + "description": "Used to provide answers to user queries.", + "frequency": "Unclear at this time.", + "function": "Crawls sites to provide answers to user queries.", + "operator": "iAsk", + "respect": "No" + }, "ICC-Crawler": { "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business.", "frequency": "No information.", From 1c1b42368407484f765ddf7af2266cd9fa44fbb9 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 7 Sep 2024 02:05:43 +0000 Subject: [PATCH 135/249] chore: add iaskspider/2.0 --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index 4fdca4c..da17a01 100644 --- a/robots.txt +++ b/robots.txt @@ -16,6 +16,7 @@ User-agent: Google-Extended User-agent: GoogleOther User-agent: GoogleOther-Image User-agent: GoogleOther-Video +User-agent: iaskspider/2.0 User-agent: ICC-Crawler User-agent: ImagesiftBot User-agent: Meta-ExternalAgent diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 1a96903..39b7959 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -18,6 +18,7 @@ | GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | | ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | | Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | From 809851ae88fdc0fe5129484fe0e6e000d2734ede Mon Sep 17 00:00:00 2001 From: Malte Ubl Date: Sat, 7 Sep 2024 15:59:25 -0700 Subject: [PATCH 136/249] Add instructions for AI bot blocking on Vercel --- FAQ.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/FAQ.md b/FAQ.md index d60ca9a..49cbdfb 100644 --- a/FAQ.md +++ b/FAQ.md @@ -33,6 +33,8 @@ That depends on your stack. - Cloudflare - [Block AI bots, scrapers and crawlers with a single click](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) by Cloudflare - [I’m blocking AI crawlers](https://roelant.net/en/2024/im-blocking-ai-crawlers-part-2/) by Roelant +- Vercel + - [Block AI Bots Firewall Rule](https://vercel.com/templates/firewall/block-ai-bots-firewall-rule) by Vercel ## Why should we block these crawlers? From 5963cbf9f79404095221f4e8c14ce0f54bd3b627 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Sun, 8 Sep 2024 01:19:31 +0000 Subject: [PATCH 137/249] Daily update from Dark Visitors --- robots.json | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/robots.json b/robots.json index c31df62..a53cebd 100644 --- a/robots.json +++ b/robots.json @@ -125,13 +125,6 @@ "operator": "Google", "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" }, - "iaskspider/2.0": { - "description": "Used to provide answers to user queries.", - "frequency": "Unclear at this time.", - "function": "Crawls sites to provide answers to user queries.", - "operator": "iAsk", - "respect": "No" - }, "ICC-Crawler": { "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business.", "frequency": "No information.", @@ -237,6 +230,13 @@ "operator": "Meta/Facebook", "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)" }, + "iaskspider/2.0": { + "description": "Used to provide answers to user queries.", + "frequency": "Unclear at this time.", + "function": "Crawls sites to provide answers to user queries.", + "operator": "iAsk", + "respect": "No" + }, "img2dataset": { "description": "Downloads large sets of images into datasets for LLM training or other purposes.", "frequency": "At the discretion of img2dataset users.", From 6b8d7f5890d6bed722a95297996c054c210bd3b8 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 9 Sep 2024 01:16:21 +0000 Subject: [PATCH 138/249] Daily update from Dark Visitors --- robots.txt | 2 +- table-of-bot-metrics.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/robots.txt b/robots.txt index da17a01..e097a47 100644 --- a/robots.txt +++ b/robots.txt @@ -16,7 +16,6 @@ User-agent: Google-Extended User-agent: GoogleOther User-agent: GoogleOther-Image User-agent: GoogleOther-Video -User-agent: iaskspider/2.0 User-agent: ICC-Crawler User-agent: ImagesiftBot User-agent: Meta-ExternalAgent @@ -32,6 +31,7 @@ User-agent: YouBot User-agent: anthropic-ai User-agent: cohere-ai User-agent: facebookexternalhit +User-agent: iaskspider/2.0 User-agent: img2dataset User-agent: omgili User-agent: omgilibot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 39b7959..d9441b5 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -18,7 +18,6 @@ | GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | | ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | | Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | @@ -34,6 +33,7 @@ | anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | | facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | | img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 0106d4b15a4e1d3e0fe15ff0c137b53767afe7b4 Mon Sep 17 00:00:00 2001 From: Urvish Patel <169079981+urvish-p80@users.noreply.github.com> Date: Mon, 23 Sep 2024 08:19:27 -0400 Subject: [PATCH 139/249] Add additional resource - README.md A detailed blogpost to - See the live dashboard showing the websites that are blocking AI Bots such as GPTBot, CCBot, Google-extended and ByteSpider from crawling and scraping the content on their website. Learn which AI crawlers / scrapers do what and how to block them using Robots.txt. --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index c2d79f5..9502b0d 100644 --- a/README.md +++ b/README.md @@ -33,3 +33,4 @@ If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your- - [Blocking Bots With 11ty And Apache](https://flamedfury.com/posts/blocking-bots-with-11ty-and-apache/) by fLaMEd fury - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman - [Blocking AI web crawlers](https://underlap.org/blocking-ai-web-crawlers) by Glyn Normington +- [Block AI Bots from Crawling Websites Using Robots.txt](https://originality.ai/ai-bot-blocking) by Jonathan Gillham, Originality.AI From af05890b078a28251b8cd75e6a97ebf0441d1b35 Mon Sep 17 00:00:00 2001 From: Julian Mair <13933169+cityrolr@users.noreply.github.com> Date: Mon, 23 Sep 2024 23:27:27 +0200 Subject: [PATCH 140/249] Update README.md For people who don't use or don't want to use RSS for this, I've added a little explanation of how to subscribe to releases via GitHub. --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index c2d79f5..3d79036 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,8 @@ https://github.com/ai-robots-txt/ai.robots.txt/releases.atom You can subscribe with [Feedly](https://feedly.com/i/subscription/feed/https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), [Inoreader](https://www.inoreader.com/?add_feed=https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), [The Old Reader](https://theoldreader.com/feeds/subscribe?url=https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), [Feedbin](https://feedbin.me/?subscribe=https://github.com/ai-robots-txt/ai.robots.txt/releases.atom), or any other reader app. +Alternatively, you can also subscribe to new releases with your GitHub account by clicking the ⬇️ on "Watch" button at the top of this page, clicking "Custom" and selecting "Releases". + ## Report abusive crawlers If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https://docs.google.com/forms/d/e/1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg/viewform). From a6de89e6bdcc552a13ac7bd56b78017d251e01bc Mon Sep 17 00:00:00 2001 From: Greg Lindahl Date: Thu, 26 Sep 2024 21:41:28 +0000 Subject: [PATCH 141/249] feat: make CCBot entry more accurate --- robots.json | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/robots.json b/robots.json index a53cebd..12ed898 100644 --- a/robots.json +++ b/robots.json @@ -42,10 +42,10 @@ "respect": "No" }, "CCBot": { - "description": "Sources data that is made openly available and is used to train AI models.", - "frequency": "Unclear at this time.", - "function": "Provides crawl data for an open source repository that has been used to train LLMs.", - "operator": "[Common Crawl](https://commoncrawl.org)", + "description": "Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers).", + "frequency": "Monthly at present.", + "function": "Provides open crawl dataset, used for many purposes, including Machine Learning/AI.", + "operator": "[Common Crawl Foundation](https://commoncrawl.org)", "respect": "[Yes](https://commoncrawl.org/ccbot)" }, "ChatGPT-User": { From 44d975c799130d58380b49f5c2bbb4ba33f1ae1a Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Fri, 27 Sep 2024 00:21:49 +0000 Subject: [PATCH 142/249] Merge pull request #42 from commoncrawl/main feat: make CCBot entry more accurate --- table-of-bot-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index d9441b5..213b098 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -6,7 +6,7 @@ | Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | | Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | | Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides crawl data for an open source repository that has been used to train LLMs. | Unclear at this time. | Sources data that is made openly available and is used to train AI models. | +| CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | | ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | | Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | From 7851cea4fd4e233cd7ed74e48c3316d114e29f3b Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Fri, 27 Sep 2024 01:18:04 +0000 Subject: [PATCH 143/249] Daily update from Dark Visitors --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index 12ed898..f8c6876 100644 --- a/robots.json +++ b/robots.json @@ -139,6 +139,13 @@ "operator": "[ImageSift](https://imagesift.com)", "respect": "[Yes](https://imagesift.com/about)" }, + "Kangaroo Bot": { + "description": "Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot", + "frequency": "Unclear at this time.", + "function": "AI Data Scrapers", + "operator": "Unclear at this time.", + "respect": "Unclear at this time." + }, "Meta-ExternalAgent": { "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"", "frequency": "No information.", From 632e9d65109584c2ed12ecf2f4a898e9a276e604 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 28 Sep 2024 01:17:19 +0000 Subject: [PATCH 144/249] Daily update from Dark Visitors --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index e097a47..c11be04 100644 --- a/robots.txt +++ b/robots.txt @@ -18,6 +18,7 @@ User-agent: GoogleOther-Image User-agent: GoogleOther-Video User-agent: ICC-Crawler User-agent: ImagesiftBot +User-agent: Kangaroo Bot User-agent: Meta-ExternalAgent User-agent: Meta-ExternalFetcher User-agent: OAI-SearchBot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 213b098..dfeb86a 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -20,6 +20,7 @@ | GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | | Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | | Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | | OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | From 6a988be27f37e175539920f6cdbf6aa4c89170b3 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Sat, 28 Sep 2024 13:58:00 -0700 Subject: [PATCH 145/249] chore: add sidetrade bot --- robots.json | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/robots.json b/robots.json index f8c6876..83c91a7 100644 --- a/robots.json +++ b/robots.json @@ -184,10 +184,17 @@ "Scrapy": { "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"", "frequency": "No information.", - "function": "Scrapes data a variety of uses including training AI.", + "function": "Scrapes data for a variety of uses including training AI.", "operator": "[Zyte](https://www.zyte.com)", "respect": "Unclear at this time." }, + "Sidetrade indexer bot": { + "description": "AI product training.", + "frequency": "No information.", + "function": "Extracts data for a variety of uses including training AI.", + "operator": "[Sidetrade](https://www.sidetrade.com)", + "respect": "Unclear at this time." + }, "Timpibot": { "description": "Makes data available for training AI models.", "frequency": "No information.", From 6d9ce1d62aa29117e0f7badc23e0b16d0afc3573 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 28 Sep 2024 20:58:18 +0000 Subject: [PATCH 146/249] chore: add sidetrade bot --- robots.txt | 1 + table-of-bot-metrics.md | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/robots.txt b/robots.txt index c11be04..a593d88 100644 --- a/robots.txt +++ b/robots.txt @@ -25,6 +25,7 @@ User-agent: OAI-SearchBot User-agent: PerplexityBot User-agent: PetalBot User-agent: Scrapy +User-agent: Sidetrade indexer bot User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: Webzio-Extended diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index dfeb86a..a77b4bb 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -26,7 +26,8 @@ | OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | | Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | | VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | | Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | From 9c2394f23bc83f06fbc8de410939045e5b3ba1bc Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Mon, 30 Sep 2024 16:25:20 -0700 Subject: [PATCH 147/249] chore: add ISSCyberRiskCrawler --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index 83c91a7..c446cd7 100644 --- a/robots.json +++ b/robots.json @@ -132,6 +132,13 @@ "operator": "[NICT](https://nict.go.jp)", "respect": "Yes" }, + "ISSCyberRiskCrawler": { + "description": "Used to train machine learning based models to quantify cyber risk.", + "frequency": "No information.", + "function": "Scrapes data to train machine learning models.", + "operator": "[ISS-Corporate](https://iss-cyber.com)", + "respect": "No" + }, "ImagesiftBot": { "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images.", "frequency": "No information.", From 6da804e826b2f2b3d889389e961031d44a73f043 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 30 Sep 2024 23:50:18 +0000 Subject: [PATCH 148/249] chore: add ISSCyberRiskCrawler --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index a593d88..739e44f 100644 --- a/robots.txt +++ b/robots.txt @@ -17,6 +17,7 @@ User-agent: GoogleOther User-agent: GoogleOther-Image User-agent: GoogleOther-Video User-agent: ICC-Crawler +User-agent: ISSCyberRiskCrawler User-agent: ImagesiftBot User-agent: Kangaroo Bot User-agent: Meta-ExternalAgent diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index a77b4bb..9f2ca90 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -19,6 +19,7 @@ | GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | | Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | | Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | From dc15afe84705b5324cd9ad6440a696dff6cc49a2 Mon Sep 17 00:00:00 2001 From: Laker Turner Date: Mon, 7 Oct 2024 17:38:01 +0100 Subject: [PATCH 149/249] Update robots.json with Claude respect link --- robots.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/robots.json b/robots.json index c446cd7..6236c72 100644 --- a/robots.json +++ b/robots.json @@ -67,7 +67,7 @@ "frequency": "No information. provided.", "function": "Scrapes data to train Anthropic's AI products.", "operator": "[Anthropic](https://www.anthropic.com)", - "respect": "Unclear at this time." + "respect": "[Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)" }, "Diffbot": { "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training.", @@ -279,4 +279,4 @@ "operator": "[Webz.io](https://webz.io/)", "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" } -} \ No newline at end of file +} From 9be286626d0a47761a1fa3524fb6407f4fa2de38 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Tue, 8 Oct 2024 02:30:17 +0000 Subject: [PATCH 150/249] Merge pull request #43 from lxjv/main Update robots.json with Claude respect link --- table-of-bot-metrics.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 9f2ca90..cf14641 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -9,7 +9,7 @@ | CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | | ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | | Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | From b1491d269460ca57581c2df7cf14b3f3fc4749f3 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Wed, 9 Oct 2024 01:17:37 +0000 Subject: [PATCH 151/249] Daily update from Dark Visitors --- robots.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/robots.json b/robots.json index 6236c72..03db17b 100644 --- a/robots.json +++ b/robots.json @@ -279,4 +279,4 @@ "operator": "[Webz.io](https://webz.io/)", "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" } -} +} \ No newline at end of file From b229f5b9366a0b9a77a4573589ed861de16db435 Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Thu, 17 Oct 2024 12:25:54 +0100 Subject: [PATCH 152/249] Re-order the FAQ The "why" question should come first. --- FAQ.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/FAQ.md b/FAQ.md index 49cbdfb..1b3f247 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1,5 +1,15 @@ # Frequently asked questions +## Why should we block these crawlers? + +They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities. + +**[How Tech Giants Cut Corners to Harvest Data for A.I.](https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?unlocked_article_code=1.ik0.Ofja.L21c1wyW-0xj&ugrp=m)** +> OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems. + +**[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)** +> The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI. + ## How do we know AI companies/bots respect `robots.txt`? The short answer is that we don't. `robots.txt` is a well-established standard, but compliance is voluntary. There is no enforcement mechanism. @@ -36,16 +46,6 @@ That depends on your stack. - Vercel - [Block AI Bots Firewall Rule](https://vercel.com/templates/firewall/block-ai-bots-firewall-rule) by Vercel -## Why should we block these crawlers? - -They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities. - -**[How Tech Giants Cut Corners to Harvest Data for A.I.](https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?unlocked_article_code=1.ik0.Ofja.L21c1wyW-0xj&ugrp=m)** -> OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems. - -**[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)** -> The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI. - ## How can I contribute? Open a pull request. It will be reviewed and acted upon appropriately. **We really appreciate contributions** — this is a community effort. From e6bb7cae9ead3e33078c3b9632a44b3234f241ba Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Thu, 17 Oct 2024 12:27:05 +0100 Subject: [PATCH 153/249] Augment the "why" FAQ Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2419078796 --- FAQ.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/FAQ.md b/FAQ.md index 1b3f247..4d58350 100644 --- a/FAQ.md +++ b/FAQ.md @@ -10,6 +10,8 @@ They're extractive, confer no benefit to the creators of data they're ingesting **[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)** > The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI. +Crawlers also sometimes impact the performance of crawled sites, or even take them down. + ## How do we know AI companies/bots respect `robots.txt`? The short answer is that we don't. `robots.txt` is a well-established standard, but compliance is voluntary. There is no enforcement mechanism. From 7bb5efd462ffe1ef80e13468a660a82e6987df81 Mon Sep 17 00:00:00 2001 From: Ivan Sagalaev Date: Thu, 17 Oct 2024 21:08:43 -0400 Subject: [PATCH 154/249] Sort the content case-insensitively before dumping to JSON --- code/dark_visitors.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 484daa1..7d29c65 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -70,4 +70,6 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): } print(f"Total: {len(existing_content)}") -Path("./robots.json").write_text(json.dumps(existing_content, indent=4, sort_keys=True)) \ No newline at end of file +sorted_keys = sorted(existing_content, key=lambda k: k.lower()) +existing_content = {k: existing_content[k] for k in sorted_keys} +Path("./robots.json").write_text(json.dumps(existing_content, indent=4)) \ No newline at end of file From cfaade6e2f8e55b462328262a381386079238943 Mon Sep 17 00:00:00 2001 From: Fabian Egli Date: Sat, 19 Oct 2024 00:01:15 +0200 Subject: [PATCH 155/249] log the diff in the update action daily_update.yml --- .github/workflows/daily_update.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 2ae0398..e0ce102 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -16,6 +16,7 @@ jobs: git config --global user.name "dark-visitors" git config --global user.email "dark-visitors@users.noreply.github.com" python code/dark_visitors.py + git --no-pager diff git add -A git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) shell: bash @@ -24,4 +25,4 @@ jobs: uses: ./.github/workflows/main.yml secrets: inherit with: - message: "Daily update from Dark Visitors" \ No newline at end of file + message: "Daily update from Dark Visitors" From a46d06d436584273b99cdaa45837560f9d46204b Mon Sep 17 00:00:00 2001 From: Fabian Egli Date: Sat, 19 Oct 2024 00:04:15 +0200 Subject: [PATCH 156/249] log changes made by the action in main.yml --- .github/workflows/main.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index bd10a45..140e0fd 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -21,6 +21,7 @@ jobs: git config --global user.name "ai.robots.txt" git config --global user.email "ai.robots.txt@users.noreply.github.com" php -f code/action.php + git --no-pager diff git add -A if [ -n "${{ inputs.message }}" ]; then git commit -m "${{ inputs.message }}" From b3068a8d90c6cb091b25d6125d758cd02b774bbb Mon Sep 17 00:00:00 2001 From: Fabian Egli Date: Sat, 19 Oct 2024 00:12:25 +0200 Subject: [PATCH 157/249] add some signposts --- .github/workflows/daily_update.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index e0ce102..6b6624a 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -15,7 +15,9 @@ jobs: pip install beautifulsoup4 requests git config --global user.name "dark-visitors" git config --global user.email "dark-visitors@users.noreply.github.com" + echo "Running update script ..." python code/dark_visitors.py + echo "... done." git --no-pager diff git add -A git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) From b584f613cd29e1fbb88d5e55e24dda85f506927d Mon Sep 17 00:00:00 2001 From: Fabian Egli Date: Sat, 19 Oct 2024 00:13:09 +0200 Subject: [PATCH 158/249] add some signposts to the log --- .github/workflows/main.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 140e0fd..3e3ddfc 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -20,7 +20,9 @@ jobs: - run: | git config --global user.name "ai.robots.txt" git config --global user.email "ai.robots.txt@users.noreply.github.com" + echo "Running update script ..." php -f code/action.php + echo "... done." git --no-pager diff git add -A if [ -n "${{ inputs.message }}" ]; then From 25adc6b8027e832119fd73fa679c89cd602d2e62 Mon Sep 17 00:00:00 2001 From: Fabian Egli Date: Sat, 19 Oct 2024 00:28:41 +0200 Subject: [PATCH 159/249] log git repository status --- .github/workflows/main.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 3e3ddfc..ea8edc5 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -20,6 +20,8 @@ jobs: - run: | git config --global user.name "ai.robots.txt" git config --global user.email "ai.robots.txt@users.noreply.github.com" + git log -1 + git status echo "Running update script ..." php -f code/action.php echo "... done." From faf81efb126cfb94ad59b1164b6176357bdb337c Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Sat, 19 Oct 2024 01:17:15 +0000 Subject: [PATCH 160/249] Daily update from Dark Visitors --- robots.json | 274 ++++++++++++++++++++++++++-------------------------- 1 file changed, 137 insertions(+), 137 deletions(-) diff --git a/robots.json b/robots.json index 03db17b..db308d7 100644 --- a/robots.json +++ b/robots.json @@ -14,72 +14,93 @@ "respect": "Yes" }, "Amazonbot": { - "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses.", - "frequency": "No information. provided.", - "function": "Service improvement and enabling answers for Alexa users.", "operator": "Amazon", - "respect": "Yes" + "respect": "Yes", + "function": "Service improvement and enabling answers for Alexa users.", + "frequency": "No information. provided.", + "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." + }, + "anthropic-ai": { + "operator": "[Anthropic](https://www.anthropic.com)", + "respect": "Unclear at this time.", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information. provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot": { - "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot", - "frequency": "Unclear at this time.", - "function": "AI Search Crawlers", "operator": "Unclear at this time.", - "respect": "Unclear at this time." + "respect": "Unclear at this time.", + "function": "AI Search Crawlers", + "frequency": "Unclear at this time.", + "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" }, "Applebot-Extended": { - "description": "Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools.", - "frequency": "Unclear at this time.", - "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", - "respect": "Yes" + "respect": "Yes", + "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", + "frequency": "Unclear at this time.", + "description": "Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools." }, "Bytespider": { - "description": "Downloads data to train LLMS, including ChatGPT competitors.", - "frequency": "Unclear at this time.", - "function": "LLM training.", "operator": "ByteDance", - "respect": "No" + "respect": "No", + "function": "LLM training.", + "frequency": "Unclear at this time.", + "description": "Downloads data to train LLMS, including ChatGPT competitors." }, "CCBot": { - "description": "Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers).", - "frequency": "Monthly at present.", - "function": "Provides open crawl dataset, used for many purposes, including Machine Learning/AI.", "operator": "[Common Crawl Foundation](https://commoncrawl.org)", - "respect": "[Yes](https://commoncrawl.org/ccbot)" + "respect": "[Yes](https://commoncrawl.org/ccbot)", + "function": "Provides open crawl dataset, used for many purposes, including Machine Learning/AI.", + "frequency": "Monthly at present.", + "description": "Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers)." }, "ChatGPT-User": { - "description": "Used by plugins in ChatGPT to answer queries based on user input.", - "frequency": "Only when prompted by a user.", - "function": "Takes action based on user prompts.", "operator": "[OpenAI](https://openai.com)", - "respect": "Yes" + "respect": "Yes", + "function": "Takes action based on user prompts.", + "frequency": "Only when prompted by a user.", + "description": "Used by plugins in ChatGPT to answer queries based on user input." }, "Claude-Web": { - "description": "Scrapes data to train LLMs and AI products offered by Anthropic.", - "frequency": "No information. provided.", - "function": "Scrapes data to train Anthropic's AI products.", "operator": "[Anthropic](https://www.anthropic.com)", - "respect": "Unclear at this time." + "respect": "Unclear at this time.", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information. provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "ClaudeBot": { - "description": "Scrapes data to train LLMs and AI products offered by Anthropic.", - "frequency": "No information. provided.", - "function": "Scrapes data to train Anthropic's AI products.", "operator": "[Anthropic](https://www.anthropic.com)", - "respect": "[Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)" + "respect": "[Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information. provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + }, + "cohere-ai": { + "operator": "[Cohere](https://cohere.com)", + "respect": "Unclear at this time.", + "function": "Retrieves data to provide responses to user-initiated prompts.", + "frequency": "Takes action based on user prompts.", + "description": "Retrieves data based on user prompts." }, "Diffbot": { - "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training.", - "frequency": "Unclear at this time.", - "function": "Aggregates structured web data for monitoring and AI model training.", "operator": "[Diffbot](https://www.diffbot.com/)", - "respect": "At the discretion of Diffbot users." + "respect": "At the discretion of Diffbot users.", + "function": "Aggregates structured web data for monitoring and AI model training.", + "frequency": "Unclear at this time.", + "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." }, "FacebookBot": { - "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically.", - "frequency": "Up to 1 page per second", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", "function": "Training language models", + "frequency": "Up to 1 page per second", + "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." + }, + "facebookexternalhit": { + "description": "Unclear at this time.", + "frequency": "Unclear at this time.", + "function": "No information.", "operator": "Meta/Facebook", "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)" }, @@ -90,19 +111,12 @@ "operator": "Unknown", "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)" }, - "GPTBot": { - "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies.", - "frequency": "No information.", - "function": "Scrapes data to train OpenAI's products.", - "operator": "[OpenAI](https://openai.com)", - "respect": "Yes" - }, "Google-Extended": { - "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search.", - "frequency": "No information.", - "function": "LLM training.", "operator": "Google", - "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "function": "LLM training.", + "frequency": "No information.", + "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." }, "GoogleOther": { "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"", @@ -125,6 +139,20 @@ "operator": "Google", "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" }, + "GPTBot": { + "operator": "[OpenAI](https://openai.com)", + "respect": "Yes", + "function": "Scrapes data to train OpenAI's products.", + "frequency": "No information.", + "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." + }, + "iaskspider/2.0": { + "description": "Used to provide answers to user queries.", + "frequency": "Unclear at this time.", + "function": "Crawls sites to provide answers to user queries.", + "operator": "iAsk", + "respect": "No" + }, "ICC-Crawler": { "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business.", "frequency": "No information.", @@ -132,13 +160,6 @@ "operator": "[NICT](https://nict.go.jp)", "respect": "Yes" }, - "ISSCyberRiskCrawler": { - "description": "Used to train machine learning based models to quantify cyber risk.", - "frequency": "No information.", - "function": "Scrapes data to train machine learning models.", - "operator": "[ISS-Corporate](https://iss-cyber.com)", - "respect": "No" - }, "ImagesiftBot": { "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images.", "frequency": "No information.", @@ -146,40 +167,68 @@ "operator": "[ImageSift](https://imagesift.com)", "respect": "[Yes](https://imagesift.com/about)" }, - "Kangaroo Bot": { - "description": "Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot", - "frequency": "Unclear at this time.", - "function": "AI Data Scrapers", - "operator": "Unclear at this time.", + "img2dataset": { + "description": "Downloads large sets of images into datasets for LLM training or other purposes.", + "frequency": "At the discretion of img2dataset users.", + "function": "Scrapes images for use in LLMs.", + "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", "respect": "Unclear at this time." }, + "ISSCyberRiskCrawler": { + "description": "Used to train machine learning based models to quantify cyber risk.", + "frequency": "No information.", + "function": "Scrapes data to train machine learning models.", + "operator": "[ISS-Corporate](https://iss-cyber.com)", + "respect": "No" + }, + "Kangaroo Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Data Scrapers", + "frequency": "Unclear at this time.", + "description": "Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot" + }, "Meta-ExternalAgent": { - "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"", - "frequency": "No information.", - "function": "Used to train models and improve products.", "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", - "respect": "Yes." + "respect": "Yes.", + "function": "Used to train models and improve products.", + "frequency": "No information.", + "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" }, "Meta-ExternalFetcher": { - "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher", - "frequency": "Unclear at this time.", - "function": "AI Assistants", "operator": "Unclear at this time.", - "respect": "Unclear at this time." + "respect": "Unclear at this time.", + "function": "AI Assistants", + "frequency": "Unclear at this time.", + "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" }, "OAI-SearchBot": { - "description": "Crawls sites to surface as results in SearchGPT.", - "frequency": "No information.", - "function": "Search result generation.", "operator": "[OpenAI](https://openai.com)", - "respect": "[Yes](https://platform.openai.com/docs/bots)" + "respect": "[Yes](https://platform.openai.com/docs/bots)", + "function": "Search result generation.", + "frequency": "No information.", + "description": "Crawls sites to surface as results in SearchGPT." + }, + "omgili": { + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", + "function": "Data is sold.", + "frequency": "No information.", + "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." + }, + "omgilibot": { + "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io.", + "frequency": "No information.", + "function": "Data is sold.", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" }, "PerplexityBot": { - "description": "Operated by Perplexity to obtain results in response to user queries.", - "frequency": "Takes action based on user prompts.", - "function": "Used to answer queries at the request of users.", "operator": "[Perplexity](https://www.perplexity.ai/)", - "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)" + "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", + "function": "Used to answer queries at the request of users.", + "frequency": "Takes action based on user prompts.", + "description": "Operated by Perplexity to obtain results in response to user queries." }, "PetalBot": { "description": "Operated by Huawei to provide search and AI assistant services.", @@ -203,11 +252,11 @@ "respect": "Unclear at this time." }, "Timpibot": { - "description": "Makes data available for training AI models.", - "frequency": "No information.", - "function": "Scrapes data for use in training LLMs.", "operator": "[Timpi](https://timpi.io)", - "respect": "Unclear at this time." + "respect": "Unclear at this time.", + "function": "Scrapes data for use in training LLMs.", + "frequency": "No information.", + "description": "Makes data available for training AI models." }, "VelenPublicWebCrawler": { "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"", @@ -217,66 +266,17 @@ "respect": "[Yes](https://velen.io)" }, "Webzio-Extended": { - "description": "Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended", - "frequency": "Unclear at this time.", - "function": "AI Data Scrapers", "operator": "Unclear at this time.", - "respect": "Unclear at this time." + "respect": "Unclear at this time.", + "function": "AI Data Scrapers", + "frequency": "Unclear at this time.", + "description": "Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended" }, "YouBot": { - "description": "Retrieves data used for You.com web search engine and LLMs.", - "frequency": "No information.", - "function": "Scrapes data for search engine and LLMs.", "operator": "[You](https://about.you.com/youchat/)", - "respect": "[Yes](https://about.you.com/youbot/)" - }, - "anthropic-ai": { - "description": "Scrapes data to train LLMs and AI products offered by Anthropic.", - "frequency": "No information. provided.", - "function": "Scrapes data to train Anthropic's AI products.", - "operator": "[Anthropic](https://www.anthropic.com)", - "respect": "Unclear at this time." - }, - "cohere-ai": { - "description": "Retrieves data based on user prompts.", - "frequency": "Takes action based on user prompts.", - "function": "Retrieves data to provide responses to user-initiated prompts.", - "operator": "[Cohere](https://cohere.com)", - "respect": "Unclear at this time." - }, - "facebookexternalhit": { - "description": "Unclear at this time.", - "frequency": "Unclear at this time.", - "function": "No information.", - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)" - }, - "iaskspider/2.0": { - "description": "Used to provide answers to user queries.", - "frequency": "Unclear at this time.", - "function": "Crawls sites to provide answers to user queries.", - "operator": "iAsk", - "respect": "No" - }, - "img2dataset": { - "description": "Downloads large sets of images into datasets for LLM training or other purposes.", - "frequency": "At the discretion of img2dataset users.", - "function": "Scrapes images for use in LLMs.", - "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", - "respect": "Unclear at this time." - }, - "omgili": { - "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training.", + "respect": "[Yes](https://about.you.com/youbot/)", + "function": "Scrapes data for search engine and LLMs.", "frequency": "No information.", - "function": "Data is sold.", - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)" - }, - "omgilibot": { - "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io.", - "frequency": "No information.", - "function": "Data is sold.", - "operator": "[Webz.io](https://webz.io/)", - "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" + "description": "Retrieves data used for You.com web search engine and LLMs." } } \ No newline at end of file From bdf30be7dcce79152af6b95d4520c23600a4ca13 Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sat, 19 Oct 2024 04:33:46 +0100 Subject: [PATCH 161/249] Dump out file contents in PHP script --- code/action.php | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/code/action.php b/code/action.php index 52ebbe6..f6a1d3d 100644 --- a/code/action.php +++ b/code/action.php @@ -12,6 +12,8 @@ It generates: */ $robots = json_decode(file_get_contents('robots.json'), 1); +var_dump($robots); + $robots_txt = null; $robots_table = '| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description |'."\n"; @@ -24,5 +26,8 @@ foreach($robots as $robot => $details) { $robots_txt .= 'Disallow: /'; +var_dump($robots_txt); +var_dump($robots_table); + file_put_contents('robots.txt', $robots_txt); file_put_contents('table-of-bot-metrics.md', $robots_table); From a80bd18fb8f27cf234fa2f21e79a6fa99f7878dd Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 19 Oct 2024 03:34:29 +0000 Subject: [PATCH 162/249] Dump out file contents in PHP script --- robots.txt | 18 +++++++++--------- table-of-bot-metrics.md | 18 +++++++++--------- 2 files changed, 18 insertions(+), 18 deletions(-) diff --git a/robots.txt b/robots.txt index 739e44f..13681f3 100644 --- a/robots.txt +++ b/robots.txt @@ -1,6 +1,7 @@ User-agent: AI2Bot User-agent: Ai2Bot-Dolma User-agent: Amazonbot +User-agent: anthropic-ai User-agent: Applebot User-agent: Applebot-Extended User-agent: Bytespider @@ -8,21 +9,27 @@ User-agent: CCBot User-agent: ChatGPT-User User-agent: Claude-Web User-agent: ClaudeBot +User-agent: cohere-ai User-agent: Diffbot User-agent: FacebookBot +User-agent: facebookexternalhit User-agent: FriendlyCrawler -User-agent: GPTBot User-agent: Google-Extended User-agent: GoogleOther User-agent: GoogleOther-Image User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: iaskspider/2.0 User-agent: ICC-Crawler -User-agent: ISSCyberRiskCrawler User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: ISSCyberRiskCrawler User-agent: Kangaroo Bot User-agent: Meta-ExternalAgent User-agent: Meta-ExternalFetcher User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot User-agent: PerplexityBot User-agent: PetalBot User-agent: Scrapy @@ -31,11 +38,4 @@ User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: Webzio-Extended User-agent: YouBot -User-agent: anthropic-ai -User-agent: cohere-ai -User-agent: facebookexternalhit -User-agent: iaskspider/2.0 -User-agent: img2dataset -User-agent: omgili -User-agent: omgilibot Disallow: / \ No newline at end of file diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index cf14641..111ccbb 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -3,6 +3,7 @@ | AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | | Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | | Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | | Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | | Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | @@ -10,21 +11,27 @@ | ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | | Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | | Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | | GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | | ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | | Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | | Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | | Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | | OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | @@ -33,10 +40,3 @@ | VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | | Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | | YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | From 38a388097cd18f620da38391398608edd1d4786b Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sat, 19 Oct 2024 04:42:27 +0100 Subject: [PATCH 163/249] Fix typo and trigger rerun of main job --- code/dark_visitors.py | 2 +- robots.json | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 7d29c65..5de65fe 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -34,7 +34,7 @@ for section in soup.find_all("div", {"class": "agent-links-section"}): default_values = { "Unclear at this time.", - "No information. provided.", + "No information provided.", "No information.", "No explicit frequency provided." } diff --git a/robots.json b/robots.json index db308d7..c50d63c 100644 --- a/robots.json +++ b/robots.json @@ -1,14 +1,14 @@ { "AI2Bot": { "description": "Explores 'certain domains' to find web content.", - "frequency": "No information. provided.", + "frequency": "No information provided.", "function": "Content is used to train open language models.", "operator": "[Ai2](https://allenai.org/crawler)", "respect": "Yes" }, "Ai2Bot-Dolma": { "description": "Explores 'certain domains' to find web content.", - "frequency": "No information. provided.", + "frequency": "No information provided.", "function": "Content is used to train open language models.", "operator": "[Ai2](https://allenai.org/crawler)", "respect": "Yes" @@ -17,14 +17,14 @@ "operator": "Amazon", "respect": "Yes", "function": "Service improvement and enabling answers for Alexa users.", - "frequency": "No information. provided.", + "frequency": "No information provided.", "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." }, "anthropic-ai": { "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", + "frequency": "No information provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "Applebot": { @@ -66,14 +66,14 @@ "operator": "[Anthropic](https://www.anthropic.com)", "respect": "Unclear at this time.", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", + "frequency": "No information provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "ClaudeBot": { "operator": "[Anthropic](https://www.anthropic.com)", "respect": "[Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)", "function": "Scrapes data to train Anthropic's AI products.", - "frequency": "No information. provided.", + "frequency": "No information provided.", "description": "Scrapes data to train LLMs and AI products offered by Anthropic." }, "cohere-ai": { From 6a359e7fd719285b75c4bf6aa8d95403d7573c4e Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 19 Oct 2024 03:43:00 +0000 Subject: [PATCH 164/249] Fix typo and trigger rerun of main job --- table-of-bot-metrics.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 111ccbb..0e6884c 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -1,16 +1,16 @@ | Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | |-----|----------|-----------------------|----------|------------------|-------------| -| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information. provided. | Explores 'certain domains' to find web content. | -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information. provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | | Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | | Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | | CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | | ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information. provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | From 6bb598820ec670db0c333b4950362a8844c3c0ab Mon Sep 17 00:00:00 2001 From: fabianegli Date: Fri, 18 Oct 2024 23:24:13 +0200 Subject: [PATCH 165/249] ignore venv --- .gitignore | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 496ee2c..edef0f5 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,3 @@ -.DS_Store \ No newline at end of file +.DS_Store +.venv +venv From 0c05461f84ca0d8ad9fa525821c10dbd0937db92 Mon Sep 17 00:00:00 2001 From: fabianegli Date: Sat, 19 Oct 2024 13:06:34 +0200 Subject: [PATCH 166/249] simplify repo and added some tests --- .github/workflows/daily_update.yml | 9 +- .github/workflows/main.yml | 36 --- .gitignore | 1 + code/action.php | 33 --- code/dark_visitors.py | 161 ++++++++------ code/test_files/robots.json | 282 ++++++++++++++++++++++++ code/test_files/robots.txt | 41 ++++ code/test_files/table-of-bot-metrics.md | 42 ++++ code/tests.py | 21 ++ robots.txt | 2 +- table-of-bot-metrics.md | 80 +++---- 11 files changed, 527 insertions(+), 181 deletions(-) delete mode 100644 .github/workflows/main.yml delete mode 100644 code/action.php create mode 100644 code/test_files/robots.json create mode 100644 code/test_files/robots.txt create mode 100644 code/test_files/table-of-bot-metrics.md create mode 100644 code/tests.py diff --git a/.github/workflows/daily_update.yml b/.github/workflows/daily_update.yml index 6b6624a..11eeab3 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/daily_update.yml @@ -1,5 +1,8 @@ name: Daily Update from Dark Visitors on: + push: + branches: + - "main" schedule: - cron: "0 0 * * *" @@ -22,9 +25,3 @@ jobs: git add -A git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) shell: bash - call-main: - needs: dark-visitors - uses: ./.github/workflows/main.yml - secrets: inherit - with: - message: "Daily update from Dark Visitors" diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml deleted file mode 100644 index ea8edc5..0000000 --- a/.github/workflows/main.yml +++ /dev/null @@ -1,36 +0,0 @@ -on: - workflow_call: - inputs: - message: - type: string - required: true - description: The message to commit - push: - paths: - - 'robots.json' - -jobs: - ai-robots-txt: - runs-on: ubuntu-latest - name: ai-robots-txt - steps: - - uses: actions/checkout@v4 - with: - fetch-depth: 2 - - run: | - git config --global user.name "ai.robots.txt" - git config --global user.email "ai.robots.txt@users.noreply.github.com" - git log -1 - git status - echo "Running update script ..." - php -f code/action.php - echo "... done." - git --no-pager diff - git add -A - if [ -n "${{ inputs.message }}" ]; then - git commit -m "${{ inputs.message }}" - else - git commit -m "${{ github.event.head_commit.message }}" - fi - git push - shell: bash diff --git a/.gitignore b/.gitignore index edef0f5..cbe1c29 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,4 @@ .DS_Store .venv venv +__pycache__ diff --git a/code/action.php b/code/action.php deleted file mode 100644 index f6a1d3d..0000000 --- a/code/action.php +++ /dev/null @@ -1,33 +0,0 @@ - $details) { - $robots_txt .= 'User-agent: '.$robot."\n"; - $robots_table .= '| '.$robot.' | '.$details['operator'].' | '.$details['respect'].' | '.$details['function'].' | '.$details['frequency'].' | '.$details['description'].' | '."\n"; -} - -$robots_txt .= 'Disallow: /'; - -var_dump($robots_txt); -var_dump($robots_table); - -file_put_contents('robots.txt', $robots_txt); -file_put_contents('table-of-bot-metrics.md', $robots_table); diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 5de65fe..838ce67 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -4,72 +4,103 @@ from pathlib import Path import requests from bs4 import BeautifulSoup -session = requests.Session() -response = session.get("https://darkvisitors.com/agents") -soup = BeautifulSoup(response.text, "html.parser") -existing_content = json.loads(Path("./robots.json").read_text()) -to_include = [ - "AI Assistants", - "AI Data Scrapers", - "AI Search Crawlers", - # "Archivers", - # "Developer Helpers", - # "Fetchers", - # "Intelligence Gatherers", - # "Scrapers", - # "Search Engine Crawlers", - # "SEO Crawlers", - # "Uncategorized", - "Undocumented AI Agents" -] +def get_updated_robots_json(): + session = requests.Session() + response = session.get("https://darkvisitors.com/agents") + soup = BeautifulSoup(response.text, "html.parser") -for section in soup.find_all("div", {"class": "agent-links-section"}): - category = section.find("h2").get_text() - if category not in to_include: - continue - for agent in section.find_all("a", href=True): - name = agent.find("div", {"class": "agent-name"}).get_text().strip() - desc = agent.find("p").get_text().strip() - - default_values = { - "Unclear at this time.", - "No information provided.", - "No information.", - "No explicit frequency provided." - } - default_value = "Unclear at this time." - - # Parse the operator information from the description if possible - operator = default_value - if "operated by " in desc: - try: - operator = desc.split("operated by ", 1)[1].split(".", 1)[0].strip() - except Exception as e: - print(f"Error: {e}") - - def consolidate(field: str, value: str) -> str: - # New entry - if name not in existing_content: - return value - # New field - if field not in existing_content[name]: - return value - # Unclear value - if existing_content[name][field] in default_values and value not in default_values: - return value - # Existing value - return existing_content[name][field] + existing_content = json.loads(Path("./robots.json").read_text()) + to_include = [ + "AI Assistants", + "AI Data Scrapers", + "AI Search Crawlers", + # "Archivers", + # "Developer Helpers", + # "Fetchers", + # "Intelligence Gatherers", + # "Scrapers", + # "Search Engine Crawlers", + # "SEO Crawlers", + # "Uncategorized", + "Undocumented AI Agents", + ] - existing_content[name] = { - "operator": consolidate("operator", operator), - "respect": consolidate("respect", default_value), - "function": consolidate("function", f"{category}"), - "frequency": consolidate("frequency", default_value), - "description": consolidate("description", f"{desc} More info can be found at https://darkvisitors.com/agents{agent['href']}") - } + for section in soup.find_all("div", {"class": "agent-links-section"}): + category = section.find("h2").get_text() + if category not in to_include: + continue + for agent in section.find_all("a", href=True): + name = agent.find("div", {"class": "agent-name"}).get_text().strip() + desc = agent.find("p").get_text().strip() -print(f"Total: {len(existing_content)}") -sorted_keys = sorted(existing_content, key=lambda k: k.lower()) -existing_content = {k: existing_content[k] for k in sorted_keys} -Path("./robots.json").write_text(json.dumps(existing_content, indent=4)) \ No newline at end of file + default_values = { + "Unclear at this time.", + "No information provided.", + "No information.", + "No explicit frequency provided.", + } + default_value = "Unclear at this time." + + # Parse the operator information from the description if possible + operator = default_value + if "operated by " in desc: + try: + operator = desc.split("operated by ", 1)[1].split(".", 1)[0].strip() + except Exception as e: + print(f"Error: {e}") + + def consolidate(field: str, value: str) -> str: + # New entry + if name not in existing_content: + return value + # New field + if field not in existing_content[name]: + return value + # Unclear value + if ( + existing_content[name][field] in default_values + and value not in default_values + ): + return value + # Existing value + return existing_content[name][field] + + existing_content[name] = { + "operator": consolidate("operator", operator), + "respect": consolidate("respect", default_value), + "function": consolidate("function", f"{category}"), + "frequency": consolidate("frequency", default_value), + "description": consolidate( + "description", + f"{desc} More info can be found at https://darkvisitors.com/agents{agent['href']}", + ), + } + + print(f"Total: {len(existing_content)}") + sorted_keys = sorted(existing_content, key=lambda k: k.lower()) + sorted_robots = {k: existing_content[k] for k in sorted_keys} + return sorted_robots + + +def json_to_txt(robots_json): + robots_txt = "\n".join(f"User-agent: {k}" for k in robots_json.keys()) + robots_txt += "\nDisallow: /\n" + return robots_txt + + +def json_to_table(robots_json): + table = "| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description |\n" + table += "|-----|----------|-----------------------|----------|------------------|-------------|\n" + + for name, robot in robots_json.items(): + table += f'| {name} | {robot["operator"]} | {robot["respect"]} | {robot["function"]} | {robot["frequency"]} | {robot["description"]} |\n' + + return table + + +if __name__ == "__main__": + robots_json = get_updated_robots_json() + Path("./robots.json").write_text(json.dumps(robots_json, indent=4)) + Path("./robots.txt").write_text(json_to_txt(robots_json)) + Path("./table-of-bot-metrics.md").write_text(json_to_table(robots_json)) diff --git a/code/test_files/robots.json b/code/test_files/robots.json new file mode 100644 index 0000000..c50d63c --- /dev/null +++ b/code/test_files/robots.json @@ -0,0 +1,282 @@ +{ + "AI2Bot": { + "description": "Explores 'certain domains' to find web content.", + "frequency": "No information provided.", + "function": "Content is used to train open language models.", + "operator": "[Ai2](https://allenai.org/crawler)", + "respect": "Yes" + }, + "Ai2Bot-Dolma": { + "description": "Explores 'certain domains' to find web content.", + "frequency": "No information provided.", + "function": "Content is used to train open language models.", + "operator": "[Ai2](https://allenai.org/crawler)", + "respect": "Yes" + }, + "Amazonbot": { + "operator": "Amazon", + "respect": "Yes", + "function": "Service improvement and enabling answers for Alexa users.", + "frequency": "No information provided.", + "description": "Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses." + }, + "anthropic-ai": { + "operator": "[Anthropic](https://www.anthropic.com)", + "respect": "Unclear at this time.", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + }, + "Applebot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Search Crawlers", + "frequency": "Unclear at this time.", + "description": "Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot" + }, + "Applebot-Extended": { + "operator": "[Apple](https://support.apple.com/en-us/119829#datausage)", + "respect": "Yes", + "function": "Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others.", + "frequency": "Unclear at this time.", + "description": "Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools." + }, + "Bytespider": { + "operator": "ByteDance", + "respect": "No", + "function": "LLM training.", + "frequency": "Unclear at this time.", + "description": "Downloads data to train LLMS, including ChatGPT competitors." + }, + "CCBot": { + "operator": "[Common Crawl Foundation](https://commoncrawl.org)", + "respect": "[Yes](https://commoncrawl.org/ccbot)", + "function": "Provides open crawl dataset, used for many purposes, including Machine Learning/AI.", + "frequency": "Monthly at present.", + "description": "Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers)." + }, + "ChatGPT-User": { + "operator": "[OpenAI](https://openai.com)", + "respect": "Yes", + "function": "Takes action based on user prompts.", + "frequency": "Only when prompted by a user.", + "description": "Used by plugins in ChatGPT to answer queries based on user input." + }, + "Claude-Web": { + "operator": "[Anthropic](https://www.anthropic.com)", + "respect": "Unclear at this time.", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + }, + "ClaudeBot": { + "operator": "[Anthropic](https://www.anthropic.com)", + "respect": "[Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler)", + "function": "Scrapes data to train Anthropic's AI products.", + "frequency": "No information provided.", + "description": "Scrapes data to train LLMs and AI products offered by Anthropic." + }, + "cohere-ai": { + "operator": "[Cohere](https://cohere.com)", + "respect": "Unclear at this time.", + "function": "Retrieves data to provide responses to user-initiated prompts.", + "frequency": "Takes action based on user prompts.", + "description": "Retrieves data based on user prompts." + }, + "Diffbot": { + "operator": "[Diffbot](https://www.diffbot.com/)", + "respect": "At the discretion of Diffbot users.", + "function": "Aggregates structured web data for monitoring and AI model training.", + "frequency": "Unclear at this time.", + "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." + }, + "FacebookBot": { + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", + "function": "Training language models", + "frequency": "Up to 1 page per second", + "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." + }, + "facebookexternalhit": { + "description": "Unclear at this time.", + "frequency": "Unclear at this time.", + "function": "No information.", + "operator": "Meta/Facebook", + "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)" + }, + "FriendlyCrawler": { + "description": "Unclear who the operator is; but data is used for training/machine learning.", + "frequency": "Unclear at this time.", + "function": "We are using the data from the crawler to build datasets for machine learning experiments.", + "operator": "Unknown", + "respect": "[Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler)" + }, + "Google-Extended": { + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)", + "function": "LLM training.", + "frequency": "No information.", + "description": "Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search." + }, + "GoogleOther": { + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"", + "frequency": "No information.", + "function": "Scrapes data.", + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + }, + "GoogleOther-Image": { + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"", + "frequency": "No information.", + "function": "Scrapes data.", + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + }, + "GoogleOther-Video": { + "description": "\"Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.\"", + "frequency": "No information.", + "function": "Scrapes data.", + "operator": "Google", + "respect": "[Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)" + }, + "GPTBot": { + "operator": "[OpenAI](https://openai.com)", + "respect": "Yes", + "function": "Scrapes data to train OpenAI's products.", + "frequency": "No information.", + "description": "Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies." + }, + "iaskspider/2.0": { + "description": "Used to provide answers to user queries.", + "frequency": "Unclear at this time.", + "function": "Crawls sites to provide answers to user queries.", + "operator": "iAsk", + "respect": "No" + }, + "ICC-Crawler": { + "description": "Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business.", + "frequency": "No information.", + "function": "Scrapes data to train and support AI technologies.", + "operator": "[NICT](https://nict.go.jp)", + "respect": "Yes" + }, + "ImagesiftBot": { + "description": "Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images.", + "frequency": "No information.", + "function": "ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products", + "operator": "[ImageSift](https://imagesift.com)", + "respect": "[Yes](https://imagesift.com/about)" + }, + "img2dataset": { + "description": "Downloads large sets of images into datasets for LLM training or other purposes.", + "frequency": "At the discretion of img2dataset users.", + "function": "Scrapes images for use in LLMs.", + "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", + "respect": "Unclear at this time." + }, + "ISSCyberRiskCrawler": { + "description": "Used to train machine learning based models to quantify cyber risk.", + "frequency": "No information.", + "function": "Scrapes data to train machine learning models.", + "operator": "[ISS-Corporate](https://iss-cyber.com)", + "respect": "No" + }, + "Kangaroo Bot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Data Scrapers", + "frequency": "Unclear at this time.", + "description": "Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot" + }, + "Meta-ExternalAgent": { + "operator": "[Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers)", + "respect": "Yes.", + "function": "Used to train models and improve products.", + "frequency": "No information.", + "description": "\"The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly.\"" + }, + "Meta-ExternalFetcher": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Assistants", + "frequency": "Unclear at this time.", + "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" + }, + "OAI-SearchBot": { + "operator": "[OpenAI](https://openai.com)", + "respect": "[Yes](https://platform.openai.com/docs/bots)", + "function": "Search result generation.", + "frequency": "No information.", + "description": "Crawls sites to surface as results in SearchGPT." + }, + "omgili": { + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/)", + "function": "Data is sold.", + "frequency": "No information.", + "description": "Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training." + }, + "omgilibot": { + "description": "Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io.", + "frequency": "No information.", + "function": "Data is sold.", + "operator": "[Webz.io](https://webz.io/)", + "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" + }, + "PerplexityBot": { + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", + "function": "Used to answer queries at the request of users.", + "frequency": "Takes action based on user prompts.", + "description": "Operated by Perplexity to obtain results in response to user queries." + }, + "PetalBot": { + "description": "Operated by Huawei to provide search and AI assistant services.", + "frequency": "No explicit frequency provided.", + "function": "Used to provide recommendations in Hauwei assistant and AI search services.", + "operator": "[Huawei](https://huawei.com/)", + "respect": "Yes" + }, + "Scrapy": { + "description": "\"AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets.\"", + "frequency": "No information.", + "function": "Scrapes data for a variety of uses including training AI.", + "operator": "[Zyte](https://www.zyte.com)", + "respect": "Unclear at this time." + }, + "Sidetrade indexer bot": { + "description": "AI product training.", + "frequency": "No information.", + "function": "Extracts data for a variety of uses including training AI.", + "operator": "[Sidetrade](https://www.sidetrade.com)", + "respect": "Unclear at this time." + }, + "Timpibot": { + "operator": "[Timpi](https://timpi.io)", + "respect": "Unclear at this time.", + "function": "Scrapes data for use in training LLMs.", + "frequency": "No information.", + "description": "Makes data available for training AI models." + }, + "VelenPublicWebCrawler": { + "description": "\"Our goal with this crawler is to build business datasets and machine learning models to better understand the web.\"", + "frequency": "No information.", + "function": "Scrapes data for business data sets and machine learning models.", + "operator": "[Velen Crawler](https://velen.io)", + "respect": "[Yes](https://velen.io)" + }, + "Webzio-Extended": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Data Scrapers", + "frequency": "Unclear at this time.", + "description": "Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended" + }, + "YouBot": { + "operator": "[You](https://about.you.com/youchat/)", + "respect": "[Yes](https://about.you.com/youbot/)", + "function": "Scrapes data for search engine and LLMs.", + "frequency": "No information.", + "description": "Retrieves data used for You.com web search engine and LLMs." + } +} \ No newline at end of file diff --git a/code/test_files/robots.txt b/code/test_files/robots.txt new file mode 100644 index 0000000..927f6f4 --- /dev/null +++ b/code/test_files/robots.txt @@ -0,0 +1,41 @@ +User-agent: AI2Bot +User-agent: Ai2Bot-Dolma +User-agent: Amazonbot +User-agent: anthropic-ai +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Bytespider +User-agent: CCBot +User-agent: ChatGPT-User +User-agent: Claude-Web +User-agent: ClaudeBot +User-agent: cohere-ai +User-agent: Diffbot +User-agent: FacebookBot +User-agent: facebookexternalhit +User-agent: FriendlyCrawler +User-agent: Google-Extended +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GPTBot +User-agent: iaskspider/2.0 +User-agent: ICC-Crawler +User-agent: ImagesiftBot +User-agent: img2dataset +User-agent: ISSCyberRiskCrawler +User-agent: Kangaroo Bot +User-agent: Meta-ExternalAgent +User-agent: Meta-ExternalFetcher +User-agent: OAI-SearchBot +User-agent: omgili +User-agent: omgilibot +User-agent: PerplexityBot +User-agent: PetalBot +User-agent: Scrapy +User-agent: Sidetrade indexer bot +User-agent: Timpibot +User-agent: VelenPublicWebCrawler +User-agent: Webzio-Extended +User-agent: YouBot +Disallow: / diff --git a/code/test_files/table-of-bot-metrics.md b/code/test_files/table-of-bot-metrics.md new file mode 100644 index 0000000..257ba99 --- /dev/null +++ b/code/test_files/table-of-bot-metrics.md @@ -0,0 +1,42 @@ +| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | +|-----|----------|-----------------------|----------|------------------|-------------| +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | +| Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | diff --git a/code/tests.py b/code/tests.py new file mode 100644 index 0000000..ffa7574 --- /dev/null +++ b/code/tests.py @@ -0,0 +1,21 @@ +"""These tests can be run with pytest. +This requires pytest: pip install pytest +cd to the `code` directory and run `pytest` +""" + +import json +from pathlib import Path + +from dark_visitors import json_to_txt, json_to_table + + +def test_robots_txt_creation(): + robots_json = json.loads(Path("test_files/robots.json").read_text()) + robots_txt = json_to_txt(robots_json) + assert Path("test_files/robots.txt").read_text() == robots_txt + + +def test_table_of_bot_metrices_md(): + robots_json = json.loads(Path("test_files/robots.json").read_text()) + robots_table = json_to_table(robots_json) + assert Path("test_files/table-of-bot-metrics.md").read_text() == robots_table diff --git a/robots.txt b/robots.txt index 13681f3..927f6f4 100644 --- a/robots.txt +++ b/robots.txt @@ -38,4 +38,4 @@ User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: Webzio-Extended User-agent: YouBot -Disallow: / \ No newline at end of file +Disallow: / diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 0e6884c..257ba99 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -1,42 +1,42 @@ | Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | |-----|----------|-----------------------|----------|------------------|-------------| -| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | -| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | -| CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | -| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | -| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | -| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | -| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | -| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | -| ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | -| Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | -| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | -| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | -| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | -| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | -| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | -| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | -| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | +| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | +| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | +| CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | +| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | +| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | +| FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | +| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | +| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | +| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | +| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | +| Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | +| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | +| omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | +| Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | +| Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | +| VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | +| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 7e2b3ab0372080ba885a6c1969d8101135f6bae8 Mon Sep 17 00:00:00 2001 From: fabianegli Date: Sat, 19 Oct 2024 19:09:34 +0200 Subject: [PATCH 167/249] rename action --- .github/workflows/{daily_update.yml => ai_robots_update.yml} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename .github/workflows/{daily_update.yml => ai_robots_update.yml} (95%) diff --git a/.github/workflows/daily_update.yml b/.github/workflows/ai_robots_update.yml similarity index 95% rename from .github/workflows/daily_update.yml rename to .github/workflows/ai_robots_update.yml index 11eeab3..ea5c760 100644 --- a/.github/workflows/daily_update.yml +++ b/.github/workflows/ai_robots_update.yml @@ -1,4 +1,4 @@ -name: Daily Update from Dark Visitors +name: Updates for AI robots files on: push: branches: From 6ab8fb2d37082f524ca7a5d724669e8175e9f94f Mon Sep 17 00:00:00 2001 From: fabianegli Date: Sat, 19 Oct 2024 19:11:01 +0200 Subject: [PATCH 168/249] no more failure when run without network --- code/dark_visitors.py | 69 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 60 insertions(+), 9 deletions(-) diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 838ce67..820c9c1 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -5,12 +5,27 @@ import requests from bs4 import BeautifulSoup -def get_updated_robots_json(): - session = requests.Session() - response = session.get("https://darkvisitors.com/agents") - soup = BeautifulSoup(response.text, "html.parser") +def load_robots_json(): + """Load the robots.json contents into a dictionary.""" + return json.loads(Path("./robots.json").read_text(encoding="utf-8")) - existing_content = json.loads(Path("./robots.json").read_text()) + +def get_agent_soup(): + """Retrieve current known agents from darkvisitors.com""" + session = requests.Session() + try: + response = session.get("https://darkvisitors.com/agents") + except requests.exceptions.ConnectionError: + print( + "ERROR: Could not gather the current agents from https://darkvisitors.com/agents" + ) + return + return BeautifulSoup(response.text, "html.parser") + + +def updated_robots_json(soup): + """Update AI scraper information with data from darkvisitors.""" + existing_content = load_robots_json() to_include = [ "AI Assistants", "AI Data Scrapers", @@ -83,13 +98,31 @@ def get_updated_robots_json(): return sorted_robots +def ingest_darkvisitors(): + + old_robots_json = load_robots_json() + soup = get_agent_soup() + if soup: + robots_json = updated_robots_json(soup) + print( + "robots.json is unchanged." + if robots_json == old_robots_json + else "robots.json got updates." + ) + Path("./robots.json").write_text( + json.dumps(robots_json, indent=4), encoding="utf-8" + ) + + def json_to_txt(robots_json): + """Compose the robots.txt from the robots.json file.""" robots_txt = "\n".join(f"User-agent: {k}" for k in robots_json.keys()) robots_txt += "\nDisallow: /\n" return robots_txt def json_to_table(robots_json): + """Compose a markdown table with the information in robots.json""" table = "| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description |\n" table += "|-----|----------|-----------------------|----------|------------------|-------------|\n" @@ -99,8 +132,26 @@ def json_to_table(robots_json): return table +def update_file_if_changed(file_name, converter): + """Update files if newer content is available and log the (in)actions.""" + new_content = converter(load_robots_json()) + old_content = Path(file_name).read_text(encoding="utf-8") + if old_content == new_content: + print(f"{file_name} is already up to date.") + else: + Path(file_name).write_text(new_content, encoding="utf-8") + print(f"{file_name} has been updated.") + + +def conversions(): + """Triggers the conversions from the json file.""" + update_file_if_changed(file_name="./robots.txt", converter=json_to_txt) + update_file_if_changed( + file_name="./table-of-bot-metrics.md", + converter=json_to_table, + ) + + if __name__ == "__main__": - robots_json = get_updated_robots_json() - Path("./robots.json").write_text(json.dumps(robots_json, indent=4)) - Path("./robots.txt").write_text(json_to_txt(robots_json)) - Path("./table-of-bot-metrics.md").write_text(json_to_table(robots_json)) + ingest_darkvisitors() + conversions() From 3ab22bc49887325dde1ce74d0b5952fcef87e2ea Mon Sep 17 00:00:00 2001 From: fabianegli Date: Sat, 19 Oct 2024 19:56:41 +0200 Subject: [PATCH 169/249] make conversions and updates separately triggerable --- .github/workflows/ai_robots_update.yml | 13 ++++++++--- code/dark_visitors.py | 30 ++++++++++++++++++++++++-- 2 files changed, 38 insertions(+), 5 deletions(-) diff --git a/.github/workflows/ai_robots_update.yml b/.github/workflows/ai_robots_update.yml index ea5c760..b346e10 100644 --- a/.github/workflows/ai_robots_update.yml +++ b/.github/workflows/ai_robots_update.yml @@ -18,10 +18,17 @@ jobs: pip install beautifulsoup4 requests git config --global user.name "dark-visitors" git config --global user.email "dark-visitors@users.noreply.github.com" - echo "Running update script ..." - python code/dark_visitors.py + echo "Updating robots.json with data from darkvisitor.com ..." + python code/dark_visitors.py --update echo "... done." git --no-pager diff git add -A - git diff --quiet && git diff --staged --quiet || (git commit -m "Daily update from Dark Visitors" && git push) + git diff --quiet && git diff --staged --quiet || (git commit -m "Update from Dark Visitors" && git push) + + echo "Updating robots.txt and table-of-bot-metrics.md if necessary ..." + python code/dark_visitors.py --convert + echo "... done." + git --no-pager diff + git add -A + git diff --quiet && git diff --staged --quiet || (git commit -m "Updated from new robots.json" && git push) shell: bash diff --git a/code/dark_visitors.py b/code/dark_visitors.py index 820c9c1..cf44e8e 100644 --- a/code/dark_visitors.py +++ b/code/dark_visitors.py @@ -153,5 +153,31 @@ def conversions(): if __name__ == "__main__": - ingest_darkvisitors() - conversions() + import argparse + + parser = argparse.ArgumentParser() + parser = argparse.ArgumentParser( + prog="ai-robots", + description="Collects and updates information about web scrapers of AI companies.", + epilog="One of the flags must be set.\n", + ) + parser.add_argument( + "--update", + action="store_true", + help="Update the robots.json file with data from darkvisitors.com/agents", + ) + parser.add_argument( + "--convert", + action="store_true", + help="Create the robots.txt and markdown table from robots.json", + ) + args = parser.parse_args() + + if not (args.update or args.convert): + print("ERROR: please provide one of the possible flags.") + parser.print_help() + + if args.update: + ingest_darkvisitors() + if args.convert: + conversions() From fe5f4076738888d51a7f8719f503294996050d6f Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Sun, 27 Oct 2024 00:54:47 +0000 Subject: [PATCH 170/249] Update from Dark Visitors --- robots.json | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/robots.json b/robots.json index c50d63c..4922c84 100644 --- a/robots.json +++ b/robots.json @@ -1,10 +1,10 @@ { "AI2Bot": { - "description": "Explores 'certain domains' to find web content.", - "frequency": "No information provided.", - "function": "Content is used to train open language models.", "operator": "[Ai2](https://allenai.org/crawler)", - "respect": "Yes" + "respect": "Yes", + "function": "Content is used to train open language models.", + "frequency": "No information provided.", + "description": "Explores 'certain domains' to find web content." }, "Ai2Bot-Dolma": { "description": "Explores 'certain domains' to find web content.", From bc0a0ad0e97f93c152d582ad7b67543b399a3158 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Tue, 29 Oct 2024 00:52:12 +0000 Subject: [PATCH 171/249] Update from Dark Visitors --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index 4922c84..dbd5ae4 100644 --- a/robots.json +++ b/robots.json @@ -90,6 +90,13 @@ "frequency": "Unclear at this time.", "description": "Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training." }, + "DuckAssistBot": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Assistants", + "frequency": "Unclear at this time.", + "description": "DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot" + }, "FacebookBot": { "operator": "Meta/Facebook", "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)", From 9e06cf3bc9eb9cd4947eb1a887cfa07ecde117b3 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Tue, 29 Oct 2024 00:52:12 +0000 Subject: [PATCH 172/249] Updated from new robots.json --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index 927f6f4..4b9cc6a 100644 --- a/robots.txt +++ b/robots.txt @@ -11,6 +11,7 @@ User-agent: Claude-Web User-agent: ClaudeBot User-agent: cohere-ai User-agent: Diffbot +User-agent: DuckAssistBot User-agent: FacebookBot User-agent: facebookexternalhit User-agent: FriendlyCrawler diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 257ba99..fe6baa2 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -13,6 +13,7 @@ | ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | +| DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | | facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | From 9295b6a963f0ccba30392e005419b42eed2b264e Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sat, 9 Nov 2024 04:45:47 +0000 Subject: [PATCH 173/249] Clarify our rationale I deleted the point about excessive load on crawled sites as any other crawler could potentially be guilty of this and I wouldn't want our scope to creep to all crawlers. Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/53#issuecomment-2466042550 --- FAQ.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/FAQ.md b/FAQ.md index 4d58350..15db540 100644 --- a/FAQ.md +++ b/FAQ.md @@ -2,7 +2,7 @@ ## Why should we block these crawlers? -They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities. +They're extractive, confer no benefit to the creators of data they're ingesting and also have wide-ranging negative externalities: particularly copyright abuse and environmental impact. **[How Tech Giants Cut Corners to Harvest Data for A.I.](https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html?unlocked_article_code=1.ik0.Ofja.L21c1wyW-0xj&ugrp=m)** > OpenAI, Google and Meta ignored corporate policies, altered their own rules and discussed skirting copyright law as they sought online information to train their newest artificial intelligence systems. @@ -10,7 +10,11 @@ They're extractive, confer no benefit to the creators of data they're ingesting **[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)** > The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI. -Crawlers also sometimes impact the performance of crawled sites, or even take them down. +**[Reconciling the contrasting narratives on the environmental impact of large language models](https://www.nature.com/articles/s41598-024-76682-6) +> Studies have shown that the training of just one LLM can consume as much energy as five cars do across their lifetimes. The water footprint of AI is also substantial; for example, recent work has highlighted that water consumption associated with AI models involves data centers using millions of gallons of water per day for cooling. Additionally, the energy consumption and carbon emissions of AI are projected to grow quickly in the coming years [...]. + +**[Scientists Predict AI to Generate Millions of Tons of E-Waste](https://www.sciencealert.com/scientists-predict-ai-to-generate-millions-of-tons-of-e-waste) +> we could end up with between 1.2 million and 5 million metric tons of additional electronic waste by the end of this decade [the 2020's]. ## How do we know AI companies/bots respect `robots.txt`? From 2c88909be39ad7e0b113e1245fbf1d134b267e8b Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sun, 10 Nov 2024 01:02:18 +0000 Subject: [PATCH 174/249] Fix formatting --- FAQ.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/FAQ.md b/FAQ.md index 15db540..044f710 100644 --- a/FAQ.md +++ b/FAQ.md @@ -10,10 +10,10 @@ They're extractive, confer no benefit to the creators of data they're ingesting **[How AI copyright lawsuits could make the whole industry go extinct](https://www.theverge.com/24062159/ai-copyright-fair-use-lawsuits-new-york-times-openai-chatgpt-decoder-podcast)** > The New York Times' lawsuit against OpenAI is part of a broader, industry-shaking copyright challenge that could define the future of AI. -**[Reconciling the contrasting narratives on the environmental impact of large language models](https://www.nature.com/articles/s41598-024-76682-6) +**[Reconciling the contrasting narratives on the environmental impact of large language models](https://www.nature.com/articles/s41598-024-76682-6)** > Studies have shown that the training of just one LLM can consume as much energy as five cars do across their lifetimes. The water footprint of AI is also substantial; for example, recent work has highlighted that water consumption associated with AI models involves data centers using millions of gallons of water per day for cooling. Additionally, the energy consumption and carbon emissions of AI are projected to grow quickly in the coming years [...]. -**[Scientists Predict AI to Generate Millions of Tons of E-Waste](https://www.sciencealert.com/scientists-predict-ai-to-generate-millions-of-tons-of-e-waste) +**[Scientists Predict AI to Generate Millions of Tons of E-Waste](https://www.sciencealert.com/scientists-predict-ai-to-generate-millions-of-tons-of-e-waste)** > we could end up with between 1.2 million and 5 million metric tons of additional electronic waste by the end of this decade [the 2020's]. ## How do we know AI companies/bots respect `robots.txt`? From d50615d3947e524ca92fbaa05eaac6c5bc59121d Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sun, 10 Nov 2024 01:06:13 +0000 Subject: [PATCH 175/249] Improve formatting This clarifies the scope of the tip is Apache httpd. --- FAQ.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/FAQ.md b/FAQ.md index 044f710..967cf41 100644 --- a/FAQ.md +++ b/FAQ.md @@ -42,8 +42,8 @@ That depends on your stack. - Apache httpd - [Blockin' bots.](https://ethanmarcotte.com/wrote/blockin-bots/) by Ethan Marcotte - [Blocking Bots With 11ty And Apache](https://flamedfury.com/posts/blocking-bots-with-11ty-and-apache/) by fLaMEd fury -> [!TIP] -> The snippets in these articles all use `mod_rewrite`, which [should be considered a last resort](https://httpd.apache.org/docs/trunk/rewrite/avoid.html). A good alternative that's less resource-intensive is `mod_setenvif`; see [httpd docs](https://httpd.apache.org/docs/trunk/rewrite/access.html#blocking-of-robots) for an example. You should also consider [setting this up in `httpd.conf` instead of `.htaccess`](https://httpd.apache.org/docs/trunk/howto/htaccess.html#when) if it's available to you. + > [!TIP] + > The snippets in these articles all use `mod_rewrite`, which [should be considered a last resort](https://httpd.apache.org/docs/trunk/rewrite/avoid.html). A good alternative that's less resource-intensive is `mod_setenvif`; see [httpd docs](https://httpd.apache.org/docs/trunk/rewrite/access.html#blocking-of-robots) for an example. You should also consider [setting this up in `httpd.conf` instead of `.htaccess`](https://httpd.apache.org/docs/trunk/howto/htaccess.html#when) if it's available to you. - Netlify - [Blockin' bots on Netlify](https://www.jeremiak.com/blog/block-bots-netlify-edge-functions/) by Jeremia Kimelman - Cloudflare From adfd4af872d5fd91817915ac0ca539165e1e0cd2 Mon Sep 17 00:00:00 2001 From: "Y. Meyer-Norwood" <106889957+norwd@users.noreply.github.com> Date: Mon, 11 Nov 2024 12:58:40 +1300 Subject: [PATCH 176/249] Create upload-robots-txt-file-to-release.yml --- .../upload-robots-txt-file-to-release.yml | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) create mode 100644 .github/workflows/upload-robots-txt-file-to-release.yml diff --git a/.github/workflows/upload-robots-txt-file-to-release.yml b/.github/workflows/upload-robots-txt-file-to-release.yml new file mode 100644 index 0000000..df57bee --- /dev/null +++ b/.github/workflows/upload-robots-txt-file-to-release.yml @@ -0,0 +1,23 @@ +--- + +name: "Upload robots.txt file to release" +run-name: "Upload robots.txt file to release" + +on: + release: + types: + - published + +jobs: + upload-robots-txt-file-to-release: + name: "Upload robots.txt file to release" + runs-on: ubuntu-latest + steps: + - name: "Checkout" + uses: actions/checkout@v4 + + - name: "Upload" + run: gh --repo "${REPO}" release upload "${TAG}" robots.txt + env: + REPO: ${{ github.repository }} + TAG: ${{ github.event.release.tag_name }} From 94ceb3cffdc3001dccfdfbd48140cd8057116242 Mon Sep 17 00:00:00 2001 From: "Y. Meyer-Norwood" <106889957+norwd@users.noreply.github.com> Date: Mon, 11 Nov 2024 13:04:55 +1300 Subject: [PATCH 177/249] Add authentication for `gh` command --- .github/workflows/upload-robots-txt-file-to-release.yml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/.github/workflows/upload-robots-txt-file-to-release.yml b/.github/workflows/upload-robots-txt-file-to-release.yml index df57bee..370feb6 100644 --- a/.github/workflows/upload-robots-txt-file-to-release.yml +++ b/.github/workflows/upload-robots-txt-file-to-release.yml @@ -8,6 +8,9 @@ on: types: - published +permissions: + contents: write + jobs: upload-robots-txt-file-to-release: name: "Upload robots.txt file to release" @@ -19,5 +22,6 @@ jobs: - name: "Upload" run: gh --repo "${REPO}" release upload "${TAG}" robots.txt env: + GH_TOKEN: ${{ github.token }} REPO: ${{ github.repository }} TAG: ${{ github.event.release.tag_name }} From e8f0784a0058f8a737ef150ae132ecb13051979d Mon Sep 17 00:00:00 2001 From: "Y. Meyer-Norwood" <106889957+norwd@users.noreply.github.com> Date: Wed, 13 Nov 2024 10:26:37 +1300 Subject: [PATCH 178/249] Explicitly use release tag for checkout --- .github/workflows/upload-robots-txt-file-to-release.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/upload-robots-txt-file-to-release.yml b/.github/workflows/upload-robots-txt-file-to-release.yml index 370feb6..5bf2b29 100644 --- a/.github/workflows/upload-robots-txt-file-to-release.yml +++ b/.github/workflows/upload-robots-txt-file-to-release.yml @@ -18,6 +18,8 @@ jobs: steps: - name: "Checkout" uses: actions/checkout@v4 + with: + ref: ${{ github.event.release.tag_name }} - name: "Upload" run: gh --repo "${REPO}" release upload "${TAG}" robots.txt From 80002f5e17e5fd3ab87cd74c17f6a102e9cd634e Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Tue, 19 Nov 2024 03:33:45 +0000 Subject: [PATCH 179/249] Allow facebookexternalhit At the time of writing, this crawler does not appear to be for the purpose of AI. See: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/ (accessed on 19 November 2024). Fixes https://github.com/ai-robots-txt/ai.robots.txt/issues/40 --- robots.json | 7 ------- 1 file changed, 7 deletions(-) diff --git a/robots.json b/robots.json index dbd5ae4..21fc1de 100644 --- a/robots.json +++ b/robots.json @@ -104,13 +104,6 @@ "frequency": "Up to 1 page per second", "description": "Officially used for training Meta \"speech recognition technology,\" unknown if used to train Meta AI specifically." }, - "facebookexternalhit": { - "description": "Unclear at this time.", - "frequency": "Unclear at this time.", - "function": "No information.", - "operator": "Meta/Facebook", - "respect": "[Yes](https://developers.facebook.com/docs/sharing/bot/)" - }, "FriendlyCrawler": { "description": "Unclear who the operator is; but data is used for training/machine learning.", "frequency": "Unclear at this time.", From 58985737e783aff099fc9dd06b895179d7833c34 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Tue, 19 Nov 2024 16:46:21 +0000 Subject: [PATCH 180/249] Updated from new robots.json --- robots.txt | 1 - table-of-bot-metrics.md | 1 - 2 files changed, 2 deletions(-) diff --git a/robots.txt b/robots.txt index 4b9cc6a..1865026 100644 --- a/robots.txt +++ b/robots.txt @@ -13,7 +13,6 @@ User-agent: cohere-ai User-agent: Diffbot User-agent: DuckAssistBot User-agent: FacebookBot -User-agent: facebookexternalhit User-agent: FriendlyCrawler User-agent: Google-Extended User-agent: GoogleOther diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index fe6baa2..d1eed4b 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -15,7 +15,6 @@ | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | -| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | | Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | | GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | From 37065f911805370426a9a33b1af5f400b24a0c16 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Sun, 24 Nov 2024 00:57:05 +0000 Subject: [PATCH 181/249] Update from Dark Visitors --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index 21fc1de..51a5b50 100644 --- a/robots.json +++ b/robots.json @@ -223,6 +223,13 @@ "operator": "[Webz.io](https://webz.io/)", "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" }, + "PanguBot": { + "operator": "the Chinese company Huawei", + "respect": "Unclear at this time.", + "function": "AI Data Scrapers", + "frequency": "Unclear at this time.", + "description": "PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot" + }, "PerplexityBot": { "operator": "[Perplexity](https://www.perplexity.ai/)", "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", From 609ddca39295d1ec8ec47bc4c1c609135bc238d3 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Sun, 24 Nov 2024 00:57:06 +0000 Subject: [PATCH 182/249] Updated from new robots.json --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index 1865026..c41ed6d 100644 --- a/robots.txt +++ b/robots.txt @@ -30,6 +30,7 @@ User-agent: Meta-ExternalFetcher User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot +User-agent: PanguBot User-agent: PerplexityBot User-agent: PetalBot User-agent: Scrapy diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index d1eed4b..e905d2f 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -32,6 +32,7 @@ | OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| PanguBot | the Chinese company Huawei | Unclear at this time. | AI Data Scrapers | Unclear at this time. | PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | From bd38c3019412afdb53ee394e2dd8ccc9294b83a3 Mon Sep 17 00:00:00 2001 From: fabianegli Date: Tue, 26 Nov 2024 09:12:11 +0100 Subject: [PATCH 183/249] specify file encodings in tests --- code/tests.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/code/tests.py b/code/tests.py index ffa7574..16e4fe3 100644 --- a/code/tests.py +++ b/code/tests.py @@ -10,12 +10,12 @@ from dark_visitors import json_to_txt, json_to_table def test_robots_txt_creation(): - robots_json = json.loads(Path("test_files/robots.json").read_text()) + robots_json = json.loads(Path("test_files/robots.json").read_text(encoding="utf-8")) robots_txt = json_to_txt(robots_json) - assert Path("test_files/robots.txt").read_text() == robots_txt + assert Path("test_files/robots.txt").read_text(encoding="utf-8") == robots_txt def test_table_of_bot_metrices_md(): - robots_json = json.loads(Path("test_files/robots.json").read_text()) + robots_json = json.loads(Path("test_files/robots.json").read_text(encoding="utf-8")) robots_table = json_to_table(robots_json) - assert Path("test_files/table-of-bot-metrics.md").read_text() == robots_table + assert Path("test_files/table-of-bot-metrics.md").read_text(encoding="utf-8") == robots_table From b64284d6846da62e1ad78146c4fcb6e7ff0eb80c Mon Sep 17 00:00:00 2001 From: fabianegli Date: Tue, 26 Nov 2024 09:41:46 +0100 Subject: [PATCH 184/249] restore correct attribution logic to before PR #55 --- .github/workflows/ai_robots_update.yml | 16 ++++------- .github/workflows/main.yml | 38 ++++++++++++++++++++++++++ 2 files changed, 44 insertions(+), 10 deletions(-) create mode 100644 .github/workflows/main.yml diff --git a/.github/workflows/ai_robots_update.yml b/.github/workflows/ai_robots_update.yml index b346e10..654b0b5 100644 --- a/.github/workflows/ai_robots_update.yml +++ b/.github/workflows/ai_robots_update.yml @@ -1,8 +1,5 @@ name: Updates for AI robots files on: - push: - branches: - - "main" schedule: - cron: "0 0 * * *" @@ -24,11 +21,10 @@ jobs: git --no-pager diff git add -A git diff --quiet && git diff --staged --quiet || (git commit -m "Update from Dark Visitors" && git push) - - echo "Updating robots.txt and table-of-bot-metrics.md if necessary ..." - python code/dark_visitors.py --convert - echo "... done." - git --no-pager diff - git add -A - git diff --quiet && git diff --staged --quiet || (git commit -m "Updated from new robots.json" && git push) shell: bash + call-main: + needs: dark-visitors + uses: ./.github/workflows/main.yml + secrets: inherit + with: + message: "Update from Dark Visitors" diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml new file mode 100644 index 0000000..a4c47d6 --- /dev/null +++ b/.github/workflows/main.yml @@ -0,0 +1,38 @@ +on: + workflow_call: + inputs: + message: + type: string + required: true + description: The message to commit + push: + paths: + - 'robots.json' + branches: + - "main" + +jobs: + ai-robots-txt: + runs-on: ubuntu-latest + name: ai-robots-txt + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 2 + - run: | + git config --global user.name "ai.robots.txt" + git config --global user.email "ai.robots.txt@users.noreply.github.com" + git log -1 + git status + echo "Updating robots.txt and table-of-bot-metrics.md if necessary ..." + python code/dark_visitors.py --convert + echo "... done." + git --no-pager diff + git add -A + if [ -n "${{ inputs.message }}" ]; then + git commit -m "${{ inputs.message }}" + else + git commit -m "${{ github.event.head_commit.message }}" + fi + git push + shell: bash From eb8e1a49b5fd36b57490b37831d26013223b4eb9 Mon Sep 17 00:00:00 2001 From: fabianegli Date: Fri, 29 Nov 2024 09:02:47 +0100 Subject: [PATCH 185/249] Revert "specify file encodings in tests" This reverts commit bd38c3019412afdb53ee394e2dd8ccc9294b83a3. --- code/tests.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/code/tests.py b/code/tests.py index 16e4fe3..ffa7574 100644 --- a/code/tests.py +++ b/code/tests.py @@ -10,12 +10,12 @@ from dark_visitors import json_to_txt, json_to_table def test_robots_txt_creation(): - robots_json = json.loads(Path("test_files/robots.json").read_text(encoding="utf-8")) + robots_json = json.loads(Path("test_files/robots.json").read_text()) robots_txt = json_to_txt(robots_json) - assert Path("test_files/robots.txt").read_text(encoding="utf-8") == robots_txt + assert Path("test_files/robots.txt").read_text() == robots_txt def test_table_of_bot_metrices_md(): - robots_json = json.loads(Path("test_files/robots.json").read_text(encoding="utf-8")) + robots_json = json.loads(Path("test_files/robots.json").read_text()) robots_table = json_to_table(robots_json) - assert Path("test_files/table-of-bot-metrics.md").read_text(encoding="utf-8") == robots_table + assert Path("test_files/table-of-bot-metrics.md").read_text() == robots_table From 2036a68c1f6d5b217439976000e8f7162e2dbb3f Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Wed, 4 Dec 2024 00:55:50 +0000 Subject: [PATCH 186/249] Update from Dark Visitors --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index 51a5b50..1c00b63 100644 --- a/robots.json +++ b/robots.json @@ -83,6 +83,13 @@ "frequency": "Takes action based on user prompts.", "description": "Retrieves data based on user prompts." }, + "cohere-training-data-crawler": { + "operator": "Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products", + "respect": "Unclear at this time.", + "function": "AI Data Scrapers", + "frequency": "Unclear at this time.", + "description": "cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products. More info can be found at https://darkvisitors.com/agents/agents/cohere-training-data-crawler" + }, "Diffbot": { "operator": "[Diffbot](https://www.diffbot.com/)", "respect": "At the discretion of Diffbot users.", From 3a43714908dd7df42a9ecf35c107e609bc2f9120 Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sat, 4 Jan 2025 04:55:34 +0000 Subject: [PATCH 187/249] Rename Python code The name dark_visitors.py gives the impression that the code is entirely related to the dark visitors website, whereas the update command relates to dark visitors and the convert command is unrelated to dark visitors. --- .github/workflows/ai_robots_update.yml | 2 +- .github/workflows/main.yml | 2 +- code/{dark_visitors.py => robots.py} | 0 3 files changed, 2 insertions(+), 2 deletions(-) rename code/{dark_visitors.py => robots.py} (100%) diff --git a/.github/workflows/ai_robots_update.yml b/.github/workflows/ai_robots_update.yml index 654b0b5..59e785d 100644 --- a/.github/workflows/ai_robots_update.yml +++ b/.github/workflows/ai_robots_update.yml @@ -16,7 +16,7 @@ jobs: git config --global user.name "dark-visitors" git config --global user.email "dark-visitors@users.noreply.github.com" echo "Updating robots.json with data from darkvisitor.com ..." - python code/dark_visitors.py --update + python code/robots.py --update echo "... done." git --no-pager diff git add -A diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index a4c47d6..40ac9ab 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -25,7 +25,7 @@ jobs: git log -1 git status echo "Updating robots.txt and table-of-bot-metrics.md if necessary ..." - python code/dark_visitors.py --convert + python code/robots.py --convert echo "... done." git --no-pager diff git add -A diff --git a/code/dark_visitors.py b/code/robots.py similarity index 100% rename from code/dark_visitors.py rename to code/robots.py From e4c12ee2f84e2cb6643f7eeb7dd6eb50c6e91df8 Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sat, 4 Jan 2025 05:03:48 +0000 Subject: [PATCH 188/249] Rename in test code --- code/tests.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/code/tests.py b/code/tests.py index ffa7574..9cf35fe 100644 --- a/code/tests.py +++ b/code/tests.py @@ -6,7 +6,7 @@ cd to the `code` directory and run `pytest` import json from pathlib import Path -from dark_visitors import json_to_txt, json_to_table +from robots import json_to_txt, json_to_table def test_robots_txt_creation(): From 996b9c678cbdd90dea414006cc14027b29118d5c Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sat, 4 Jan 2025 05:28:41 +0000 Subject: [PATCH 189/249] Improve job name The purpose of the job is to convert the JSON file to the other files. --- .github/workflows/ai_robots_update.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/ai_robots_update.yml b/.github/workflows/ai_robots_update.yml index 59e785d..7e11ce8 100644 --- a/.github/workflows/ai_robots_update.yml +++ b/.github/workflows/ai_robots_update.yml @@ -22,7 +22,8 @@ jobs: git add -A git diff --quiet && git diff --staged --quiet || (git commit -m "Update from Dark Visitors" && git push) shell: bash - call-main: + convert: + name: convert needs: dark-visitors uses: ./.github/workflows/main.yml secrets: inherit From 9e372d069625f2a2939c19fb8bfc703548a2ae42 Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sun, 5 Jan 2025 01:45:33 +0000 Subject: [PATCH 190/249] Ensure dependency installed Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/60#issuecomment-2571437913 Ref: https://stackoverflow.com/questions/11783875/importerror-no-module-named-bs4-beautifulsoup --- .github/workflows/main.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index a4c47d6..cb5fefc 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -20,6 +20,7 @@ jobs: with: fetch-depth: 2 - run: | + pip install beautifulsoup4 git config --global user.name "ai.robots.txt" git config --global user.email "ai.robots.txt@users.noreply.github.com" git log -1 From c01a68403687f44ef3235ee726ff70b9d6a133f4 Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Sun, 5 Jan 2025 05:03:50 +0000 Subject: [PATCH 191/249] Convert robots.json more frequently Specifically, when github workflows or code is changed as either of these can affect the conversion results. Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/60 --- .github/workflows/main.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index cb5fefc..4abbe2b 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -8,6 +8,8 @@ on: push: paths: - 'robots.json' + - '.github/workflows/**' + - 'code/**' branches: - "main" From ca8620e28b8b3baddc34852e3cb2ece2bf89d18d Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 5 Jan 2025 05:05:20 +0000 Subject: [PATCH 192/249] Merge pull request #63 from glyn/push-paths Convert robots.json more frequently --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index c41ed6d..1ae5558 100644 --- a/robots.txt +++ b/robots.txt @@ -10,6 +10,7 @@ User-agent: ChatGPT-User User-agent: Claude-Web User-agent: ClaudeBot User-agent: cohere-ai +User-agent: cohere-training-data-crawler User-agent: Diffbot User-agent: DuckAssistBot User-agent: FacebookBot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index e905d2f..1106d0f 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -12,6 +12,7 @@ | Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| cohere-training-data-crawler | Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products | Unclear at this time. | AI Data Scrapers | Unclear at this time. | cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products. More info can be found at https://darkvisitors.com/agents/agents/cohere-training-data-crawler | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | From 83cd54647015829bbf241931e3d602c6081d2a1c Mon Sep 17 00:00:00 2001 From: Fabian Egli Date: Mon, 6 Jan 2025 11:39:41 +0100 Subject: [PATCH 193/249] allow Action to succeed even if no changes were made Before, the Action would fail in case there were no changes made to any files by the converter. --- .github/workflows/main.yml | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index 4abbe2b..d26a5a0 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -32,6 +32,13 @@ jobs: echo "... done." git --no-pager diff git add -A + if [ "$(git diff --staged)" ]; then + # To have the action run successfully, if no changes are staged, we + # manually skip the later commits because they fail with exit code 1 + # and this would then display as a failure for the Action. + echo "No staged changes to commit. Skipping commit and push." + exit 0 + fi if [ -n "${{ inputs.message }}" ]; then git commit -m "${{ inputs.message }}" else From 30ee95701162ac8f67cf6183641b2a140fcde721 Mon Sep 17 00:00:00 2001 From: Fabian Egli Date: Mon, 6 Jan 2025 12:05:42 +0100 Subject: [PATCH 194/249] bail when NO changes are staged --- .github/workflows/main.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index d26a5a0..ac20d99 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -32,7 +32,7 @@ jobs: echo "... done." git --no-pager diff git add -A - if [ "$(git diff --staged)" ]; then + if [ -z "$(git diff --staged)" ]; then # To have the action run successfully, if no changes are staged, we # manually skip the later commits because they fail with exit code 1 # and this would then display as a failure for the Action. From 143f8f228588b1f66bc1435fc21457f610807d5f Mon Sep 17 00:00:00 2001 From: Jordan Atwood Date: Mon, 6 Jan 2025 12:34:38 -0800 Subject: [PATCH 195/249] Block SemrushBot --- robots.json | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/robots.json b/robots.json index 1c00b63..c444cb4 100644 --- a/robots.json +++ b/robots.json @@ -258,6 +258,13 @@ "operator": "[Zyte](https://www.zyte.com)", "respect": "Unclear at this time." }, + "SemrushBot": { + "operator": "[Semrush](https://www.semrush.com/)", + "respect": "[Yes](https://www.semrush.com/bot/)", + "function": "Scrapes data for use in LLM article-writing tool.", + "frequency": "Roughly once every 10 seconds.", + "description": "SemrushBot is a bot which, among other functions, scrapes data for use in ContentShake AI tool reports." + }, "Sidetrade indexer bot": { "description": "AI product training.", "frequency": "No information.", From ec454b71d3984e58f323bb71631847dfe6b51b78 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 6 Jan 2025 20:51:56 +0000 Subject: [PATCH 196/249] Merge pull request #67 from Nightfirecat/semrushbot Block SemrushBot --- robots.txt | 1 + table-of-bot-metrics.md | 1 + 2 files changed, 2 insertions(+) diff --git a/robots.txt b/robots.txt index 1ae5558..5c32c96 100644 --- a/robots.txt +++ b/robots.txt @@ -35,6 +35,7 @@ User-agent: PanguBot User-agent: PerplexityBot User-agent: PetalBot User-agent: Scrapy +User-agent: SemrushBot User-agent: Sidetrade indexer bot User-agent: Timpibot User-agent: VelenPublicWebCrawler diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 1106d0f..31c9367 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -37,6 +37,7 @@ | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | +| SemrushBot | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Scrapes data for use in LLM article-writing tool. | Roughly once every 10 seconds. | SemrushBot is a bot which, among other functions, scrapes data for use in ContentShake AI tool reports. | | Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | | Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | | VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | From 933aa6159da9dbe7025f6294e98a6d3e326b43a3 Mon Sep 17 00:00:00 2001 From: Massimo Gismondi Date: Tue, 7 Jan 2025 11:02:29 +0100 Subject: [PATCH 197/249] Implementing htaccess generation --- .htaccess | 3 +++ code/robots.py | 22 +++++++++++++++++++++- code/test_files/.htaccess | 3 +++ code/tests.py | 8 +++++++- 4 files changed, 34 insertions(+), 2 deletions(-) create mode 100644 .htaccess create mode 100644 code/test_files/.htaccess diff --git a/.htaccess b/.htaccess new file mode 100644 index 0000000..31ba5f7 --- /dev/null +++ b/.htaccess @@ -0,0 +1,3 @@ +RewriteEngine On +RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] +RewriteRule .* - [F,L] \ No newline at end of file diff --git a/code/robots.py b/code/robots.py index cf44e8e..d35d74b 100644 --- a/code/robots.py +++ b/code/robots.py @@ -132,10 +132,26 @@ def json_to_table(robots_json): return table +def json_to_htaccess(robot_json): + htaccess = "RewriteEngine On\n" + htaccess += "RewriteCond %{HTTP_USER_AGENT} ^.*(" + + robots = map(lambda el: el.replace(" ", "\\ "), robot_json.keys()) + htaccess += "|".join(robots) + htaccess += ").*$ [NC]\n" + htaccess += "RewriteRule .* - [F,L]" + return htaccess + + def update_file_if_changed(file_name, converter): """Update files if newer content is available and log the (in)actions.""" new_content = converter(load_robots_json()) - old_content = Path(file_name).read_text(encoding="utf-8") + filepath = Path(file_name) + if not filepath.exists(): + filepath.write_text(new_content, encoding="utf-8") + print(f"{file_name} has been created.") + return + old_content = filepath.read_text(encoding="utf-8") if old_content == new_content: print(f"{file_name} is already up to date.") else: @@ -150,6 +166,10 @@ def conversions(): file_name="./table-of-bot-metrics.md", converter=json_to_table, ) + update_file_if_changed( + file_name="./.htaccess", + converter=json_to_htaccess, + ) if __name__ == "__main__": diff --git a/code/test_files/.htaccess b/code/test_files/.htaccess new file mode 100644 index 0000000..a34bf55 --- /dev/null +++ b/code/test_files/.htaccess @@ -0,0 +1,3 @@ +RewriteEngine On +RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] +RewriteRule .* - [F,L] \ No newline at end of file diff --git a/code/tests.py b/code/tests.py index 9cf35fe..6f778c3 100644 --- a/code/tests.py +++ b/code/tests.py @@ -6,7 +6,7 @@ cd to the `code` directory and run `pytest` import json from pathlib import Path -from robots import json_to_txt, json_to_table +from robots import json_to_txt, json_to_table, json_to_htaccess def test_robots_txt_creation(): @@ -19,3 +19,9 @@ def test_table_of_bot_metrices_md(): robots_json = json.loads(Path("test_files/robots.json").read_text()) robots_table = json_to_table(robots_json) assert Path("test_files/table-of-bot-metrics.md").read_text() == robots_table + + +def test_htaccess_creation(): + robots_json = json.loads(Path("test_files/robots.json").read_text()) + robots_htaccess = json_to_htaccess(robots_json) + assert Path("test_files/.htaccess").read_text() == robots_htaccess From 189e75bbfd06715a5d30972d3aa4c23974aecee0 Mon Sep 17 00:00:00 2001 From: Massimo Gismondi Date: Fri, 17 Jan 2025 21:25:23 +0100 Subject: [PATCH 198/249] Adding usage instructions --- README.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/README.md b/README.md index b3c2e7c..45c8f3a 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,19 @@ A number of these crawlers have been sourced from [Dark Visitors](https://darkvi If you'd like to add information about a crawler to the list, please make a pull request with the bot name added to `robots.txt`, `ai.txt`, and any relevant details in `table-of-bot-metrics.md` to help people understand what's crawling. +## Usage + +Many visitors will find these files from this repository most useful: +- `robots.txt` +- `.htaccess` + +The first one tells search engine and AI crawlers which parts of your website should be scanned or avoided. The webpages of your server are returned anyway, but the crawler "pledges" not to use them. By default, the provided `robots.txt` tells every AI crawler not to scan any page in your website. This is not bulletproof, as an evil crawler could simply ignore the `robots.txt` content. + +The second one tells your own webserver to return an error page when one of the listed AI crawlers tries to request a page from your website. A `.htaccess` file does not work on every webserver, but works correctly on most common and cheap shared hosting providers. The majority of AI crawlers set a "User Agent" string in every request they send, by which they are identifiable: this string is used to filter the request. Instead of simply hoping the crawler pledges to respect our intention, this solution actively sends back a bad webpage (an error or an empty page). Note that this solution isn't bulletproof either, as anyone can fake the sent User Agent. + +We suggest adding both files, as some crawlers may respect `robots.txt` while not having an identifiable User Agent; on the other hand, other crawlers may not respect the `robots.txt`, but they provide a identifiable User Agent by which we can filter them out. + + ## Contributing A note about contributing: updates should be added/made to `robots.json`. A GitHub action, courtesy of [Adam](https://github.com/newbold), will then generate the updated `robots.txt` and `table-of-bot-metrics.md`. From b455af66e7903e76162d43f3e8f0900084fb9539 Mon Sep 17 00:00:00 2001 From: Massimo Gismondi Date: Fri, 17 Jan 2025 21:42:08 +0100 Subject: [PATCH 199/249] Adding clarification about performance and code comment --- README.md | 3 ++- code/robots.py | 4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 45c8f3a..dd84a16 100644 --- a/README.md +++ b/README.md @@ -18,8 +18,9 @@ The first one tells search engine and AI crawlers which parts of your website sh The second one tells your own webserver to return an error page when one of the listed AI crawlers tries to request a page from your website. A `.htaccess` file does not work on every webserver, but works correctly on most common and cheap shared hosting providers. The majority of AI crawlers set a "User Agent" string in every request they send, by which they are identifiable: this string is used to filter the request. Instead of simply hoping the crawler pledges to respect our intention, this solution actively sends back a bad webpage (an error or an empty page). Note that this solution isn't bulletproof either, as anyone can fake the sent User Agent. -We suggest adding both files, as some crawlers may respect `robots.txt` while not having an identifiable User Agent; on the other hand, other crawlers may not respect the `robots.txt`, but they provide a identifiable User Agent by which we can filter them out. +Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/current/howto/htaccess.html), more performant methods than an `.htaccess` file exist. Nevertheless, most shared hosting providers only allow `.htaccess` configuration. +We suggest adding both files, as some crawlers may respect `robots.txt` while not having an identifiable User Agent; on the other hand, other crawlers may not respect the `robots.txt`, but they provide a identifiable User Agent by which we can filter them out. ## Contributing diff --git a/code/robots.py b/code/robots.py index d35d74b..f2ddbb8 100644 --- a/code/robots.py +++ b/code/robots.py @@ -133,7 +133,9 @@ def json_to_table(robots_json): def json_to_htaccess(robot_json): - htaccess = "RewriteEngine On\n" + # Creates a .htaccess filter file. It uses a regular expression to filter out + #User agents that contain any of the blocked values. + htaccess += "RewriteEngine On\n" htaccess += "RewriteCond %{HTTP_USER_AGENT} ^.*(" robots = map(lambda el: el.replace(" ", "\\ "), robot_json.keys()) From 8aee2f24bb03a8d91a2fb17c3a98628411239d40 Mon Sep 17 00:00:00 2001 From: Massimo Gismondi <24638827+MassiminoilTrace@users.noreply.github.com> Date: Sat, 18 Jan 2025 12:39:07 +0100 Subject: [PATCH 200/249] Fixed space in comment Co-authored-by: Glyn Normington --- code/robots.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/code/robots.py b/code/robots.py index f2ddbb8..0172330 100644 --- a/code/robots.py +++ b/code/robots.py @@ -134,7 +134,7 @@ def json_to_table(robots_json): def json_to_htaccess(robot_json): # Creates a .htaccess filter file. It uses a regular expression to filter out - #User agents that contain any of the blocked values. + # User agents that contain any of the blocked values. htaccess += "RewriteEngine On\n" htaccess += "RewriteCond %{HTTP_USER_AGENT} ^.*(" From 1cc4b59dfc4acd5666478efea658b1adf1af8aee Mon Sep 17 00:00:00 2001 From: Massimo Gismondi <24638827+MassiminoilTrace@users.noreply.github.com> Date: Sat, 18 Jan 2025 12:40:03 +0100 Subject: [PATCH 201/249] Shortened htaccess instructions Co-authored-by: Glyn Normington --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index dd84a16..badd23b 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ Many visitors will find these files from this repository most useful: - `robots.txt` - `.htaccess` -The first one tells search engine and AI crawlers which parts of your website should be scanned or avoided. The webpages of your server are returned anyway, but the crawler "pledges" not to use them. By default, the provided `robots.txt` tells every AI crawler not to scan any page in your website. This is not bulletproof, as an evil crawler could simply ignore the `robots.txt` content. +`robots.txt` implements the Robots Exclusion Protocol ([RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)). The second one tells your own webserver to return an error page when one of the listed AI crawlers tries to request a page from your website. A `.htaccess` file does not work on every webserver, but works correctly on most common and cheap shared hosting providers. The majority of AI crawlers set a "User Agent" string in every request they send, by which they are identifiable: this string is used to filter the request. Instead of simply hoping the crawler pledges to respect our intention, this solution actively sends back a bad webpage (an error or an empty page). Note that this solution isn't bulletproof either, as anyone can fake the sent User Agent. From d65128d10acfd14b714488170b3a261912cc3729 Mon Sep 17 00:00:00 2001 From: Massimo Gismondi <24638827+MassiminoilTrace@users.noreply.github.com> Date: Sat, 18 Jan 2025 12:41:09 +0100 Subject: [PATCH 202/249] Removed paragraph in favour of future FAQ.md Co-authored-by: Glyn Normington --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index badd23b..505a8dd 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,6 @@ The second one tells your own webserver to return an error page when one of the Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/current/howto/htaccess.html), more performant methods than an `.htaccess` file exist. Nevertheless, most shared hosting providers only allow `.htaccess` configuration. -We suggest adding both files, as some crawlers may respect `robots.txt` while not having an identifiable User Agent; on the other hand, other crawlers may not respect the `robots.txt`, but they provide a identifiable User Agent by which we can filter them out. ## Contributing From 5aa08bc0022e8e9960e4cf52359ca2d910f795bf Mon Sep 17 00:00:00 2001 From: Joshua Sheard Date: Sun, 19 Jan 2025 22:03:50 +0000 Subject: [PATCH 203/249] Add Crawlspace --- robots.json | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/robots.json b/robots.json index c444cb4..d71c80b 100644 --- a/robots.json +++ b/robots.json @@ -90,6 +90,13 @@ "frequency": "Unclear at this time.", "description": "cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products. More info can be found at https://darkvisitors.com/agents/agents/cohere-training-data-crawler" }, + "Crawlspace": { + "operator": "[Crawlspace](https://crawlspace.dev)", + "respect": "[Yes](https://news.ycombinator.com/item?id=42756654)", + "function": "Scrapes data", + "frequency": "Unclear at this time.", + "description": "Provides crawling services for any purpose, but most likely to be used for AI model training." + }, "Diffbot": { "operator": "[Diffbot](https://www.diffbot.com/)", "respect": "At the discretion of Diffbot users.", @@ -300,4 +307,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} \ No newline at end of file +} From 70fd6c0fb13cdf4f0525bf061556e8e50ca7b8d9 Mon Sep 17 00:00:00 2001 From: Massimo Gismondi <24638827+MassiminoilTrace@users.noreply.github.com> Date: Mon, 20 Jan 2025 06:25:07 +0100 Subject: [PATCH 204/249] Add mention of htaccess in readme Co-authored-by: Glyn Normington --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 505a8dd..cd8d467 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/ ## Contributing -A note about contributing: updates should be added/made to `robots.json`. A GitHub action, courtesy of [Adam](https://github.com/newbold), will then generate the updated `robots.txt` and `table-of-bot-metrics.md`. +A note about contributing: updates should be added/made to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, and `.htaccess`. ## Subscribe to updates From 013b7abfa1f2126e9320ddbab90ff87af54b092c Mon Sep 17 00:00:00 2001 From: Massimo Gismondi <24638827+MassiminoilTrace@users.noreply.github.com> Date: Mon, 20 Jan 2025 06:27:02 +0100 Subject: [PATCH 205/249] Update README.md Co-authored-by: Glyn Normington --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index cd8d467..1417a85 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,9 @@ Many visitors will find these files from this repository most useful: `robots.txt` implements the Robots Exclusion Protocol ([RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)). -The second one tells your own webserver to return an error page when one of the listed AI crawlers tries to request a page from your website. A `.htaccess` file does not work on every webserver, but works correctly on most common and cheap shared hosting providers. The majority of AI crawlers set a "User Agent" string in every request they send, by which they are identifiable: this string is used to filter the request. Instead of simply hoping the crawler pledges to respect our intention, this solution actively sends back a bad webpage (an error or an empty page). Note that this solution isn't bulletproof either, as anyone can fake the sent User Agent. +### `.htaccess` + +`.htaccess` may be used to configure web servers such as [Apache httpd](https://httpd.apache.org/) to return an error page when one of the listed AI crawlers sends a request to the web server. Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/current/howto/htaccess.html), more performant methods than an `.htaccess` file exist. Nevertheless, most shared hosting providers only allow `.htaccess` configuration. From 52241bdca6c9930f7b225264cd862b5f98a2d68f Mon Sep 17 00:00:00 2001 From: Massimo Gismondi <24638827+MassiminoilTrace@users.noreply.github.com> Date: Mon, 20 Jan 2025 06:27:56 +0100 Subject: [PATCH 206/249] Update README.md Co-authored-by: Glyn Normington --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1417a85..bb6558c 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ Many visitors will find these files from this repository most useful: `.htaccess` may be used to configure web servers such as [Apache httpd](https://httpd.apache.org/) to return an error page when one of the listed AI crawlers sends a request to the web server. -Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/current/howto/htaccess.html), more performant methods than an `.htaccess` file exist. Nevertheless, most shared hosting providers only allow `.htaccess` configuration. +Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/current/howto/htaccess.html), more performant methods than an `.htaccess` file exist. ## Contributing From 33c38ee70b3a45343ddb360ae79e743e42bc8f76 Mon Sep 17 00:00:00 2001 From: Massimo Gismondi <24638827+MassiminoilTrace@users.noreply.github.com> Date: Mon, 20 Jan 2025 06:28:32 +0100 Subject: [PATCH 207/249] Update README.md Co-authored-by: Glyn Normington --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index bb6558c..648f5ed 100644 --- a/README.md +++ b/README.md @@ -10,10 +10,12 @@ If you'd like to add information about a crawler to the list, please make a pull ## Usage -Many visitors will find these files from this repository most useful: +This repository provides the following files: - `robots.txt` - `.htaccess` +### `robots.txt` + `robots.txt` implements the Robots Exclusion Protocol ([RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)). ### `.htaccess` From a9956f7825080467adbbda6e41d7dfbaee47210b Mon Sep 17 00:00:00 2001 From: Massimo Gismondi Date: Mon, 20 Jan 2025 06:50:48 +0100 Subject: [PATCH 208/249] Removed additional sections --- README.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/README.md b/README.md index 648f5ed..065b0b7 100644 --- a/README.md +++ b/README.md @@ -14,14 +14,9 @@ This repository provides the following files: - `robots.txt` - `.htaccess` -### `robots.txt` - `robots.txt` implements the Robots Exclusion Protocol ([RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)). -### `.htaccess` - `.htaccess` may be used to configure web servers such as [Apache httpd](https://httpd.apache.org/) to return an error page when one of the listed AI crawlers sends a request to the web server. - Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/current/howto/htaccess.html), more performant methods than an `.htaccess` file exist. From 4f03818280e7979697250ac5d59da12290db2e9f Mon Sep 17 00:00:00 2001 From: Massimo Gismondi Date: Mon, 20 Jan 2025 06:51:06 +0100 Subject: [PATCH 209/249] Removed if condition and added a little comments --- code/robots.py | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/code/robots.py b/code/robots.py index 0172330..087b00b 100644 --- a/code/robots.py +++ b/code/robots.py @@ -135,9 +135,10 @@ def json_to_table(robots_json): def json_to_htaccess(robot_json): # Creates a .htaccess filter file. It uses a regular expression to filter out # User agents that contain any of the blocked values. - htaccess += "RewriteEngine On\n" + htaccess = "RewriteEngine On\n" htaccess += "RewriteCond %{HTTP_USER_AGENT} ^.*(" + # Escape spaces in each User Agent to build the regular expression robots = map(lambda el: el.replace(" ", "\\ "), robot_json.keys()) htaccess += "|".join(robots) htaccess += ").*$ [NC]\n" @@ -149,10 +150,8 @@ def update_file_if_changed(file_name, converter): """Update files if newer content is available and log the (in)actions.""" new_content = converter(load_robots_json()) filepath = Path(file_name) - if not filepath.exists(): - filepath.write_text(new_content, encoding="utf-8") - print(f"{file_name} has been created.") - return + # "touch" will create the file if it doesn't exist yet + filepath.touch() old_content = filepath.read_text(encoding="utf-8") if old_content == new_content: print(f"{file_name} is already up to date.") From 7427d96bac08d59276292ca7a66d77365f7d26b9 Mon Sep 17 00:00:00 2001 From: Joshua Sheard Date: Mon, 20 Jan 2025 10:59:02 +0000 Subject: [PATCH 210/249] Update robots.json Co-authored-by: Glyn Normington --- robots.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/robots.json b/robots.json index d71c80b..465a61c 100644 --- a/robots.json +++ b/robots.json @@ -95,7 +95,7 @@ "respect": "[Yes](https://news.ycombinator.com/item?id=42756654)", "function": "Scrapes data", "frequency": "Unclear at this time.", - "description": "Provides crawling services for any purpose, but most likely to be used for AI model training." + "description": "Provides crawling services for any purpose, probably including AI model training." }, "Diffbot": { "operator": "[Diffbot](https://www.diffbot.com/)", From 6c552a3daa591f47a81936ebc41c822dc35b9fa2 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Mon, 20 Jan 2025 17:45:42 +0000 Subject: [PATCH 211/249] Merge pull request #71 from jsheard/patch-1 Add Crawlspace --- .htaccess | 2 +- robots.txt | 1 + table-of-bot-metrics.md | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/.htaccess b/.htaccess index 31ba5f7..beaddc3 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] +RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] RewriteRule .* - [F,L] \ No newline at end of file diff --git a/robots.txt b/robots.txt index 5c32c96..fd388fd 100644 --- a/robots.txt +++ b/robots.txt @@ -11,6 +11,7 @@ User-agent: Claude-Web User-agent: ClaudeBot User-agent: cohere-ai User-agent: cohere-training-data-crawler +User-agent: Crawlspace User-agent: Diffbot User-agent: DuckAssistBot User-agent: FacebookBot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 31c9367..f44c585 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -13,6 +13,7 @@ | ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | | cohere-training-data-crawler | Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products | Unclear at this time. | AI Data Scrapers | Unclear at this time. | cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products. More info can be found at https://darkvisitors.com/agents/agents/cohere-training-data-crawler | +| Crawlspace | [Crawlspace](https://crawlspace.dev) | [Yes](https://news.ycombinator.com/item?id=42756654) | Scrapes data | Unclear at this time. | Provides crawling services for any purpose, probably including AI model training. | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | From 9c060dee1c9cead8a3cb1092bdf8615cf33f3656 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Tue, 21 Jan 2025 00:49:22 +0000 Subject: [PATCH 212/249] Update from Dark Visitors --- robots.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/robots.json b/robots.json index 465a61c..4d7d582 100644 --- a/robots.json +++ b/robots.json @@ -307,4 +307,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} +} \ No newline at end of file From 05b79b8a5886983c818eaad107fcf6c7de5fad3a Mon Sep 17 00:00:00 2001 From: nisbet-hubbard <87453615+nisbet-hubbard@users.noreply.github.com> Date: Mon, 27 Jan 2025 19:41:03 +0800 Subject: [PATCH 213/249] Update robots.json --- robots.json | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/robots.json b/robots.json index 4d7d582..7f3cba3 100644 --- a/robots.json +++ b/robots.json @@ -265,12 +265,19 @@ "operator": "[Zyte](https://www.zyte.com)", "respect": "Unclear at this time." }, - "SemrushBot": { + "SemrushBot-OCOB": { "operator": "[Semrush](https://www.semrush.com/)", "respect": "[Yes](https://www.semrush.com/bot/)", - "function": "Scrapes data for use in LLM article-writing tool.", + "function": "Crawls your site for ContentShake AI tool.", "frequency": "Roughly once every 10 seconds.", - "description": "SemrushBot is a bot which, among other functions, scrapes data for use in ContentShake AI tool reports." + "description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)." + }, + "SemrushBot-SWA": { + "operator": "[Semrush](https://www.semrush.com/)", + "respect": "[Yes](https://www.semrush.com/bot/)", + "function": "Checks URLs on your site for SWA tool.", + "frequency": "Roughly once every 10 seconds.", + "description": "You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL)." }, "Sidetrade indexer bot": { "description": "AI product training.", @@ -307,4 +314,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} \ No newline at end of file +} From 89d4c6e5ca03f0aedec09b9191e2aece6f2efec3 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 1 Feb 2025 10:51:01 +0000 Subject: [PATCH 214/249] Merge pull request #73 from nisbet-hubbard/patch-8 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Actually block Semrush’s AI tools --- .htaccess | 2 +- robots.txt | 3 ++- table-of-bot-metrics.md | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/.htaccess b/.htaccess index beaddc3..97482e2 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] +RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] RewriteRule .* - [F,L] \ No newline at end of file diff --git a/robots.txt b/robots.txt index fd388fd..3839e55 100644 --- a/robots.txt +++ b/robots.txt @@ -36,7 +36,8 @@ User-agent: PanguBot User-agent: PerplexityBot User-agent: PetalBot User-agent: Scrapy -User-agent: SemrushBot +User-agent: SemrushBot-OCOB +User-agent: SemrushBot-SWA User-agent: Sidetrade indexer bot User-agent: Timpibot User-agent: VelenPublicWebCrawler diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index f44c585..b51bbae 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -38,7 +38,8 @@ | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| SemrushBot | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Scrapes data for use in LLM article-writing tool. | Roughly once every 10 seconds. | SemrushBot is a bot which, among other functions, scrapes data for use in ContentShake AI tool reports. | +| SemrushBot-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | +| SemrushBot-SWA | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Checks URLs on your site for SWA tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | | Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | | Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | | VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | From bebffccc0ced8c420276c93f3109c2e71cd5ca0c Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Sun, 2 Feb 2025 00:52:50 +0000 Subject: [PATCH 215/249] Update from Dark Visitors --- robots.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/robots.json b/robots.json index 7f3cba3..79762a0 100644 --- a/robots.json +++ b/robots.json @@ -314,4 +314,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} +} \ No newline at end of file From 261a2b83b90fe89f1d842066709c019fd1dba30f Mon Sep 17 00:00:00 2001 From: always-be-testing Date: Fri, 14 Feb 2025 12:26:19 -0500 Subject: [PATCH 216/249] update README to inclide list of ai bots Cloudflare considers verified --- README.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/README.md b/README.md index 065b0b7..6758570 100644 --- a/README.md +++ b/README.md @@ -40,6 +40,19 @@ Alternatively, you can also subscribe to new releases with your GitHub account b If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https://docs.google.com/forms/d/e/1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg/viewform). + +If you are unable to make use of [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) and/or have WAF rules that make use of [Cloudflare's Verified Bots](https://radar.cloudflare.com/traffic/verified-bots) conditions, please note that the following AI web crawlers are considered verified bots by Cloudflare: +- Amazonbot +- Applebot +- CCBot +- ChatGPT-User +- DuckAssistBot +- GoogleOther +- GPTBot +- OAI-SearchBot +- PerplexityBot +- PetalBot + ## Additional resources - [Blocking Bots with Nginx](https://rknight.me/blog/blocking-bots-with-nginx/) by Robb Knight From e396a2ec781095c5e2659eefb99c46ab7715a664 Mon Sep 17 00:00:00 2001 From: always-be-testing Date: Fri, 14 Feb 2025 12:31:20 -0500 Subject: [PATCH 217/249] forgot to include heading --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6758570..e70d283 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ Alternatively, you can also subscribe to new releases with your GitHub account b If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https://docs.google.com/forms/d/e/1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg/viewform). - +## Cloudflare Verified Bots If you are unable to make use of [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) and/or have WAF rules that make use of [Cloudflare's Verified Bots](https://radar.cloudflare.com/traffic/verified-bots) conditions, please note that the following AI web crawlers are considered verified bots by Cloudflare: - Amazonbot - Applebot From f99339922fa9afdbb00e18bb99105e81cd3f8e88 Mon Sep 17 00:00:00 2001 From: always-be-testing Date: Fri, 14 Feb 2025 12:36:33 -0500 Subject: [PATCH 218/249] grammar update and include syntax for verified bot condition --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e70d283..f471ede 100644 --- a/README.md +++ b/README.md @@ -41,7 +41,7 @@ Alternatively, you can also subscribe to new releases with your GitHub account b If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https://docs.google.com/forms/d/e/1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg/viewform). ## Cloudflare Verified Bots -If you are unable to make use of [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) and/or have WAF rules that make use of [Cloudflare's Verified Bots](https://radar.cloudflare.com/traffic/verified-bots) conditions, please note that the following AI web crawlers are considered verified bots by Cloudflare: +If you are unable to make use of [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) and/or have WAF rules that use the `cf.bot_management.verified_bot` condition based on [Cloudflare's Verified Bots](https://radar.cloudflare.com/traffic/verified-bots), please note that the following AI web crawlers are considered verified bots by Cloudflare: - Amazonbot - Applebot - CCBot From af87b85d7f00bc285cb414280e02d2f42284a9d8 Mon Sep 17 00:00:00 2001 From: always-be-testing Date: Fri, 14 Feb 2025 12:39:08 -0500 Subject: [PATCH 219/249] include return after heading --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index f471ede..303f009 100644 --- a/README.md +++ b/README.md @@ -41,6 +41,7 @@ Alternatively, you can also subscribe to new releases with your GitHub account b If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https://docs.google.com/forms/d/e/1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg/viewform). ## Cloudflare Verified Bots + If you are unable to make use of [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) and/or have WAF rules that use the `cf.bot_management.verified_bot` condition based on [Cloudflare's Verified Bots](https://radar.cloudflare.com/traffic/verified-bots), please note that the following AI web crawlers are considered verified bots by Cloudflare: - Amazonbot - Applebot From 5b13c2e504c843c2a95981cee1c2655d9f21c8f4 Mon Sep 17 00:00:00 2001 From: always-be-testing Date: Sat, 15 Feb 2025 11:22:10 -0500 Subject: [PATCH 220/249] add more concise message about verified bots Co-authored-by: Glyn Normington --- README.md | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/README.md b/README.md index 303f009..a206c83 100644 --- a/README.md +++ b/README.md @@ -39,21 +39,7 @@ Alternatively, you can also subscribe to new releases with your GitHub account b ## Report abusive crawlers If you use [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) alongside this list, you can report abusive crawlers that don't respect `robots.txt` [here](https://docs.google.com/forms/d/e/1FAIpQLScbUZ2vlNSdcsb8LyTeSF7uLzQI96s0BKGoJ6wQ6ocUFNOKEg/viewform). - -## Cloudflare Verified Bots - -If you are unable to make use of [Cloudflare's hard block](https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click) and/or have WAF rules that use the `cf.bot_management.verified_bot` condition based on [Cloudflare's Verified Bots](https://radar.cloudflare.com/traffic/verified-bots), please note that the following AI web crawlers are considered verified bots by Cloudflare: -- Amazonbot -- Applebot -- CCBot -- ChatGPT-User -- DuckAssistBot -- GoogleOther -- GPTBot -- OAI-SearchBot -- PerplexityBot -- PetalBot - +But even if you don't use Cloudflare's hard block, their list of [verified bots](https://radar.cloudflare.com/traffic/verified-bots) may come in handy. ## Additional resources - [Blocking Bots with Nginx](https://rknight.me/blog/blocking-bots-with-nginx/) by Robb Knight From a9ec4ffa6fd1816ee6c1c146fa75983abc0b2edc Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Sun, 16 Feb 2025 13:36:39 -0800 Subject: [PATCH 221/249] chore: add Brightbot 1.0 --- robots.json | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/robots.json b/robots.json index 79762a0..a634634 100644 --- a/robots.json +++ b/robots.json @@ -41,6 +41,13 @@ "frequency": "Unclear at this time.", "description": "Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools." }, + "Brightbot 1.0": { + "operator": "Browsing.ai", + "respect": "Unclear at this time.", + "function": "LLM/AI training.", + "frequency": "Unclear at this time.", + "description": "Scrapes data to train LLMs and AI products focused on website customer support." + }, "Bytespider": { "operator": "ByteDance", "respect": "No", @@ -314,4 +321,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} \ No newline at end of file +} From 693289bb29c42b7a526d8210d1f743ca3608690d Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sun, 16 Feb 2025 21:37:52 +0000 Subject: [PATCH 222/249] chore: add Brightbot 1.0 --- .htaccess | 2 +- robots.txt | 1 + table-of-bot-metrics.md | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/.htaccess b/.htaccess index 97482e2..512c274 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] +RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot\ 1.0|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] RewriteRule .* - [F,L] \ No newline at end of file diff --git a/robots.txt b/robots.txt index 3839e55..80c40e8 100644 --- a/robots.txt +++ b/robots.txt @@ -4,6 +4,7 @@ User-agent: Amazonbot User-agent: anthropic-ai User-agent: Applebot User-agent: Applebot-Extended +User-agent: Brightbot 1.0 User-agent: Bytespider User-agent: CCBot User-agent: ChatGPT-User diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index b51bbae..af32bf2 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -6,6 +6,7 @@ | anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | | Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Brightbot 1.0 | Browsing.ai | Unclear at this time. | LLM/AI training. | Unclear at this time. | Scrapes data to train LLMs and AI products focused on website customer support. | | Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | | CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | | ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | From abfd6dfcd15267ed03b5fda4cd3eac2512604ed2 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Mon, 17 Feb 2025 00:53:32 +0000 Subject: [PATCH 223/249] Update from Dark Visitors --- robots.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/robots.json b/robots.json index a634634..cdc7bb5 100644 --- a/robots.json +++ b/robots.json @@ -321,4 +321,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} +} \ No newline at end of file From c0d418cd875b432fd4558be57ad3c009326b631e Mon Sep 17 00:00:00 2001 From: Dennis Camera Date: Mon, 17 Feb 2025 21:00:57 +0100 Subject: [PATCH 224/249] .htaccess: Allow robots access to /robots.txt --- .htaccess | 2 +- code/robots.py | 2 +- code/test_files/.htaccess | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.htaccess b/.htaccess index 512c274..c42f99e 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot\ 1.0|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] -RewriteRule .* - [F,L] \ No newline at end of file +RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/code/robots.py b/code/robots.py index 087b00b..bb18e70 100644 --- a/code/robots.py +++ b/code/robots.py @@ -142,7 +142,7 @@ def json_to_htaccess(robot_json): robots = map(lambda el: el.replace(" ", "\\ "), robot_json.keys()) htaccess += "|".join(robots) htaccess += ").*$ [NC]\n" - htaccess += "RewriteRule .* - [F,L]" + htaccess += "RewriteRule !^/?robots\\.txt$ - [F,L]\n" return htaccess diff --git a/code/test_files/.htaccess b/code/test_files/.htaccess index a34bf55..2e78674 100644 --- a/code/test_files/.htaccess +++ b/code/test_files/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] -RewriteRule .* - [F,L] \ No newline at end of file +RewriteRule !^/?robots\.txt$ - [F,L] From a884a2afb9dbc7338b0faa24b3c10308adbc48e4 Mon Sep 17 00:00:00 2001 From: Dennis Camera Date: Mon, 17 Feb 2025 21:00:57 +0100 Subject: [PATCH 225/249] .htaccess: Make regex in RewriteCond safe Improve the regular expression by removing unneeded anchors and escaping special characters (not just space) to prevent false positives or a misbehaving rewrite rule. --- .htaccess | 2 +- code/robots.py | 19 ++++++++++--------- code/test_files/.htaccess | 2 +- 3 files changed, 12 insertions(+), 11 deletions(-) diff --git a/.htaccess b/.htaccess index c42f99e..2313293 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Brightbot\ 1.0|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|cohere-training-data-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot-OCOB|SemrushBot-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/code/robots.py b/code/robots.py index bb18e70..a8a674d 100644 --- a/code/robots.py +++ b/code/robots.py @@ -1,8 +1,9 @@ import json -from pathlib import Path - +import re import requests + from bs4 import BeautifulSoup +from pathlib import Path def load_robots_json(): @@ -99,7 +100,6 @@ def updated_robots_json(soup): def ingest_darkvisitors(): - old_robots_json = load_robots_json() soup = get_agent_soup() if soup: @@ -132,16 +132,17 @@ def json_to_table(robots_json): return table +def list_to_pcre(lst): + # Python re is not 100% identical to PCRE which is used by Apache, but it + # should probably be close enough in the real world for re.escape to work. + return f"({"|".join(map(re.escape, lst))})" + + def json_to_htaccess(robot_json): # Creates a .htaccess filter file. It uses a regular expression to filter out # User agents that contain any of the blocked values. htaccess = "RewriteEngine On\n" - htaccess += "RewriteCond %{HTTP_USER_AGENT} ^.*(" - - # Escape spaces in each User Agent to build the regular expression - robots = map(lambda el: el.replace(" ", "\\ "), robot_json.keys()) - htaccess += "|".join(robots) - htaccess += ").*$ [NC]\n" + htaccess += f"RewriteCond %{{HTTP_USER_AGENT}} {list_to_pcre(robot_json.keys())} [NC]\n" htaccess += "RewriteRule !^/?robots\\.txt$ - [F,L]\n" return htaccess diff --git a/code/test_files/.htaccess b/code/test_files/.htaccess index 2e78674..90ddcf2 100644 --- a/code/test_files/.htaccess +++ b/code/test_files/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} ^.*(AI2Bot|Ai2Bot-Dolma|Amazonbot|anthropic-ai|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|cohere-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|GPTBot|iaskspider/2.0|ICC-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot).*$ [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] From 0bd3fa63b832ffd8fa908675656c7007021f6654 Mon Sep 17 00:00:00 2001 From: Dennis Camera Date: Tue, 18 Feb 2025 10:12:04 +0100 Subject: [PATCH 226/249] table-of-bot-metrics.md: Escape robot names for Markdown table Some characters which could occur in a crawler's name have a special meaning in Markdown. They are escaped to prevent them from having unintended side effects. The escaping is only applied to the first (Name) column of the table. The rest of the columns is expected to already be Markdown encoded in robots.json. --- code/robots.py | 8 ++++++-- table-of-bot-metrics.md | 40 ++++++++++++++++++++-------------------- 2 files changed, 26 insertions(+), 22 deletions(-) diff --git a/code/robots.py b/code/robots.py index a8a674d..62fb061 100644 --- a/code/robots.py +++ b/code/robots.py @@ -121,13 +121,17 @@ def json_to_txt(robots_json): return robots_txt +def escape_md(s): + return re.sub(r"([]*\\|`(){}<>#+-.!_[])", r"\\\1", s) + + def json_to_table(robots_json): """Compose a markdown table with the information in robots.json""" table = "| Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description |\n" - table += "|-----|----------|-----------------------|----------|------------------|-------------|\n" + table += "|------|----------|-----------------------|----------|------------------|-------------|\n" for name, robot in robots_json.items(): - table += f'| {name} | {robot["operator"]} | {robot["respect"]} | {robot["function"]} | {robot["frequency"]} | {robot["description"]} |\n' + table += f'| {escape_md(name)} | {robot["operator"]} | {robot["respect"]} | {robot["function"]} | {robot["frequency"]} | {robot["description"]} |\n' return table diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index af32bf2..ce82047 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -1,48 +1,48 @@ | Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| +|------|----------|-----------------------|----------|------------------|-------------| | AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Ai2Bot\-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | | Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| anthropic\-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | -| Brightbot 1.0 | Browsing.ai | Unclear at this time. | LLM/AI training. | Unclear at this time. | Scrapes data to train LLMs and AI products focused on website customer support. | +| Applebot\-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Brightbot 1\.0 | Browsing.ai | Unclear at this time. | LLM/AI training. | Unclear at this time. | Scrapes data to train LLMs and AI products focused on website customer support. | | Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | | CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ChatGPT\-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude\-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | -| cohere-training-data-crawler | Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products | Unclear at this time. | AI Data Scrapers | Unclear at this time. | cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products. More info can be found at https://darkvisitors.com/agents/agents/cohere-training-data-crawler | +| cohere\-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| cohere\-training\-data\-crawler | Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products | Unclear at this time. | AI Data Scrapers | Unclear at this time. | cohere-training-data-crawler is a web crawler operated by Cohere to download training data for its LLMs (Large Language Models) that power its enterprise AI products. More info can be found at https://darkvisitors.com/agents/agents/cohere-training-data-crawler | | Crawlspace | [Crawlspace](https://crawlspace.dev) | [Yes](https://news.ycombinator.com/item?id=42756654) | Scrapes data | Unclear at this time. | Provides crawling services for any purpose, probably including AI model training. | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| Google\-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | | GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther\-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther\-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| iaskspider/2\.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | +| ICC\-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | | img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | | ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | | Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| Meta\-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta\-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI\-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | | PanguBot | the Chinese company Huawei | Unclear at this time. | AI Data Scrapers | Unclear at this time. | PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | -| SemrushBot-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | -| SemrushBot-SWA | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Checks URLs on your site for SWA tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | +| SemrushBot\-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | +| SemrushBot\-SWA | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Checks URLs on your site for SWA tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | | Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | | Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | | VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| Webzio\-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | | YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | From 17b826a6d3868cf87fb52adf95f52872ac5c4437 Mon Sep 17 00:00:00 2001 From: Dennis Camera Date: Tue, 18 Feb 2025 10:13:27 +0100 Subject: [PATCH 227/249] Update tests and convert to stock unittest For these simple tests Python's built-in unittest framework is more than enough. No additional dependencies are required. Added some more test cases with "special" characters to test the escaping code better. --- code/test_files/.htaccess | 2 +- code/test_files/robots.json | 44 ++++++++++++++++- code/test_files/robots.txt | 6 +++ code/test_files/table-of-bot-metrics.md | 38 +++++++++------ code/tests.py | 65 ++++++++++++++++++------- 5 files changed, 120 insertions(+), 35 deletions(-) mode change 100644 => 100755 code/tests.py diff --git a/code/test_files/.htaccess b/code/test_files/.htaccess index 90ddcf2..7e39092 100644 --- a/code/test_files/.htaccess +++ b/code/test_files/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot|crawler\.with\.dots|star\*\*\*crawler|Is\ this\ a\ crawler\?|a\[mazing\]\{42\}\(robot\)|2\^32\$|curl\|sudo\ bash) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/code/test_files/robots.json b/code/test_files/robots.json index c50d63c..b0cbfbb 100644 --- a/code/test_files/robots.json +++ b/code/test_files/robots.json @@ -278,5 +278,47 @@ "function": "Scrapes data for search engine and LLMs.", "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." + }, + "crawler.with.dots": { + "operator": "Test suite", + "respect": "No", + "function": "To ensure the code works correctly.", + "frequency": "No information.", + "description": "When used in the .htaccess regular expression dots need to be escaped." + }, + "star***crawler": { + "operator": "Test suite", + "respect": "No", + "function": "To ensure the code works correctly.", + "frequency": "No information.", + "description": "When used in the .htaccess regular expression stars need to be escaped." + }, + "Is this a crawler?": { + "operator": "Test suite", + "respect": "No", + "function": "To ensure the code works correctly.", + "frequency": "No information.", + "description": "When used in the .htaccess regular expression spaces and question marks need to be escaped." + }, + "a[mazing]{42}(robot)": { + "operator": "Test suite", + "respect": "No", + "function": "To ensure the code works correctly.", + "frequency": "No information.", + "description": "When used in the .htaccess regular expression parantheses, braces, etc. need to be escaped." + }, + "2^32$": { + "operator": "Test suite", + "respect": "No", + "function": "To ensure the code works correctly.", + "frequency": "No information.", + "description": "When used in the .htaccess regular expression RE anchor characters need to be escaped." + }, + "curl|sudo bash": { + "operator": "Test suite", + "respect": "No", + "function": "To ensure the code works correctly.", + "frequency": "No information.", + "description": "When used in the .htaccess regular expression pipes need to be escaped." } -} \ No newline at end of file +} diff --git a/code/test_files/robots.txt b/code/test_files/robots.txt index 927f6f4..03c3c25 100644 --- a/code/test_files/robots.txt +++ b/code/test_files/robots.txt @@ -38,4 +38,10 @@ User-agent: Timpibot User-agent: VelenPublicWebCrawler User-agent: Webzio-Extended User-agent: YouBot +User-agent: crawler.with.dots +User-agent: star***crawler +User-agent: Is this a crawler? +User-agent: a[mazing]{42}(robot) +User-agent: 2^32$ +User-agent: curl|sudo bash Disallow: / diff --git a/code/test_files/table-of-bot-metrics.md b/code/test_files/table-of-bot-metrics.md index 257ba99..88af6c0 100644 --- a/code/test_files/table-of-bot-metrics.md +++ b/code/test_files/table-of-bot-metrics.md @@ -1,35 +1,35 @@ | Name | Operator | Respects `robots.txt` | Data use | Visit regularity | Description | -|-----|----------|-----------------------|----------|------------------|-------------| +|------|----------|-----------------------|----------|------------------|-------------| | AI2Bot | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | -| Ai2Bot-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | +| Ai2Bot\-Dolma | [Ai2](https://allenai.org/crawler) | Yes | Content is used to train open language models. | No information provided. | Explores 'certain domains' to find web content. | | Amazonbot | Amazon | Yes | Service improvement and enabling answers for Alexa users. | No information provided. | Includes references to crawled website when surfacing answers via Alexa; does not clearly outline other uses. | -| anthropic-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| anthropic\-ai | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | Applebot | Unclear at this time. | Unclear at this time. | AI Search Crawlers | Unclear at this time. | Applebot is a web crawler used by Apple to index search results that allow the Siri AI Assistant to answer user questions. Siri's answers normally contain references to the website. More info can be found at https://darkvisitors.com/agents/agents/applebot | -| Applebot-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | +| Applebot\-Extended | [Apple](https://support.apple.com/en-us/119829#datausage) | Yes | Powers features in Siri, Spotlight, Safari, Apple Intelligence, and others. | Unclear at this time. | Apple has a secondary user agent, Applebot-Extended ... [that is] used to train Apple's foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools. | | Bytespider | ByteDance | No | LLM training. | Unclear at this time. | Downloads data to train LLMS, including ChatGPT competitors. | | CCBot | [Common Crawl Foundation](https://commoncrawl.org) | [Yes](https://commoncrawl.org/ccbot) | Provides open crawl dataset, used for many purposes, including Machine Learning/AI. | Monthly at present. | Web archive going back to 2008. [Cited in thousands of research papers per year](https://commoncrawl.org/research-papers). | -| ChatGPT-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | -| Claude-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | +| ChatGPT\-User | [OpenAI](https://openai.com) | Yes | Takes action based on user prompts. | Only when prompted by a user. | Used by plugins in ChatGPT to answer queries based on user input. | +| Claude\-Web | [Anthropic](https://www.anthropic.com) | Unclear at this time. | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | | ClaudeBot | [Anthropic](https://www.anthropic.com) | [Yes](https://support.anthropic.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) | Scrapes data to train Anthropic's AI products. | No information provided. | Scrapes data to train LLMs and AI products offered by Anthropic. | -| cohere-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | +| cohere\-ai | [Cohere](https://cohere.com) | Unclear at this time. | Retrieves data to provide responses to user-initiated prompts. | Takes action based on user prompts. | Retrieves data based on user prompts. | | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. | | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. | | facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. | | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. | -| Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | +| Google\-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. | | GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | -| GoogleOther-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther\-Image | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | +| GoogleOther\-Video | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." | | GPTBot | [OpenAI](https://openai.com) | Yes | Scrapes data to train OpenAI's products. | No information. | Data is used to train current and future models, removed paywalled data, PII and data that violates the company's policies. | -| iaskspider/2.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | -| ICC-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | +| iaskspider/2\.0 | iAsk | No | Crawls sites to provide answers to user queries. | Unclear at this time. | Used to provide answers to user queries. | +| ICC\-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | | img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | | ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | | Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | -| Meta-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | -| Meta-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | -| OAI-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | +| Meta\-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | +| Meta\-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| OAI\-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | @@ -38,5 +38,11 @@ | Sidetrade indexer bot | [Sidetrade](https://www.sidetrade.com) | Unclear at this time. | Extracts data for a variety of uses including training AI. | No information. | AI product training. | | Timpibot | [Timpi](https://timpi.io) | Unclear at this time. | Scrapes data for use in training LLMs. | No information. | Makes data available for training AI models. | | VelenPublicWebCrawler | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information. | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." | -| Webzio-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | +| Webzio\-Extended | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Webzio-Extended is a web crawler used by Webz.io to maintain a repository of web crawl data that it sells to other companies, including those using it to train AI models. More info can be found at https://darkvisitors.com/agents/agents/webzio-extended | | YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information. | Retrieves data used for You.com web search engine and LLMs. | +| crawler\.with\.dots | Test suite | No | To ensure the code works correctly. | No information. | When used in the .htaccess regular expression dots need to be escaped. | +| star\*\*\*crawler | Test suite | No | To ensure the code works correctly. | No information. | When used in the .htaccess regular expression stars need to be escaped. | +| Is this a crawler? | Test suite | No | To ensure the code works correctly. | No information. | When used in the .htaccess regular expression spaces and question marks need to be escaped. | +| a\[mazing\]\{42\}\(robot\) | Test suite | No | To ensure the code works correctly. | No information. | When used in the .htaccess regular expression parantheses, braces, etc. need to be escaped. | +| 2^32$ | Test suite | No | To ensure the code works correctly. | No information. | When used in the .htaccess regular expression RE anchor characters need to be escaped. | +| curl\|sudo bash | Test suite | No | To ensure the code works correctly. | No information. | When used in the .htaccess regular expression pipes need to be escaped. | diff --git a/code/tests.py b/code/tests.py old mode 100644 new mode 100755 index 6f778c3..94cbb47 --- a/code/tests.py +++ b/code/tests.py @@ -1,27 +1,58 @@ -"""These tests can be run with pytest. -This requires pytest: pip install pytest -cd to the `code` directory and run `pytest` -""" +#!/usr/bin/env python3 +"""To run these tests just execute this script.""" import json -from pathlib import Path +import unittest from robots import json_to_txt, json_to_table, json_to_htaccess +class RobotsUnittestExtensions: + def loadJson(self, pathname): + with open(pathname, "rt") as f: + return json.load(f) -def test_robots_txt_creation(): - robots_json = json.loads(Path("test_files/robots.json").read_text()) - robots_txt = json_to_txt(robots_json) - assert Path("test_files/robots.txt").read_text() == robots_txt + def assertEqualsFile(self, f, s): + with open(f, "rt") as f: + f_contents = f.read() + + return self.assertMultiLineEqual(f_contents, s) -def test_table_of_bot_metrices_md(): - robots_json = json.loads(Path("test_files/robots.json").read_text()) - robots_table = json_to_table(robots_json) - assert Path("test_files/table-of-bot-metrics.md").read_text() == robots_table +class TestRobotsTXTGeneration(unittest.TestCase, RobotsUnittestExtensions): + maxDiff = 8192 + + def setUp(self): + self.robots_dict = self.loadJson("test_files/robots.json") + + def test_robots_txt_generation(self): + robots_txt = json_to_txt(self.robots_dict) + self.assertEqualsFile("test_files/robots.txt", robots_txt) -def test_htaccess_creation(): - robots_json = json.loads(Path("test_files/robots.json").read_text()) - robots_htaccess = json_to_htaccess(robots_json) - assert Path("test_files/.htaccess").read_text() == robots_htaccess +class TestTableMetricsGeneration(unittest.TestCase, RobotsUnittestExtensions): + maxDiff = 32768 + + def setUp(self): + self.robots_dict = self.loadJson("test_files/robots.json") + + def test_table_generation(self): + robots_table = json_to_table(self.robots_dict) + self.assertEqualsFile("test_files/table-of-bot-metrics.md", robots_table) + + +class TestHtaccessGeneration(unittest.TestCase, RobotsUnittestExtensions): + maxDiff = 8192 + + def setUp(self): + self.robots_dict = self.loadJson("test_files/robots.json") + + def test_htaccess_generation(self): + robots_htaccess = json_to_htaccess(self.robots_dict) + self.assertEqualsFile("test_files/.htaccess", robots_htaccess) + + +if __name__ == "__main__": + import os + os.chdir(os.path.dirname(__file__)) + + unittest.main(verbosity=2) From c7c1e7b96fe74f90590f4d375c1bab4be53a4044 Mon Sep 17 00:00:00 2001 From: Dennis Camera Date: Tue, 18 Feb 2025 10:15:10 +0100 Subject: [PATCH 228/249] robots.py: Make executable --- code/robots.py | 2 ++ 1 file changed, 2 insertions(+) mode change 100644 => 100755 code/robots.py diff --git a/code/robots.py b/code/robots.py old mode 100644 new mode 100755 index 62fb061..6bf7920 --- a/code/robots.py +++ b/code/robots.py @@ -1,3 +1,5 @@ +#!/usr/bin/env python3 + import json import re import requests From 1d55a205e4c8447829abdd34098ef9b0fedefee1 Mon Sep 17 00:00:00 2001 From: Glyn Normington Date: Tue, 18 Feb 2025 05:08:28 +0000 Subject: [PATCH 229/249] Document testing in README Fixes: https://github.com/ai-robots-txt/ai.robots.txt/issues/81 --- README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.md b/README.md index a206c83..30a85da 100644 --- a/README.md +++ b/README.md @@ -24,6 +24,11 @@ Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/ A note about contributing: updates should be added/made to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, and `.htaccess`. +You can run the tests by [installing](https://www.python.org/about/gettingstarted/) Python 3 and issuing: +```console +code/tests.py +``` + ## Subscribe to updates You can subscribe to list updates via RSS/Atom with the releases feed: From 8a7489633326465fd7e83fecece6740440d38eb6 Mon Sep 17 00:00:00 2001 From: Dennis Camera Date: Tue, 18 Feb 2025 10:23:40 +0100 Subject: [PATCH 230/249] Add workflow to run tests on pull request or push to main --- .github/workflows/run-tests.yml | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 .github/workflows/run-tests.yml diff --git a/.github/workflows/run-tests.yml b/.github/workflows/run-tests.yml new file mode 100644 index 0000000..c98861f --- /dev/null +++ b/.github/workflows/run-tests.yml @@ -0,0 +1,21 @@ +on: + pull_request: + branches: + - main + push: + branches: + - main +jobs: + run-tests: + runs-on: ubuntu-latest + steps: + - name: Check out repository + uses: actions/checkout@v4 + with: + fetch-depth: 2 + - name: Install dependencies + run: | + pip install -U requests beautifulsoup4 + - name: Run tests + run: | + code/tests.py From 6ecfcdfcbfd1bd36da1982b7a4f9f95cbeb8101a Mon Sep 17 00:00:00 2001 From: deyigifts Date: Mon, 24 Mar 2025 14:16:57 +0800 Subject: [PATCH 231/249] Update perplexity bot Update based on perplexity bot docs --- robots.json | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/robots.json b/robots.json index cdc7bb5..eaac816 100644 --- a/robots.json +++ b/robots.json @@ -253,10 +253,17 @@ }, "PerplexityBot": { "operator": "[Perplexity](https://www.perplexity.ai/)", - "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", + "respect": "[Yes](https://docs.perplexity.ai/guides/bots)", + "function": "Search result generation.", + "frequency": "No information.", + "description": "Crawls sites to surface as results in Perplexity." + }, + "Perplexity‑User": { + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://docs.perplexity.ai/guides/bots)", "function": "Used to answer queries at the request of users.", - "frequency": "Takes action based on user prompts.", - "description": "Operated by Perplexity to obtain results in response to user queries." + "frequency": "Only when prompted by a user.", + "description": "Visit web pages to help provide an accurate answer and include links to the page in Perplexity response." }, "PetalBot": { "description": "Operated by Huawei to provide search and AI assistant services.", @@ -321,4 +328,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} \ No newline at end of file +} From da85207314724c02d151a7bdfcdca3ef3fd056a1 Mon Sep 17 00:00:00 2001 From: Thomas Leister Date: Thu, 27 Mar 2025 12:27:09 +0100 Subject: [PATCH 232/249] Implement new function "json_to_nginx" which outputs an Nginx configuration snippet --- code/robots.py | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/code/robots.py b/code/robots.py index 6bf7920..f58f2b8 100755 --- a/code/robots.py +++ b/code/robots.py @@ -152,6 +152,12 @@ def json_to_htaccess(robot_json): htaccess += "RewriteRule !^/?robots\\.txt$ - [F,L]\n" return htaccess +def json_to_nginx(robot_json): + # Creates an Nginx config file. This config snippet can be included in + # nginx server{} blocks to block AI bots. + config = f"if ($http_user_agent ~* \"{list_to_pcre(robot_json.keys())}\") {{\n return 403;\n}}" + return config + def update_file_if_changed(file_name, converter): """Update files if newer content is available and log the (in)actions.""" @@ -178,6 +184,10 @@ def conversions(): file_name="./.htaccess", converter=json_to_htaccess, ) + update_file_if_changed( + file_name="./nginx-block-ai-bots.conf", + converter=json_to_nginx, + ) if __name__ == "__main__": From 5a312c5f4d1fcd89c17f4d6cb360ad7230857402 Mon Sep 17 00:00:00 2001 From: Thomas Leister Date: Thu, 27 Mar 2025 12:28:11 +0100 Subject: [PATCH 233/249] Mention Nginx config feature in README --- README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 30a85da..b984672 100644 --- a/README.md +++ b/README.md @@ -13,16 +13,19 @@ If you'd like to add information about a crawler to the list, please make a pull This repository provides the following files: - `robots.txt` - `.htaccess` +- `nginx-block-ai-bots.conf` `robots.txt` implements the Robots Exclusion Protocol ([RFC 9309](https://www.rfc-editor.org/rfc/rfc9309.html)). `.htaccess` may be used to configure web servers such as [Apache httpd](https://httpd.apache.org/) to return an error page when one of the listed AI crawlers sends a request to the web server. Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/current/howto/htaccess.html), more performant methods than an `.htaccess` file exist. +`nginx-block-ai-bots.conf` implements a Nginx configuration snippet that can be included in any virtual host `server {}` block via the `include` directive. + ## Contributing -A note about contributing: updates should be added/made to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, and `.htaccess`. +A note about contributing: updates should be added/made to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, `.htaccess` and `nginx-block-ai-bots.conf`. You can run the tests by [installing](https://www.python.org/about/gettingstarted/) Python 3 and issuing: ```console From 4f3f4cd0dd0f421c2787b1336d37b8da06998882 Mon Sep 17 00:00:00 2001 From: Thomas Leister Date: Thu, 27 Mar 2025 12:28:50 +0100 Subject: [PATCH 234/249] Add assembled version of nginx-block-ai-bots.conf file --- nginx-block-ai-bots.conf | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 nginx-block-ai-bots.conf diff --git a/nginx-block-ai-bots.conf b/nginx-block-ai-bots.conf new file mode 100644 index 0000000..ce30520 --- /dev/null +++ b/nginx-block-ai-bots.conf @@ -0,0 +1,3 @@ +if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { + return 403; +} \ No newline at end of file From 7c3b5a2cb21f5404cf4e2af1acf8689ba77d7b06 Mon Sep 17 00:00:00 2001 From: Thomas Leister Date: Thu, 27 Mar 2025 16:12:18 +0100 Subject: [PATCH 235/249] Add tests for Nginx config generator --- code/test_files/nginx-block-ai-bots.conf | 3 +++ code/tests.py | 12 +++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-) create mode 100644 code/test_files/nginx-block-ai-bots.conf diff --git a/code/test_files/nginx-block-ai-bots.conf b/code/test_files/nginx-block-ai-bots.conf new file mode 100644 index 0000000..d1b559e --- /dev/null +++ b/code/test_files/nginx-block-ai-bots.conf @@ -0,0 +1,3 @@ +if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot|crawler\.with\.dots|star\*\*\*crawler|Is\ this\ a\ crawler\?|a\[mazing\]\{42\}\(robot\)|2\^32\$|curl\|sudo\ bash)") { + return 403; +} \ No newline at end of file diff --git a/code/tests.py b/code/tests.py index 94cbb47..61d69b4 100755 --- a/code/tests.py +++ b/code/tests.py @@ -4,7 +4,7 @@ import json import unittest -from robots import json_to_txt, json_to_table, json_to_htaccess +from robots import json_to_txt, json_to_table, json_to_htaccess, json_to_nginx class RobotsUnittestExtensions: def loadJson(self, pathname): @@ -50,6 +50,16 @@ class TestHtaccessGeneration(unittest.TestCase, RobotsUnittestExtensions): robots_htaccess = json_to_htaccess(self.robots_dict) self.assertEqualsFile("test_files/.htaccess", robots_htaccess) +class TestNginxConfigGeneration(unittest.TestCase, RobotsUnittestExtensions): + maxDiff = 8192 + + def setUp(self): + self.robots_dict = self.loadJson("test_files/robots.json") + + def test_nginx_generation(self): + robots_nginx = json_to_nginx(self.robots_dict) + self.assertEqualsFile("test_files/nginx-block-ai-bots.conf", robots_nginx) + if __name__ == "__main__": import os From 68d1d93714bbe4931811f301c7030ca979d95b39 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 27 Mar 2025 19:29:30 +0000 Subject: [PATCH 236/249] Merge pull request #91 from deyigifts/perplexity-user Update perplexity bots --- .htaccess | 2 +- robots.txt | 1 + table-of-bot-metrics.md | 3 ++- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/.htaccess b/.htaccess index 2313293..2f5d0e4 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|Perplexity‑User|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/robots.txt b/robots.txt index 80c40e8..8c79fc2 100644 --- a/robots.txt +++ b/robots.txt @@ -35,6 +35,7 @@ User-agent: omgili User-agent: omgilibot User-agent: PanguBot User-agent: PerplexityBot +User-agent: Perplexity‑User User-agent: PetalBot User-agent: Scrapy User-agent: SemrushBot-OCOB diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index ce82047..0cc2264 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -36,7 +36,8 @@ | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | | PanguBot | the Chinese company Huawei | Unclear at this time. | AI Data Scrapers | Unclear at this time. | PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/guides/bots) | Search result generation. | No information. | Crawls sites to surface as results in Perplexity. | +| Perplexity‑User | [Perplexity](https://www.perplexity.ai/) | [No](https://docs.perplexity.ai/guides/bots) | Used to answer queries at the request of users. | Only when prompted by a user. | Visit web pages to help provide an accurate answer and include links to the page in Perplexity response. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | | SemrushBot\-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | From 6851413c52b91b9729bbbfd75f84af364b490bde Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 27 Mar 2025 19:49:15 +0000 Subject: [PATCH 237/249] Merge pull request #94 from ThomasLeister/feature/implement-nginx-configuration-snippet-export Implement Nginx configuration snippet export --- nginx-block-ai-bots.conf | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/nginx-block-ai-bots.conf b/nginx-block-ai-bots.conf index ce30520..72d65ec 100644 --- a/nginx-block-ai-bots.conf +++ b/nginx-block-ai-bots.conf @@ -1,3 +1,3 @@ -if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { +if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|Perplexity‑User|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { return 403; } \ No newline at end of file From ec18af76242c1b62bbbfc7e1df72098b423402a6 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Thu, 27 Mar 2025 12:51:22 -0700 Subject: [PATCH 238/249] Revert "Merge pull request #91 from deyigifts/perplexity-user" This reverts commit 68d1d93714bbe4931811f301c7030ca979d95b39. --- .htaccess | 2 +- robots.txt | 1 - table-of-bot-metrics.md | 3 +-- 3 files changed, 2 insertions(+), 4 deletions(-) diff --git a/.htaccess b/.htaccess index 2f5d0e4..2313293 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|Perplexity‑User|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/robots.txt b/robots.txt index 8c79fc2..80c40e8 100644 --- a/robots.txt +++ b/robots.txt @@ -35,7 +35,6 @@ User-agent: omgili User-agent: omgilibot User-agent: PanguBot User-agent: PerplexityBot -User-agent: Perplexity‑User User-agent: PetalBot User-agent: Scrapy User-agent: SemrushBot-OCOB diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 0cc2264..ce82047 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -36,8 +36,7 @@ | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | | PanguBot | the Chinese company Huawei | Unclear at this time. | AI Data Scrapers | Unclear at this time. | PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/guides/bots) | Search result generation. | No information. | Crawls sites to surface as results in Perplexity. | -| Perplexity‑User | [Perplexity](https://www.perplexity.ai/) | [No](https://docs.perplexity.ai/guides/bots) | Used to answer queries at the request of users. | Only when prompted by a user. | Visit web pages to help provide an accurate answer and include links to the page in Perplexity response. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | | SemrushBot\-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | From c249de99a317b54e8891f1682dbf514e7763986e Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Fri, 28 Mar 2025 00:54:28 +0000 Subject: [PATCH 239/249] Update from Dark Visitors --- robots.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/robots.json b/robots.json index eaac816..e907c8b 100644 --- a/robots.json +++ b/robots.json @@ -258,7 +258,7 @@ "frequency": "No information.", "description": "Crawls sites to surface as results in Perplexity." }, - "Perplexity‑User": { + "Perplexity\u2011User": { "operator": "[Perplexity](https://www.perplexity.ai/)", "respect": "[No](https://docs.perplexity.ai/guides/bots)", "function": "Used to answer queries at the request of users.", @@ -328,4 +328,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} +} \ No newline at end of file From 5b8650b99b35ff2aa1aa9ae26183b312edc48d45 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Sat, 29 Mar 2025 00:54:10 +0000 Subject: [PATCH 240/249] Update from Dark Visitors --- .htaccess | 2 +- robots.txt | 1 + table-of-bot-metrics.md | 3 ++- 3 files changed, 4 insertions(+), 2 deletions(-) diff --git a/.htaccess b/.htaccess index 2313293..2f5d0e4 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|Perplexity‑User|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/robots.txt b/robots.txt index 80c40e8..8c79fc2 100644 --- a/robots.txt +++ b/robots.txt @@ -35,6 +35,7 @@ User-agent: omgili User-agent: omgilibot User-agent: PanguBot User-agent: PerplexityBot +User-agent: Perplexity‑User User-agent: PetalBot User-agent: Scrapy User-agent: SemrushBot-OCOB diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index ce82047..0cc2264 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -36,7 +36,8 @@ | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | | PanguBot | the Chinese company Huawei | Unclear at this time. | AI Data Scrapers | Unclear at this time. | PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot | -| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | +| PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/guides/bots) | Search result generation. | No information. | Crawls sites to surface as results in Perplexity. | +| Perplexity‑User | [Perplexity](https://www.perplexity.ai/) | [No](https://docs.perplexity.ai/guides/bots) | Used to answer queries at the request of users. | Only when prompted by a user. | Visit web pages to help provide an accurate answer and include links to the page in Perplexity response. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | | SemrushBot\-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | From 6b0349f37ddf69ef9ec0e09a884b351f4a0e4b43 Mon Sep 17 00:00:00 2001 From: Frederic Barthelemy Date: Fri, 4 Apr 2025 15:20:30 -0700 Subject: [PATCH 241/249] fix python complaining about f-string syntax ``` python code/tests.py Traceback (most recent call last): File "/Users/fbarthelemy/Code/ai.robots.txt/code/tests.py", line 7, in from robots import json_to_txt, json_to_table, json_to_htaccess, json_to_nginx File "/Users/fbarthelemy/Code/ai.robots.txt/code/robots.py", line 144 return f"({"|".join(map(re.escape, lst))})" ^ SyntaxError: f-string: expecting '}' ``` --- code/robots.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/code/robots.py b/code/robots.py index f58f2b8..90c0e8c 100755 --- a/code/robots.py +++ b/code/robots.py @@ -141,7 +141,8 @@ def json_to_table(robots_json): def list_to_pcre(lst): # Python re is not 100% identical to PCRE which is used by Apache, but it # should probably be close enough in the real world for re.escape to work. - return f"({"|".join(map(re.escape, lst))})" + formatted = "|".join(map(re.escape, lst)) + return f"({formatted})" def json_to_htaccess(robot_json): From 5f5a89c38c27b676c3212f6ea3895d31f315f37e Mon Sep 17 00:00:00 2001 From: Frederic Barthelemy Date: Fri, 4 Apr 2025 17:34:14 -0700 Subject: [PATCH 242/249] Fix html-mangled hyphen in Perplexity-Users Fixes: #99 --- .htaccess | 2 +- code/robots.py | 15 +++++++++++++++ code/test_files/.htaccess | 2 +- code/test_files/nginx-block-ai-bots.conf | 2 +- code/test_files/robots.json | 7 +++++++ code/test_files/robots.txt | 1 + code/test_files/table-of-bot-metrics.md | 1 + code/tests.py | 5 +++++ nginx-block-ai-bots.conf | 2 +- robots.json | 14 +++++++------- robots.txt | 2 +- table-of-bot-metrics.md | 2 +- 12 files changed, 42 insertions(+), 13 deletions(-) diff --git a/.htaccess b/.htaccess index 2f5d0e4..27a7e11 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|Perplexity‑User|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/code/robots.py b/code/robots.py index 90c0e8c..d158b36 100755 --- a/code/robots.py +++ b/code/robots.py @@ -50,6 +50,7 @@ def updated_robots_json(soup): continue for agent in section.find_all("a", href=True): name = agent.find("div", {"class": "agent-name"}).get_text().strip() + name = clean_robot_name(name) desc = agent.find("p").get_text().strip() default_values = { @@ -101,6 +102,20 @@ def updated_robots_json(soup): return sorted_robots +def clean_robot_name(name): + """ Clean the robot name by removing some characters that were mangled by html software once. """ + # This was specifically spotted in "Perplexity-User" + # Looks like a non-breaking hyphen introduced by the HTML rendering software + # Reading the source page for Perplexity: https://docs.perplexity.ai/guides/bots + # You can see the bot is listed several times as "Perplexity‑User" with a normal hyphen, + # and it's only the Row-Heading that has the special hyphen + # + # Technically, there's no reason there wouldn't someday be a bot that + # actually uses a non-breaking hyphen, but that seems unlikely, + # so this solution should be fine for now. + return re.sub(r"\u2011", "-", name) + + def ingest_darkvisitors(): old_robots_json = load_robots_json() soup = get_agent_soup() diff --git a/code/test_files/.htaccess b/code/test_files/.htaccess index 7e39092..f0d6783 100644 --- a/code/test_files/.htaccess +++ b/code/test_files/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot|crawler\.with\.dots|star\*\*\*crawler|Is\ this\ a\ crawler\?|a\[mazing\]\{42\}\(robot\)|2\^32\$|curl\|sudo\ bash) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot|crawler\.with\.dots|star\*\*\*crawler|Is\ this\ a\ crawler\?|a\[mazing\]\{42\}\(robot\)|2\^32\$|curl\|sudo\ bash) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/code/test_files/nginx-block-ai-bots.conf b/code/test_files/nginx-block-ai-bots.conf index d1b559e..c569b15 100644 --- a/code/test_files/nginx-block-ai-bots.conf +++ b/code/test_files/nginx-block-ai-bots.conf @@ -1,3 +1,3 @@ -if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot|crawler\.with\.dots|star\*\*\*crawler|Is\ this\ a\ crawler\?|a\[mazing\]\{42\}\(robot\)|2\^32\$|curl\|sudo\ bash)") { +if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot|crawler\.with\.dots|star\*\*\*crawler|Is\ this\ a\ crawler\?|a\[mazing\]\{42\}\(robot\)|2\^32\$|curl\|sudo\ bash)") { return 403; } \ No newline at end of file diff --git a/code/test_files/robots.json b/code/test_files/robots.json index b0cbfbb..385f284 100644 --- a/code/test_files/robots.json +++ b/code/test_files/robots.json @@ -223,6 +223,13 @@ "operator": "[Webz.io](https://webz.io/)", "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" }, + "Perplexity-User": { + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://docs.perplexity.ai/guides/bots)", + "function": "Used to answer queries at the request of users.", + "frequency": "Only when prompted by a user.", + "description": "Visit web pages to help provide an accurate answer and include links to the page in Perplexity response." + }, "PerplexityBot": { "operator": "[Perplexity](https://www.perplexity.ai/)", "respect": "[No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/)", diff --git a/code/test_files/robots.txt b/code/test_files/robots.txt index 03c3c25..ee201f8 100644 --- a/code/test_files/robots.txt +++ b/code/test_files/robots.txt @@ -30,6 +30,7 @@ User-agent: Meta-ExternalFetcher User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot +User-agent: Perplexity-User User-agent: PerplexityBot User-agent: PetalBot User-agent: Scrapy diff --git a/code/test_files/table-of-bot-metrics.md b/code/test_files/table-of-bot-metrics.md index 88af6c0..9b280aa 100644 --- a/code/test_files/table-of-bot-metrics.md +++ b/code/test_files/table-of-bot-metrics.md @@ -32,6 +32,7 @@ | OAI\-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| Perplexity\-User | [Perplexity](https://www.perplexity.ai/) | [No](https://docs.perplexity.ai/guides/bots) | Used to answer queries at the request of users. | Only when prompted by a user. | Visit web pages to help provide an accurate answer and include links to the page in Perplexity response. | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | diff --git a/code/tests.py b/code/tests.py index 61d69b4..f58b445 100755 --- a/code/tests.py +++ b/code/tests.py @@ -60,6 +60,11 @@ class TestNginxConfigGeneration(unittest.TestCase, RobotsUnittestExtensions): robots_nginx = json_to_nginx(self.robots_dict) self.assertEqualsFile("test_files/nginx-block-ai-bots.conf", robots_nginx) +class TestRobotsNameCleaning(unittest.TestCase): + def test_clean_name(self): + from robots import clean_robot_name + + self.assertEqual(clean_robot_name("Perplexity‑User"), "Perplexity-User") if __name__ == "__main__": import os diff --git a/nginx-block-ai-bots.conf b/nginx-block-ai-bots.conf index 72d65ec..0577bd9 100644 --- a/nginx-block-ai-bots.conf +++ b/nginx-block-ai-bots.conf @@ -1,3 +1,3 @@ -if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|PerplexityBot|Perplexity‑User|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { +if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { return 403; } \ No newline at end of file diff --git a/robots.json b/robots.json index e907c8b..8fd7572 100644 --- a/robots.json +++ b/robots.json @@ -251,6 +251,13 @@ "frequency": "Unclear at this time.", "description": "PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot" }, + "Perplexity-User": { + "operator": "[Perplexity](https://www.perplexity.ai/)", + "respect": "[No](https://docs.perplexity.ai/guides/bots)", + "function": "Used to answer queries at the request of users.", + "frequency": "Only when prompted by a user.", + "description": "Visit web pages to help provide an accurate answer and include links to the page in Perplexity response." + }, "PerplexityBot": { "operator": "[Perplexity](https://www.perplexity.ai/)", "respect": "[Yes](https://docs.perplexity.ai/guides/bots)", @@ -258,13 +265,6 @@ "frequency": "No information.", "description": "Crawls sites to surface as results in Perplexity." }, - "Perplexity\u2011User": { - "operator": "[Perplexity](https://www.perplexity.ai/)", - "respect": "[No](https://docs.perplexity.ai/guides/bots)", - "function": "Used to answer queries at the request of users.", - "frequency": "Only when prompted by a user.", - "description": "Visit web pages to help provide an accurate answer and include links to the page in Perplexity response." - }, "PetalBot": { "description": "Operated by Huawei to provide search and AI assistant services.", "frequency": "No explicit frequency provided.", diff --git a/robots.txt b/robots.txt index 8c79fc2..c531918 100644 --- a/robots.txt +++ b/robots.txt @@ -34,8 +34,8 @@ User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot User-agent: PanguBot +User-agent: Perplexity-User User-agent: PerplexityBot -User-agent: Perplexity‑User User-agent: PetalBot User-agent: Scrapy User-agent: SemrushBot-OCOB diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index 0cc2264..d92df34 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -36,8 +36,8 @@ | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | | PanguBot | the Chinese company Huawei | Unclear at this time. | AI Data Scrapers | Unclear at this time. | PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot | +| Perplexity\-User | [Perplexity](https://www.perplexity.ai/) | [No](https://docs.perplexity.ai/guides/bots) | Used to answer queries at the request of users. | Only when prompted by a user. | Visit web pages to help provide an accurate answer and include links to the page in Perplexity response. | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/guides/bots) | Search result generation. | No information. | Crawls sites to surface as results in Perplexity. | -| Perplexity‑User | [Perplexity](https://www.perplexity.ai/) | [No](https://docs.perplexity.ai/guides/bots) | Used to answer queries at the request of users. | Only when prompted by a user. | Visit web pages to help provide an accurate answer and include links to the page in Perplexity response. | | PetalBot | [Huawei](https://huawei.com/) | Yes | Used to provide recommendations in Hauwei assistant and AI search services. | No explicit frequency provided. | Operated by Huawei to provide search and AI assistant services. | | Scrapy | [Zyte](https://www.zyte.com) | Unclear at this time. | Scrapes data for a variety of uses including training AI. | No information. | "AI and machine learning applications often need large amounts of quality data, and web data extraction is a fast, efficient way to build structured data sets." | | SemrushBot\-OCOB | [Semrush](https://www.semrush.com/) | [Yes](https://www.semrush.com/bot/) | Crawls your site for ContentShake AI tool. | Roughly once every 10 seconds. | You enter one text (on-demand) and we will make suggestions on it (the tool uses AI but we are not actively crawling the web, you need to manually enter one text/URL). | From c6f308cbd0a00166f5085fa4adc98630c767e11e Mon Sep 17 00:00:00 2001 From: Frederic Barthelemy Date: Sat, 5 Apr 2025 09:01:52 -0700 Subject: [PATCH 243/249] PR Feedback: log special-case, comment consistency --- code/robots.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/code/robots.py b/code/robots.py index d158b36..86ea413 100755 --- a/code/robots.py +++ b/code/robots.py @@ -107,13 +107,16 @@ def clean_robot_name(name): # This was specifically spotted in "Perplexity-User" # Looks like a non-breaking hyphen introduced by the HTML rendering software # Reading the source page for Perplexity: https://docs.perplexity.ai/guides/bots - # You can see the bot is listed several times as "Perplexity‑User" with a normal hyphen, + # You can see the bot is listed several times as "Perplexity-User" with a normal hyphen, # and it's only the Row-Heading that has the special hyphen # # Technically, there's no reason there wouldn't someday be a bot that # actually uses a non-breaking hyphen, but that seems unlikely, # so this solution should be fine for now. - return re.sub(r"\u2011", "-", name) + result = re.sub(r"\u2011", "-", name) + if result != name: + print(f"\tCleaned '{name}' to '{result}' - unicode/html mangled chars normalized.") + return result def ingest_darkvisitors(): From b65f45e408461560a32f44f05860f80655737467 Mon Sep 17 00:00:00 2001 From: Cory Dransfeldt Date: Thu, 10 Apr 2025 10:12:51 -0700 Subject: [PATCH 244/249] chore(robots.json): adds imgproxy crawler --- robots.json | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/robots.json b/robots.json index 8fd7572..4c9f7d7 100644 --- a/robots.json +++ b/robots.json @@ -195,6 +195,13 @@ "operator": "[img2dataset](https://github.com/rom1504/img2dataset)", "respect": "Unclear at this time." }, + "imgproxy": { + "frequency": "No information.", + "function": "Not documented or explained on operator's site.", + "operator": "[imgproxy](https://imgproxy.net)", + "respect": "Unclear at this time.", + "description": "AI-powered image processing." + }, "ISSCyberRiskCrawler": { "description": "Used to train machine learning based models to quantify cyber risk.", "frequency": "No information.", @@ -328,4 +335,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} \ No newline at end of file +} From 4a764bba18f10167cb5f7107c8721e5dc208100f Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Thu, 10 Apr 2025 19:22:34 +0000 Subject: [PATCH 245/249] Merge pull request #102 from ai-robots-txt/imgproxy-bot chore(robots.json): adds imgproxy crawler --- .htaccess | 2 +- nginx-block-ai-bots.conf | 2 +- robots.txt | 1 + table-of-bot-metrics.md | 1 + 4 files changed, 4 insertions(+), 2 deletions(-) diff --git a/.htaccess b/.htaccess index 27a7e11..c0e5fbb 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/nginx-block-ai-bots.conf b/nginx-block-ai-bots.conf index 0577bd9..a6bbfa2 100644 --- a/nginx-block-ai-bots.conf +++ b/nginx-block-ai-bots.conf @@ -1,3 +1,3 @@ -if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { +if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { return 403; } \ No newline at end of file diff --git a/robots.txt b/robots.txt index c531918..de25a56 100644 --- a/robots.txt +++ b/robots.txt @@ -26,6 +26,7 @@ User-agent: iaskspider/2.0 User-agent: ICC-Crawler User-agent: ImagesiftBot User-agent: img2dataset +User-agent: imgproxy User-agent: ISSCyberRiskCrawler User-agent: Kangaroo Bot User-agent: Meta-ExternalAgent diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index d92df34..b3e51fe 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -28,6 +28,7 @@ | ICC\-Crawler | [NICT](https://nict.go.jp) | Yes | Scrapes data to train and support AI technologies. | No information. | Use the collected data for artificial intelligence technologies; provide data to third parties, including commercial companies; those companies can use the data for their own business. | | ImagesiftBot | [ImageSift](https://imagesift.com) | [Yes](https://imagesift.com/about) | ImageSiftBot is a web crawler that scrapes the internet for publicly available images to support our suite of web intelligence products | No information. | Once images and text are downloaded from a webpage, ImageSift analyzes this data from the page and stores the information in an index. Our web intelligence products use this index to enable search and retrieval of similar images. | | img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | Unclear at this time. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. | +| imgproxy | [imgproxy](https://imgproxy.net) | Unclear at this time. | Not documented or explained on operator's site. | No information. | AI-powered image processing. | | ISSCyberRiskCrawler | [ISS-Corporate](https://iss-cyber.com) | No | Scrapes data to train machine learning models. | No information. | Used to train machine learning based models to quantify cyber risk. | | Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | | Meta\-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | From 305188b2e78855d4e7193f29a3e7205f96fa86f6 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Fri, 11 Apr 2025 00:55:52 +0000 Subject: [PATCH 246/249] Update from Dark Visitors --- robots.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/robots.json b/robots.json index 4c9f7d7..eff38ac 100644 --- a/robots.json +++ b/robots.json @@ -335,4 +335,4 @@ "frequency": "No information.", "description": "Retrieves data used for You.com web search engine and LLMs." } -} +} \ No newline at end of file From d9f882a9b21170754c4b37ff1bbc237171876684 Mon Sep 17 00:00:00 2001 From: Joshua Sheard Date: Mon, 14 Apr 2025 15:46:01 +0100 Subject: [PATCH 247/249] Include "AI Agents" from Dark Visitors --- code/robots.py | 1 + 1 file changed, 1 insertion(+) diff --git a/code/robots.py b/code/robots.py index 86ea413..8a06b55 100755 --- a/code/robots.py +++ b/code/robots.py @@ -30,6 +30,7 @@ def updated_robots_json(soup): """Update AI scraper information with data from darkvisitors.""" existing_content = load_robots_json() to_include = [ + "AI Agents", "AI Assistants", "AI Data Scrapers", "AI Search Crawlers", From a96e33098975edf1c05c8d9684b36b9fa31f7ef2 Mon Sep 17 00:00:00 2001 From: dark-visitors Date: Tue, 15 Apr 2025 00:57:01 +0000 Subject: [PATCH 248/249] Update from Dark Visitors --- robots.json | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/robots.json b/robots.json index eff38ac..8bba6b2 100644 --- a/robots.json +++ b/robots.json @@ -230,6 +230,13 @@ "frequency": "Unclear at this time.", "description": "Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher" }, + "NovaAct": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Agents", + "frequency": "Unclear at this time.", + "description": "Nova Act is an AI agent created by Amazon that can use a web browser. It can intelligently navigate and interact with websites to complete multi-step tasks on behalf of a human user. More info can be found at https://darkvisitors.com/agents/agents/novaact" + }, "OAI-SearchBot": { "operator": "[OpenAI](https://openai.com)", "respect": "[Yes](https://platform.openai.com/docs/bots)", @@ -251,6 +258,13 @@ "operator": "[Webz.io](https://webz.io/)", "respect": "[Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html)" }, + "Operator": { + "operator": "Unclear at this time.", + "respect": "Unclear at this time.", + "function": "AI Agents", + "frequency": "Unclear at this time.", + "description": "Operator is an AI agent created by OpenAI that can use a web browser. It can intelligently navigate and interact with websites to complete multi-step tasks on behalf of a human user. More info can be found at https://darkvisitors.com/agents/agents/operator" + }, "PanguBot": { "operator": "the Chinese company Huawei", "respect": "Unclear at this time.", From e0cdb278fbd243f554579fe5050850f124b286a8 Mon Sep 17 00:00:00 2001 From: "ai.robots.txt" Date: Wed, 16 Apr 2025 00:57:11 +0000 Subject: [PATCH 249/249] Update from Dark Visitors --- .htaccess | 2 +- nginx-block-ai-bots.conf | 2 +- robots.txt | 2 ++ table-of-bot-metrics.md | 2 ++ 4 files changed, 6 insertions(+), 2 deletions(-) diff --git a/.htaccess b/.htaccess index c0e5fbb..d10e796 100644 --- a/.htaccess +++ b/.htaccess @@ -1,3 +1,3 @@ RewriteEngine On -RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] +RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot) [NC] RewriteRule !^/?robots\.txt$ - [F,L] diff --git a/nginx-block-ai-bots.conf b/nginx-block-ai-bots.conf index a6bbfa2..c37cef5 100644 --- a/nginx-block-ai-bots.conf +++ b/nginx-block-ai-bots.conf @@ -1,3 +1,3 @@ -if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { +if ($http_user_agent ~* "(AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Crawlspace|Diffbot|DuckAssistBot|FacebookBot|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|imgproxy|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot)") { return 403; } \ No newline at end of file diff --git a/robots.txt b/robots.txt index de25a56..1e3aa80 100644 --- a/robots.txt +++ b/robots.txt @@ -31,9 +31,11 @@ User-agent: ISSCyberRiskCrawler User-agent: Kangaroo Bot User-agent: Meta-ExternalAgent User-agent: Meta-ExternalFetcher +User-agent: NovaAct User-agent: OAI-SearchBot User-agent: omgili User-agent: omgilibot +User-agent: Operator User-agent: PanguBot User-agent: Perplexity-User User-agent: PerplexityBot diff --git a/table-of-bot-metrics.md b/table-of-bot-metrics.md index b3e51fe..4c87b41 100644 --- a/table-of-bot-metrics.md +++ b/table-of-bot-metrics.md @@ -33,9 +33,11 @@ | Kangaroo Bot | Unclear at this time. | Unclear at this time. | AI Data Scrapers | Unclear at this time. | Kangaroo Bot is used by the company Kangaroo LLM to download data to train AI models tailored to Australian language and culture. More info can be found at https://darkvisitors.com/agents/agents/kangaroo-bot | | Meta\-ExternalAgent | [Meta](https://developers.facebook.com/docs/sharing/webmasters/web-crawlers) | Yes. | Used to train models and improve products. | No information. | "The Meta-ExternalAgent crawler crawls the web for use cases such as training AI models or improving products by indexing content directly." | | Meta\-ExternalFetcher | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | Meta-ExternalFetcher is dispatched by Meta AI products in response to user prompts, when they need to fetch an individual links. More info can be found at https://darkvisitors.com/agents/agents/meta-externalfetcher | +| NovaAct | Unclear at this time. | Unclear at this time. | AI Agents | Unclear at this time. | Nova Act is an AI agent created by Amazon that can use a web browser. It can intelligently navigate and interact with websites to complete multi-step tasks on behalf of a human user. More info can be found at https://darkvisitors.com/agents/agents/novaact | | OAI\-SearchBot | [OpenAI](https://openai.com) | [Yes](https://platform.openai.com/docs/bots) | Search result generation. | No information. | Crawls sites to surface as results in SearchGPT. | | omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information. | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. | | omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information. | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. | +| Operator | Unclear at this time. | Unclear at this time. | AI Agents | Unclear at this time. | Operator is an AI agent created by OpenAI that can use a web browser. It can intelligently navigate and interact with websites to complete multi-step tasks on behalf of a human user. More info can be found at https://darkvisitors.com/agents/agents/operator | | PanguBot | the Chinese company Huawei | Unclear at this time. | AI Data Scrapers | Unclear at this time. | PanguBot is a web crawler operated by the Chinese company Huawei. It's used to download training data for its multimodal LLM (Large Language Model) called PanGu. More info can be found at https://darkvisitors.com/agents/agents/pangubot | | Perplexity\-User | [Perplexity](https://www.perplexity.ai/) | [No](https://docs.perplexity.ai/guides/bots) | Used to answer queries at the request of users. | Only when prompted by a user. | Visit web pages to help provide an accurate answer and include links to the page in Perplexity response. | | PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [Yes](https://docs.perplexity.ai/guides/bots) | Search result generation. | No information. | Crawls sites to surface as results in Perplexity. |