Updated from new robots.json

2025-05-20 01:03:11 +00:00 · 2024-11-19 16:46:21 +00:00 · 2024-11-19 16:46:21 +00:00 · 58985737e7
commit 58985737e7
parent 584e66cb99
2 changed files with 0 additions and 2 deletions
--- a/robots.txt
+++ b/robots.txt
@ -13,7 +13,6 @@ User-agent: cohere-ai
 User-agent: Diffbot
 User-agent: DuckAssistBot
 User-agent: FacebookBot
-User-agent: facebookexternalhit
 User-agent: FriendlyCrawler
 User-agent: Google-Extended
 User-agent: GoogleOther
--- a/table-of-bot-metrics.md
+++ b/table-of-bot-metrics.md
@ -15,7 +15,6 @@
 | Diffbot | [Diffbot](https://www.diffbot.com/) | At the discretion of Diffbot users. | Aggregates structured web data for monitoring and AI model training. | Unclear at this time. | Diffbot is an application used to parse web pages into structured data; this data is used for monitoring or AI model training. |
 | DuckAssistBot | Unclear at this time. | Unclear at this time. | AI Assistants | Unclear at this time. | DuckAssistBot is used by DuckDuckGo's DuckAssist feature to fetch content and generate realtime AI answers to user searches. More info can be found at https://darkvisitors.com/agents/agents/duckassistbot |
 | FacebookBot | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | Training language models | Up to 1 page per second | Officially used for training Meta "speech recognition technology," unknown if used to train Meta AI specifically. |
-| facebookexternalhit | Meta/Facebook | [Yes](https://developers.facebook.com/docs/sharing/bot/) | No information. | Unclear at this time. | Unclear at this time. |
 | FriendlyCrawler | Unknown | [Yes](https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler) | We are using the data from the crawler to build datasets for machine learning experiments. | Unclear at this time. | Unclear who the operator is; but data is used for training/machine learning. |
 | Google-Extended | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | LLM training. | No information. | Used to train Gemini and Vertex AI generative APIs. Does not impact a site's inclusion or ranking in Google Search. |
 | GoogleOther | Google | [Yes](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers) | Scrapes data. | No information. | "Used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development." |