diff --git a/src/posts/2024/go-ahead-and-block-ai-web-crawlers.md b/src/posts/2024/go-ahead-and-block-ai-web-crawlers.md index 728e794f..601f519d 100644 --- a/src/posts/2024/go-ahead-and-block-ai-web-crawlers.md +++ b/src/posts/2024/go-ahead-and-block-ai-web-crawlers.md @@ -39,71 +39,27 @@ User-agent: * Disallow: User-agent: AdsBot-Google -Disallow: / - User-agent: Amazonbot -Disallow: / - User-agent: anthropic-ai -Disallow: / - User-agent: AwarioRssBot -Disallow: / - User-agent: AwarioSmartBot -Disallow: / - User-agent: Bytespider -Disallow: / - User-agent: CCBot -Disallow: / - User-agent: ChatGPT-User -Disallow: / - User-agent: ClaudeBot -Disallow: / - User-agent: Claude-Web -Disallow: / - User-agent: cohere-ai -Disallow: / - User-agent: DataForSeoBot -Disallow: / - User-agent: FacebookBot -Disallow: / - User-agent: Google-Extended -Disallow: / - User-agent: GPTBot -Disallow: / - User-agent: ImagesiftBot -Disallow: / - User-agent: magpie-crawler -Disallow: / - User-agent: omgili -Disallow: / - User-agent: omgilibot -Disallow: / - User-agent: peer39_crawler -Disallow: / - User-agent: peer39_crawler/1.0 -Disallow: / - User-agent: PerplexityBot -Disallow: / - User-agent: YouBot Disallow: / ``` @@ -112,4 +68,6 @@ Disallow: / - [I’m blocking AI-crawlers](https://roelant.net/en/2023/im-blocking-ai-crawlers/) - [Block the Bots that Feed “AI” Models by Scraping Your Website](https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/) +**Update March 27, 2024:** Many thanks to Jens for pointing out that the `User-agent` rules can be safely combined preceding a `Disallow` statement. + [^1]: I've yet to definitively identify Arc Search's user agent but I'd like to, so I can block it and share it — but that assumes they respect `robots.txt` declarations. \ No newline at end of file