A list of AI agents and robots to block.
Find a file
Urvish Patel 0106d4b15a
Add additional resource - README.md
A detailed blogpost to - See the live dashboard showing the websites that are blocking AI Bots such as GPTBot, CCBot, Google-extended and ByteSpider from crawling and scraping the content on their website. Learn which AI crawlers / scrapers do what and how to block them using Robots.txt.
2024-09-23 08:19:27 -04:00
.github/workflows Update main.yml 2024-09-03 07:48:48 +08:00
assets/images chore: remove unused image 2024-06-22 12:44:02 -07:00
code chore: sort output 2024-08-14 17:10:29 -07:00
.gitignore chore: add gitignore; move logo file 2024-04-10 10:44:07 -07:00
FAQ.md Add instructions for AI bot blocking on Vercel 2024-09-07 15:59:25 -07:00
LICENSE Initial commit 2024-03-27 10:48:29 -07:00
README.md Add additional resource - README.md 2024-09-23 08:19:27 -04:00
robots.json Daily update from Dark Visitors 2024-09-08 01:19:31 +00:00
robots.txt Daily update from Dark Visitors 2024-09-09 01:16:21 +00:00
table-of-bot-metrics.md Daily update from Dark Visitors 2024-09-09 01:16:21 +00:00

ai.robots.txt

This is an open list of web crawlers associated with AI companies and the training of LLMs to block. We encourage you to contribute to and implement this list on your own site. See information about the listed crawlers and the FAQ.

A number of these crawlers have been sourced from Dark Visitors and we appreciate the ongoing effort they put in to track these crawlers.

If you'd like to add information about a crawler to the list, please make a pull request with the bot name added to robots.txt, ai.txt, and any relevant details in table-of-bot-metrics.md to help people understand what's crawling.

Contributing

A note about contributing: updates should be added/made to robots.json. A GitHub action, courtesy of Adam, will then generate the updated robots.txt and table-of-bot-metrics.md.

Subscribe to updates

You can subscribe to list updates via RSS/Atom with the releases feed:

https://github.com/ai-robots-txt/ai.robots.txt/releases.atom

You can subscribe with Feedly, Inoreader, The Old Reader, Feedbin, or any other reader app.

Report abusive crawlers

If you use Cloudflare's hard block alongside this list, you can report abusive crawlers that don't respect robots.txt here.

Additional resources