mirror of
https://github.com/ai-robots-txt/ai.robots.txt.git
synced 2025-05-17 16:03:10 +00:00
Compare commits
2 commits
678380727e
...
9a9b1b41c0
Author | SHA1 | Date | |
---|---|---|---|
![]() |
9a9b1b41c0 | ||
36a52a88d8 |
2 changed files with 38 additions and 0 deletions
|
@ -35,6 +35,8 @@ Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/
|
||||||
```
|
```
|
||||||
(Note that the path of the `haproxy-block-ai-bots.txt` may be different in your environment.)
|
(Note that the path of the `haproxy-block-ai-bots.txt` may be different in your environment.)
|
||||||
|
|
||||||
|
[Bing uses the data it crawls for AI and training, you may opt out by adding a `meta` tag to the `head` of your site.]((./docs/additional-steps/bing.md))
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
A note about contributing: updates should be added/made to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, `.htaccess` and `nginx-block-ai-bots.conf`.
|
A note about contributing: updates should be added/made to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, `.htaccess` and `nginx-block-ai-bots.conf`.
|
||||||
|
|
36
docs/additional-steps/bing.md
Normal file
36
docs/additional-steps/bing.md
Normal file
|
@ -0,0 +1,36 @@
|
||||||
|
# Bing (bingbot)
|
||||||
|
|
||||||
|
It's not well publicised, but Bing uses the data it crawls for AI and training.
|
||||||
|
|
||||||
|
However, the current thinking is, blocking a search engine of this size using `robots.txt` seems a quite drastic approach as it is second only to Google and could significantly impact your website in search results.
|
||||||
|
|
||||||
|
Additionally, Bing powers a number of search engines such as Yahoo and AOL, and its search results are also used in Duck Duck Go, amongst others.
|
||||||
|
|
||||||
|
Fortunately, Bing supports a relatively simple opt-out method, requiring an additional step.
|
||||||
|
|
||||||
|
## How to opt-out of AI training
|
||||||
|
|
||||||
|
You must add a metatag in the `<head>` of your webpage. This also needs to be added to every page on your website.
|
||||||
|
|
||||||
|
The line you need to add is:
|
||||||
|
|
||||||
|
```plaintext
|
||||||
|
<meta name="robots" content="noarchive">
|
||||||
|
```
|
||||||
|
|
||||||
|
By adding this line, you are signifying to Bing: "Do not use the content for training Microsoft's generative AI foundation models."
|
||||||
|
|
||||||
|
## Will my site be negatively affected
|
||||||
|
|
||||||
|
Simple answer, no.
|
||||||
|
The original use of "noarchive" has been retired by all search engines. Google retired its use in 2024.
|
||||||
|
|
||||||
|
The use of this metatag will not impact your site in search engines or in any other meaningful way if you add it to your page(s).
|
||||||
|
|
||||||
|
It is now solely used by a handful of crawlers, such as Bingbot and Amazonbot, to signify to them not to use your data for AI/training.
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
Bing Blog AI opt-out announcement: https://blogs.bing.com/webmaster/september-2023/Announcing-new-options-for-webmasters-to-control-usage-of-their-content-in-Bing-Chat
|
||||||
|
|
||||||
|
Bing metatag information, including AI opt-out: https://www.bing.com/webmasters/help/which-robots-metatags-does-bing-support-5198d240
|
Loading…
Add table
Add a link
Reference in a new issue