chore: additional resources

This commit is contained in:
Cory Dransfeldt 2024-04-01 09:55:46 -07:00 committed by GitHub
parent 5e02ebc168
commit d7064d23fe
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -6,6 +6,26 @@ This is an open list of web crawlers associated with AI companies and the traini
A number of these crawlers have been sourced from [Dark Visitors](https://darkvisitors.com) and we appreciate the ongoing effort they put in to track these crawlers.
---
## Additional resources
**Spawning.ai**
[Create an ai.txt](https://spawning.ai/ai-txt#create): an additional avenue to block crawlers. Example file:
```text
# Spawning AI
# Prevent datasets from using the following file types
User-Agent: *
Disallow: /
Disallow: *
```
**[Have I Been Trained?](https://haveibeentrained.com/)**
Search datasets for your content and request its removal.
---
Thank you to [Glyn](https://github.com/glyn) for pushing [me](https://coryd.dev) to set this up after [I posted about blocking these crawlers](https://coryd.dev/posts/2024/go-ahead-and-block-ai-web-crawlers/).