chore: update bots table

This commit is contained in:
Cory Dransfeldt 2024-07-10 19:47:23 -07:00
parent 0ca6bce87e
commit 570fd36ea2
No known key found for this signature in database

View file

@ -18,7 +18,5 @@
| img2dataset | [img2dataset](https://github.com/rom1504/img2dataset) | At the discretion of img2dataset users. | Scrapes images for use in LLMs. | At the discretion of img2dataset users. | Downloads large sets of images into datasets for LLM training or other purposes. |
|omgili | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. |
|omgilibot | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. |
|peer39_crawler| [Peer39](https://www.peer39.com/) | [Yes](https://www.peer39.com/crawler-notice) | Targeted advertising. | No information | Web crawler used to "enhance the visibility of your site to advertisers who value and seek out such quality content." |
|peer39_crawler| [Peer39](https://www.peer39.com/) | [Yes](https://www.peer39.com/crawler-notice) | Targeted advertising. | No information | Web crawler used to "enhance the visibility of your site to advertisers who value and seek out such quality content." |
|PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts. | Operated by Perplexity to obtain results in response to user queries. |
|YouBot | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information | Retrieves data used for You.com web search engine and LLMs. |