chore: add VelenPublicWebCrawler

2025-04-05 19:37:45 +00:00 · 2024-07-29 12:12:42 -07:00 · 2024-07-29 12:12:42 -07:00 · d49e860b74
commit d49e860b74
parent 6e323554c6
2 changed files with 3 additions and 1 deletions
--- a/robots.txt
+++ b/robots.txt
@ -22,5 +22,6 @@ User-agent: OAI-SearchBot
 User-agent: omgili
 User-agent: omgilibot
 User-agent: PerplexityBot
+User-agent: VelenPublicWebCrawler
 User-agent: YouBot
 Disallow: /
--- a/table-of-bot-metrics.md
+++ b/table-of-bot-metrics.md
@ -22,4 +22,5 @@
 |omgili        | [Webz.io](https://webz.io/) | [Yes](https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/) | Data is sold. | No information | Crawls sites for APIs used by Hootsuite, Sprinklr, NetBase, and other companies. Data also sold for research purposes or LLM training. |
 |omgilibot     | [Webz.io](https://webz.io/) | [Yes](https://web.archive.org/web/20170704003301/http://omgili.com/Crawler.html) | Data is sold. | No information | Legacy user agent initially used for Omgili search engine. Unknown if still used, `omgili` agent still used by Webz.io. |
 |PerplexityBot | [Perplexity](https://www.perplexity.ai/) | [No](https://www.macstories.net/stories/wired-confirms-perplexity-is-bypassing-efforts-by-websites-to-block-its-web-crawler/) | Used to answer queries at the request of users. | Takes action based on user prompts.  | Operated by Perplexity to obtain results in response to user queries. |
+|VelenPublicWebCrawler        | [Velen Crawler](https://velen.io) | [Yes](https://velen.io) | Scrapes data for business data sets and machine learning models. | No information | "Our goal with this crawler is to build business datasets and machine learning models to better understand the web." |
 |YouBot        | [You](https://about.you.com/youchat/) | [Yes](https://about.you.com/youbot/) | Scrapes data for search engine and LLMs. | No information | Retrieves data used for You.com web search engine and LLMs. |