Commit graph

298 commits

Author SHA1 Message Date
Glyn Normington
24666e8b15
Merge pull request #58 from fabianegli/fabianegli-restore-attribution
Restore attribution
2024-11-29 09:05:16 +00:00
fabianegli
eb8e1a49b5 Revert "specify file encodings in tests"
This reverts commit bd38c30194.
2024-11-29 09:02:47 +01:00
fabianegli
b64284d684 restore correct attribution logic to before PR #55 2024-11-26 09:41:46 +01:00
fabianegli
bd38c30194 specify file encodings in tests 2024-11-26 09:12:11 +01:00
dark-visitors
609ddca392 Updated from new robots.json 2024-11-24 00:57:06 +00:00
dark-visitors
37065f9118 Update from Dark Visitors 2024-11-24 00:57:05 +00:00
dark-visitors
58985737e7 Updated from new robots.json 2024-11-19 16:46:21 +00:00
584e66cb99
Merge pull request #56 from glyn/40-exclude-facebookexternalhit
Allow facebookexternalhit
2024-11-19 08:46:05 -08:00
Glyn Normington
80002f5e17 Allow facebookexternalhit
At the time of writing, this crawler does not
appear to be for the purpose of AI.

See: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
(accessed on 19 November 2024).

Fixes https://github.com/ai-robots-txt/ai.robots.txt/issues/40
2024-11-19 03:33:45 +00:00
Glyn Normington
71db599b41
Merge pull request #55 from norwd/feature/add-robots.txt-file-to-release
Create workflow to upload `robots.txt` file as release artefact
2024-11-13 01:39:11 +00:00
Y. Meyer-Norwood
e8f0784a00
Explicitly use release tag for checkout 2024-11-13 10:26:37 +13:00
Y. Meyer-Norwood
94ceb3cffd
Add authentication for gh command 2024-11-11 13:04:55 +13:00
Y. Meyer-Norwood
adfd4af872
Create upload-robots-txt-file-to-release.yml 2024-11-11 12:58:40 +13:00
Glyn Normington
d50615d394 Improve formatting
This clarifies the scope of the tip is Apache httpd.
2024-11-10 01:06:13 +00:00
Glyn Normington
2c88909be3 Fix formatting 2024-11-10 01:02:18 +00:00
Glyn Normington
6f58ddc623
Merge pull request #54 from glyn/rationale
Clarify our rationale
2024-11-10 00:58:29 +00:00
Glyn Normington
9295b6a963 Clarify our rationale
I deleted the point about excessive load on
crawled sites as any other crawler could potentially
be guilty of this and I wouldn't want our scope to
creep to all crawlers.

Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/53#issuecomment-2466042550
2024-11-09 04:45:47 +00:00
dark-visitors
9e06cf3bc9 Updated from new robots.json 2024-10-29 00:52:12 +00:00
dark-visitors
bc0a0ad0e9 Update from Dark Visitors 2024-10-29 00:52:12 +00:00
dark-visitors
fe5f407673 Update from Dark Visitors 2024-10-27 00:54:47 +00:00
Adam Newbold
a66b16827d
Merge pull request #51 from fabianegli/php-to-python-plus-tests
PHP to Python plus tests and stuff
2024-10-22 21:32:58 -04:00
fabianegli
3ab22bc498 make conversions and updates separately triggerable 2024-10-19 19:56:41 +02:00
fabianegli
6ab8fb2d37 no more failure when run without network 2024-10-19 19:11:01 +02:00
fabianegli
7e2b3ab037 rename action 2024-10-19 19:09:34 +02:00
fabianegli
0c05461f84 simplify repo and added some tests 2024-10-19 13:06:34 +02:00
fabianegli
6bb598820e ignore venv 2024-10-19 11:56:00 +02:00
Glyn Normington
d62cab66c5
Merge pull request #50 from glyn/fix-typo
Fix typo and trigger rerun of main job
2024-10-19 04:43:09 +01:00
ai.robots.txt
6a359e7fd7 Fix typo and trigger rerun of main job 2024-10-19 03:43:00 +00:00
Glyn Normington
38a388097c Fix typo and trigger rerun of main job 2024-10-19 04:42:27 +01:00
Glyn Normington
83c8603071
Merge pull request #49 from glyn/php-diagnostics
PHP diagnostics
2024-10-19 04:34:53 +01:00
ai.robots.txt
a80bd18fb8 Dump out file contents in PHP script 2024-10-19 03:34:29 +00:00
Glyn Normington
bdf30be7dc Dump out file contents in PHP script 2024-10-19 04:33:46 +01:00
Glyn Normington
4d47b17c45
Merge pull request #47 from fabianegli/fabianegli-patch-1
log the diff in the update actions
2024-10-19 02:58:05 +01:00
dark-visitors
faf81efb12 Daily update from Dark Visitors 2024-10-19 01:17:15 +00:00
Fabian Egli
25adc6b802
log git repository status 2024-10-19 00:28:41 +02:00
Fabian Egli
b584f613cd
add some signposts to the log 2024-10-19 00:13:09 +02:00
Fabian Egli
b3068a8d90
add some signposts 2024-10-19 00:12:25 +02:00
Fabian Egli
a46d06d436
log changes made by the action in main.yml 2024-10-19 00:04:15 +02:00
Fabian Egli
cfaade6e2f
log the diff in the update action daily_update.yml 2024-10-19 00:01:15 +02:00
04f630f7f8
Merge pull request #45 from glyn/faq-update
Update the FAQ
2024-10-18 06:35:47 -07:00
Glyn Normington
898c8ab82d
Merge pull request #46 from isagalaev/case-insensitive-sorting
Sort the content of robots.json by keys, case-insensitively
2024-10-18 07:57:56 +01:00
Ivan Sagalaev
7bb5efd462
Sort the content case-insensitively before dumping to JSON 2024-10-17 21:08:43 -04:00
Glyn Normington
e6bb7cae9e Augment the "why" FAQ
Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2419078796
2024-10-17 12:27:05 +01:00
Glyn Normington
b229f5b936 Re-order the FAQ
The "why" question should come first.
2024-10-17 12:25:54 +01:00
dark-visitors
b1491d2694 Daily update from Dark Visitors 2024-10-09 01:17:37 +00:00
ai.robots.txt
9be286626d Merge pull request #43 from lxjv/main
Update robots.json with Claude respect link
2024-10-08 02:30:17 +00:00
Glyn Normington
01993b98c3
Merge pull request #43 from lxjv/main
Update robots.json with Claude respect link
2024-10-08 03:30:07 +01:00
Laker Turner
dc15afe847
Update robots.json with Claude respect link 2024-10-07 17:38:01 +01:00
ai.robots.txt
6da804e826 chore: add ISSCyberRiskCrawler 2024-09-30 23:50:18 +00:00
9c2394f23b
chore: add ISSCyberRiskCrawler 2024-09-30 16:25:20 -07:00