Commit graph

302 commits

Author SHA1 Message Date
Glyn Normington
c01a684036 Convert robots.json more frequently
Specifically, when github workflows or code
is changed as either of these can affect the
conversion results.

Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/60
2025-01-05 05:03:50 +00:00
Glyn Normington
d2be15447c
Merge pull request #62 from ai-robots-txt/missing-dependency
Ensure dependency installed
2025-01-05 01:46:27 +00:00
Glyn Normington
9e372d0696 Ensure dependency installed
Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/60#issuecomment-2571437913
Ref: https://stackoverflow.com/questions/11783875/importerror-no-module-named-bs4-beautifulsoup
2025-01-05 01:45:33 +00:00
dark-visitors
2036a68c1f Update from Dark Visitors 2024-12-04 00:55:50 +00:00
Glyn Normington
24666e8b15
Merge pull request #58 from fabianegli/fabianegli-restore-attribution
Restore attribution
2024-11-29 09:05:16 +00:00
fabianegli
eb8e1a49b5 Revert "specify file encodings in tests"
This reverts commit bd38c30194.
2024-11-29 09:02:47 +01:00
fabianegli
b64284d684 restore correct attribution logic to before PR #55 2024-11-26 09:41:46 +01:00
fabianegli
bd38c30194 specify file encodings in tests 2024-11-26 09:12:11 +01:00
dark-visitors
609ddca392 Updated from new robots.json 2024-11-24 00:57:06 +00:00
dark-visitors
37065f9118 Update from Dark Visitors 2024-11-24 00:57:05 +00:00
dark-visitors
58985737e7 Updated from new robots.json 2024-11-19 16:46:21 +00:00
584e66cb99
Merge pull request #56 from glyn/40-exclude-facebookexternalhit
Allow facebookexternalhit
2024-11-19 08:46:05 -08:00
Glyn Normington
80002f5e17 Allow facebookexternalhit
At the time of writing, this crawler does not
appear to be for the purpose of AI.

See: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
(accessed on 19 November 2024).

Fixes https://github.com/ai-robots-txt/ai.robots.txt/issues/40
2024-11-19 03:33:45 +00:00
Glyn Normington
71db599b41
Merge pull request #55 from norwd/feature/add-robots.txt-file-to-release
Create workflow to upload `robots.txt` file as release artefact
2024-11-13 01:39:11 +00:00
Y. Meyer-Norwood
e8f0784a00
Explicitly use release tag for checkout 2024-11-13 10:26:37 +13:00
Y. Meyer-Norwood
94ceb3cffd
Add authentication for gh command 2024-11-11 13:04:55 +13:00
Y. Meyer-Norwood
adfd4af872
Create upload-robots-txt-file-to-release.yml 2024-11-11 12:58:40 +13:00
Glyn Normington
d50615d394 Improve formatting
This clarifies the scope of the tip is Apache httpd.
2024-11-10 01:06:13 +00:00
Glyn Normington
2c88909be3 Fix formatting 2024-11-10 01:02:18 +00:00
Glyn Normington
6f58ddc623
Merge pull request #54 from glyn/rationale
Clarify our rationale
2024-11-10 00:58:29 +00:00
Glyn Normington
9295b6a963 Clarify our rationale
I deleted the point about excessive load on
crawled sites as any other crawler could potentially
be guilty of this and I wouldn't want our scope to
creep to all crawlers.

Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/53#issuecomment-2466042550
2024-11-09 04:45:47 +00:00
dark-visitors
9e06cf3bc9 Updated from new robots.json 2024-10-29 00:52:12 +00:00
dark-visitors
bc0a0ad0e9 Update from Dark Visitors 2024-10-29 00:52:12 +00:00
dark-visitors
fe5f407673 Update from Dark Visitors 2024-10-27 00:54:47 +00:00
Adam Newbold
a66b16827d
Merge pull request #51 from fabianegli/php-to-python-plus-tests
PHP to Python plus tests and stuff
2024-10-22 21:32:58 -04:00
fabianegli
3ab22bc498 make conversions and updates separately triggerable 2024-10-19 19:56:41 +02:00
fabianegli
6ab8fb2d37 no more failure when run without network 2024-10-19 19:11:01 +02:00
fabianegli
7e2b3ab037 rename action 2024-10-19 19:09:34 +02:00
fabianegli
0c05461f84 simplify repo and added some tests 2024-10-19 13:06:34 +02:00
fabianegli
6bb598820e ignore venv 2024-10-19 11:56:00 +02:00
Glyn Normington
d62cab66c5
Merge pull request #50 from glyn/fix-typo
Fix typo and trigger rerun of main job
2024-10-19 04:43:09 +01:00
ai.robots.txt
6a359e7fd7 Fix typo and trigger rerun of main job 2024-10-19 03:43:00 +00:00
Glyn Normington
38a388097c Fix typo and trigger rerun of main job 2024-10-19 04:42:27 +01:00
Glyn Normington
83c8603071
Merge pull request #49 from glyn/php-diagnostics
PHP diagnostics
2024-10-19 04:34:53 +01:00
ai.robots.txt
a80bd18fb8 Dump out file contents in PHP script 2024-10-19 03:34:29 +00:00
Glyn Normington
bdf30be7dc Dump out file contents in PHP script 2024-10-19 04:33:46 +01:00
Glyn Normington
4d47b17c45
Merge pull request #47 from fabianegli/fabianegli-patch-1
log the diff in the update actions
2024-10-19 02:58:05 +01:00
dark-visitors
faf81efb12 Daily update from Dark Visitors 2024-10-19 01:17:15 +00:00
Fabian Egli
25adc6b802
log git repository status 2024-10-19 00:28:41 +02:00
Fabian Egli
b584f613cd
add some signposts to the log 2024-10-19 00:13:09 +02:00
Fabian Egli
b3068a8d90
add some signposts 2024-10-19 00:12:25 +02:00
Fabian Egli
a46d06d436
log changes made by the action in main.yml 2024-10-19 00:04:15 +02:00
Fabian Egli
cfaade6e2f
log the diff in the update action daily_update.yml 2024-10-19 00:01:15 +02:00
04f630f7f8
Merge pull request #45 from glyn/faq-update
Update the FAQ
2024-10-18 06:35:47 -07:00
Glyn Normington
898c8ab82d
Merge pull request #46 from isagalaev/case-insensitive-sorting
Sort the content of robots.json by keys, case-insensitively
2024-10-18 07:57:56 +01:00
Ivan Sagalaev
7bb5efd462
Sort the content case-insensitively before dumping to JSON 2024-10-17 21:08:43 -04:00
Glyn Normington
e6bb7cae9e Augment the "why" FAQ
Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/40#issuecomment-2419078796
2024-10-17 12:27:05 +01:00
Glyn Normington
b229f5b936 Re-order the FAQ
The "why" question should come first.
2024-10-17 12:25:54 +01:00
dark-visitors
b1491d2694 Daily update from Dark Visitors 2024-10-09 01:17:37 +00:00
ai.robots.txt
9be286626d Merge pull request #43 from lxjv/main
Update robots.json with Claude respect link
2024-10-08 02:30:17 +00:00