ai.robots.txt
6c552a3daa
Merge pull request #71 from jsheard/patch-1
...
Add Crawlspace
2025-01-20 17:45:42 +00:00
Glyn Normington
f621fb4852
Merge pull request #71 from jsheard/patch-1
...
Add Crawlspace
2025-01-20 17:45:29 +00:00
Joshua Sheard
7427d96bac
Update robots.json
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-20 10:59:02 +00:00
Glyn Normington
81cc81b35e
Merge pull request #68 from MassiminoilTrace/main
...
Implementing automatic htaccess generation
2025-01-20 07:33:54 +00:00
Massimo Gismondi
4f03818280
Removed if condition and added a little comments
2025-01-20 06:51:06 +01:00
Massimo Gismondi
a9956f7825
Removed additional sections
2025-01-20 06:50:48 +01:00
Massimo Gismondi
33c38ee70b
Update README.md
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-20 06:28:32 +01:00
Massimo Gismondi
52241bdca6
Update README.md
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-20 06:27:56 +01:00
Massimo Gismondi
013b7abfa1
Update README.md
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-20 06:27:02 +01:00
Massimo Gismondi
70fd6c0fb1
Add mention of htaccess in readme
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-20 06:25:07 +01:00
Joshua Sheard
5aa08bc002
Add Crawlspace
2025-01-19 22:03:50 +00:00
Massimo Gismondi
d65128d10a
Removed paragraph in favour of future FAQ.md
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-18 12:41:09 +01:00
Massimo Gismondi
1cc4b59dfc
Shortened htaccess instructions
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-18 12:40:03 +01:00
Massimo Gismondi
8aee2f24bb
Fixed space in comment
...
Co-authored-by: Glyn Normington <work@underlap.org>
2025-01-18 12:39:07 +01:00
Massimo Gismondi
b455af66e7
Adding clarification about performance and code comment
2025-01-17 21:42:08 +01:00
Massimo Gismondi
189e75bbfd
Adding usage instructions
2025-01-17 21:25:23 +01:00
Massimo Gismondi
933aa6159d
Implementing htaccess generation
2025-01-07 11:02:29 +01:00
Glyn Normington
b7f908e305
Merge pull request #66 from fabianegli/patch-1
...
Allow Action to succeed even if no changes were made
2025-01-07 03:54:40 +00:00
ai.robots.txt
ec454b71d3
Merge pull request #67 from Nightfirecat/semrushbot
...
Block SemrushBot
2025-01-06 20:51:56 +00:00
565dca3dc0
Merge pull request #67 from Nightfirecat/semrushbot
...
Block SemrushBot
2025-01-06 12:51:43 -08:00
Jordan Atwood
143f8f2285
Block SemrushBot
2025-01-06 12:34:38 -08:00
8e98cc6049
Merge pull request #61 from glyn/improve-naming
...
Rename Python code
2025-01-06 08:10:47 -08:00
Fabian Egli
30ee957011
bail when NO changes are staged
2025-01-06 12:05:42 +01:00
Fabian Egli
83cd546470
allow Action to succeed even if no changes were made
...
Before, the Action would fail in case there were no changes made to any files by the converter.
2025-01-06 11:39:41 +01:00
ai.robots.txt
ca8620e28b
Merge pull request #63 from glyn/push-paths
...
Convert robots.json more frequently
2025-01-05 05:05:20 +00:00
Glyn Normington
b9df958b39
Merge pull request #63 from glyn/push-paths
...
Convert robots.json more frequently
2025-01-05 05:05:01 +00:00
Glyn Normington
c01a684036
Convert robots.json more frequently
...
Specifically, when github workflows or code
is changed as either of these can affect the
conversion results.
Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/60
2025-01-05 05:03:50 +00:00
Glyn Normington
d2be15447c
Merge pull request #62 from ai-robots-txt/missing-dependency
...
Ensure dependency installed
2025-01-05 01:46:27 +00:00
Glyn Normington
9e372d0696
Ensure dependency installed
...
Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/60#issuecomment-2571437913
Ref: https://stackoverflow.com/questions/11783875/importerror-no-module-named-bs4-beautifulsoup
2025-01-05 01:45:33 +00:00
Glyn Normington
996b9c678c
Improve job name
...
The purpose of the job is to convert the JSON file
to the other files.
2025-01-04 05:28:41 +00:00
Glyn Normington
e4c12ee2f8
Rename in test code
2025-01-04 05:03:48 +00:00
Glyn Normington
3a43714908
Rename Python code
...
The name dark_visitors.py gives the impression that the code is entirely
related to the dark visitors website, whereas the update command relates
to dark visitors and the convert command is unrelated to dark visitors.
2025-01-04 04:55:34 +00:00
dark-visitors
2036a68c1f
Update from Dark Visitors
2024-12-04 00:55:50 +00:00
Glyn Normington
24666e8b15
Merge pull request #58 from fabianegli/fabianegli-restore-attribution
...
Restore attribution
2024-11-29 09:05:16 +00:00
fabianegli
eb8e1a49b5
Revert "specify file encodings in tests"
...
This reverts commit bd38c30194
.
2024-11-29 09:02:47 +01:00
fabianegli
b64284d684
restore correct attribution logic to before PR #55
2024-11-26 09:41:46 +01:00
fabianegli
bd38c30194
specify file encodings in tests
2024-11-26 09:12:11 +01:00
dark-visitors
609ddca392
Updated from new robots.json
2024-11-24 00:57:06 +00:00
dark-visitors
37065f9118
Update from Dark Visitors
2024-11-24 00:57:05 +00:00
dark-visitors
58985737e7
Updated from new robots.json
2024-11-19 16:46:21 +00:00
584e66cb99
Merge pull request #56 from glyn/40-exclude-facebookexternalhit
...
Allow facebookexternalhit
2024-11-19 08:46:05 -08:00
Glyn Normington
80002f5e17
Allow facebookexternalhit
...
At the time of writing, this crawler does not
appear to be for the purpose of AI.
See: https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/
(accessed on 19 November 2024).
Fixes https://github.com/ai-robots-txt/ai.robots.txt/issues/40
2024-11-19 03:33:45 +00:00
Glyn Normington
71db599b41
Merge pull request #55 from norwd/feature/add-robots.txt-file-to-release
...
Create workflow to upload `robots.txt` file as release artefact
2024-11-13 01:39:11 +00:00
Y. Meyer-Norwood
e8f0784a00
Explicitly use release tag for checkout
2024-11-13 10:26:37 +13:00
Y. Meyer-Norwood
94ceb3cffd
Add authentication for gh
command
2024-11-11 13:04:55 +13:00
Y. Meyer-Norwood
adfd4af872
Create upload-robots-txt-file-to-release.yml
2024-11-11 12:58:40 +13:00
Glyn Normington
d50615d394
Improve formatting
...
This clarifies the scope of the tip is Apache httpd.
2024-11-10 01:06:13 +00:00
Glyn Normington
2c88909be3
Fix formatting
2024-11-10 01:02:18 +00:00
Glyn Normington
6f58ddc623
Merge pull request #54 from glyn/rationale
...
Clarify our rationale
2024-11-10 00:58:29 +00:00
Glyn Normington
9295b6a963
Clarify our rationale
...
I deleted the point about excessive load on
crawled sites as any other crawler could potentially
be guilty of this and I wouldn't want our scope to
creep to all crawlers.
Ref: https://github.com/ai-robots-txt/ai.robots.txt/issues/53#issuecomment-2466042550
2024-11-09 04:45:47 +00:00