diff --git a/README.md b/README.md index 232b3ed..6183857 100644 --- a/README.md +++ b/README.md @@ -44,6 +44,7 @@ Note that, as stated in the [httpd documentation](https://httpd.apache.org/docs/ middleware plugin for [Traefik](https://traefik.io/traefik/) to automatically add rules of [robots.txt](./robots.txt) file on-the-fly. +- alternatively you can [manually configure Traefik](./docs/traefik-manual-setup.md) to centrally serve a static `robots.txt` ## Contributing A note about contributing: updates should be added/made to `robots.json`. A GitHub action will then generate the updated `robots.txt`, `table-of-bot-metrics.md`, `.htaccess` and `nginx-block-ai-bots.conf`. diff --git a/docs/traefik-manual-setup.md b/docs/traefik-manual-setup.md new file mode 100644 index 0000000..2bb8d33 --- /dev/null +++ b/docs/traefik-manual-setup.md @@ -0,0 +1,37 @@ +# Intro +If you're using Traefik as your reverse proxy in your docker setup, you might want to use it as well to centrally serve the ```/robots.txt``` for all your Traefik fronted services. + +This can be achieved by configuring a single lightweight service to service static files and defining a high priority Traefik HTTP Router rule. + +# Setup +Define a single service to serve the one robots.txt to rule them all. I'm using a lean nginx:alpine docker image in this example: + +``` +services: + robots: + image: nginx:alpine + container_name: robots-server + volumes: + - ./static/:/usr/share/nginx/html/:ro + labels: + - "traefik.enable=true" + # Router for all /robots.txt requests + - "traefik.http.routers.robots.rule=Path(`/robots.txt`)" + - "traefik.http.routers.robots.entrypoints=web,websecure" + - "traefik.http.routers.robots.priority=3000" + - "traefik.http.routers.robots.service=robots" + - "traefik.http.routers.robots.tls.certresolver=letsencrypt" + - "traefik.http.services.robots.loadbalancer.server.port=80" + networks: + - external_network + +networks: + external_network: + name: traefik_external_network + external: true +``` + + +The Traefik HTTP Routers rule explicitly does not contain a Hostname. Traefik will print a warning about this for the TLS setup but it will work. The high priority of 3000 should ensure this rule is evaluated first for incoming requests. + +Place your robots.txt in the local `./static/` directory and NGINX will serve it for all services behind your Traefik proxy.