• blog.gregoweb.ovh
  • RSS Feed
  • ATOM Feed
  • Tag cloud
  • Picture wall
  • Daily
Links per page: 20 50 100
page 1 / 1
7 results for tags robots.txt x
  • Using HAProxy to protect me from scrapers
    Tue Apr 29 07:39:25 2025 - permalink -
    - https://dgl.cx/2025/04/using-haproxy-to-stop-scrapers
    anubis haproxy honeypot host robots.txt security
  • Comment bloquer les Robots qui aspirent le contenu de votre site pour entraîner des modèles LLM ?
    #via conf nginx :

    if ($http_user_agent ~* (AI2Bot|Ai2Bot-Dolma|Amazonbot|Applebot|Applebot-Extended|Bytespider|CCBot|ChatGPT-User|Claude-Web|ClaudeBot|Diffbot|FacebookBot|FriendlyCrawler|GPTBot|Google-Extended|GoogleOther|GoogleOther-Image|GoogleOther-Video|ICC-Crawler|ImagesiftBot|Kangaroo Bot|Meta-ExternalAgent|Meta-ExternalFetcher|OAI-SearchBot|PerplexityBot|PetalBot|Scrapy|Sidetrade indexer bot|Timpibot|VelenPublicWebCrawler|Webzio-Extended|YouBot|anthropic-ai|cohere-ai|facebookexternalhit|iaskspider/2.0|img2dataset|omgili|omgilibot)) {
       return 403;
    }
    Mon Oct 7 15:56:02 2024 - permalink -
    - https://www.geeek.org/comment-bloquer-robots-aspirent-contenu-pour-llm/
    admin ia llm nginx robots.txt
  • Bloquer les gaveurs d'IA // /home/lord
    Tue Apr 16 09:14:13 2024 - permalink -
    - https://lord.re/fast-posts/76-bloquer-les-gaveurs-dia/
    ia nginx robots.txt
  • Dark Visitors - A List of Known AI Agents on the Internet
    Thu Mar 28 16:50:59 2024 - permalink -
    - https://darkvisitors.com/
    ia robots.txt
  • gregoweb/Robots.txt-Templates
    Thu Mar 28 16:41:46 2024 - permalink -
    - https://github.com/gregoweb/Robots.txt-Templates
    robots.txt
  • ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.
    Thu Mar 28 16:35:19 2024 - permalink -
    - https://github.com/ai-robots-txt/ai.robots.txt
    robots.txt
  • AI bots (OpenAI ChatGPT et al) - comment les bloquer - Didier J. MARY (blog)
    Mon Feb 19 17:41:12 2024 - permalink -
    - https://www.didiermary.fr/bloquer-ai-bots-chatgpt-openai/
    ia robots.txt
Links per page: 20 50 100
page 1 / 1