Prevent OpenAI from crawling your content

Add this to your website’s robots.txt to discourage GPTBot from crawling your content:

User-agent: GPTBot
Disallow: /

Documentation: GPTBot - OpenAI API

I say “discourage” because few bots read this file and just because there’s a rule there doesn’t mean they’ll follow it. But you hope they would. Like a ward against evil forces or a garlic necklace. Are bots vampires?

GPTBot might share similarities with Dracula and Nandor, as it consumes websites to feed OpenAI’s models which in turn allows the company and its customers profit off existing work on the Internet.

I did think it was cool that my work is somehow immortalized inside that big ball of virtual neurons, but the amount of influence my work has on those connections is infinitesimally small, similar to the world. And since those models live inside a computer they are unlikely to survive a rogue billionaire.

Just in case they do though, I added another rule to my robots.txt:

User-agent: GPTBot
Disallow: /
Allow: /blog/2023/prevent-crawling-openai/

Via @beep